Theory at the Institute and Beyond, July 2023

by Venkatesan Guruswami (Simons Institute)

Summer is typically a quiet time on university campuses, but not at the Simons Institute, where two programs — one on Analysis and TCS, and another on Quantum Computing — are buzzing along. One might recall that one of the inaugural programs hosted by the Simons Institute, back in Fall 2013, was Real Analysis in Computer Science. In the decade since, the field has cultivated influential new themes such as global hypercontractivity and spectral independence, incorporated methods based on high-dimensional expanders and stochastic calculus, and also enabled striking applications in hardness of approximation, Markov chain analysis, and coding theory. All this progress makes this an excellent time to reconvene a program on this topic. The Quantum Computing program has a special focus on the power of noisy intermediate-scale quantum (NISQ) devices, a subject of great current practical interest aimed at demonstrating quantum advantage with noisy devices (pre-quantum error correction), with unique challenges for and great synergy with theory.

These programs come on the heels of a very busy spring semester that hosted a program on Meta-Complexity, which I wrote about earlier, and an extended reunion on the theory and practice of Satisfiability. The participants in the latter program were exposed to both the theoretical and the practical aspects of SAT solving in parallel, and especially for junior researchers, getting such a perspective early in their careers provides an unparalleled platform from which to embark on interdisciplinary and high-impact research.

Generating primes, almost deterministically
One of the papers to come out of the Meta-Complexity program, co-authored by a team of five full-time participants in the program (Chen, Lu, Oliveira, Ren, and Santhanam), gives a randomized algorithm that on infinitely many inputs n, runs in poly(n) time and with high probability generates a canonical n-bit prime. A randomized polynomial-time algorithm to generate some n-bit prime is easy, as one can just sample O(n) random n-bit numbers, test them for primality, and output a prime among them. But such an algorithm can output different primes on different runs. A deterministic algorithm outputting a single prime always would be desirable, but such an algorithm remains elusive. A pseudodeterministic algorithm is an intriguing middle ground and is a randomized algorithm that on any input outputs a unique answer with high probability. Motivated by the question of generating canonical primes, the concept of pseudodeterministic algorithms was introduced by Gat and Goldwasser in 2011 and has since received much attention. But a pseudodeterministic polynomial-time algorithm for prime generation remained open, and the present work solves it modulo the caveat of only working infinitely often. Actually, the result has nothing to do with generating primes per se, and works for any property of numbers that occurs sufficiently often and that can be checked in polynomial time (both of which hold for primality).

A few years ago, a subset of the present authors gave a subexponential (i.e., exp(n0.01)) pseudodeterministic algorithm for generating primes (again only for infinitely many lengths). This was based on a win-win argument that converts a conditional hardness-randomness trade-off (a specific uniform version due to Trevisan and Vadhan) into an unconditional pseudodeterministic algorithm. Namely, if a certain PSPACE-complete language L that they construct is not in BPP, then one can build a pseudorandom set of subexponential size that fools polynomial-time randomized algorithms (infinitely often). So in this case, one can derandomize the trivial randomized algorithm to generate primes and get a deterministic subexponential-time algorithm. On the other hand, if L is in BPP, then the polynomial-space algorithm that searches over all n-bit numbers to find the lexicographically smallest prime yields a polynomial-time pseudodeterministic algorithm.

Continue reading

Mechanisms: Inside or In-Between?

by Issa Kohler-Hausmann (Senior Law and Society Fellow, Spring 2022, Simons Institute)1

This work was made possible by the Simons Institute’s Causality program in the spring of 2022, where I was the Law and Society fellow and had the opportunity to learn and discuss with a collection of brilliant scholars thinking about and working on causality and causal modeling. Special gratitude goes to Robin Dembroff, Maegan Fairchild, and Shamik Dasgupta, who participated in the April 2022 Theoretically Speaking event “Noncausal Dependence and Why It Matters for Causal Reasoning.”

The term “mechanism” or “causal mechanism” is used in two possibly conflicting ways in causal inference literature. Sometimes “causal mechanism” is used to refer to the chain of causal relations that is unleashed between some stipulated triggering event (let’s call it X) and some outcome of interest (let’s call it Y). When people use the term in this sense, they mean “a causal process through which the effect of a treatment on an outcome comes about.”2 One could think of this use of the term as slowing down a movie about the causal process between the moment when X is unleashed and when Y obtains so that we can see more distinct frames capturing ever-finer-grained descriptions of prior events triggering subsequent events as they unfold over time. This is the in-between sense of “mechanism,” or, as Craver says, “causal betweenness.”3 An expansive methodological literature engages causal mechanisms in the in-between sense under the banner of mediation or indirect effects.4 When used in this way, a causal mechanism M lies in the middle of a causal pathway between X and Y: XMY.

But there is a different sense of “mechanism” that refers to whatever it is about the triggering variable (let’s call it X again) that endows it with the causal powers it has. When people use the term in this inside sense, they mean to pick out the constituents of X, the parts and relations that compose it, or the grounds by virtue of which it obtains. Instead of slowing down the movie of a causal process unfolding over time, this use of “mechanism” calls for zooming into X at a particular slice in time.5

Causal models encode mechanisms in the inside sense insofar as denoting a variable (e.g., X) in the model entails denoting the stuff that builds the innards of X in the model.6 Designating variables expresses how the modeler has chosen to carve up states or events in the world. It entails expressing the boundaries of the relata (represented by variables) in the model, as variables marked out as, for example, X and M are taken to be distinct.7 But variable definition often leaves the innards of each relata designated by a variable name — what’s inside of X and M — opaque. And because most causal models we work with are not expressed in terms of fundamental entities (whatever those are — quarks and leptons, or something), variables are built out of or constituted by other things and connections between those things. The variables take the various states designated in the model because certain facts obtain. Inside causal mechanisms are the intravariable relata and relations that compose the variables and give them their distinctive causal powers.

Questions about mechanisms could be posed in one or the other sense of the term. For example, imagine you have a pile of pills and know with absolute certainty that each pill contains the identical chemical substance and dosage. Now imagine you conduct a randomized controlled trial with these pills to see whether ingesting these pills reduces reported headaches, and you document some average causal effect. Upon completion of the study, you might say: “We still do not know the causal mechanisms involved here.” There are simply two meanings to that query.

One meaning is that you do not know what physical processes in the body ingestion of the pill triggered — what physiological pathways ingestion of the substance brought about and unfolded over time such that headache pain was reduced. This version of the query asks about causal mechanisms in the in-between sense. Alternatively, you could mean that you do not know what was in the pill! That is, you have no idea what stuff did the triggering — you do not know the chemical compound that constituted the little pills you gave to your treated subjects.8 This version of the query asks about causal mechanisms in the inside sense, asking what facts obtained such that the thing designated as the cause occurred.

Sometimes people blur these two uses together.9 However, it is important to maintain this conceptual distinction because the relationship between mechanisms in the inside and in-between senses sets some limits on variable definition within a causal model. Specifically, if you posit some mechanism in the in-between sense in a causal model, then the state or event picked out by that mediator cannot be inside another variable.

Continue reading

Using Theory to Design Better Interactions with Visualized Data

by Jessica Hullman (Senior Law and Society Fellow (Fall 2022), Simons Institute)

Theories of sequential decision-making have been around for decades but continue to flourish. The Simons Institute’s Fall 2022 program on Data-Driven Decision Processes provided an excellent overview of recent results in online learning and sequential decision-making. Visiting the Simons Institute as a researcher whose background is in interactive data visualization, I spent some time thinking about how learning theory might advance more applied research related to human-data interaction.

First, it’s worth noting that theories of inference and decision-making remain relatively unintegrated in fields that research data interfaces, including human-computer interaction and visualization. While we might sometimes visualize data simply to generate a historical record, such as to compare points scored across NBA players, most of the time visualization is used to support inference tasks like extrapolating beyond the particular sample to the larger population. Yet beyond a smattering of experimental papers that make use of decision theory, only a handful of works have advocated for theorizing the function of visualizations in the context of frameworks that could provide prescriptive guidance (for example, within the traditions of model checking,1 Bayesian cognition,2 and hypothesis testing3).

A natural question is why. I suspect it may have something to do with the status of visualization as a general-purpose tool that can be put to work to summarize data, persuade viewers to draw certain conclusions from data, or support inferential and decision tasks, sometimes simultaneously. More formal theory requires pinning down the problem more precisely, which might seem reductive. Visualization has also long been associated with the tradition of exploratory data analysis in statistics, in which John Tukey pioneered exposure of the unexpected through graphical displays as an overlooked part of statistical practice.4 Maybe the understanding that visualization is valuable for producing unpredictable insights keeps researchers away from attempting to theorize.

The power of visualization is often touted as providing an external representation that enables the amplification of cognition by allowing natural perceptual processes to aid in identifying patterns in data and freeing up valuable working memory. A key part of this is that a good visualization enables its human user to bring their prior domain knowledge to bear. Similar to how some statistical modelers shy away from the idea of formalizing prior knowledge in applied Bayesian statistics, the role of prior knowledge in visualization-aided analysis may contribute to a seeming bias in the literature toward leaving the human part of the equation untheorized. Instead, we’re left to trust that in supporting exploratory analysis in its various forms, visualization interactions don’t need modeling because the analyst will “know when they see it” and also know what to do about it, whether that means transforming data, collecting more data, making a decision, etc.

All this means that there are many opportunities where statistical theory, data economics, and online learning theory could be helpful for providing a more rigorous theoretical framework in which to answer questions that get at the heart of what visualization is about.

Continue reading

Theory at the Institute and Beyond, February 2023

by Venkatesan Guruswami (Simons Institute)

This semester at the Simons Institute, the Meta-Complexity program is buzzing along with intense activity in the form of multiple reading groups and a weekly seminar, on top of the usual three workshops and boot camp. A large number of complexity theorists have converged at the Institute, including several students, postdocs, and junior researchers. And all this complexity action is intermixed with some good and varied fun in the form of many self-organized social activities.

Theory readers know what complexity is, but what is the “meta” for? An online dictionary definition of “meta” is a term describing something that “consciously references or comments upon its own subject or features” — e.g., “A movie about making a movie is just so meta—especially when the actors criticize the acting.”

So meta-complexity is the study of complexity of problems that themselves pertain to complexity. A prototypical example is circuit minimization, where the goal is to find the smallest circuit for a given specification. This is modeled as the minimum circuit size problem (MCSP), which is one of the lead actors in meta-complexity: Given the truth table of an n-bit Boolean function f, and a size parameter s, does f have a Boolean circuit of size at most s? (Note that the input size is 2n.)

The MCSP problem was explicitly defined in 2000 by Kabanets and Cai, inspired by Razborov and Rudich’s seminal work on natural proofs from 1994. (Upon some reflection, one can realize that a natural property against some class of circuits is really an average-case, one-sided error algorithm for MCSP on circuits of that class.) Being a “natural” problem, MCSP was in fact considered implicitly even earlier, including in early works on complexity in the former Soviet Union. Yablonski claimed in 1959 that MCSP is not in P (to use modern terminology — the class P wasn’t even defined then). MCSP is easily seen to be in NP — indeed, one can guess a circuit and then check if it computes the function correctly on all inputs — so we of course can’t yet prove that it lies outside P

Leading up to his seminal work on NP-completeness, Levin was in fact trying to show NP-hardness of MCSP, but didn’t succeed. One of the six problems that Levin showed NP-hardness of was DNF-MCSP, where one tries to find a DNF formula of minimum size to compute a given truth table. (Technically, Levin proved only NP-hardness of the partial-function version of the question, where we don’t care about the function value on some inputs.) In fact, this was Problem 2 on Levin’s list, between Problem 1, which was set cover, and Problem 3, which was Boolean formula satisfiability.

The complexity of MCSP remains open — we do not know if it is in P or NP-complete. Resolving this is a natural challenge, but on the surface it might seem like a rather specific curiosity. One reason to care about MCSP is that any NP-hardness proof faces some fundamental challenges — in particular, the function produced on No instances must have large circuits, giving an explicit function in EXP that has large circuits, which seems beyond the reach of current techniques. One might try to circumvent this via randomized or exponential-time reductions, and this has indeed fueled some very interesting recent results, as well as intriguing connections to learning theory, average-case complexity, cryptography, and pseudorandomness.

An “extraneous” reason to care about MCSP and meta-complexity is that the underlying techniques have led to some of the best insights on certain foundational questions that have nothing to do with meta-complexity per se. Meta-complexity has been energized by some significant recent progress on both of these (internal and extraneous) fronts. These advances and momentum are fueling a lot of activity and optimism in the Simons Institute program this semester.


The Blooming of the \(c^3\) LTC Flowers

by Prahladh Harsha

The last year (2021–22) has seen some amazing new constructions of locally testable codes with constant rate and constant fractional distance and testable with a constant number of queries, sometimes referred to as \(c^3\) LTCs [DELLM22, PK22]. The construction of Panteleev and Kalachev’s LTCs [PK22] is almost identical to that of Dinur, Evra, Livne, Lubotzky, and Mozes [DELLM22]; while Dinur et al.’s main goal was to construct \(c^3\) LTCs, Panteleev and Kalachev were motivated by considerations of constructing quantum LDPC codes. In this short note, I give an informal description of some of the ideas that led to the \(c^3\) LTC construction of Dinur et al., focusing more on the interplay between high-dimensional expanders and codes that led to this construction and less on the technical proofs. The original paper of [DELLM22] is extremely well written, and the reader is encouraged to read either the paper for the technical details or Goldreich’s exposition [Gol21] for a less group-theoretic presentation.

What are locally testable codes (LTCs)?

Let us begin by recalling what a code is. A code \(\mathcal{C}\) with blocklength over the Boolean alphabet refers to a subset of \(\{0,1\}^n\), and the elements of \(\mathcal{C}\) are usually referred to as codewords. The rate of the code \(\mathcal{C}\), denoted by \(R(\mathcal{C})\), is \(\frac{\log_2|\mathcal{C}|}{n}\) while the (fractional) distance, denoted by \(\delta(\mathcal{C})\), is the minimum fractional Hamming distance between any two distinct codewords in \(\mathcal{C}\). In this note, we will restrict our attention to linear codes: where the underlying alphabet \(\{0,1\}\) is identified with the field \(GF(2)\) and the code \(\mathcal{C}\) is a \(GF(2)\)-vector-space. In this case, the distance \(\delta\) is the minimum fractional weight of a nonzero codeword in \(\mathcal{C}\). If a code \(\mathcal{C}\) is linear, it can easily be described by the set of (dual) constraints that describe the vector space \(\mathcal{C}\). These constraints are usually referred to as parity checks. If, furthermore, each of these parity checks has constant arity (i.e., each constraint involves only a constant number of codeword locations), then the code is said to be a low-density parity-check code (LDPC). Given a string \(x \in \{0,1\}^n\) and a subset \(Q \subset [n]\) of the coordinate locations, \(x|_Q\) refers to the projection of the string \(x\) to the coordinates in \(Q\). The projected code \(\mathcal{C}|_Q\) is defined as the set of projected codewords, formally, \[\mathcal{C}|_Q := \{ x \in \{0,1\}^Q \colon \exists y \in \{0,1\}^{[n]\setminus Q} \text{ such that } (x,y) \in \mathcal{C}\}.\]

A linear code \(\mathcal{C}\) is said to be \(q\)-locally testable1 if there exists a distribution \(\mathcal{Q}\) over subsets \(Q \subset [n]\) of size at most \(q\) such that the following is satisfied. \[\text{For every } x \in \{0,1\}^n, \text{ we have } \Pr_{Q \sim \mathcal{Q}}[ x|_Q \not\in \mathcal{C}|_Q] = \Omega(\delta(x,\mathcal{C})).\] Here, \(\delta(x,\mathcal{C})\) refers to \(\min_{c \in \mathcal{C}}\delta(x,c)\), the fractional Hamming distance between \(x\) and the code \(\mathcal{C}\).

In general, we will be interested in constructing a family \(\{\mathcal{C}_n\}_{n=1}^{\infty}\) of codes with increasing blocklengths. The holy grail in this area, which was attained by the recent constructions, was to obtain a family of codes that had constant rate and constant fractional distance and were testable with a constant number of queries. Codes that satisfy the first two properties (constant rate and constant fractional distance) are usually referred to as good codes. We will first take a detour and see how good codes are constructed and then return to the question of \(c^3\) LTCs.


The Blind Men and the Quantum Elephants

by Chinmay Nirkhe (IBM Research and UC Berkeley)

In the parable of the blind men and the elephant, a group of blind men encounters a very large object, unbeknownst to them that it is indeed an elephant. They start exploring the object by touch, with each blind man trying to understand the small fraction of the object in front of him. But each blind man feels a different part of the elephant, such as its trunk, its tusks, its feet, etc. For instance, one feels the elephant’s trunk and remarks, “This feels like a thick snake.” Each develops a different view as to what the object he encounters must be, as each of the blind men is offered only a small perspective on the global object.

Only once the blind men bicker and argue over their different views do they finally agree that they are observing something greater and finally recognize that the object they are observing is an elephant. In other words, only from the union of the small local views — each local view being indistinguishable from other objects (such as the elephant’s trunk and a snake) — are the blind men able to surmise that the global object in front of them is an elephant.

Today, we consider a variation on the parable more suited for our modern understanding of physics. In this parable, instead of one object that the blind men observe, there are two. The blind men approach the first object, and each man goes to his location around the object. The first man feels the front of the object and again remarks, “This feels like a thick snake,” while the other men then proceed to feel the object in front of them and make observations as to what they feel. The men then proceed to the second object and again proceed to make observations as to what they feel. The first man again remarks, “This feels like a thick snake,” and, curiously, each blind man makes the exact same observation for the second object as he did for the first. The blind men confer and note that since each man’s observations were the same for both objects and together they had observed the entirety of both objects, then the two objects must be the same and therefore the same elephant. They must have been presented with the same elephant twice.

But what if the blind men were wrong and they were truly observing two very different objects? Is it possible that the two objects were very different but were the same under every local observation? Well, it is possible if they were quantum elephants! An amazing property of two pure quantum objects is that their local views can be exactly the same and yet globally the states are very different (even orthogonal). We call such quantum objects locally indistinguishable, and they are fundamentally interesting objects in quantum information theory. For one, due to being locally indistinguishable, these quantum states (objects) have the property that their entanglement is nontrivial — in this case, this means that for the state to be expressed as the output of a quantum circuit, the depth of the circuit required is large. If a high-depth quantum circuit is required, then this means that the state is far from all classical states and is fundamentally quantum.

Illustration by Chinmay Nirkhe

Locally indistinguishable quantum states are easy to concoct, but it is far more interesting when locally indistinguishable quantum states are “naturally” occurring. A physicist might phrase this as a local Hamiltonian system (i.e., a quantum energy landscape) with the property that after cooling the system to its minimum energy (temperature), the state of the system is a locally indistinguishable quantum state — i.e., is there any hope that a group of blind physicists go on a trek through the quantum jungle and stumble upon a pair of quantum elephants?


Where Do Q-Functions Come From?

by Sean Meyn (University of Florida) and Gergely Neu (Pompeu Fabra University)

One theoretical foundation of reinforcement learning is optimal control, usually rooted in the Markovian variety known as Markov decision processes (MDPs). The MDP model consists of a state process, an action (or input) process, and a one-step reward that is a function of state and action. The goal is to obtain a policy (function from states to actions) that is optimal in some predefined sense. Chris Watkins introduced the Q-function in the 1980s as part of a methodology for reinforcement learning. Given its importance for over three decades, it is not surprising that the question of the true meaning of Q was a hot topic for discussion during the Simons Institute’s Fall 2020 program on Theory of Reinforcement Learning.

This short note focuses on interactions at the start of the program, and research directions inspired in part by these interactions. To start with, who is Q? Was this code for one of Watkins’ friends at Cambridge? The question was posed early on, which led to an online investigation. The mystery was shattered through a response from Chris: we now know that the letter Q stands for quality, not Quinlyn or Quincy. To discuss further questions and potential answers requires some technicalities.

The discounted-cost optimality criterion is a favorite metric for performance in computer science and operations research, and is the setting of the original Q-function formulation. The definition requires a state process \(\{X_k : k\ge 0\}\) and an action (or input) process \(\{A_k : k\ge 0\}\), evolving on respective spaces (which are assumed discrete in this note). There is a controlled transition matrix \(P\) that describes dynamics: \(X_{k+1}\) is distributed according to \(P(\cdot|x,a)\) when \(X_k=x\) and \(A_k=a\), for any action sequence that is adapted to the state sequence.

With \(\gamma\) denoting the discount factor, the Q-function is the solution to a nonlinear fixed-point equation \(T^*Q = Q\) in which \(T^*\) is the Bellman operator: \[\left(T^*Q\right)(x,a) = r(x,a) + \gamma \mathbb{E}_{X’\sim P(\cdot|x,a)}\left[\max_{a’} Q(X’,a’)\right]\] This must hold for each state-action pair \((x,a)\), with the maximum over all possible actions. This is a version of the dynamic programming (DP) equation that has been with us for about seven decades.

The magic of Q-learning, which is based on this DP equation, is that the maximum appears within an expectation. This makes possible the application of Monte Carlo methods to obtain an approximate solution based solely on observations of the actual system to be controlled, or through simulations.

One core idea of modern reinforcement learning (RL) is to find approximate solutions of the DP equation within a function class (e.g., neural networks, as popularized by the deep Q-learning approach of Mnih et al., 2015). While success stories are well-known, useful theory is scarce: we don’t know if a solution exists to an approximate DP equation except in very special settings, and we don’t know if a good approximation will lead to good performance for the resulting policy. We don’t even know if the recursive algorithms that define Q-learning will be stable — estimates may diverge to infinity.

There are many ways to read these negative results, and indeed many articles have been written around this subject. Our own reading is probably among the most radical: without understanding the issues around the existence of solutions to these DP equation approximations or their interpretation, we should search for alternative approximations of dynamic programming suitable for application in RL.

These concerns were raised at Sean Meyn’s boot camp lecture, where he called on listeners to revisit an alternate foundation of optimal control: the linear programming (LP) approach introduced by Manne (1960) and further developed by Denardo (1970) and d’Epenoux (1963). The message was greeted with enthusiasm from some attendees, including Gergely Neu, who responded, “You have blown my mind!” He had been working on his own formulation of this idea, which became logistic Q-learning (more on this below).


Theory at the Institute and Beyond, February 2022

by Prasad Raghavendra (Simons Institute)

It has been only a few months since the last time this column appeared. Yet there is so much to write about that it feels like a year has passed. For one area in particular, a decade’s worth of developments seems to have emerged in a few months’ time. I am talking of course about the phenomenal developments on error-correcting codes that we witnessed in the past few months.

Error-correcting codes
Error-correcting codes encode a message into a longer codeword so that the original message can be recovered even if many of the bits of the codeword are corrupted. Clearly, there is a trade-off between the amount of redundancy introduced by the error-correcting code and the number of errors it can recover from. A gold standard for error-correcting codes is being “constant rate, constant distance,” also referred to as being “asymptotically good.” An asymptotically good code can recover from a constant fraction of bit errors in the codeword, and the codeword is only a constant factor longer than the original message.

Error-correcting codes can be described using a family of “parity checks,” where a parity check on a subset \(S\) of bits is a constraint that there are an even number of \(1\)s within the subset \(S\). A codeword is then a sequence of bits that satisfy all of the parity check constraints. In a low-density parity check (LDPC) code, there is a family of parity checks on constantly many bits each, that together define the code.

Introduced by Gallager in 1960, LDPC codes are central objects in both the theory and the practice of error correction. In 1996, Sipser and Spielman showed how to use expander graphs to construct LDPCs with constant distance and constant rate. Their construction is a thing of beauty. Pick an expander graph and associate one bit on each edge of the graph. For each vertex of the expander graph, the parity of the sum of bits on its edges is even. This construction also admits linear time encoding and decoding algorithms for the same.

Asymptotically good locally testable codes and quantum LDPCs
A locally testable code is one that admits a highly efficient procedure to detect the presence of errors. More precisely, it admits a testing procedure that queries a small number of randomly selected bits in the codeword, and will reject a corrupted codeword with, say, 1% of errors with constant probability. The gold standard would be a constant-query locally testable code, where the testing procedure queries only a constantly many (say, 100) bits of the corrupted codeword.

Locally testable codes have been central to many of the developments in complexity theory for close to three decades now. Linearity testing, aka locally testing the Hadamard code, is still the prime example for property testing and lies at the gateway to the world of probabilistically checkable proofs.

Intuitively for a code to be locally testable, it needs to admit a large number of constant-size parity checks. In other words, locally testability seemingly necessitates the code to have a lot of redundancy. In fact, the classic examples of locally testable codes, such as Hadamard codes or Reed-Muller codes, have codewords that are superpolynomially larger than the message.

Irit Dinur’s ingenious proof of the PCP theorem yielded as a by-product the first locally testable codes where the codewords were only slightly superlinear in size. It was then that one would dare to ask: can locally testability be achieved for “free”? That is, can one have the best of all worlds: a constant-rate, constant-distance code that also admits a constant-query local-tester?

This long-standing open question always seemed out of reach until late last year, when it was resolved affirmatively by two independent groups of researchers simultaneously!


Theory at the Institute and Beyond, September 2021

by Prasad Raghavendra (Simons Institute)

Being in one of the talks in the Simons Institute auditorium, witnessing live and lively interaction with the speaker, feels like the closest thing to normal since the start of the pandemic. There is a sense of tangible joy among the participants just to be sharing the same physical space, let alone the fantastic environs of the Institute. The renewed energy is all there to witness in the programs this semester on Computational Complexity of Statistical Inference (CCSI) and Geometric Methods in Optimization and Sampling (GMOS), both of which are now in full swing. Although masking is maintained, it doesn’t seem to change the quintessential Simons program experience even a little bit. I am referring, of course, to the constant feeling of missing out on all the incredibly interesting activities going on, much of which one is unable to fit into their schedule.

At least some of the palpable energy can be attributed to over 40 postdocs and research fellows who have arrived at the Institute this semester, many of whom will stay on for a year or two. This extraordinary group of young researchers covers the whole gamut of topics, ranging from cryptography, quantum computing, and fairness to machine learning, data structures, algorithms, and complexity theory. Each of these postdocs and fellows gave a 10-minute presentation at the “Meet the Fellows’’ welcome event that the Institute held on September 8 and 9. Check out their talks for glimpses of the cutting edge in all these subfields of theory.

An advance in algebraic circuit complexity
This time around, there is some good news from the front lines on circuit complexity, one of the most challenging arenas within theoretical computer science.

An algebraic circuit consists of gates, each of which carries out either addition or multiplication over some field, say real numbers. The depth of the circuit is the length of the longest path from the output to one of its inputs. Naturally, an algebraic circuit computes a polynomial over its inputs.

In the world of Boolean circuits with AND/OR/NOT gates, lower bounds against constant depth circuits, aka AC0 circuit lower bounds, have been known since the 1980s and are one of the most influential results in complexity theory. For general algebraic circuits over a large field (say reals), even superpolynomial lower bounds for depth three circuits had remained elusive. In a remarkable paper, Nutan Limaye, Srikanth Srinivasan, and Sébastien Tavenas have obtained the first superpolynomial lower bounds against general algebraic circuits of all constant depths over fields of characteristic zero (say reals). Furthermore, the lower-bound result is shown for a simple polynomial known as “iterated matrix multiplication” whose input consists of \(d\) matrices \(X_1,\ldots,X_d\) of dimension \(n \times n\), and the goal is to compute a fixed entry of their product \(X_1 \cdot X_2 \cdots X_d\). The same work also obtains a depth hierarchy theorem for algebraic circuits showing that for every depth D, there is an explicit polynomial that can be computed by a depth D circuit of size s, but requires circuits of size superpolynomial in s if the depth is D-1.

Remarkable work of Matthew Brennan
The theory community suffered a terrible loss this year with the tragic and untimely passing of one of our rising stars, Matthew Brennan. While still a graduate student at MIT, Matthew almost single-handedly pushed forward an ambitious research program at the intersection of computational complexity and statistics. Here we will try to give a glimpse of Matthew’s extensive body of research.


Trends in Machine Learning Theory

Welcome to ALT Highlights, a series of blog posts spotlighting various happenings at the recent conference ALT 2021, including plenary talks, tutorials, trends in learning theory, and more! To reach a broad audience, the series is disseminated as guest posts on different blogs in machine learning and theoretical computer science. This initiative is organized by the Learning Theory Alliance and is overseen by Gautam Kamath. All posts in ALT Highlights are indexed on the official Learning Theory Alliance blog.

This is the sixth and final post in the series, on trends in machine learning theory, written by Margalit GlasgowMichal Moshkovitz, and Cyrus Rashtchian.

Throughout the last few decades, we have witnessed unprecedented growth of machine learning. Originally a topic formalized by a small group of computer scientists, machine learning now impacts many areas: the physical sciences, medicine, commerce, finance, urban planning, and more. The rapid growth of machine learning can be partially attributed to the availability of large amounts of data and the development of powerful computing devices. Another important factor is that machine learning has foundations in many other fields, such as theoretical computer science, algorithms, applied mathematics, statistics, and optimization. 

If machine learning is already mathematically rooted in many existing research areas, why do we need a field solely dedicated to learning theory? According to Daniel Hsu, “Learning theory serves (at least) two purposes: to help make sense of machine learning, and also to explore the capabilities and limitations of learning algorithms.” Besides finding innovative applications for existing tools, learning theorists also provide answers to long-standing problems and ask new fundamental questions. 

Modern learning theory goes beyond classical statistical and computer science paradigms by: 

  • developing insights about specific computational models (e.g., neural networks) 
  • analyzing popular learning algorithms (e.g., stochastic gradient descent)
  • taking into account data distributions (e.g., margin bounds or manifold assumptions)
  • adding auxiliary goals (e.g., robustness or privacy), and 
  • rethinking how algorithms interact with and access data (e.g., online or reinforcement learning).

By digging deep into the basic questions, researchers generate new concepts and models that change the way we solve problems and help us understand emerging phenomena.

This article provides a brief overview of three key areas in machine learning theory: new learning paradigms, trustworthy machine learning, and reinforcement learning. We describe the main thrust of each of these areas, as well as point to a few papers from ALT 2021 (the 32nd International Conference on Algorithmic Learning Theory) that touch each of these topics. To share a broader view, we also asked experts in the areas to comment on the field and on their recent papers. Needless to say, this article only scratches the surface. At the end, we point to places to learn more about learning theory.