Meta-Complexity: A Basic Introduction for the Meta-Perplexed

by Adam Becker (science communicator in residence, Spring 2023)

Think about the last time you faced a problem you couldn’t solve. Say it was something practical, something that seemed small — a leaky faucet, for example. There’s an exposed screw right on the top of the faucet handle, so you figure all you need to do is turn the faucet off as far as it will go, and then tighten that screw. So you try that, and it doesn’t work. You get a different screwdriver, a better fit for the screw, but you can’t get it to budge. You grab a wrench and take apart the faucet handle, and that doesn’t help much either — it turns out there’s far more under there than you’d expected, and you can barely put it back together again. You’re about to give up and call a plumber, but first you want to see whether you’re close. Maybe it really is easy to fix the problem, and you just need to know where to look. Or maybe it’s far more difficult than you think. So now you’re trying to solve a new problem, a meta-problem: instead of fixing the leaky faucet, you’re trying to figure out how hard it will be to fix the leaky faucet. You turn to the internet, and find that there are many different kinds of faucets and sinks, some of which are practically indistinguishable, and there are different reasons they can leak, unique to each type of sink. Simply determining the difficulty of fixing your leaky faucet is itself turning out to be more difficult than you expected.

Theoretical computer scientists have been facing their own version of this problem for decades. Many of the problems they ask are about complexity: How hard must a computer (really, an idealized version of one) work to perform a particular task? One such task, famous in the annals of both mathematics and computer science — theoretical computer science is where the two disciplines meet — is the traveling salesperson problem. Imagine a traveling salesperson, going from city to city. Starting from her home, she has a list of cities she must visit, and a map with the distances between those cities. Her budget limits the total distance she can travel to a certain maximum, so she’d like to find a route shorter than that maximum distance that allows her to visit each of the cities on her list, returning to her home city at the end. Given her list of cities and her budget, does such a route exist?

There is no known method for solving this problem quickly in a general way — a method that would work for all possible budgets and lists of cities that the salesperson might have. There are ways of doing it, but all of them take a large number of calculations relative to the number of cities on the list, and thus take a great deal of time, especially as the number of cities increases. In fact, the shortest such guaranteed method known for solving the traveling salesperson problem takes, in general, an exponentially larger amount of time as the number of cities on the list increases, because there’s no known way to do this that’s significantly faster than brute-forcing the problem by checking every possible route. Compare this with verifying a solution to the traveling salesperson problem: that’s easy. All you have to do is confirm that the solution does in fact visit every city once, and that the total distance of the route is shorter than the maximum allowed by the salesperson’s budget. 

This property of the traveling salesperson problem — it seems like it can be solved in general only by a lengthy brute-force method, but it’s fast to verify a given solution — places it into a class of “computational complexity” known as NP. (This stands for “nondeterministic polynomial time,” and it’s not particularly important to understand that name in order to understand what’s going on here.) Compare this with a problem like determining whether the last entry on a list of numbers is the largest, for which there are known (and straightforward) methods that don’t scale exponentially with the length of the list. Such problems, which can be solved and verified quickly, are in a complexity class called P, a special subset of NP.

On the face of it, NP and P seem to be different; the traveling salesperson problem (TSP) can’t be solved quickly by any known method. But the trouble, for computer scientists, begins with those words “known method.” While nobody knows a fast way of solving a problem like the traveling salesperson problem, that doesn’t mean no such method exists. Finding such a method would show that TSP actually belongs in P. In fact, it would show more than that, because computer scientists have proved that TSP is not just a member of NP — it is NP-complete: if there were an efficient solution to TSP, it could be adapted to solve every other problem in NP quickly too. Therefore, a fast solution to TSP wouldn’t just show that TSP is part of P — it would show that every problem in NP is a member of P, making P and NP the same complexity class. But if instead someone were to prove that there is no universally fast method for solving TSP, this would mean that TSP and many other similarly difficult problems in NP aren’t in P, meaning that P and NP are not the same complexity class.

Continue reading

Theory at the Institute and Beyond, January 2024

by Nikhil Srivastava (senior scientist, Simons Institute for the Theory of Computing)

The rain in Berkeley is busy washing Calvin Lab for a shiny new semester at the Simons Institute. Familiar and unfamiliar faces trickle in as we begin this spring’s programs on Error-Correcting Codes: Theory and Practice and on Quantum Algorithms, Complexity, and Fault Tolerance. There is a great deal of interaction between these areas via the hot topic of quantum error-correcting codes, which has seen a wave of progress in recent years. The structure of the semester’s workshops reflects this synergy: there is a common boot camp for both programs, and each of the quantum workshops begins with a day of tutorials to welcome outsiders.

This is a good moment to reflect on some of the exciting happenings of the past semester. One subject very much in the air during the Data Structures and Optimization for Fast Algorithms program was recent progress on the classical question of computing a maximum st-flow in a directed graph. I like this topic not just because the results are surprising, but also because they leverage decades of understanding developed by the TCS community on a staggering array of topics spanning the discrete and continuous worlds: interior point methods, data structures, online algorithms, Laplacian solvers, expander graphs, oblivious routing, low-stretch trees, spanners, sparsifiers, and more. It is a truly modern body of work displaying the maturity and unity of the theory of algorithms, as well as its gainful interactions with neighboring fields such as optimization and numerical linear algebra.

The rest of this article is my attempt at a genealogy of the ideas behind the almost-linear time algorithm for maximum st-flow by Chen et al. and its further refinements. Some of the themes discussed below have been cultivated in previous Institute programs, notably Algorithmic Spectral Graph Theory (2014) and Bridging Continuous and Discrete Optimization (2017). I encourage you to watch Aaron Sidford’s stunning boot camp introduction for a thoughtful overview of the surrounding algorithmic landscape (and a good example of how to run a boot camp, for future organizers!).

Four decades of max flow
The min-cost flow problem is the following: given a directed graph G with nonnegative integer edge capacities and costs, a flow value F, and two distinguished vertices s and t, find a feasible st-flow of value F of minimum cost, if one exists. When the costs are all 1, this problem is the feasibility version of max flow.

Min-cost flow (with unit capacities) was first studied by A.N. Tolstoi in 1930 in the context of cheaply moving goods on a rail network in the then relatively young Soviet Union. Max flow was first studied by T.E. Harris and Gen. F.S. Ross in the United States during the 1950s, with the dual interest of finding the most efficient way to destroy the capability of the same Soviet rail network (corresponding to a minimum cut). Max flow is now taught in undergraduate CS courses and can be used to solve other fundamental graph problems such as bipartite matching and sparsest cut. It is a special case of linear programming.

More than 90 years after its introduction, in 2022, Chen, Kyng, Liu, Peng, Probst Gutenberg, and Sachdeva [CKLPPS’22] designed an almost-linear time algorithm for exactly solving min-cost flow on general directed graphs with integer capacities and costs. The paper masterfully combines two ingredients from continuous and discrete algorithms — specifically, a certain kind of interior point method (IPM) for linear programming, and a randomized data structure for quickly producing the updates required by that IPM, exploiting the fact that the flow changes slowly over the course of the algorithm. There has recently been a further distillation of these techniques in Chen, Kyng, Liu, Meierhans, and Probst Gutenberg [CKLMP’23], so it is a natural time to appreciate what has been done.

Let me briefly explain the two components and their historical evolution. The first is a potential reduction interior point method for linear programming, originated by Karmarkar (1984). The min-cost flow LP aims to minimize a linear objective inside a polytope, which is an affine section of a (scaled) ball. The idea of Karmarkar’s IPM is to define a smooth potential function on the interior of the polytope that blows up to -∞ at the facet/vertex on which the optimum is reached, and to +∞ elsewhere on the boundary, and then generate a sequence of iterates converging to its minimum by locally minimizing its quadratic Taylor series repeatedly. One striking thing about this method is that the potential is not convex, and yet repeated local optimization is guaranteed to reach a global minimum; the geometric reason for why this is expected comes from Karmarkar’s original framing of his algorithm in terms of projective transformations, but it is nonetheless not hard to prove once imagined. The reason for using IPMs is that they exhibit log(1/ϵ) type convergence, enabling an exact solution, as opposed to first-order methods like gradient descent, which can obtain only poly(1/ϵ) type convergence. First-order methods were used 10 years ago by Kelner, Lee, Orecchia, and Sidford (2014) and Sherman (2014) to obtain the first almost-linear time algorithms for approximate max flow on undirected graphs.

Continue reading

Workshop Report: Structural Results

by Jordan Ellenberg (science communicator in residence, Summer 2023)

Some of the richest and most lasting mathematical ideas come about when two distinct fields come together and pay close attention to their intersections. That was the goal of the workshop on Structural Results held at the Simons Institute in July 2023, in which extremal combinatorists and theoretical computer scientists specializing in complexity convened to talk about the overlap between their two fields.

One highlight was Cosmin Pohoata’s talk about his recent breakthrough with Alex Cohen and Dmitrii Zakharov (both still graduate students!) on the Heilbronn triangle problem. It seems like a simple one: If you put n points in a unit square, what is the triangle of smallest area formed by any three of the points? The answer, of course, depends on the location of the points. The question Hans Heilbronn asked in the 1940s was: How large can the smallest triangle be? In other words, how can we arrange the n points so that no three of them form a small triangle? Your first thought might be to arrange the points so that no two of them came too close to each other. But that’s not good enough — if three points are very close to lying on a straight line, the triangle they form will be long but very skinny, and will still have a small area! Indeed, if you toss n points randomly in the square, the smallest triangle you find will typically have an area around 1/n3, but its diameter will be close to that of the square itself! Long skinny triangles are ubiquitous. Komlós, Pintz, and Szemerédi showed in 1981 that there has to be a triangle whose size is at most n-8/7, and there the problem stood for 40 years, until Pohoata and his collaborators broke the 8/7 barrier and showed that the smallest triangle actually has to be a shade smaller — its area is at most n-8/7 – 1/2000.

What do long skinny triangles have to do with computer science? Nothing, per se. But problems about sets of points in space are surprisingly applicable. Tselil Schramm’s talk, for instance (about her joint work with Liu, Mohanty, and Yang). The notion of an expander graph — a network in which a random walk disperses optimally quickly — is crucial to computer science, allowing advances in pseudorandomness, coding theory, and many other areas. Fascinatingly, for many years the situation was the following: it was known that random graphs were expanding with very high probability, but there was no specific family of graphs that could be proved to be expanding! Nowadays there has been an explosion of explicit expanders. But the situation is different in the newer field of high-dimensional expansion, in which instead of networks (which are composed of vertices, some pairs of which are connected by edges) one has complexes, in which the collections of vertices one keeps track of are not just pairs, but triples (or quadruples, or even larger groupings, but the work presented at the workshop was about triples). What Schramm and her collaborators have done is to show that two-dimensional expanders are plentiful (albeit with vertex degree somewhat higher than what one can do in the one-dimensional case), by showing that random constructions really do enjoy this property; in particular, you can take n points at random on a sphere whose dimension is about log n, and take the “edges” to be triples of points which are all within a small distance of each other. Small triangles again! (But this time, “small” in the sense of “all three vertices being mutually close to each other,” not “having small area.”)

What Pohoata’s and Schramm’s work have in common is the difficulty of going from 2 to 3: problems about pairs of points are well understood, but problems about triangles are drastically harder! (This phenomenon is familiar from dynamics, where the two-body problem is a simple matter of ellipses, while the three-body problem remains frustratingly unsolved and an active topic of research.) But where the two papers part ways is in their relationship with randomness. The point of Schramm’s work is to prove that a random configuration of points achieves maximum performance. In the Heilbronn triangle problem, randomly chosen points definitely do not make the smallest triangle maximally big; to get to the optimum, one will have to choose points very, very carefully, perhaps according to the dictates of some yet-undiscovered structure.

This dichotomy between structure and randomness was an ever-present theme during the workshop. We saw it in Julia Wolf’s talk about her joint work with Caroline Terry, which was about a new regularity result concerning subsets of the space of Boolean vectors (strings of 0’s and 1’s). One information-theoretic way in which such a subset A could be thought of as “simple” would be if membership in A depended only on a few coordinates. More generally, one can call A regular if there is some linear projection p to a much lower-dimensional space of Boolean vectors such that membership of v in A depends only on the image of v under p, which carries much less information. What Wolf and Terry show is that a different computational notion of tameness, called “k-stability,” implies “approximate regularity” — that there is some projection p such that the value of p(v) determines membership of v in A with high probability. Now we are getting still closer to questions of computational complexity! And the relation was made even more apparent in Shachar Lovett’s Richard M. Karp Distinguished Lecture on communication complexity. In this talk, the setup is that two players, Xavier and Yelena, are trying to compute the value of a function f(x,y) which takes values in {1,-1}. The function is known to both players, but the problem is that Xavier knows only the value of x and Yelena knows only the value of y. Of course they could share their private values and compute f; but the game here is to compute f(x,y) using as few bits of communication between Xavier and Yelena as possible. This complexity problem is visibly of the same flavor as the regularity question contemplated by Terry and Wolf; and here, too, the exciting work arises when we try to compare this notion of complexity with other notions of a more linear-algebraic flavor.

One of the high points of the week was the open-problem session held on the second afternoon of the workshop. In this informal setting, the pure mathematicians and computer scientists jumped to the board, interrogated each other, and laid out a whole spray of conundrums, some stemming directly from the talks we’d heard or were going to here, others from farther afield. The structure, shall we say, allowed for a certain amount of randomness.

Theory at the Institute and Beyond, July 2023

by Venkatesan Guruswami (Simons Institute)

Summer is typically a quiet time on university campuses, but not at the Simons Institute, where two programs — one on Analysis and TCS, and another on Quantum Computing — are buzzing along. One might recall that one of the inaugural programs hosted by the Simons Institute, back in Fall 2013, was Real Analysis in Computer Science. In the decade since, the field has cultivated influential new themes such as global hypercontractivity and spectral independence, incorporated methods based on high-dimensional expanders and stochastic calculus, and also enabled striking applications in hardness of approximation, Markov chain analysis, and coding theory. All this progress makes this an excellent time to reconvene a program on this topic. The Quantum Computing program has a special focus on the power of noisy intermediate-scale quantum (NISQ) devices, a subject of great current practical interest aimed at demonstrating quantum advantage with noisy devices (pre-quantum error correction), with unique challenges for and great synergy with theory.

These programs come on the heels of a very busy spring semester that hosted a program on Meta-Complexity, which I wrote about earlier, and an extended reunion on the theory and practice of Satisfiability. The participants in the latter program were exposed to both the theoretical and the practical aspects of SAT solving in parallel, and especially for junior researchers, getting such a perspective early in their careers provides an unparalleled platform from which to embark on interdisciplinary and high-impact research.

Generating primes, almost deterministically
One of the papers to come out of the Meta-Complexity program, co-authored by a team of five full-time participants in the program (Chen, Lu, Oliveira, Ren, and Santhanam), gives a randomized algorithm that on infinitely many inputs n, runs in poly(n) time and with high probability generates a canonical n-bit prime. A randomized polynomial-time algorithm to generate some n-bit prime is easy, as one can just sample O(n) random n-bit numbers, test them for primality, and output a prime among them. But such an algorithm can output different primes on different runs. A deterministic algorithm outputting a single prime always would be desirable, but such an algorithm remains elusive. A pseudodeterministic algorithm is an intriguing middle ground and is a randomized algorithm that on any input outputs a unique answer with high probability. Motivated by the question of generating canonical primes, the concept of pseudodeterministic algorithms was introduced by Gat and Goldwasser in 2011 and has since received much attention. But a pseudodeterministic polynomial-time algorithm for prime generation remained open, and the present work solves it modulo the caveat of only working infinitely often. Actually, the result has nothing to do with generating primes per se, and works for any property of numbers that occurs sufficiently often and that can be checked in polynomial time (both of which hold for primality).

A few years ago, a subset of the present authors gave a subexponential (i.e., exp(n0.01)) pseudodeterministic algorithm for generating primes (again only for infinitely many lengths). This was based on a win-win argument that converts a conditional hardness-randomness trade-off (a specific uniform version due to Trevisan and Vadhan) into an unconditional pseudodeterministic algorithm. Namely, if a certain PSPACE-complete language L that they construct is not in BPP, then one can build a pseudorandom set of subexponential size that fools polynomial-time randomized algorithms (infinitely often). So in this case, one can derandomize the trivial randomized algorithm to generate primes and get a deterministic subexponential-time algorithm. On the other hand, if L is in BPP, then the polynomial-space algorithm that searches over all n-bit numbers to find the lexicographically smallest prime yields a polynomial-time pseudodeterministic algorithm.

Continue reading

Mechanisms: Inside or In-Between?

by Issa Kohler-Hausmann (Senior Law and Society Fellow, Spring 2022, Simons Institute)1

This work was made possible by the Simons Institute’s Causality program in the spring of 2022, where I was the Law and Society fellow and had the opportunity to learn and discuss with a collection of brilliant scholars thinking about and working on causality and causal modeling. Special gratitude goes to Robin Dembroff, Maegan Fairchild, and Shamik Dasgupta, who participated in the April 2022 Theoretically Speaking event “Noncausal Dependence and Why It Matters for Causal Reasoning.”

Introduction
The term “mechanism” or “causal mechanism” is used in two possibly conflicting ways in causal inference literature. Sometimes “causal mechanism” is used to refer to the chain of causal relations that is unleashed between some stipulated triggering event (let’s call it X) and some outcome of interest (let’s call it Y). When people use the term in this sense, they mean “a causal process through which the effect of a treatment on an outcome comes about.”2 One could think of this use of the term as slowing down a movie about the causal process between the moment when X is unleashed and when Y obtains so that we can see more distinct frames capturing ever-finer-grained descriptions of prior events triggering subsequent events as they unfold over time. This is the in-between sense of “mechanism,” or, as Craver says, “causal betweenness.”3 An expansive methodological literature engages causal mechanisms in the in-between sense under the banner of mediation or indirect effects.4 When used in this way, a causal mechanism M lies in the middle of a causal pathway between X and Y: XMY.

But there is a different sense of “mechanism” that refers to whatever it is about the triggering variable (let’s call it X again) that endows it with the causal powers it has. When people use the term in this inside sense, they mean to pick out the constituents of X, the parts and relations that compose it, or the grounds by virtue of which it obtains. Instead of slowing down the movie of a causal process unfolding over time, this use of “mechanism” calls for zooming into X at a particular slice in time.5

Causal models encode mechanisms in the inside sense insofar as denoting a variable (e.g., X) in the model entails denoting the stuff that builds the innards of X in the model.6 Designating variables expresses how the modeler has chosen to carve up states or events in the world. It entails expressing the boundaries of the relata (represented by variables) in the model, as variables marked out as, for example, X and M are taken to be distinct.7 But variable definition often leaves the innards of each relata designated by a variable name — what’s inside of X and M — opaque. And because most causal models we work with are not expressed in terms of fundamental entities (whatever those are — quarks and leptons, or something), variables are built out of or constituted by other things and connections between those things. The variables take the various states designated in the model because certain facts obtain. Inside causal mechanisms are the intravariable relata and relations that compose the variables and give them their distinctive causal powers.

Questions about mechanisms could be posed in one or the other sense of the term. For example, imagine you have a pile of pills and know with absolute certainty that each pill contains the identical chemical substance and dosage. Now imagine you conduct a randomized controlled trial with these pills to see whether ingesting these pills reduces reported headaches, and you document some average causal effect. Upon completion of the study, you might say: “We still do not know the causal mechanisms involved here.” There are simply two meanings to that query.

One meaning is that you do not know what physical processes in the body ingestion of the pill triggered — what physiological pathways ingestion of the substance brought about and unfolded over time such that headache pain was reduced. This version of the query asks about causal mechanisms in the in-between sense. Alternatively, you could mean that you do not know what was in the pill! That is, you have no idea what stuff did the triggering — you do not know the chemical compound that constituted the little pills you gave to your treated subjects.8 This version of the query asks about causal mechanisms in the inside sense, asking what facts obtained such that the thing designated as the cause occurred.

Sometimes people blur these two uses together.9 However, it is important to maintain this conceptual distinction because the relationship between mechanisms in the inside and in-between senses sets some limits on variable definition within a causal model. Specifically, if you posit some mechanism in the in-between sense in a causal model, then the state or event picked out by that mediator cannot be inside another variable.

Continue reading

Using Theory to Design Better Interactions with Visualized Data

by Jessica Hullman (Senior Law and Society Fellow (Fall 2022), Simons Institute)
 

Theories of sequential decision-making have been around for decades but continue to flourish. The Simons Institute’s Fall 2022 program on Data-Driven Decision Processes provided an excellent overview of recent results in online learning and sequential decision-making. Visiting the Simons Institute as a researcher whose background is in interactive data visualization, I spent some time thinking about how learning theory might advance more applied research related to human-data interaction.

First, it’s worth noting that theories of inference and decision-making remain relatively unintegrated in fields that research data interfaces, including human-computer interaction and visualization. While we might sometimes visualize data simply to generate a historical record, such as to compare points scored across NBA players, most of the time visualization is used to support inference tasks like extrapolating beyond the particular sample to the larger population. Yet beyond a smattering of experimental papers that make use of decision theory, only a handful of works have advocated for theorizing the function of visualizations in the context of frameworks that could provide prescriptive guidance (for example, within the traditions of model checking,1 Bayesian cognition,2 and hypothesis testing3).

A natural question is why. I suspect it may have something to do with the status of visualization as a general-purpose tool that can be put to work to summarize data, persuade viewers to draw certain conclusions from data, or support inferential and decision tasks, sometimes simultaneously. More formal theory requires pinning down the problem more precisely, which might seem reductive. Visualization has also long been associated with the tradition of exploratory data analysis in statistics, in which John Tukey pioneered exposure of the unexpected through graphical displays as an overlooked part of statistical practice.4 Maybe the understanding that visualization is valuable for producing unpredictable insights keeps researchers away from attempting to theorize.

The power of visualization is often touted as providing an external representation that enables the amplification of cognition by allowing natural perceptual processes to aid in identifying patterns in data and freeing up valuable working memory. A key part of this is that a good visualization enables its human user to bring their prior domain knowledge to bear. Similar to how some statistical modelers shy away from the idea of formalizing prior knowledge in applied Bayesian statistics, the role of prior knowledge in visualization-aided analysis may contribute to a seeming bias in the literature toward leaving the human part of the equation untheorized. Instead, we’re left to trust that in supporting exploratory analysis in its various forms, visualization interactions don’t need modeling because the analyst will “know when they see it” and also know what to do about it, whether that means transforming data, collecting more data, making a decision, etc.

All this means that there are many opportunities where statistical theory, data economics, and online learning theory could be helpful for providing a more rigorous theoretical framework in which to answer questions that get at the heart of what visualization is about.

Continue reading

Theory at the Institute and Beyond, February 2023

by Venkatesan Guruswami (Simons Institute)

This semester at the Simons Institute, the Meta-Complexity program is buzzing along with intense activity in the form of multiple reading groups and a weekly seminar, on top of the usual three workshops and boot camp. A large number of complexity theorists have converged at the Institute, including several students, postdocs, and junior researchers. And all this complexity action is intermixed with some good and varied fun in the form of many self-organized social activities.

Theory readers know what complexity is, but what is the “meta” for? An online dictionary definition of “meta” is a term describing something that “consciously references or comments upon its own subject or features” — e.g., “A movie about making a movie is just so meta—especially when the actors criticize the acting.”

So meta-complexity is the study of complexity of problems that themselves pertain to complexity. A prototypical example is circuit minimization, where the goal is to find the smallest circuit for a given specification. This is modeled as the minimum circuit size problem (MCSP), which is one of the lead actors in meta-complexity: Given the truth table of an n-bit Boolean function f, and a size parameter s, does f have a Boolean circuit of size at most s? (Note that the input size is 2n.)

The MCSP problem was explicitly defined in 2000 by Kabanets and Cai, inspired by Razborov and Rudich’s seminal work on natural proofs from 1994. (Upon some reflection, one can realize that a natural property against some class of circuits is really an average-case, one-sided error algorithm for MCSP on circuits of that class.) Being a “natural” problem, MCSP was in fact considered implicitly even earlier, including in early works on complexity in the former Soviet Union. Yablonski claimed in 1959 that MCSP is not in P (to use modern terminology — the class P wasn’t even defined then). MCSP is easily seen to be in NP — indeed, one can guess a circuit and then check if it computes the function correctly on all inputs — so we of course can’t yet prove that it lies outside P

Leading up to his seminal work on NP-completeness, Levin was in fact trying to show NP-hardness of MCSP, but didn’t succeed. One of the six problems that Levin showed NP-hardness of was DNF-MCSP, where one tries to find a DNF formula of minimum size to compute a given truth table. (Technically, Levin proved only NP-hardness of the partial-function version of the question, where we don’t care about the function value on some inputs.) In fact, this was Problem 2 on Levin’s list, between Problem 1, which was set cover, and Problem 3, which was Boolean formula satisfiability.

The complexity of MCSP remains open — we do not know if it is in P or NP-complete. Resolving this is a natural challenge, but on the surface it might seem like a rather specific curiosity. One reason to care about MCSP is that any NP-hardness proof faces some fundamental challenges — in particular, the function produced on No instances must have large circuits, giving an explicit function in EXP that has large circuits, which seems beyond the reach of current techniques. One might try to circumvent this via randomized or exponential-time reductions, and this has indeed fueled some very interesting recent results, as well as intriguing connections to learning theory, average-case complexity, cryptography, and pseudorandomness.

An “extraneous” reason to care about MCSP and meta-complexity is that the underlying techniques have led to some of the best insights on certain foundational questions that have nothing to do with meta-complexity per se. Meta-complexity has been energized by some significant recent progress on both of these (internal and extraneous) fronts. These advances and momentum are fueling a lot of activity and optimism in the Simons Institute program this semester.

CONTINUE READING

The Blooming of the \(c^3\) LTC Flowers

by Prahladh Harsha

The last year (2021–22) has seen some amazing new constructions of locally testable codes with constant rate and constant fractional distance and testable with a constant number of queries, sometimes referred to as \(c^3\) LTCs [DELLM22, PK22]. The construction of Panteleev and Kalachev’s LTCs [PK22] is almost identical to that of Dinur, Evra, Livne, Lubotzky, and Mozes [DELLM22]; while Dinur et al.’s main goal was to construct \(c^3\) LTCs, Panteleev and Kalachev were motivated by considerations of constructing quantum LDPC codes. In this short note, I give an informal description of some of the ideas that led to the \(c^3\) LTC construction of Dinur et al., focusing more on the interplay between high-dimensional expanders and codes that led to this construction and less on the technical proofs. The original paper of [DELLM22] is extremely well written, and the reader is encouraged to read either the paper for the technical details or Goldreich’s exposition [Gol21] for a less group-theoretic presentation.

What are locally testable codes (LTCs)?

Let us begin by recalling what a code is. A code \(\mathcal{C}\) with blocklength over the Boolean alphabet refers to a subset of \(\{0,1\}^n\), and the elements of \(\mathcal{C}\) are usually referred to as codewords. The rate of the code \(\mathcal{C}\), denoted by \(R(\mathcal{C})\), is \(\frac{\log_2|\mathcal{C}|}{n}\) while the (fractional) distance, denoted by \(\delta(\mathcal{C})\), is the minimum fractional Hamming distance between any two distinct codewords in \(\mathcal{C}\). In this note, we will restrict our attention to linear codes: where the underlying alphabet \(\{0,1\}\) is identified with the field \(GF(2)\) and the code \(\mathcal{C}\) is a \(GF(2)\)-vector-space. In this case, the distance \(\delta\) is the minimum fractional weight of a nonzero codeword in \(\mathcal{C}\). If a code \(\mathcal{C}\) is linear, it can easily be described by the set of (dual) constraints that describe the vector space \(\mathcal{C}\). These constraints are usually referred to as parity checks. If, furthermore, each of these parity checks has constant arity (i.e., each constraint involves only a constant number of codeword locations), then the code is said to be a low-density parity-check code (LDPC). Given a string \(x \in \{0,1\}^n\) and a subset \(Q \subset [n]\) of the coordinate locations, \(x|_Q\) refers to the projection of the string \(x\) to the coordinates in \(Q\). The projected code \(\mathcal{C}|_Q\) is defined as the set of projected codewords, formally, \[\mathcal{C}|_Q := \{ x \in \{0,1\}^Q \colon \exists y \in \{0,1\}^{[n]\setminus Q} \text{ such that } (x,y) \in \mathcal{C}\}.\]

A linear code \(\mathcal{C}\) is said to be \(q\)-locally testable1 if there exists a distribution \(\mathcal{Q}\) over subsets \(Q \subset [n]\) of size at most \(q\) such that the following is satisfied. \[\text{For every } x \in \{0,1\}^n, \text{ we have } \Pr_{Q \sim \mathcal{Q}}[ x|_Q \not\in \mathcal{C}|_Q] = \Omega(\delta(x,\mathcal{C})).\] Here, \(\delta(x,\mathcal{C})\) refers to \(\min_{c \in \mathcal{C}}\delta(x,c)\), the fractional Hamming distance between \(x\) and the code \(\mathcal{C}\).

In general, we will be interested in constructing a family \(\{\mathcal{C}_n\}_{n=1}^{\infty}\) of codes with increasing blocklengths. The holy grail in this area, which was attained by the recent constructions, was to obtain a family of codes that had constant rate and constant fractional distance and were testable with a constant number of queries. Codes that satisfy the first two properties (constant rate and constant fractional distance) are usually referred to as good codes. We will first take a detour and see how good codes are constructed and then return to the question of \(c^3\) LTCs.

CONTINUE READING

The Blind Men and the Quantum Elephants

by Chinmay Nirkhe (IBM Research and UC Berkeley)

In the parable of the blind men and the elephant, a group of blind men encounters a very large object, unbeknownst to them that it is indeed an elephant. They start exploring the object by touch, with each blind man trying to understand the small fraction of the object in front of him. But each blind man feels a different part of the elephant, such as its trunk, its tusks, its feet, etc. For instance, one feels the elephant’s trunk and remarks, “This feels like a thick snake.” Each develops a different view as to what the object he encounters must be, as each of the blind men is offered only a small perspective on the global object.

Only once the blind men bicker and argue over their different views do they finally agree that they are observing something greater and finally recognize that the object they are observing is an elephant. In other words, only from the union of the small local views — each local view being indistinguishable from other objects (such as the elephant’s trunk and a snake) — are the blind men able to surmise that the global object in front of them is an elephant.

Today, we consider a variation on the parable more suited for our modern understanding of physics. In this parable, instead of one object that the blind men observe, there are two. The blind men approach the first object, and each man goes to his location around the object. The first man feels the front of the object and again remarks, “This feels like a thick snake,” while the other men then proceed to feel the object in front of them and make observations as to what they feel. The men then proceed to the second object and again proceed to make observations as to what they feel. The first man again remarks, “This feels like a thick snake,” and, curiously, each blind man makes the exact same observation for the second object as he did for the first. The blind men confer and note that since each man’s observations were the same for both objects and together they had observed the entirety of both objects, then the two objects must be the same and therefore the same elephant. They must have been presented with the same elephant twice.

But what if the blind men were wrong and they were truly observing two very different objects? Is it possible that the two objects were very different but were the same under every local observation? Well, it is possible if they were quantum elephants! An amazing property of two pure quantum objects is that their local views can be exactly the same and yet globally the states are very different (even orthogonal). We call such quantum objects locally indistinguishable, and they are fundamentally interesting objects in quantum information theory. For one, due to being locally indistinguishable, these quantum states (objects) have the property that their entanglement is nontrivial — in this case, this means that for the state to be expressed as the output of a quantum circuit, the depth of the circuit required is large. If a high-depth quantum circuit is required, then this means that the state is far from all classical states and is fundamentally quantum.

Illustration by Chinmay Nirkhe

Locally indistinguishable quantum states are easy to concoct, but it is far more interesting when locally indistinguishable quantum states are “naturally” occurring. A physicist might phrase this as a local Hamiltonian system (i.e., a quantum energy landscape) with the property that after cooling the system to its minimum energy (temperature), the state of the system is a locally indistinguishable quantum state — i.e., is there any hope that a group of blind physicists go on a trek through the quantum jungle and stumble upon a pair of quantum elephants?

CONTINUE READING

Where Do Q-Functions Come From?

by Sean Meyn (University of Florida) and Gergely Neu (Pompeu Fabra University)

One theoretical foundation of reinforcement learning is optimal control, usually rooted in the Markovian variety known as Markov decision processes (MDPs). The MDP model consists of a state process, an action (or input) process, and a one-step reward that is a function of state and action. The goal is to obtain a policy (function from states to actions) that is optimal in some predefined sense. Chris Watkins introduced the Q-function in the 1980s as part of a methodology for reinforcement learning. Given its importance for over three decades, it is not surprising that the question of the true meaning of Q was a hot topic for discussion during the Simons Institute’s Fall 2020 program on Theory of Reinforcement Learning.

This short note focuses on interactions at the start of the program, and research directions inspired in part by these interactions. To start with, who is Q? Was this code for one of Watkins’ friends at Cambridge? The question was posed early on, which led to an online investigation. The mystery was shattered through a response from Chris: we now know that the letter Q stands for quality, not Quinlyn or Quincy. To discuss further questions and potential answers requires some technicalities.

The discounted-cost optimality criterion is a favorite metric for performance in computer science and operations research, and is the setting of the original Q-function formulation. The definition requires a state process \(\{X_k : k\ge 0\}\) and an action (or input) process \(\{A_k : k\ge 0\}\), evolving on respective spaces (which are assumed discrete in this note). There is a controlled transition matrix \(P\) that describes dynamics: \(X_{k+1}\) is distributed according to \(P(\cdot|x,a)\) when \(X_k=x\) and \(A_k=a\), for any action sequence that is adapted to the state sequence.

With \(\gamma\) denoting the discount factor, the Q-function is the solution to a nonlinear fixed-point equation \(T^*Q = Q\) in which \(T^*\) is the Bellman operator: \[\left(T^*Q\right)(x,a) = r(x,a) + \gamma \mathbb{E}_{X’\sim P(\cdot|x,a)}\left[\max_{a’} Q(X’,a’)\right]\] This must hold for each state-action pair \((x,a)\), with the maximum over all possible actions. This is a version of the dynamic programming (DP) equation that has been with us for about seven decades.

The magic of Q-learning, which is based on this DP equation, is that the maximum appears within an expectation. This makes possible the application of Monte Carlo methods to obtain an approximate solution based solely on observations of the actual system to be controlled, or through simulations.

One core idea of modern reinforcement learning (RL) is to find approximate solutions of the DP equation within a function class (e.g., neural networks, as popularized by the deep Q-learning approach of Mnih et al., 2015). While success stories are well-known, useful theory is scarce: we don’t know if a solution exists to an approximate DP equation except in very special settings, and we don’t know if a good approximation will lead to good performance for the resulting policy. We don’t even know if the recursive algorithms that define Q-learning will be stable — estimates may diverge to infinity.

There are many ways to read these negative results, and indeed many articles have been written around this subject. Our own reading is probably among the most radical: without understanding the issues around the existence of solutions to these DP equation approximations or their interpretation, we should search for alternative approximations of dynamic programming suitable for application in RL.

These concerns were raised at Sean Meyn’s boot camp lecture, where he called on listeners to revisit an alternate foundation of optimal control: the linear programming (LP) approach introduced by Manne (1960) and further developed by Denardo (1970) and d’Epenoux (1963). The message was greeted with enthusiasm from some attendees, including Gergely Neu, who responded, “You have blown my mind!” He had been working on his own formulation of this idea, which became logistic Q-learning (more on this below).

CONTINUE READING