Lattice Blog Reduction – Part I: BKZ

Posted on April 1, 2020 by Michael Walter

This is the first entry in a (planned) series of at least three, potentially four or five, posts about lattice block reduction. The purpose of this series is to give a high level introduction to the most popular algorithms and their analysis, with pointers to the literature for more details. The idea is to start with the obvious – the classic BKZ algorithm. In the next two posts we will look at two lesser known algorithm, which allow to highlight useful tools in lattice reduction. These three posts will focus on provable results. I have not decided how to proceed from there, but I could see the series being extended to topics involving heuristic analyses, practical considerations, and/or a survey of more exotic algorithms that have been considered in the literature.

Target Audience

I will assume that readers of this series are already familiar with basic concepts of lattices, e.g. bases, determinants, successive minima, Minkowski’s bound, Gram-Schmidt orthogonalization, dual lattices and dual bases, etc. If any of these concepts seem new to you, there are great resources to familiarize yourself with them first (see e.g. lecture notes by Daniele, Oded, Daniel/Léo). It will probably help if you are familiar with the LLL algorithm (also covered in aforementioned notes), but I’ll try to phrase everything so it is understandable even if if you aren’t.

Ok, so let’s get started. Before we look at BKZ in particular, first some comments about lattice block reduction in general.

The Basics

The Goal

Why would anyone use block reduction? There are (at least) two reasons.

1) Block reduction allows you to find short vectors in a lattice. Recall that finding the shortest vector in a lattice (i.e. solving SVP) is really hard (as far as we know, this takes at least $2^{\Omega(n)}$ time or even $n^{\Omega(n)}$ if you are not willing to also spend exponential amounts of memory). On the other hand, finding somewhat short vectors that are longer than the shortest vector by “only” an exponential factor is really easy (see LLL). So what do you do if you need something that is shorter than what LLL gives you, but you don’t have enough time to actually find the shortest vector? (This situation arises practically every time you use lattice reduction for cryptanalysis.) You can try to find something in between and hope that it doesn’t take as long. This is where lattice reduction comes in: it gives you a smooth trade-off between the two settings. It is worth mentioning that when it comes to approximation algorithms, block reduction is essentially the only game in town, i.e. there are, as far as I know, no non-trivial approximation algorithms that cannot be viewed as block reduction. (In fact, this is related to an open problem that Noah stated during the program: to come up with a non-trivial approximation algorithm that does not rely on a subroutine to find the shortest lattice vector in smaller dimensions.) The only exception to this are quantum algorithms that are able to find subexponential approximations in polynomial time in lattices with certain (cryptographically highly relevant) structure (see [CDPR16] and follow up work).

2) Block reduction actually gives you more than just short vectors. It gives you guarantees on the “quality” of the basis. What do we mean by the quality of the basis? Consider the Gram-Schmidt vectors ${\mathbf{b}}_i^*$ (GSO vectors) associated to a lattice basis ${\mathbf{B}}$. What we want is that the length of these Gram-Schmidt vectors (the GSO norms) does not drop off too quickly. The reason why this is a useful measure of quality for lattice bases is that it gives a sense of how orthogonal the basis vectors are: conditioned on being bases of the same lattice, the less accentuated the drop off in the GSO vectors, the more orthogonal the basis, and the more useful this basis is to solve several problems in a lattice. In fact, recall that the product of the GSO norms is equal to the determinant of the lattice and thus remains constant. Accordingly, if the GSO norms do not drop off too quickly, the first vector can be shown to be relatively short. So by analyzing the quality of the basis that block reduction achieves, a guarantee on the length of the first vector comes for free (see goal 1)). If you are familiar with the analysis of LLL, this should not come as a surprise to you.

Tools

In order to ensure that the GSO norms do not drop off to quickly, it seems useful to be able to reduce them locally. To this end, we will work with projected lattice blocks (this is where the term “block” in block reduction comes from). More formally, given a basis ${\mathbf{B}}$ we will consider the block ${\mathbf{B}}_{[i,j]}$ for $i < j$ as the basis formed by the basis vectors ${\mathbf{b}}_i, {\mathbf{b}}_{i+1}, \dots, {\mathbf{b}}_{j}$ projected orthogonally to the first $i-1$ basis vectors. So ${\mathbf{B}}_{[i,j]}$ is a basis for the lattice given by the sublattice formed by ${\mathbf{b}}_1, {\mathbf{b}}_{2}, \dots, {\mathbf{b}}_{j}$ projected onto the orthogonal subspace of the vectors ${\mathbf{b}}_1, {\mathbf{b}}_{2}, \dots, {\mathbf{b}}_{i-1}$. Notice that the first vector of ${\mathbf{B}}_{[i,j]}$ is exactly ${\mathbf{b}}^*_i$ – the $i$-th GSO vector. Another way to view this is to consider the QR-factorization of ${\mathbf{B}} = {\mathbf{Q}} {\mathbf{R}}$, where ${\mathbf{B}}$ is the matrix whose columns are the basis vectors ${\mathbf{b}}_i$. Since ${\mathbf{Q}}$ is orthonormal, it represents a rotation of the lattice and we can consider the lattice generated by the columns of ${\mathbf{R}}$ instead, which is an upper triangular matrix. For an upper triangular basis, the projection of a basis vector orthogonal to the previous basis vectors simply results in dropping the first entries from the vector. So considering a projected block ${\mathbf{R}}_{i,j}$ is simply to consider the square submatrix of ${\mathbf{R}}$ consisting of the rows and columns with index $k$ between $i \leq k \leq j$.

Now we need a tool that allows us to control these GSO vectors, which we view as the first basis vectors in projected sublattices. For this, we will fall back to algorithms that solve SVP. Recall that this is very expensive, so we will not call this on the basis ${\mathbf{B}}$ but rather on the projected blocks ${\mathbf{B}}_{[i,j]}$, where we ensure that the dimension $k = j-i+1$ of the lattice generated by this projected block is not too large. In fact, the maximum dimension $k$ that we call the SVP algorithm on will control the time/quality trade-off achieved by our block reduction algorithms and is usually denoted by the block size. So we will assume that we have access to such an SVP algorithm. Actually, we will assume something slightly stronger: we will assume access to a subroutine that takes as input the basis ${\mathbf{B}}$ and indices $i,j$ and outputs a basis ${\mathbf{C}}$ such that

the lattice generated by the basis remains the same
the first $i-1$ and the last vectors starting from $j+1$ remain unchanged
the projected block ${\mathbf{C}}_{[i,j]}$ is SVP reduced, meaning that ${\mathbf{c}}^*_i$ is the shortest vector in the lattice generated by ${\mathbf{C}}_{[i,j]}$. Additionally, if ${\mathbf{B}}_{[i,j]}$ is already SVP reduced, we assume that the basis ${\mathbf{B}}$ is left unchanged.

We will call an algorithm that achieves this an SVP oracle. Such an oracle can be implemented given any algorithm that solves SVP (for arbitrary lattices). The technical detail of filling in the gap is left as homework to the reader.

Effect of a call to the SVP oracle. GSO log norms of the input in black, of the output in red. Note that the sum of the GSO log norms is a constant, so reducing the first vector, increases the (average of the) remaining vectors.

For the analysis we need to know what such an SVP oracle buys us. This is where Minkowski’s theorem comes in: we know that for any $n$-dimensional lattice $\Lambda$ we have $\lambda_1(\Lambda) \leq \sqrt{\gamma_n} \det(\Lambda)^{1/n}$ (where $\lambda_1(\Lambda)$ is the length of the shortest vector in $\Lambda$ and $\gamma_n = \Theta(n)$ is Hermite’s constant). This tells us that after we’ve applied the SVP oracle to a projected block ${\mathbf{B}}_{[i,i+k-1]}$, we have \[\|{\mathbf{b}}^*_i \| \leq \sqrt{\gamma_{k}} \left(\prod_{j = i}^{i+k-1} \|{\mathbf{b}}_j^* \| \right)^{1/k}.\] Almost all of the analyses of block reduction algorithms, at least in terms of their output quality, rely on this single inequality.

Disclaimer

Before we finally get to talk about BKZ, I want to remark that throughout this series I will punt on a technical (but very important) topic: the number of arithmetic operations (outside of the oracle calls) and the size of the numbers. The number of arithmetic operations is usually not a problem, since it will be dominated by the calls to the SVP oracle. We will only compute projections of sublattices corresponding to projected blocks as described above to pass them to the oracle, which can be done efficiently using the Gram-Schmidt orthogonalization. The size of the numbers is a more delicate issue. We need to ensure that the required precision for these projections does not explode somehow. This is usually addressed by interleaving the calls to the SVP oracle with calls to LLL. If you are familiar with the LLL algorithm, it should be intuitive that this allows to control the size of the number. For a clean example of how this can be handled, we refer to e.g. [GN08a]. So, in summary, we will measure the running time of our algorithms thoughout simply in the number of calls to the SVP oracle.

BKZ

Schnorr [S87] introduced the concept of BKZ reduction in the 80’s as a generalization of LLL. The first version of the BKZ algorithm as we consider it today was proposed by Schnorr and Euchner [SE94] a few years later. With our setup above, the algorithm can be described in a very simple way. Let ${\mathbf{B}}$ be a lattice basis of an $n$-dimensional lattice and $k$ be the block size. Recall that this is a parameter that will determine the time/quality trade-off as we shall see in the analysis. We start by calling the SVP oracle on the first block ${\mathbf{B}}_{[1,k]}$ of size $k$. Once this block is SVP reduced, we shift our attention to the next block ${\mathbf{B}}_{[2,k+1]}$ and call the oracle on that. Notice that SVP reduction of ${\mathbf{B}}_{[2,k+1]}$ may change the lattice generated by ${\mathbf{B}}_{[1,k]}$ and ${\mathbf{b}}_1$ may not be the shortest vector in the first block anymore, i.e. it can potentially be reduced even further. However, instead of going back and fixing that, we will simply leave this as a problem to “future us”. For now, we continue in this fashion until we reach the end of the basis, i.e. until we called the oracle on ${\mathbf{B}}_{n-k,n}$. Note that so far this can be viewed as considering a constant sized window moving from the start of the basis to the end and reducing the first vector of the projected block in this window as much as possible using the oracle. Once we have reached the end of the basis, we start reducing the window size, i.e. we call the oracle on ${\mathbf{B}}_{n-k+1,n}$, then on ${\mathbf{B}}_{n-k+2,n}$, etc. This whole process is called a BKZ tour.

Now that we have finished a tour, it is time to go back and fix the blocks that are not SVP reduced anymore. We do this simply by running another tour. Again, if the second tour modified the basis, there is no guarantee that all the blocks are SVP redcued. So we simply repeat, and repeat, and … you get the idea. We run as many tours as required until the basis does not change anymore. That’s it. If this looks familiar to you, that’s not a coincidence: if we plug in $k=2$ as our block size, we obtain (a version of) LLL! So BKZ is a proper generalization of LLL.

BKZ in one picture: apply the SVP oracle to the projected blocks from start to finish and when you reach the end, repeat.

The obvious questions now are: what can we expect from the output? And how long does it take?

The Good

We will now take a closer look at the approximation factor achieved by BKZ. If you want to follow this analysis along, you might want to get out pen and paper. Otherwise, feel free to trust me on the calculations (I wouldn’t!) and/or jump ahead to the end of this section for the result (no spoilers!). Let’s assume for now that the BKZ algorithm terminates. If it does, we know that the projected block ${\mathbf{B}}_{[i, i+k-1]}$ is SVP reduced for every $i \in [1,\dots,n-k+1]$. This means that we have \[\|{\mathbf{b}}^*_i \|^k \leq \gamma_{k}^{k/2} \prod_{j = i}^{i+k-1} \|{\mathbf{b}}_j^* \|\] for all these $n-k+1$ values of $i$. Multiplying all of these inequalities and canceling terms gives the inequality \[\|{\mathbf{b}}^*_1 \|^{k-1}\|{\mathbf{b}}^*_2 \|^{k-2} \dots \|{\mathbf{b}}^*_{k-1} \| \leq \gamma_{k}^{\frac{(n-k+1)k}{2}} \|{\mathbf{b}}_{n-k+2}^* \|^{k-1} \|{\mathbf{b}}_{n-k+3}^* \|^{k-2} \dots \|{\mathbf{b}}_{n}^* \|.\] Now we make two more observations: 1) not only is ${\mathbf{B}}_{[1, k]}$ SVP reduced, but so is ${\mathbf{B}}_{[1, i]}$ for every $i < k$. (Why? Think about it for 2 seconds!) This means we can multiply the inequalities \[\|{\mathbf{b}}^*_1 \|^i \leq \gamma_{i}^{i/2} \prod_{j = 1}^{i} \|{\mathbf{b}}_j^* \|\] for all $i \in [2,k-1]$ together with the trivial inequality $\|{\mathbf{b}}^*_1 \| \leq \|{\mathbf{b}}^*_1 \|$, which gives \[\|{\mathbf{b}}^*_1 \|^{\frac{k(k-1)}{2}} \leq \left(\prod_{i = 2}^{k-1} \gamma_{i}^{i/2} \right) \prod_{i = 1}^{k-1} \|{\mathbf{b}}_i^* \|^{k-1}\] Now we use the fact that $\gamma_k^k \geq \gamma_i^i$ for all $i \leq k$ (Why? Homework!) and combine with our long inequality above to get \[\|{\mathbf{b}}^*_1 \|^{\frac{k(k-1)}{2}} \leq \gamma_k^{\frac{k(n-1)}{2}} \|{\mathbf{b}}_{n-k+2}^* \|^{k-1} \|{\mathbf{b}}_{n-k+3}^* \|^{k-2} \dots \|{\mathbf{b}}_{n}^* \|.\] (I’m aware that this is a lengthy calculation for a blog post, but we’re almost there, so bear with me. It’s worth it!)

We now use one final observation, which is a pretty common trick in lattice algorithms: w.l.o.g. assume that for some shortest vector ${\mathbf{v}}$ in our lattice its projection orthogonal to the first $n-1$ basis vectors is non-zero (if it is zero for all of the shortest vectors, simply drop the last vector from the basis, the result is still BKZ reduced, so use induction). Then we must have that $\lambda_1 = \| {\mathbf{v}} \| \geq \|{\mathbf{b}}_i^* \|$ for all $i \in [n-k+2, \dots, n]$, since otherwise the projected block ${\mathbf{B}}_{i,n}$ would not be SVP reduced. This means, we have $\lambda_1 \geq \max_{i \in [n-k+2, \dots, n]} \|{\mathbf{b}}_i^* \|$. This is the final puzzle piece to get our approximation bound: \[\|{\mathbf{b}}^*_1 \| \leq \gamma_{k}^{\frac{n-1}{k-1}} \lambda_1.\] Note that this analysis (dating back to Schnorr [S94]) is reminiscent of the analysis of LLL and if we plug in $k=2$, we get exactly what we’d expect from LLL. Though we do note a gap in the other extreme: if we plug in $k=n$, we know that the approximation factor is $1$ (we are solving SVP in the entire lattice), but the bound above yields a factor $\gamma_n = \Theta(n)$.

The Bad

Now that we’ve looked at the output quality of the basis, let’s see what we can say about the running time (recall that our focus is on the number of calls to the SVP oracle). The short answer is: not much and that’s very unfortunate. Ideally, we’d want a bound on the number of SVP calls that is polynomial in $n$ and $k$. This would mean that the overall running time for large $k$ is dominated by the running time of the SVP oracle in dimension $k$ and the block size would give us exactly the expected trade-off. However, an LLL style analysis has so far only yielded a bound on the number of tours which is $O(k^n)$ [HPS11, Appendix]. This is quite bad – for large $k$ the number of calls will be the dominating factor in the running time.

The Ugly

Recall that the analysis of LLL does not only provide a bound on the approximation factor, but also on the Hermite factor, i.e. on the ratio of $\| {\mathbf{b}}_1\|/\det(\Lambda)^{1/n}$. Since an LLL-style analysis worked out nicely for the approximation factor of BKZ, it stands to reason that a similar analysis should yield a similar bound for BKZ. By extrapolating from LLL, one could expect a bound along the lines of $\| {\mathbf{b}}_1\|/\det(\Lambda)^{1/n} \leq \gamma_{k}^{n/2k}$ (note the square root improvement w.r.t. the trivial bound obtained from the approximation factor). And, in fact, a bound of $\gamma_{k}^{\frac{n-1}{2(k-1)} + 1}$ has been claimed in [GN08b] but without proof (as pointed out in [HPS11]) and it is not clear, how one would prove this. ([GN08b] claims that one can use a similar argument as we did for the approximation factor, but I don’t see it.)

The Rescue

So it seems different techniques are necessary to complete the analysis of BKZ. The work of [HPS11] introduced such a new technique based on the analysis of dynamical systems. This work applied the technique successfully to BKZ, but the analysis is quite involved. What it shows is that one can terminate BKZ after a polynomial number of tours and still get a guarantee on the output quality, which is very close to the conjectured bound on the Hermite factor above. (Caveat: Technically, [HPS11] only showed this result for a slight variant of BKZ, but the difference to the standard BKZ algorithm only lies in the scope of the interleaving LLL applications, which is something that we glossed over above.) This is in line with experimental studies [SE94,GN08b,MW16], which show that BKZ produces high quality bases after a few tours already.

We will revisit this approach when considering a different block reduction variant, SDBKZ, where the analysis is much cleaner. As a teaser for the next post though, recall that BKZ can be viewed as a generalization of LLL (which corresponds to BKZ with block size $k=2$). Since the analysis of LLL did not carry entirely to BKZ, one could wonder if there is a different generalization of LLL such that an LLL-style analysis also generalizes naturally. The answer to this is yes, and we will consider such an algorithm in the next post.

[CDPR16] Cramer, Ducas, Peikert, Regev. Recovering short generators of principal ideals in cyclotomic rings. EUROCRYPT 2016
[GN08a] Gama, Nguyen. Finding short lattice vectors within Mordell’s inequality. STOC 2008
[GN08b] Gama, Nguyen. Predicting lattice reduction. EUROCRYPT 2008
[HPS11] Hanrot, Pujol, Stehlé. Analyzing blockwise lattice algorithms using dynamical systems. CRYPTO 2011
[MW16] Micciancio, Walter. Practical, predictable lattice basis reduction. EUROCRYPT 2016
[SE94] Schnorr, Euchner. Lattice basis reduction: Improved practical algorithms and solving subset sum problems. Mathematical Programming 1994
[S87] Schnorr. A hierarchy of polynomial time lattice basis reduction algorithms. Theoretical Computer Science 1987
[S94] Schnorr. Block reduced lattice bases and successive minima. Combinatorics, Probability and Computing 1994

Workshop “Lattices: New Cryptographic Capabilities”

Posted on March 20, 2020 by Hoeteck Wee

On the behalf of the organizers, I am excited to announce that the next Simons workshop Lattices: New Cryptographic Capabilities will take place next week Mar 23-27, 2020 over Zoom!

schedule 8.20 am-noon PDT (4.20 – 8 pm, CET)
zoom berkeley.zoom.us/j/912850168

The workshop will cover advanced lattice-based cryptographic constructions, while also highlighting some of the recurring themes and techniques, reiterated through a game of Bingo! The rest of this post provides a sneak preview along with the Bingo puzzle.

Looking forward to seeing everyone at the workshop!

Hoeteck, together with Shweta, Zvika and Vinod

Zoom Guidelines/Tips

To ask a question, use the “raise hand” feature.
If the speaker’s slide is not displaying in its entirety, try “side-by-side mode” under “view options”.
Please log in to Zoom with your full name.

A Sneak Preview

Let A₁, A₂ be square matrices and t a row vector such that

tA₁ = x₁t, tA₂ = x₂t
Using high-school algebra lingo, we would refer to t as the eigenvector of A₁, A₂. It is easy to see that

t ⋅ (A₁ + A₂) = (x₁ + x₂)t, t ⋅ A₁A₂ = x₁x₂t
This extends readily to any polynomial p(x₁, …, x_n), namely: if tA_i = x_it, then

t ⋅ f(A₁, …, A_n) = f(x₁, …, x_n)t
As in turns out, much of advanced lattice-based crypto boils down to a generalization of this statement! The generalization is along two orthogonal dimensions:

arbitrary matrices A₁, …, A_n that may not share the same eigenvector t, and
a relaxation to “approximate” equality, namely tA_i ≈ x_it.

The generalization underlies fully homomorphic encryption, homomorphic signatures, attribute-based encryption schemes and many more!

Bingo!

Here’s the 4×4 bingo puzzle:

GGH15	Bonsai	AR + G	noise growth
G^− 1	LWE	Vinod	LHL
Gaussian	A_f	FHE Dec ≈ linear	noise flooding
homomorphic	trapdoor	smoothing parameter	H_f, x

Research Vignette: Foundations of Data Science

Posted on September 30, 2019 by Simons Institute Editor

by Ilias Diakonikolas (University of Southern California), Santosh Vempala (Georgia Institute of Technology), and David P. Woodruff (Carnegie Mellon University)

Algorithmic High-Dimensional Robust Statistics
Fitting a model to a collection of observations is one of the quintessential goals of statistics and machine learning. A major recent advance in theoretical machine learning is the development of efficient learning algorithms for various high-dimensional models, including Gaussian mixture models, independent component analysis, and topic models. The Achilles’ heel of these algorithms is the assumption that data is precisely generated from a model of the given type.

This assumption is crucial for the performance of these algorithms: even a very small fraction of outliers can completely compromise the algorithm’s behavior. For example, k adversarially placed points can completely alter the k-dimensional principal component analysis, a commonly used primitive for many algorithms. However, this assumption is only approximately valid, as real data sets are typically exposed to some source of contamination. Moreover, the data corruption is often systematic, and random models do not accurately capture the nature of the corruption. Hence, it is desirable that any estimator that is to be used in practice is stable in the presence of arbitrarily and adversarially corrupted data.

Indeed, the problem of designing outlier-robust estimators is natural enough that it is studied by a classical body of work, namely robust statistics, whose prototypical question is the design of estimators that perform well in the presence of corrupted data. This area of statistics was initiated by the pioneering works of Tukey and Huber in the 1960s and addresses a conceptual gap in Fisher’s theory of exact parametric inference — since parametric models are typically only approximately valid, robust statistics is essential to complete the theory. From a practical perspective, the question of how to make good inferences from data sets in which pieces of information are corrupted has become a pressing challenge. Specifically, the need for robust statistics is motivated by data poisoning attacks, in the context of adversarial machine learning, as well as by automatic outlier removal for high-dimensional data sets from a variety of applications.

Classical work in robust statistics pinned down the fundamental information-theoretic aspects of high-dimensional robust estimation, establishing the existence of computable and information-theoretically optimal robust estimators for fundamental problems. In contrast, until very recently, the computational complexity aspects of robust estimators were poorly understood. In particular, even for the basic problem of robustly estimating the mean of a high-dimensional data set, all known robust estimators were hard to compute (i.e., computationally intractable). In addition, the accuracy of the known efficient heuristics degrades quickly as the dimension increases. This state of affairs prompted the following natural question: can we reconcile robustness and computational efficiency in high-dimensional estimation?

Research Vignette: Lower Bounds in Computational Complexity

Posted on September 30, 2019 by Simons Institute Editor

by Rahul Santhanam (University of Oxford)

Computational complexity theory studies the possibilities and limitations of algorithms. Over the past several decades, we have learned a lot about the possibilities of algorithms. We live today in an algorithmic world, where the ability to reliably and quickly process vast amounts of data is crucial. Along with this social and technological transformation, there have been significant theoretical advances, including the discovery of efficient algorithms for fundamental problems such as linear programming and primality.

We know far less about the limitations of algorithms. From an empirical point of view, certain important problems seem inherently hard to solve. These include the satisfiability problem in logic, the Traveling Salesman Problem in graph theory, the integer linear programming problem in optimization, the equilibrium computation problem in game theory, and the protein folding problem in computational biology. What all these problems have in common is that it is easy to verify if a given solution is correct, but it seems hard to compute solutions. This phenomenon is encapsulated in the celebrated NP vs. P question, which asks if all problems with solutions verifiable in polynomial time (NP) can also be solved in polynomial time (P). The NP vs. P problem is important both mathematically, as evidenced by its inclusion in the list of Millennium Prize Problems by the Clay Mathematics Institute, and scientifically, since natural problems that arise in a variety of scientific contexts are in NP but not known to be in P. To make progress on NP vs. P and related questions, we need to show complexity lower bounds (i.e., prove that a given computational problem cannot be solved efficiently).

Complexity lower bounds are interesting for many reasons. First, from a pragmatic point of view, they map the boundaries of what is possible and, hence, save us from wasting our efforts on finding efficient solutions to problems beyond those boundaries. Second, lower bounds can be exploited algorithmically to design secure cryptographic protocols and efficiently convert randomized algorithms into deterministic ones. Thus, intriguingly, proving new limits on the power of computation also opens up new possibilities for computation! Third, and perhaps most importantly, lower bounds provide a deeper understanding of the nature of computation. An efficient solution to an algorithmic problem tells us something new about that specific problem, while a lower bound often tells us something new about the computational model and hence about algorithms in general.

The Fall 2018 Simons Institute program on Lower Bounds in Computational Complexity gathered together researchers working on different aspects of complexity lower bounds, in Boolean, algebraic, and interactive settings, with the goal of making progress on the major open problems concerning lower bounds. In some of these cases, such as the setting of interactive computation, good lower bounds are known for various problems. In other cases, such as the case of general Boolean circuits, very little is known. In the remainder of this article, I will briefly survey what is known about Boolean circuit lower bounds and describe a phenomenon called hardness magnification that provides some new insights. This line of work was developed partly during the course of the Lower Bounds program with my collaborators Igor Oliveira and Ján Pich.

Research Vignette: Real-Time Decision Making in Energy (RTDM-E)

Posted on April 18, 2019 by Simons Institute Editor

by Xinbo Geng (Texas A&M University), Swati Gupta (Massachusetts Institute of Technology), Tong Huang (Texas A&M University), and Le Xie (Texas A&M University)

The electric grid serves as one of the backbone infrastructure systems that support the well-being of billions of citizens in modern society. There has been a significant transformation of energy systems over the past decade, even though these changes have largely gone unnoticed due to affordable and reliable grid services for end-users in most developed nations. The transformation on the supply side mainly manifests itself through the decarbonization of the power supply, i.e. greater use of low-carbon energy sources. As an example, a recent National Bureau of Economic Research (NBER) study released in December 2018 suggested that US power-sector emissions have decreased by 45% since 2008. This dramatic change is fundamentally driven by two factors: (1) a sharp increase in the renewable-energy portfolio; and (2) the substantial replacement of coal-fired power plants with natural gas and other resources.

Due to the supply-side transformation, in order to maintain or improve the quality of electricity services to end-users, it is necessary to make use of real-time information that is becoming readily available thanks to the proliferation of information and communication technology (ICT) infrastructures.

Decision making in the electric grid enabled by real-time information occurs at multiple time scales, which makes the problem even more interesting. At the time scale of 15 minutes or longer, a central challenge lies in how to best allocate different resources that will meet the balance of the fluctuating demand at the lowest cost. The underlying mathematical problem is typically formulated as a nonlinear multi-stage optimization problem. Another big area of investment in real-time decision making (RTDM) is in the proliferation of globally synchronized sensors called synchrophasors. Compared with conventional SCADA systems, synchrophasors offer a two-orders-of-magnitude faster sampling rate, and synchronized time stamps for electrical variables. Therefore, they offer new opportunities to detect, locate, and classify anomalies that would have not been possible to capture by conventional means, as well as opportunities to avoid blackouts.

In what follows, we elaborate on the real-time decision-making challenges and opportunities at two different time scales and two different spatial scales. The first one deals with day-ahead decisions on phase balancing in the distribution grid. The second one studies the possibility of real-time localization of anomalies such as forced oscillation in large transmission systems.

A focal area of the Spring 2018 Simons Institute program on Real-Time Decision Making was formalizing and studying various issues in the modern energy grid pertaining to efficiency, reliability and security. Among the research themes, a good amount of effort was devoted to modernizing the optimization core of grid operations [4]. In addition, by tapping into expertise in the program in combinatorial algorithms, concrete collaboration occurred at the interface between combinatorial algorithms and grid optimization. For example, electricity is typically distributed radially to address security concerns, and this translates to selecting a spanning tree in a network with desirable properties. The project below on the distribution grid grew out of frequent conversations that the RTDM program facilitated for researchers with backgrounds in optimization and in power-systems engineering.

Research Vignette: Optimization Against Adversarial Uncertainty

Posted on June 28, 2018 by Simons Institute Editor

by James R. Lee, University of Washington

The rise of machine learning in recent decades has generated a renewed interest in online decision-making, where algorithms are equipped only with partial information about the world, and must take actions in the face of uncertainty about the future. There are various ways to analyze how much the presence of uncertainty degrades optimization. Two of the most general models (competitive analysis and multi-arm bandits) are “probability free” (aka “worst-case,” or “adversarial”), in the sense that one can measure the performance of algorithms without making distributional assumptions on the input.

Competitive analysis

One such model goes by the name of competitive analysis, and has been studied in theoretical CS since the 1970s. Imagine, for instance, that one maintains a binary search tree (BST) with key-value pairs at the nodes. At every point in time, a key is requested, and the corresponding value is looked up via binary search. Along the search path, the algorithm is allowed to modify the tree using standard tree rotation operations.

The cost of such an algorithm is the average number of binary search steps per key-lookup over a long sequence of requests. To minimize the cost, it is beneficial for the algorithm to dynamically rebalance the tree so that frequently-requested keys are near the root. Sleator and Tarjan had this model in mind when they invented the Splay Tree data structure. Their (still unproven) Dynamic Optimality Conjecture asserts that, for every sequence of key requests, the number of operations used by the Splay Tree is within a constant factor of the number of operations used by any binary search tree, even an offline algorithm that is allowed to see the entire sequence of key requests in advance. (This is a very strong requirement! We make no assumption on the structure of the request sequence, and have to compare our performance to an algorithm that sees the entire request sequence up front.)

In the language of competitive analysis, the conjecture states that the Splay Tree algorithm is “competitive.” In general, an $\alpha$-competitive online algorithm is one whose total cost on every input sequence is within an $\alpha$ factor of the cost incurred by the optimal offline algorithm that sees the whole input sequence in advance.

Multi-arm bandits

Another compelling model arising in online learning is the multi-armed bandit problem. Here, an agent has a set $\mathcal{A}$ of feasible actions. At time $t=1,2,3,\ldots$, the agent plays an action $a_t \in \mathcal{A}$, and only then the cost of that action is revealed. Imagine, for instance, choosing an advertisement $a_t \in \mathcal{A}$ to present to a user, and then learning afterward the probability it led to a sale. The goal of an algorithm in this model is to achieve a total cost (often called the loss) that is not too much worse than the best fixed action in hindsight. The gap between the two is called the regret.

Research Vignette: Ramsey Graphs and the Error of Explicit 2-Source Extractors

Posted on December 22, 2017 by Simons Institute Editor

by Amnon Ta-Shma, Tel Aviv University

About a century ago, Ramsey proved that in any graph of size N, there exists a monochromatic subset of size ½ log(N), i.e., it is either a clique or an independent set.¹ In 1947, Erdős proved there exist graphs with no monochromatic set of size 2log(N).² Erdős’ proof is one of the first applications of the probabilistic method. It is a simple, straightforward counting argument, and like many other counting arguments, it shows almost any graph is good without giving any clue as to any specific such graph. Erdős offered $100 for finding an explicit K-Ramsey graph (a graph where all sets of size K are not monochromatic) for K=O(log N). The best explicit construction until a few years ago was log(K)= for some constant ∝<1.³

All of this dramatically changed last year. Cohen,⁴ and independently Chattopadhyay and Zuckerman,⁵ constructed K-Ramsey graphs with log K=poly loglog(N). Remarkably, Chattopadhyay and Zuckerman explicitly construct the stronger object of a two-source extractor. A function is a (K,ϵ) 2-source extractor if for every two cardinality K subsets , the distribution E(A,B) (obtained by picking uniformly and outputting E(a,b)) is ε close to uniform in the variational distance. Roughly speaking, the Ramsey graph problem is to find a (K, ε) 2-source extractor with any error ε= ε(K) smaller than 1, even allowing error that is exponentially close to 1. In contrast, The CZ construction gives a 2-source extractor with constant error. A 2-source extractor is a basic object of fundamental importance, and the previous best 2-source construction was Bourgain’s extractor, requiring log(K) = for some small constant α>0. The CZ construction is an exponential improvement over that, requiring only log(K)=polyloglog(N).

How does the CZ construction work? Roughly speaking, the first step in the CZ construction is to encode a sample from one source (say ) with a t non-malleable extractor. I do not want to formally define non-malleability here, but this essentially means that the bits of the encoded string are “almost” t-wise independent, in the sense that except for few “bad” bits, when we look at t bits, they are close to being uniform. The next step in the CZ construction is to use the sample from the second source () to sample a substring of the encoding of a. The sampling is done using an extractor and uses the known relationship between extractors and samplers.⁶ Finally, a deterministic function, conceptually similar to the Majority function, is applied on the bits of the substring.

The CZ construction achieves log K=polyloglog(N), where the non-explicit argument of Erdős shows log K=loglog(N)+1 is sufficient. The first bottleneck in the CZ construction, pointed out by Cohen and Schulman,⁷ is the use of extractors as samplers in the construction. In a recent work, Ben-Aroya, Doron and I showed how to solve this problem using samplers with multiplicative error.⁸ Furthermore, Dodis et al. showed such samplers are related to low entropy-gap condensers, and Yevgeniy gave a series of talks on such condensers and their applications, mainly in cryptography, during the Spring 2017 Simons Institute program on Pseudorandomness.⁹ With this, the currently best explicit construction has log K=loglog(N) polylogloglog(N). The extra polylogloglog(N) factor is because current explicit t-non-malleable constructions, even for a constant t, have a suboptimal dependence on ε.

Yet, in my opinion, the most pressing bottleneck is of a completely different nature. The CZ result efficiently constructs a two-source (K,ε) extractor for small values of K, but a large error ε. Specifically, the algorithm computing E(a,b) has running time poly(1/ε), and if we take explicitness to mean running time polynomial in the input length log(N), we can only hope for error which is 1/polylog(N). In contrast, a straightforward probabilistic method argument shows we can hope for an ε which is polynomially small in K. The currently best low-error constructions are Bourgain’s 2-source extractor requiring log and Raz’ extractor which allows one source to be (almost) arbitrarily weak, but requires the other source to have min-entropy rate above half (the entropy rate is the min-entropy divided by the length of the string). At the Simons Institute program on Pseudorandomness, we were wondering whether the CZ approach that allows both sources to be weak can be somehow modified to give a low-error construction?

Research Vignette: Promise and Limitations of Generative Adversarial Nets (GANs)

Posted on December 22, 2017 by Simons Institute Editor

by Sanjeev Arora, Princeton University and Institute for Advanced Study

If we are asked to close our eyes and describe an imaginary beach scene, we can usually do so in great detail. Can a machine learn to do something analogous, namely, generate realistic and novel images different from those it has seen before? One expects that some day, machines will be creative and be able to generate even new essays, songs, etc., but for this article, let’s discuss only images. In machine learning, this goal of generating novel images has been formalized as follows. We hypothesize that realistic images are drawn from a probability distribution on the (vast) space of all possible images. Humans appear to be able to imagine novel samples from this vast distribution after having seen a reasonably small number of examples that arose in their personal experience. We would like machines to do the same: sample from the distribution of all realistic images. While this framing of the problem appears anticlimactic – reducing creativity/imagination to the more prosaic act of learning to generate samples from a vast distribution – it is nevertheless a powerful framework that already presents difficult computational hurdles. Many past approaches for solving this distribution learning problem used explicit statistical models, and tended to end in failure (or at best, tepid success) because real life data is just too complicated to capture using simple models.

This article is about Generative Adversarial Nets (GANs), a proposal by Goodfellow et al. in 2014¹ to solve this task by harnessing the power of large-scale deep learning (sometimes also called neural net training). Specifically, in the last 5-6 years, deep learning has become very successful at teaching machines to recognize familiar objects in images, in the sense that they can give labels to scenes such as street, tree, person, bicyclist, parked cars with precision approaching or exceeding that of humans. Perhaps this ability can lead naturally to a solution of the sampling problem?

Research Vignette: Setting Posted Prices Under Uncertainty

Posted on July 11, 2017 by Simons Institute Editor

by Amos Fiat, Tel Aviv University

Selfish Behaviour and Uncertainty
The author had the undeserved great fortune to be invited to two semesters at the Simons Institute for the Theory of Computing. The author had a truly wondrous time and is immensely grateful for the this fantastic opportunity. The opportunity to hear and interact with the amazing people attending was priceless. Thus, when asked to write a short Research Vignette,¹ the author felt that it would be churlish to object strenuously. One can but hope that this is not yet another example of an error in judgment.²

Aspects of uncertainty have been studied in many disciplines, including philosophy, psychology, physics, the life sciences, economics and computer science, among others. In this Vignette, we address some aspects and models of uncertainty in statistics, computer science, and economics.

Temporal uncertainty occurs where the future or certain aspects of the future are unknown. Decisions made today (e.g., agreeing to write this article) may have unforseen consequences tomorrow (when the disastrous NYT critique is published). Optimal stopping theory,³prophet inequality settings,⁴secretary settings,⁵ and competitive analysis of online algorithms⁶all deal with aspects of and models for temporal uncertainty. This vast body of work includes both Bayesian and worst-case models.

Another element of uncertainty arises when selfish agents interact. Clearly, it is useful to know how much a customer is willing to pay before trying to sell her something. Given that the techniques used by Torquemada are often condemned,⁷one needs to be more subtle when considering the impact of private information.

One approach is to consider rational behavior and study resulting equilibria.⁸Pricing equilibria (competitive equilibria) have been studied in many settings.⁹ Social choice theory¹⁰ and mechanism design¹¹ seek to engineer algorithms so as to achieve certain desired goals. A wide variety of equilibrium notions have been defined, including dominant strategy, Bayes-Nash, and many others.

The computer science outlook is to quantify things – in this case, how good an outcome arises in equilibria when compared to optimal outcomes,¹² and what mechanisms can be implemented in poly time.¹³

The setting of interest
Consider a model of online decision-making mechanisms in which selfish agents arrive sequentially, and the mechanism decides upon an outcome and payment for each arriving agent, where payments are used to align the incentives of the agent with that of the mechanism. There is temporal uncertainty because the preferences of the agents (their types) are unknown. The agent may have an opportunity to specify her type, but the decisions made by the mechanism might not be in the best interest of the agent. This may result in the agent strategically misrepresenting her preferences so as to achieve a better outcome for herself. A mechanism is truthful if it is always in the best interest of the agent to report her type truthfully.

For some time I have been obsessed with a specific class of truthful online mechanisms that take the form of dynamic posted prices.¹⁴ The assumption here is that the future is entirely unknown, and can be determined adversarially with no guarantees on future behavior. Dynamic pricing schemes are truthful online mechanisms that post prices for every possible outcome, before the next agent arrives. Then, the agent chooses the preferred outcome — minimizing the cost for the outcome plus the price tag associated with the outcome.

Online problems are either minimization problems (e.g., minimize the sum of costs) or maximization problems (e.g., maximize the number of agents served). Although optimal solutions to maximization/minimization objectives can be cast as the other, the competitive ratio is quite different in the two settings. Online algorithms have been devised with both maximization and minimization objectives. The technique of “classify and randomly select” often gives simple randomized algorithms for maximization objectives which also naturally translate into truthful mechanisms. In contrast, minimization objectives (e.g., k-server and makespan) require entirely different techniques. Converting online algorithms into mechanisms without performance degradation opens up an entire new class of problems for which incentive compatible mechanism design is applicable.

We note there are many other problems where posted prices play a major role, within widely differing models, information states, and various stochastic assumptions.¹⁵

A dynamic pricing scheme is inherently truthful, since prices are determined irrespective of the type of the next agent. Posted price mechanisms have many additional advantages over arbitrary truthful online mechanisms. In particular, such mechanisms are simple: agents need not trust or understand the logic underlying the truthful mechanism, agents are not required to reveal their type, and there is no need to verify that the agents follow the decision made by the truthful online mechanism.

A posted price mechanism is a truthful online algorithm, and as such, can perform no better than the best online algorithm. The main goal of this approach is to study the performance of dynamic posted price mechanisms (quantified by the competitive ratio measure) and compare them with the performance of the best online algorithm. One may think of this problem as analogous to one of the central questions in algorithmic mechanism design in offline settings: compare the performance of the best truthful mechanism (quantified by the approximation ratio measure) with the performance of the best non-truthful algorithm.

In a paper at EC 2017, Feldman, Roytman and I presented constant competitive dynamic pricing schemes for makespan minimization in job scheduling. Events represent jobs; the job type contains the job’s processing times on various machines. Agents seek to complete their job as soon as possible, and therefore prefer to be assigned to a machine whose load (including the new job) is minimized.¹⁶ One can consider so-called related machines (where machines have speeds, and jobs have some quantity of work to be done) and unrelated machines (where the time associated with a job arbitrarily depends on the machine). Previous online algorithms for the problem¹⁷ are not truthful — in that a job may misrepresent its size so as to get a preferential assignment to a machine.

An online truthful mechanism for this setting determines an allocation and payment for each arriving agent upon arrival. That is, upon the arrival of a job, based on the job’s processing times, the mechanism assigns the job to some machine and determines the payment the agent should make. The cost of an agent is the sum of the machine’s load (including her own processing time) and the payment. Each agent seeks to minimize her cost.

A dynamic posted price mechanism for this setting sets prices on each machine, before the next agent arrives (prices may change over time). The next agent to arrive seeks to minimize her cost, i.e., the load on the chosen machine (including her own load) plus the posted price on the machine. The agent breaks ties arbitrarily.

An example of dynamic pricing: Makespan minimization, related machines
We now motivate how dynamic pricing is useful, via a small example. We do so by comparing the schedule produced without any pricing to the schedule produced via a dynamic pricing scheme. Schedules obtained without pricing are equivalent to schedules produced by the greedy algorithm that assigns each job j to a machine i that minimizes ℓ_i(j − 1) + (the load on machine i prior to the arrival of job j, plus the additional time required for job j itself).

Figure 1: Related machines: the optimal makespan, the greedy algorithm, and dynamic pricing. The algorithm for setting dynamic prices is missing.

In our example (given as Figure 1) there are m = 3 machines, with speeds s₁ = , s₂ = (1 + ϵ), and s₃ = 1 + 2ϵ; and n = 3 jobs, with sizes p₁ = (1 + ϵ), p₂ = , and p₃ = 1 + 2ϵ. The left, middle, and right columns show the optimal assignment, the greedy assignment, and the assignment obtained by our dynamic pricing scheme, respectively. In the middle and right columns, the arrival order is from bottom to top.

Optimal makespan: The optimal makespan is L^∗ = 1, achieved by assigning job 1 to machine 2, job 2 to machine 1, and job 3 to machine 3.

Greedy: The greedy algorithm assigns job 1 to machine 3, since the machine i that minimizes is the fastest machine (initially, all loads are 0). Job 2 is also assigned to machine 3, since ℓ₃(1) + = < = < (for sufficiently small ϵ > 0). Lastly, job 3 is also assigned to machine 3, since ℓ₃(2) + = < = < . Hence, the greedy algorithm assigns all jobs to machine 3, resulting in a makespan of ≈ 2.

Pricing scheme: Our dynamic pricing scheme sets prices before the arrival of each job, which are independent of the type of the incoming job. We omit the explanation as to how to set these prices.¹⁸The important aspect is that these prices are set before knowing what the next job will be. The performance guarantee must hold irrespective of the future. Let c_ij be the cost of job j on machine i; this is the sum of the completion time and the additional posted price currently on the machine.

Job 1 chooses machine 1, since c₁₁ = + π₁₁ = 1 + ϵ < 1 + π₂₁ = + π₂₁ = c₂₁ < + 1 + π₂₁ ≈ + π₃₁ = c₃₁.

Prior to the arrival of job 2, the dynamic pricing scheme sets new prices (see Figure 1), and job 2 chooses machine 2, since c₂₂ = + π₂₂ < + 1 + ϵ + = ℓ₁(1) + = c₁₂ < + 2 ≈ + π₃₂ = c₃₂.

Finally, prior to the arrival of job 3, the dynamic pricing scheme sets yet new prices and job 3 chooses machine 3, since c₃₃ = + π₃₃ ≈, while c₁₃ = ℓ₁(2) + = 1 + ϵ + 2p₃ ≈ 3 and c₂₃ = ℓ₂(2) + + π₂₃ = + + π₂₃ ≈ 3.

Since machine 1 has the highest load, the schedule produced by our dynamic pricing scheme achieves a makespan of ℓ₁(3) = 1 + ϵ. This example can be extended to show that greedy can be as bad as Ω(log m)-competitive, while in contrast our dynamic pricing scheme is O(1)-competitive.

During a talk in the Fall 2017 Simons Institute program on Algorithms and Uncertainty, I proposed that one study dynamic pricing schemes for flow time; and indeed, one definite outcome of this semester is a paper by Im, Moseley, Pruhs and Stein that comes up with a constant competitive dynamic pricing scheme to minimize the maximum flow time (the maximum time an agent is in the system). This improves on the previous online algorithm for the problem in terms of the competitive ratio, and has the added advantage of being truthful and simple.

Online algorithms, online mechanisms, and dynamic pricing
One critical question is the following: where do the boundaries lie between online algorithms, online mechanisms (not dynamic pricing) and dynamic pricing? In our EC paper, we show that there exist problems – makespan minimization on unrelated machines, where the pair (job, machine) determines the processing time – where good online mechanisms exist but there are no good dyanmic pricing schemes.

The connection between online mechanisms and dynamic pricing schemes is not entirely trivial.¹⁹ Dynamic pricing schemes are obviously truthful mechanisms, but converting an arbitrary online mechanism into a dynamic pricing scheme may not be possible.

We can show that online mechanisms with certain performance guarantees that also have properties a to c below can be converted into dynamic pricing schemes with the same guarantee: a) the online mechanism must be prompt (where the payments are determined immediately); b) ties in agent utilities do not arise, or can be resolved arbitrarily without harming performance guarantees — this is not an issue in the mechanism-design setting because the mechanism may decide how to break ties, but it is an issue with pricing schemes; and c) the mechanism does not require “voluntary revelation” of agent types – in pricing schemes, the only information available to the dynamic pricing scheme is what the previous agents actually chose; pricing schemes are not direct revelation schemes.

Afterword
It has been a pleasure discussing these and other problems with researchers visiting the Simons Institute, and at my home site, Tel Aviv University. I am grateful for discussions with and gracious instruction from Nikhil Bansal, Avrim Blum, Alon Eden, Michal Feldman, Anupam Gupta, Ilia Gorlick, Haim Kaplan, Anna Karlin, Dick Karp, Thomas Kesselheim, Bobby Kleinberg, Elias Koutsoupias, Stefano Leonardi, Christos Papadimitriou, Kirk Pruhs, Tim Roughgarden, and Matt Weinberg. In various combinations, we have tried to attack many other online problems via dynamic pricing. Many of these are wide open; some (minor) progress has been made on others. Many an idea arose in the numerous talks and discussions at the Simons Institute.

Notes

¹Merriam-Webster online dictionary: 2a) a short descriptive literary sketch; 2b) a brief incident or scene (as in a play or movie).

²According to Aristole (Poetics, 4rd century BCE), an ideal tragedy is caused by error in judgment by the protagonist, and not by unavoidable error.

³Wald (1945), Arrow, Blackwell and Girshick (1948); Snell (1952).

⁴Cayley (1875); Krengel, Sucheston and Garling (1977).

⁵Kepler (1613); Martin Gardner (1960); Gibert and Mosteler (1966).

⁶Sleator and Tarjan (1985).

⁷Recent political developments in North America notwithstanding.

⁸von Neumann (1928).

⁹Cournot (1838); Walras (1874); Arrow and Debreu (1954).

¹⁰Condorcet (1785); Arrow (1951); Gibbard and Satterthwaite (1973,1975).

¹¹Hurwicz, Maskin and Myerson (2007).

¹²Koutsoupias and Papadimitriou (1999).

¹³Nisan and Ronen (1999).

¹⁴Fiat, Mansour and Nadav (2008) — Packet routing; Cohen-Addad, Eden, Fiat and Jeż (2015) — Task systems, k-server, and metrical matching; Feldman, Fiat and Roytman (2017) — Makespan minimization; Im, Moseley, Pruhs and Stein (2017), the latter two discussed herein.

¹⁵E.g., Myerson’s virtual values, prophet inequalities, secretary problems, combinatorial markets – Feldman, Gavin and Lucier (2013, 2015 and 2016); Envy-free pricings – Guruswami, Hartline, Karlin, Kempe, Kenyon and McSherry (2005); and many others.

¹⁶In this interpretation, “load” is the time required by the server to deal with all current jobs in the server queue, and jobs are processed in a first-in, first-out manner, i.e., jobs enter a server queue.

¹⁷Awerbuch, Azar, Fiat, Plotkin and Waarts (1993).

¹⁸EC17 and on the archive.

¹⁹Many thanks to Moshe Babaioff, Liad Blumrosen, Yannai A. Gonczarowski and Noam Nisan for discussions helpful in clarifying this point.

Research Vignette: Entangled Solutions to Logic Puzzles

Posted on July 11, 2017 by Simons Institute Editor

by Albert Atserias, Universitat Politècnica de Catalunya

The Fall 2016 Simons Institute program on Logical Structures and Computation focused on four themes: a) finite and algorithmic model theory; b) logic and probability; c) logic and quantum mechanics; and d) logic and databases. In this Research Vignette, I want to highlight one of the outcomes of this fruitful program in the form of an emerging direction of research that, quite surprisingly, touches on all four themes at once.

Two-player refereed games
A game is played by Alice and Bob, who confront a verifier. After an initial gathering to agree on a common strategy, Alice and Bob will not be allowed to communicate during the game. The game starts with the verifier privately choosing a pair of queries (p,q) ∈ P × Q according to some probability distribution π known to everyone; the verifier sends p to Alice and q to Bob. Alice and Bob reply with a pair of answers (r,s) ∈ R × S according to their previously agreed strategy; Alice replies r and Bob replies s. Upon receipt of the answers, the verifier accepts or rejects depending on whether a predicate V (p,q,r,s) also known to everyone holds or not. Alice’s and Bob’s goal is to maximize the probability of winning, i.e., making the verifier accept. Let this maximum be called the value of the game.

How could we model the value of this game? One perfectly natural approach would be to model it as a constraint-optimization problem. The possible queries are the variables, the possible answers are the values for these variables, and the verifier’s predicate V (p,q,r,s) defines the constraints, weighted by π(p,q). The optimal strategy would then be an assignment of values to variables that maximizes the total weight of satisfied constraints.

However, this model does not seem to capture all the different ways Alice and Bob could interact in their initial gathering. For example, Alice and Bob could have agreed to use the outcomes of a number of coin-flips, which they could have recorded in their respective memory sticks, to determine their answers during the game. As long as they don’t communicate during the game, this form of interaction is allowed! A better model for this situation would then seem to be that Alice and Bob choose functions of the form A : P × T → R and B : Q × T → S, where T denotes the set of potential coin-flip results, which has an associated probability distribution σ. Their goal would now be to maximize the average weight:

It turns out that in this particular case, there is no difference in the optimal value of the two models; but one could easily imagine that if we missed this one, we could well be missing other ways that Alice and Bob could interact in defining their strategy. And indeed, we are missing some such forms of interaction.

Suppose that Alice and Bob, when they design their strategy, agree to prepare a pair of elementary particles in a suitable entangled state, in the way they were taught was possible in their quantum mechanics class. When they separate, Alice takes one of the particles with her and Bob takes the other. When they play, Alice and Bob perform certain measurements on their respective particles depending on the query each one gets, and then they use the outcomes to determine their answers. Notice that Alice and Bob still stay separated from each other; they just perform local measurements on their local particles! The question is: could the fact that the particles were entangled in the past give Alice and Bob a strategy that has greater probability of winning?

Quantum vs. hidden-variable theories
The peculiar properties of quantum entanglement have fascinated physicists, mathematicians, logicians, and philosophers since the 1920’s. It is well-known that such giants as Einstein were initially reluctant to accept the uncertainty principle of quantum mechanics in its full strength. Others, such as von Neumann, were profoundly inspired by the topic to develop their deepest work. In more recent times, theoretical computer scientists have brought forth a computational and information-theoretic perspective that, hopefully, sheds some more light. The origins of this approach go back to the famous 1964 article of J.S. Bell, “On the Einstein-Podolsky-Rosen Paradox.”

Following Bell, let’s consider a Gedankenexperiment. Before the experiment starts, a large number N of pairs of particles of so-called spin are independently prepared in an entangled state, the singlet state. The experiment starts when Alice and Bob travel to distant planets with one particle each from each pair. Once in place, they measure the spins of their particles along different axes. Alice uses axis a and Bob uses axis b, where a and b are 3-dimensional unit vectors chosen by the verifier. As they do their measurements, they collect outcomes r₁,…,r_N ∈{+1,−1} for Alice, and s₁,…,s_N ∈{+1,−1} for Bob, which they send back to the verifier. At this point Bell asked: what are we to expect for the value of ∑ _i₌₁^Ns _ir_i? In other words, if the verifier counts the number of times that r_i = s_i and subtracts the number of times that r_i ≠ s_i, what are we to expect?

Bell computed what quantum theory predicts for these questions, and he concluded that one is to expect that ∑ _i₌₁^Nr _is_i approaches − cos θ as N grows, where θ ∈ [0,π] is the angle between the axis vectors a and b. Put differently, if we ask the verifier to accept any single pair of data (r_i,s_i) only if r_i = s_i, one is to expect that the verifier would accept with probability approaching − cos θ. The point of Bell’s study was to contrast this calculation with what would be expected from a local hidden-variable theory of the type Einstein and his co-authors had hoped would explain the uncertainty in the experiment. In such a model, Bell argued, the statistics of ∑ _i₌₁^Nr _is_i would have to take the form of an average ∑ _t_∈_T σ(t) ∑ _i₌₁^Nr _i(t)s_i(t), and he went on to prove that in such a case it would have to approach the quantity −1 + θ. In other words, a verifier that accepts only if r_i = s_i would have to accept with probability approaching . The discrepancy between and the earlier − cos θ is particularly noticeable at θ ≈ 133.56^∘ where the ratio is minimized and is as small as 0.878567.

The analysis in Bell’s Theorem can be turned into a game in which entangled strategies give higher probability of winning than unentangled ones. The question arises, however, of whether this difference can be witnessed only statistically, i.e., quantitatively, or if it manifests itself in more absolute terms, i.e., qualitatively. In other words, could there be a game in which Alice and Bob can make the verifier accept with certainty if they use entanglement but not if they don’t? Such questions had also been considered by physicists, including Bell himself, with the conclusion that, also at this level, entangled strategies are superior to unentangled ones. This type of refinement of the sense in which entanglement is more powerful, which led Mermin to coin Kell-Kochen-Specker type theorems in his beautiful paper, “Hidden variables and the two theorems of John Bell,” ties better to the problems of interest to logicians, and will take our attention for the rest of this Vignette.

Quantum relaxations of constraint satisfaction problems
Take an instance of the constraint satisfaction problem with Boolean variables X₁,…,X_n ranging over {+1,−1}, and constraints given by, say, polynomial equations of the form P(X_i₁,…,X_ik) = 0. There is a well-known game that models this problem: the verifier chooses both a constraint and a variable that appears in this constraint, sends the constraint to Alice and the variable to Bob, and expects an assignment that satisfies the constraint from Alice, and an assignment that agrees with Alice’s on the pointed variable from Bob. It is easy to see that the solutions to the constraint-satisfaction instance give rise to (unentangled) strategies that win with certainty in the game, and vice-versa. What about entangled strategies? What do they correspond to at the level of the constraint-satisfaction instance? In their ICALP 2014 paper, theoretical computer scientists Richard Cleve and Rajat Mittal addressed this question and, to answer it, introduced the following quantum relaxation of the constraint-satisfaction problem:

Let the variables X₁,…,X_n range over Hermitian d-dimensional matrices in ℂ^d^×^d for some arbitrary but unspecified dimension d ≥ 1. Impose the conditions that X_i² = I, where I is the identity matrix, and that the matrix product on X_i and X_j commutes whenever X_i and X_j appear together in at least one of the given polynomial equations. Note that the case d = 1 is no relaxation at all since the 1-dimensional matrices are just scalars, whose product obviously commutes, and the equations X_i² = I translate to X _i ∈{+1,−1}. Cleve and Mittal proved that entangled winning strategies in the game give rise to solutions to this relaxed version of the problem, and vice-versa.

Now we can ask the type of questions that Bell asked. Could there be a system of polynomial equations that is quantum-satisfiable in the sense above but not classically satisfiable? Some well-known such examples come from the work of the physicists in their proofs of the Bell-Kochen-Specker Theorems; the best known is the Mermin-Peres Magic Square made of six polynomial equations (the three rows and the three columns in the square below) over nine variables:

Seen as a logic puzzle of the kind that one used to find in newspapers, this appears to have no solution: all variables appear exactly twice so the left-hand sides of the equations multiplied together give 1, while the right-hand sides give −1. In contrast, the system is quantum-satisfiable by means of a 4-dimensional solution made of suitable Kronecker products of the classical Pauli matrices (Fig 3. in Mermin’s paper — see Notes for further reading, below).

Emerging directions
In his series of lectures in the Logic and Computation Boot Camp that took place at the begining of the program, Samson Abramsky gave an introduction to the topic of Bell-type results from a logician’s viewpoint. A recurrent theme within the lectures was the extent to which local constraints on a system, such as the outcomes of physical measurements, may or may not influence its global properties. Put this generally, this is indeed a common theme of most subareas of logic in computer science. It appeared in Panangaden’s lectures on the analysis of probabilistic systems while constructing continuous probability spaces that meet the local constraints that arise from conditioning. It also appeared in Kolaitis’ lectures on database theory while discussing how the local structure of a conjunctive query can influence the size of its output and the computational complexity of computing it. And it appeared in Dawar’s course on finite and algorithmic model theory as the fundamental question of whether the local structure of a finite model could determine it up to isomorphism. I will close this Vignette with a discussion on a research question that was raised at the program, which links this last topic with Bell-Kochen-Specker-type theorems, and which was successfully answered in a team-effort during the program.

The work of Cleve and Mittal referred to above is the continuation of a recent trend that takes classical combinatorial problems and relaxes them into quantum analogues: quantum chromatic numbers, quantum homomorphisms, quantum Lovász theta, etc. Let us take quantum graph isomorphism. We are given two graphs G and H. The verifier chooses two vertices from either graph and sends them to Alice and Bob, one vertex each. If Alice gets a vertex u from G, she must reply with a vertex v from H, and vice-versa. If Bob gets a vertex u′ from G, he must reply with a vertex v′ from H, and vice-versa. The verifier will accept if and only if the two pointed vertices from G satisfy the same equalities and edge relationships as the two pointed vertices from H; i.e., he will accept precisely when u = u′ if and only if v = v′, and u ∼_Gu′ if and only if v ∼_Hv′. It is not hard to see that Alice and Bob have an unentangled strategy that wins this game with certainty if and only if G and H are isomorphic. Are there pairs of graphs on which Alice and Bob win with entanglement but not without?

The graph isomorphism problem is of course of fundamental importance in theoretical computer science. For one thing, it is one of the few problems in the complexity class NP that is not known to be polynomial-time solvable or NP-complete. Thanks to the recent breakthough of Babai, we now know that the problem is solvable in quasipolynomial time, and thanks to the earlier work in probabilistic proof systems, we already had strong evidence that the problem cannot be NP-complete. These two facts could explain why it has always been so hard to find hard-to-solve instances of the graph isomorphism problem. To put it graphically, if you give me a reasonably smart heuristic to solve the graph isomorphism problem, chances are that I will have a hard time finding an example on which your heuristic fails. For this very same reason, finding two graphs on which Alice and Bob can win the isomorphism game with entanglement but not without does not look like a very easy question.

It turns out that similar questions had been asked previously in descriptive complexity theory in the 1980’s. The fundamental problem of that area, which has its origins in the theory of relational databases, is whether there is a logic that is able to express all and only the isomorphism-invariant graph properties that can be decided in polynomial time. In the 1980’s some candidate logics were put forward, until one was found to resist all attempts at a counterexample. This was the logic now called Fixed-Point Logic with Counting (FPC). Eventually, in a breakthrough that became known as the CFI-construction, J. Y. Cai, M. Fürer and N. Immerman found a counterexample: they proved that the isomorphism problem for graphs with color-classes of bounded size, which is a special case that is solvable in polynomial time, cannot be expressed in FPC.

Besides its original purpose, the CFI-construction became also the canonical “hard-to-solve” instance for combinatorial heuristics for the graph isomorphism problem. For one thing, it was the first construction to show that there exist pairs of non-isomorphic graphs for which the classical vertex-refinement heuristic, when extended to tuple-refinement in the straightforward way, would need to use tuples of unbounded length before it detects that the graphs are not isomorphic. Could the same construction be used to solve our problem of finding non-isomorphic graphs that are nonetheless quantum isomorphic? One difficulty in going this way is that nobody knows of a systematic method to decide if two given graphs are quantum isomorphic; as far as we know, the problem could be even undecidable. For this reason, it would have been more natural to start by constructing two graphs that are quantum isomorphic by design, and only then show that they are not classically isomorphic. But CFI was already there, and we had to give it a try. So we tried it, and… it worked!

The key to finding the solution was the previously noted fact that, in the context of the logical and universal-algebraic approaches to the Feder-Vardi Dichotomy Conjecture for constraint satisfaction problems, the CFI-construction can be interpreted as a system of parity equations in which each variable appears in exactly two equations. Coincidentally (or not), these are precisely the type of systems of equations that underly the Mermin-Peres Magic Square and similar constructions. From there, a few calculations sufficed to realize that Mermin’s solution to the magic square could be reused, and the problem was solved.

Notes for further reading
Bell’s paper appears in Physics 1, 1964: 195-200. The ratio in our exposition of Bell’s analysis is exactly the same as Goemans-Williamson’s approximation ratio for MAX-CUT and is related to Grothendieck’s constant; this is hardly a coincidence. See for example the paper by Regev and Vidick, “Quantum XOR Games,” ACM TOCT, 7(4), 2015, n. 15, and references therein. For the Bell-Kochen-Specker Theorem and the magic square see Mermin’s Rev. Mod. Phy. 65(3), 1993. Cleve and Mittal’s paper appeared in the Proceedings of ICALP 2014 and also QIP 2014 and arXiv:1209.2729 [quant-ph]. The Cai-Fürer-Immerman paper appears in Combinatorica 12(4), 1992: 389-410, with an earlier version in FOCS 1989. The new results on quantum graph isomorphism appear in Atserias, Mančinska, Roberson, Šámal, Severini, and Varvitsiotis, “Quantum and non-signalling graph isomorphisms,” arXiv:1611.09837 [quant-ph], with a shorter version in ICALP 2017. Two other recent papers that report work along these lines are Atserias, Kolaitis and Severini, “Generalized Satisfiability Problems via Operator Assignments,” arXiv:1704.01736 [cs.LO]; and Abramsky, Dawar and Wang, “The pebbling comonad in finite model theory,” arXiv:1704.05124 [cs.LO]. This last will appear in shorter form in LICS 2017.

Calvin Café: The Simons Institute Blog

What's New at the Simons Institute for the Theory of Computing.

Lattice Blog Reduction – Part I: BKZ

Target Audience

The Basics

The Goal

Tools

Disclaimer

BKZ

The Good

The Bad

The Ugly

The Rescue

Workshop “Lattices: New Cryptographic Capabilities”

Zoom Guidelines/Tips

A Sneak Preview

Bingo!

Research Vignette: Foundations of Data Science

Research Vignette: Lower Bounds in Computational Complexity

Research Vignette: Real-Time Decision Making in Energy (RTDM-E)

Research Vignette: Optimization Against Adversarial Uncertainty

Competitive analysis

Multi-arm bandits

Research Vignette: Ramsey Graphs and the Error of Explicit 2-Source Extractors

Research Vignette: Promise and Limitations of Generative Adversarial Nets (GANs)

Research Vignette: Setting Posted Prices Under Uncertainty

Notes

Research Vignette: Entangled Solutions to Logic Puzzles