by Prasad Raghavendra (Simons Institute)
It has been only a few months since the last time this column appeared. Yet there is so much to write about that it feels like a year has passed. For one area in particular, a decade’s worth of developments seems to have emerged in a few months’ time. I am talking of course about the phenomenal developments on error-correcting codes that we witnessed in the past few months.
Error-correcting codes
Error-correcting codes encode a message into a longer codeword so that the original message can be recovered even if many of the bits of the codeword are corrupted. Clearly, there is a trade-off between the amount of redundancy introduced by the error-correcting code and the number of errors it can recover from. A gold standard for error-correcting codes is being “constant rate, constant distance,” also referred to as being “asymptotically good.” An asymptotically good code can recover from a constant fraction of bit errors in the codeword, and the codeword is only a constant factor longer than the original message.
Error-correcting codes can be described using a family of “parity checks,” where a parity check on a subset \(S\) of bits is a constraint that there are an even number of \(1\)s within the subset \(S\). A codeword is then a sequence of bits that satisfy all of the parity check constraints. In a low-density parity check (LDPC) code, there is a family of parity checks on constantly many bits each, that together define the code.
Introduced by Gallager in 1960, LDPC codes are central objects in both the theory and the practice of error correction. In 1996, Sipser and Spielman showed how to use expander graphs to construct LDPCs with constant distance and constant rate. Their construction is a thing of beauty. Pick an expander graph and associate one bit on each edge of the graph. For each vertex of the expander graph, the parity of the sum of bits on its edges is even. This construction also admits linear time encoding and decoding algorithms for the same.
Asymptotically good locally testable codes and quantum LDPCs
A locally testable code is one that admits a highly efficient procedure to detect the presence of errors. More precisely, it admits a testing procedure that queries a small number of randomly selected bits in the codeword, and will reject a corrupted codeword with, say, 1% of errors with constant probability. The gold standard would be a constant-query locally testable code, where the testing procedure queries only a constantly many (say, 100) bits of the corrupted codeword.
Locally testable codes have been central to many of the developments in complexity theory for close to three decades now. Linearity testing, aka locally testing the Hadamard code, is still the prime example for property testing and lies at the gateway to the world of probabilistically checkable proofs.
Intuitively for a code to be locally testable, it needs to admit a large number of constant-size parity checks. In other words, locally testability seemingly necessitates the code to have a lot of redundancy. In fact, the classic examples of locally testable codes, such as Hadamard codes or Reed-Muller codes, have codewords that are superpolynomially larger than the message.
Irit Dinur’s ingenious proof of the PCP theorem yielded as a by-product the first locally testable codes where the codewords were only slightly superlinear in size. It was then that one would dare to ask: can locally testability be achieved for “free”? That is, can one have the best of all worlds: a constant-rate, constant-distance code that also admits a constant-query local-tester?
This long-standing open question always seemed out of reach until late last year, when it was resolved affirmatively by two independent groups of researchers simultaneously!
Irit Dinur, Shai Evra, Ron Livne, Alexander Lubotzky, and Shahar Mozes start their journey with the idea that analogous to how Sipser and Spielman use expander graphs to construct codes, one could potentially use high-dimensional expanders to construct codes that are also locally testable. Starting with this idea, they undertake a long and intensive journey through a terrain of Ramanujan complexes, Bruhat-Tits buildings, p-adic uniformization, and other advanced mathematics to eventually land on an elementary and elegant construction (see Irit Dinur’s talk on this work in the Breakthroughs talk series).
In an independent work, Pavel Panteleev and Gleb Kalachev not only construct asymptotically good locally testable codes, but also resolve yet another long-standing open question! They exhibit a construction of asymptotically good quantum LDPCs. Regular readers of this column might recall that we earlier featured a breakthrough by Hastings, Haah, and O’Donnell, who constructed quantum LDPCs on \(n\) qubits with distance \(n^{3/5}\), significantly beating \(\sqrt{n}\) for the first time. Panteleev and Kalachev essentially achieve the holy grail in this context — quantum LDPCs on \(n\) qubits with constant rate and distance \(\Omega(n)\).
It is uncanny how often a question that has remained open for years gets resolved by independent sets of researchers simultaneously. At first, one might remark that these results were really in the air. Yet on a closer look, one can only wonder how the authors masterfully pulled these results out of thin air.
These works are a treasure trove of ideas that are elementary, elegant, and ingenious. To leave you with a taste of things that lie inside, let me briefly describe a new combinatorial object, the “left-right Cayley complex,” that seems to underlie both these works. We are all familiar with Cayley graphs — objects studied for more than a hundred years. These are graphs where the vertices are elements of a group, and edges are of the form \((g,ag)\) — corresponding to left multiplication by a fixed set of group elements \(a\) called generators.
A left-right Cayley complex is a higher-dimensional object that has vertices, edges, and squares (\(4\)-tuples). The vertices are still elements of a group, and the squares are of the form \(\{g, ag, gb, agb\}\), corresponding to left and right multiplication of a group element \(g\) by generators \(a,b\). This simple and easily constructed combinatorial object (left-right Cayley complex) plays a central role in both these code constructions. Yet this object seems to have eluded study in more than a century of work on Cayley graphs.
Reed-Muller codes achieve capacity
Reed-Muller codes are among the oldest of error-correcting codes and are probably the simplest to describe. Reed-Muller codewords of degree \(d\) are just the truth tables of degree \(d\) polynomials over the finite field \({0,1}\). Yet almost 70 years after their discovery, there are still fundamental questions about Reed-Muller codes that remain open.
Given the noise characteristics of a channel, Shannon’s theory establishes the fundamental upper bound on the rate of communication possible. An error-correcting code is said to “achieve capacity” if the rate of communication (aka, the number of message bits per codeword bit) matches Shannon’s upper bound.
A fundamental question on Reed-Muller codes is whether they do achieve capacity for a broad family of channels. In 2016, a beautiful work by Shrinivas Kudekar, Santhosh Kumar, Marco Mondelli, Henry D. Pfister, Eren Şaşoğlu, and Rüdiger Urbanke showed that indeed, Reed-Muller codes do achieve capacity for erasure channels, namely channels that only erase bits instead of corrupting bits. Using one of the most ingenious applications of Boolean function analysis, they show a very general result that any code with a certain family of symmetries achieves capacity against erasures.
In a remarkable breakthrough, Galen Reeves and Henry D. Pfister show that Reed-Muller codes achieve capacity even against the broad family of binary memoryless channels. In particular, they achieve capacity for the most classic channel, namely the binary symmetric channel (BSC), where each bit is flipped with a fixed probability \(\epsilon\).
An algorithmic surprise
Given a graph with weights on the edges, the best-known algorithm to compute shortest paths between all pairs of vertices takes \(O(n^3)\) time. In fact, it is hypothesized that the all-pairs shortest paths (APSP) problem requires \(n^{3-o(1)}\) time, and this hypothesis is the starting point for a web of reductions in fine-grained complexity.
Now consider the problem of computing max-flow values between every pair of vertices in the graph, also known as the all-pairs max-flow (APMF) problem. It would seem that the APMF problem must be harder than APSP — after all, shortest paths are arguably much simpler to compute than maximum flows.
Yet, a phenomenal new result by Amir Abboud, Robert Krauthgamer, Jason Li, Debmalya Panigrahi, Thatchaphol Saranurak, and Ohad Trabelsi shatters this intuition. In particular, they exhibit a nearly quadratic-time algorithm for APMF!
Perhaps the first hint that max-flow values are arguably simpler objects than shortest paths lies in Gomory-Hu trees. In 1961, Gomory and Hu showed that the max-flow values between every pair of vertices in a graph can be succinctly represented using a weighted tree. More precisely, the Gomory-Hu tree is a weighted tree that one can construct for any given graph so that the maximum flow between every pair of vertices in the graph is equal to the same quantity on the Gomory-Hu tree.
Six decades after its introduction, this work produces the first near-quadratic-time algorithm to compute the Gomory-Hu tree. Moreover, this work also shows that one can compute Gomory-Hu trees in unweighted graphs by using polylogarithmically many max-flow computations.