Theory at the Institute and Beyond, May 2021

by Prasad Raghavendra (Simons Institute)

Another semester of remote activities at the Simons Institute draws to a close. In our second semester of remote operations, many of the quirks of running programs online had been ironed out. But pesky challenges such as participants being in different time zones remained. To cope with these, the programs spread their workshop activities throughout the semester. Not only did this help with the time zones, but it helped keep up the intensity all through the semester.

One hopes that the Institute will never need to organize a remote semester again. At the end of June, we are excited to host the Summer Cluster in Quantum Computation, our first in-person activity since COVID-19 hit. By all expectations, the programs in the fall will likely be fairly close to the normal that we have all been longing for.

In fact, it looks like the Institute will be buzzing with activities in the fall, even more than it used to be, as if that were possible. In a few months, the Institute will welcome more than 40 talented young researchers as postdoctoral researchers and program fellows. They cover the whole gamut of areas, ranging from complexity theory and algorithms to quantum computation and machine learning.

As we settle into a different normal again, one wonders what this new normal will look like a few years from now. In particular, what aspects of this COVID era will continue to stay with us theorists, say, five years from now? Will we have online conferences and workshops? Will meeting anyone from anywhere seem as easy as it does now? How often will a speaker be physically present at our seminars? Only time will tell.

Solving linear systems faster
The ACM SODA conference held virtually in January featured a breakthrough result in algorithms for one of the oldest computational problems — namely, solving a system of linear equations. Richard Peng and Santosh Vempala exhibited a new algorithm to solve sparse linear systems that beats matrix multiplication, in certain regimes of parameters.


Research Vignette: Cryptography and Game Theory for Blockchains

by Vassilis Zikas (University of Edinburgh)

Blockchains and decentralized ledgers are creating a new reality for modern society. A major scientific impact of this new reality is that it creates an interdisciplinary arena where traditionally independent research areas, such as cryptography and game theory/economics, need to work together to address the relevant questions. The Simons Institute’s Fall 2019 program on Proofs, Consensus, and Decentralizing Society fostered much-needed cross-disciplinary collaboration by bringing together researchers from these disciplines.

An interdisciplinary research arena

The study of the interplay of cryptography and game theory/economics within the blockchain consists of several subareas. Here we discuss three such areas, with a focus on Bitcoin, as one of the most widely studied and adopted to date blockchains and cryptocurrencies. We stress that the relevant literature is huge and that discussing it all here is beyond the scope of this research vignette.

  1. Cryptographic security and economic robustnessThe works of Garay et al. [12] and Pass et al. [17] initiated the rigorous cryptographic study of the Bitcoin blockchain protocol and proved that under common assumptions about the underlying hash function and assuming the majority of the hashing power in the system is invested in executing the Bitcoin protocol (rather than attacking it), the protocol achieves a number of basic properties, such as common prefix (corresponding to the traditional notion of safety), chain growth (corresponding to liveness), and chain quality (ensuring that honest parties get to contribute blocks at an acceptable frequency). Follow-up work by Badertscher et al. [4] defined the functionality offered by the Bitcoin ledger and proved the protocol’s security under the above honest-majority assumption in a composable framework. This enables using the ledger functionality directly — without worrying about implementation details — within higher-level protocols. These initial works have ignited a number of follow-ups aiming at relaxing the underlying assumptions and/or tuning the protocol abstraction to better capture reality.

    Parallel to the cryptographic security of blockchain protocols, their economic robustness — i.e., their resilience to incentive-driven attacks/misbehavior — has been extensively investigated. A classical example here is the work of Eyal and Sirer [9], which abstracted the protocol execution as a simple mining game and demonstrated that by withholding mined blocks and strategically releasing them, attackers with favorable network conditions might be incentivized to deviate from the Bitcoin protocol, even when they control only a minority of the hashing power. Note that such deviations cannot affect the worst-case security guarantees established in the cryptographic security proofs, but they can push them to their limits. For instance, although a selfish miner cannot break common prefix and chain quality, they can temporarily create the longest-allowed forks and/or minimize the number of blocks contributed by honest miners to the worst-case value allowed by chain quality.

  2. Economics on blockchains. Independently of the questions about their economic robustness, blockchains and their associated cryptocurrencies have created a new scientific playground for economists and game theorists to develop and test new theories and confirm old ones. Indicatively, Huberman et al. [15] investigated how an ideal abstraction of Bitcoin yields a new market design paradigm where market forces do not control the functionality of the underlying payment system. They showed how analyzing this paradigm can explain behavioral aspects of Bitcoin and pointed to interesting modifications that can affect the efficiency of the protocol itself. Similarly, Benigno et al. [5] demonstrated how under common economics assumptions, equipping a two-country economy with a global cryptocurrency can create market forces driving the nominal interest rates to be equal in the two countries and yielding a rate relation between the two national currencies; and Prat and Walter [18] proposed a model using the Bitcoin-U.S. dollar exchange rate to forecast the computing power of the Bitcoin network.

  3. Blockchain-induced incentives on cryptographic protocols. A third type of blockchain-related problem where cryptography and game theory meet is the design of more efficient and resilient cryptographic protocols using incentives induced by blockchains and cryptocurrencies. The classical example here is fair multiparty computation (in short, fair MPC). In MPC, n parties wish to run a protocol to jointly compute a function on their private data. Fairness in this context requires that if a worst-case adversary — controlling and coordinating the (malicious) parties attacking the protocol — learns (any information on) the output, then the honest parties should also learn it. Cleve’s well-known impossibility result [8] mandates that if the adversary controls a majority of parties, then fairness is impossible. Intuitively, the reason is that as the protocol has to tolerate any adversarial coalition, the actual corrupted parties might be the ones to first jointly learn such information; once they do, they can stop playing, thereby preventing other parties from also learning it. In [2, 6], it was shown that using the Bitcoin blockchain as an automated escrow mechanism, one can enforce a version of fairness based on collaterals where either nobody learns any information or if someone does and prevents others from also learning it, then he loses his collateral to them. Assuming the adversary values his collateral higher than breaking fairness, this mechanism induces a fair evaluation. These results were subsequently extended to ensure robustness [16], i.e., ensure that the protocol will not fail and either it will fairly conclude or the adversary will lose his collateral.


Research Vignette: Statistical Learning-Aided Design of a Blockchain Payment System

by Xiaowu Dai (UC Berkeley)

A blockchain-based payment system uses a decentralized network of computers (miners) to verify transactions and maintain the ledger containing transaction history. The blockchain design is governed by a computer protocol and does not require trust in any centralized organization. The system supports only transactions of its native coins, such as bitcoins, which have value because payment recipients have confidence in their future usefulness. The design of such a novel system was proposed by [11]. It is of interest to study the mechanism design of the blockchain payment system and related questions on transparency and fairness [1, 6, 7, 13].

This project is connected with what was explored during the Simons Institute’s Fall 2019 program on Proofs, Consensus, and Decentralizing Society, especially with the algorithms and economics questions that emerged from the blockchain-based systems.

The blockchain payment system can be modeled as a two-sided market that intermediates between users and miners [5, 9]. The rise in Bitcoin transaction volume, coupled with limited capacity on the number of blocks that can be posted to the blockchain, raises transaction delays and motivates users to pay transaction fees for service priority. We want to understand how the transaction fees influence users’ willingness to participate in the payment system. Our analysis suggests a simple modification on the design by adding a statistical learning-aid recommendation of transaction fees for users. In particular, we build confidence bands for the regression function \(f(\cdot)\) in the model

\[\text{delay}\ \text{ti}\text{me} = f\left( \text{transaction}\ \text{fee} \right) + g\left( \text{control} \right) + \text{disturbance}\text{.}\]

Here, the control variable includes the number of pending transactions, fees of pending transactions, current infrastructure level, rate of block addition to the ledger, and history of random arrival of transactions, among others [12]. The \(f(\cdot)\) is the main nonparametric regression function that we want to infer, and the function \(g(\cdot)\) is the nuisance parameter. The dimension of the control vector can be large relative to the sample size. We develop a simple procedure to construct an honest confidence band for \(f(\cdot)\), a method that builds on the recently developed framework (e.g., [2, 3]). The confidence band would cover the true \(f(\cdot)\) at the nominal level (say, 95% level) uniformly over a large function space given a large sample of data. The notion of honesty is closely related to the use of the worst-case criterion that is necessary for good finite-sample performance [10].

This statistical learning-aid recommendation could provide the relationship between the transaction fees and the predicted transaction delay time, together with confidence bands. It leads to several implications. First, the recommendation enables the transparency of the blockchain payment system mechanism. It allows a direct comparison with the traditional payment system operated by a centralized firm like Visa or PayPal. The firm can reverse transactions in case of error or fraud, a property that is lacking in the blockchain payment system. On the other hand, users benefit from the transparency since they can decide which payment system is better for processing the transaction and how to balance the trade-off between transaction fees and transaction delay. Second, the blockchain payment system enjoys the Vickrey, Clarke, Groves (VCG) property in that each user pays a transaction fee equal to the externality on other users caused by delaying others’ transactions [4, 8, 9, 14]. The recommendation also adds the fairness guarantee; that way, with high probability, the transaction fee is close to the actual value of service priority and users avoid paying extra.


Research Vignette: Asynchrony/Synchrony vs. Mobile/Stationary

by Eli Gafni (UCLA)

In the Simons Institute program on Proofs, Consensus, and Decentralizing Society, systems like Stellar reminded us of the anomaly that results from introducing cryptography into distributed computing.

The research I embarked on was to find conditions under which the anomaly is eliminated.

In synchronous systems with processors using signatures that may behave in a Byzantine malicious manner, consensus can be reached as long as at most half of the processors are engaged in the malicious behavior. In contrast, when the system is asynchronous but eventually settles on synchronous behavior, consensus can be reached only under the more stringent condition that at most a third of processors rather than at most half are malicious.

This anomaly that synchronous systems and eventually synchronous systems do not have the same capability in reaching consensus is exhibited only with the introduction of cryptography signatures.

We resolve this anomaly by proposing that the definition of asynchrony, while practically pleasing, should be substituted by the more elegant, appealing mathematical definition of viewing asynchrony as the traditional synchronous system, but with the faulty behavior exhibited by different sets from round to round, giving rise to the pair mobile/stationary rather than asynchronous/synchronous.

This shift raises two questions: one about the meaning of mobile Byzantine and the other about the meaning of possibly all signatures forged over the rounds.

We resolve the first question by attributing the faulty behavior solely to the communication subsystem, considering processors to always behave correctly. We resolve the second question by abstracting signatures as just a constraint on which forwarded messages can be forged and which cannot. (We leave open the question of implementing the constraint.)

With these resolutions, a stationary system with signature constraint can reach consensus in the face of malicious faults as long as no more than half of processors fail in a round. The notion of eventual stationarity is now the elegant notion that the mobility of the system freezes. The system was hot and is now cooled. This supports the observation that, mathematically, the correct pair to consider is mobile/stationary rather than asynchrony/synchrony.

Research Vignette: The Unlikely Friendship of Lattices and Elliptic Curves

by Shweta Agrawal (IIT Madras)

One of the most fascinating things about cryptography is its deeply paradoxical nature: among its early successes is the ability for two strangers to meet, generate a shared secret key, and thereon communicate privately — all of these performed entirely within a crowd. Over the last several decades, we have witnessed ever more surprising cryptographic protocols, achieving increasingly paradoxical properties, ranging from zero knowledge to fully homomorphic encryption. This article will focus on one such paradox, a new technical approach that has been unraveling surprising power, opening up avenues for hitherto unrealized cryptography. This is the unlikely friendship of two seemingly disparate mathematical structures and conjectured hard problems on these, namely, the hardness of finding short independent vectors in high-dimensional lattices and hardness assumptions on elliptic curves over finite fields. This topic is connected with what was explored during the Simons Institute’s Spring 2020 program on Lattices: Algorithms, Complexity, and Cryptography — the interface of lattices with other mathematical structures and new cryptography that can emerge from it.

Our assumptions

In more detail, our first assumption on high-dimensional lattices is the so-called learning with errors (LWE) proposed by Oded Regev in 2005 [21]. LWE conjectures that it is hard to recover a secret \(\mathbf{s}\in \mathbb{Z}_q^n\) from a sequence of noisy linear equations on \(\mathbf{s}\). In more detail, given a large prime \(q\) and polynomially many pairs \((\mathbf{a}_i, \langle \mathbf{a}_i,\;\mathbf{s}\rangle + e_i)\), where \(\mathbf{a}_i \in \mathbb{Z}_q^n\) are random and \(e_i \in \mathbb{Z}_q\) are “small” perturbations chosen from some appropriate distribution, the LWE problem asks to find \(\mathbf{s}\). For our second assumption, we require three cyclic groups of large prime order (\(q\), say), \(\mathbb{G}_1\), \(\mathbb{G}_2\), and \(\mathbb{G}_T\). Let \(e:\mathbb{G}_1\times \mathbb{G}_2 \to \mathbb{G}_T\) be a nondegenerate bilinear map or “pairing,” and \(g_1\) and \(g_2\) be the generators of \(\mathbb{G}_1\) and \(\mathbb{G}_2\), respectively. Then, by the bilinearity of the pairing, we have that \(e (\;g_1^a, \; g_2^b \;) = e(\;g_1,\; g_2\;)^{ab}\) for any \(a, b \in \mathbb{Z}_q\). We require that the group operations in \(\mathbb{G}_1\), \(\mathbb{G}_2\), and \(\mathbb{G}_T\) as well as the bilinear map \(e\) can be efficiently computed. In terms of hardness, we will roughly assume that the only information the adversary can learn is via legitimate group and pairing computations on the elements she receives. This strong security guarantee can be formalized in the so-called generic group model [22, 20]. We may think of \(a\) as a secret “message” that is encoded in the exponent of a group element \(g_1^a\). Both LWE and bilinear map–based assumptions have been studied extensively, and we have significant confidence in their hardness. These are also two of the most versatile tools in cryptography and have been used to construct breakthrough applications such as identity-based encryption [5], fully homomorphic encryption [14], and such others. Traditionally, these assumptions were used in mutual exclusion for any given construction, but recently we are learning to combine them in novel non-black-box ways to yield surprising results. In this article, I will describe a recent construction of the primitive of broadcast encryption in a joint work with Shota Yamada [2] where we leveraged a serendipitous interplay of these structures to obtain optimal parameters.


Theory at the Institute and Beyond

by Prasad Raghavendra

Amid the biggest pandemic in a century, one that has disrupted lives and livelihoods the world over, it is gratifying to see how people in different walks of life have found ways to cope and carry on. Within the realm of theory research, the pursuit to better understand the foundations of computation and their implications doesn’t seem to have slowed down even a bit.

Make no mistake, the pandemic has disrupted our traditional modes of operation. A typical theorist might spend several hours each day brainstorming in a group, often over a beverage. For most theorists, this is the single most productive and enjoyable activity each day. As these meetings move online, they remain a shadow of what they used to be. Normally, surprising results often arise out of chance encounters between researchers from very different areas. As conferences and workshops shift online, however, these chance encounters become very rare. Finally, on most days, we are struggling along an unforgiving trail in an attempt to scale a seemingly insurmountable peak. Doubts — such as, Are we on the right trail? Even if we are on the right trail, are we strong enough to get through it? — often linger and can easily set one up for failure. Sharing these “theoretical” struggles with other researchers over lunch or in the corridors can be critical to keep us going. Sadly, these opportunities are rare these days.

Yet theory research does not seem to have missed a beat. The ACM STOC conference was held online for the first time. Despite the limitations of the medium, there are many silver linings to an online conference. Participation at the conference nearly doubled from last year, with 606 participants from 33 countries, many of which had not been represented at typical STOC conferences before. The videos of the conference talks are all available online, a fantastic resource for researchers going forward. Finally, as Ronen Eldan’s talk at the conference beautifully demonstrated, video is a really effective medium to communicate a research work in broad strokes in a very short time. In fact, this year’s online conference seemed to have so many advantages that the PC chair, Julia Chuzhoy, suggested holding the STOC conference twice each year, once online and once offline. 

As weekly seminars at most universities move online, they have begun to attract participants from across the world. CS Theory Online Talks maintains a list of theory talks that are available online (also see here and here). The PIMS-CRM summer school on probability has morphed into a great set of online courses that I have really enjoyed. The Oxford-Warwick Complexity Meetings are an online lecture series dedicated to complexity theory, while we at the Simons Institute are also hosting a lecture series, on Boolean function analysis (more on this later). This flurry of online activity catalyzed by the pandemic is promising to make theory research broadly accessible to graduate students and undergraduates across the globe.

Meanwhile, fantastic new results keep pouring in. The biggest breakthrough this summer is the work of Karlin, Klein, and Oveis Gharan on the metric traveling salesman problem (metric TSP). They have posted a 1.5 – ε approximation algorithm for metric TSP for some constant ε > 10 -36. Metric TSP is a fundamental combinatorial optimization problem wherein the inputs consist of a network of cities and the distances between them. The goal is to find the shortest-length route that visits each city exactly once and returns to the starting point. This problem is called metric TSP if the distances between cities are assumed to satisfy the triangle inequality, namely the distance from City A to City C is at most the sum of the distances from City A to City B and from City B to City C. 


Research Vignette: Generalization and Interpolation

by Daniel Hsu

The practice of deep learning has attracted new attention to several basic questions in statistical machine learning. One such question is how fitting machine learning models to relatively small “training” data sets can lead to accurate predictions on new data. In machine learning jargon, this is the question of generalization.

The conventional wisdom in machine learning offers the following about generalization:

  1. A model that is too simple will underfit the true patterns in the training data, and thus, it will predict poorly on new data.
  2. A model that is too complicated will overfit spurious patterns in the training data; such a model will also predict poorly on new data.

Consequently, one should choose a model that balances these concerns of underfitting and overfitting [30]. A textbook corollary is that a model that exactly fits — i.e., interpolates — the training data “is overfit to the training data and will typically generalize poorly” [17].

Recent machine learning practice appears to eschew this conventional wisdom in a dramatic fashion. For instance, a common starting point for training a neural network is to find a model that exactly fits the training data [27]. (Typically, the model is subsequently fine-tuned using different criteria, but the starting point is already nontrivial.) While this may be difficult or impossible with small neural networks, it becomes substantially easier after making the network very large. It is remarkable that this practice is compatible with generalization, despite the concerns about overfitting.

This apparent contradiction of conventional wisdom is not new. In the last century, machine learning practitioners were already aware of the efficacy (and advantages) of using “oversized” neural networks where the number of model parameters exceeds the number of training data [9, 20]. Similar observations were made about large ensembles of decision trees: it was found that increasing the number of decision trees in an ensemble could improve accuracy, even beyond the point at which the ensemble exactly fit the training data [28]. These empirical observations motivated the development of a new theory that greatly sharpens the existing body of knowledge in statistical machine learning [1, 28], and this theory has been recently extended to deep neural networks [2, 15, 26].

Figure 1: Leo Breiman’s list of important questions regarding neural networks (reproduced from [9])

However, the new theory does not fully explain recent observations about interpolating neural networks. First, in more recent experiments, neural networks are trained to interpolate noisy data [34]. By noisy data, we mean data in which the prediction targets (labels) may be erroneous. In fact, experiments have been conducted in which noise is deliberately injected in the data by assigning random labels to subsets of the training data. Neural networks that interpolate these data were nevertheless found to have nontrivial accuracy on new data. Second, similar phenomena have been observed in the context of predicting real numerical targets. (The aforementioned theoretical and empirical studies mostly focused on classification targets.) When predicting real numbers, it is almost always a given that training data will have noisy labels, so the thought of interpolating training data seems even more preposterous. But again, in recent experimental studies, interpolating (or nearly interpolating) models were found to have nontrivially accurate predictions on new data [4, 7].


Research Vignette: The Many Dimensions of High-Dimensional Expanders

by Prahladh Harsha

An emerging theme in several disciplines of science over the last few decades is the study of local-to-global phenomena. Let me elucidate with a few examples. In biology, one tries to understand the global properties of an organism by studying local interactions at a cellular level. The Internet graph is impossible to predict or control at a global level; what one can at best do is understand its behavior by making changes at a very local level. Moving to examples closer to theoretical computer science, the pioneering works in computation theory due to Turing and others define the computation of any global function as one that can be broken down into a sequence of local computations. The seminal work on NP-completeness due to Cook, Levin, and Karp demonstrates that any verification task can be in fact reduced to a conjunction of verifying very local objects.

Given these examples, an intriguing question that arises in multiple disciplines is to identify which objects support such local-to-global phenomena. This question as stated is an ill-formed one, but one can attempt to make it well-defined in a variety of contexts. Let me illustrate by giving three specific instances.

Mixing time in graphs: Which graphs have the property of having a global mixing property that can be inferred by understanding the mixing properties of several related smaller graphs?

Local testing and decoding (in coding theory): Which error-correcting codes have the property of being able to be tested or decoded by looking at the code word or received word very locally (i.e., at very few locations)?

Topologically expanding graphs: Which graphs or hypergraphs have the property that their global topological property can be inferred by their local topological behavior?

These questions, seemingly disparate and arising in very different contexts and communities (in particular, approximate sampling, coding theory, and topology), surprisingly all led to the investigation of a similar expanding object: the high-dimensional expander. The Simons Institute 2019 summer cluster on Error-Correcting Codes and High-Dimensional Expansion brought together researchers from these diverse communities to study high-dimensional expanders and their potential applications to mathematics and theoretical computer science.

What is a high-dimensional expander?

High-dimensional expanders (HDXs) are a high-dimensional analogue of expander graphs. An expander graph, loosely speaking, is an extremely well-connected graph. Analytically, this is best captured via the second-largest eigenvalue (in absolute value) of the normalized adjacency matrix of the graph. More precisely, a graph G is said to be a λ-expander (for λ ∈ [0,1]) if the second-largest eigenvalue (in absolute value) of the normalized adjacency matrix AG of G is bounded above by λ. The smaller the λ, the better the expander. An equivalent definition is in terms of how well the random walk on the vertices induced by the graph mixes: G is a λ-expander if the spectral norm of the difference of the normalized adjacency matrix AG and the normalized all-ones matrix J (the normalized adjacency matrix of the complete graph) is bounded above by λ (i.e., \[\left\| A_G – \frac1n J \right\| \leq \lambda,\]

where n is the number of vertices of the graph G).


Research Vignette: Geometry of Polynomials

by Shayan Oveis Gharan

Classical theorems in theoretical computer science typically heavily exploit combinatorial or probabilistic tools. Later on, linear algebraic and geometric tools proved to be fundamental elements in numerous algorithmic or hardness results. In this research vignette, I will explain a new tool known as the “polynomial paradigm,” which can be seen as a bridge between linear algebraic and probabilistic methods; and I will discuss some unexpected applications in computer science. 

Given a polynomial \(p \in R\lbrack z_{1},\ldots,z_{n}\rbrack\), we can think of it in three ways: (i) coefficients of p, (ii) zeros of p, or (iii) a function that maps n-dimensional complex vectors to complex numbers. Having these three different representations, one can study what happens to one representation when we change another one, or what happens when we apply a mathematical operation such as differentiation, integration, or specialization. For many years, mathematicians studied these questions for many classes of polynomials.

Over the last two decades, a new theme that we call the “polynomial paradigm” emerged. It was observed that if the roots of the (multivariate) polynomial \(p\) have a “nice” structure in the n-dimensional complex plane, then one can study the interaction of zeros, coefficients, and function values rigorously. In several pioneering works, Borcea, Brändén, Choe, Liggett, Oxley, Sokal, and Wagner studied the class of real stable polynomials, which can be seen as the right multivariate generalization of real-rooted polynomials. A polynomial \(p\) is real stable if it has no root in the upper half of the n-dimensional complex plane. They observed that this class is closed under many operations, such as specialization, differentiation, inversion, etc.

Over the last decade, the polynomial paradigm proved to be a fruitful approach in many areas of science and mathematics. Several applications of this paradigm in theoretical computer science, combinatorics, and linear algebra were discovered, such as improved approximation algorithms for the traveling salesman problem, a purely algebraic proof of the van der Waerden conjecture, and a resolution of the notorious Kadison-Singer conjecture. The general recipe in almost all these applications is as follows: Given a discrete phenomenon, encode it in a complex multivariate polynomial with a nice zero-free region, such as a real stable polynomial. Then study this polynomial via the interplay of coefficients, zeros, and function values. Finally, translate your findings to the domain of the original problem.

In Spring 2019, with Nikhil Srivastava, we organized a semester-long program on Geometry of Polynomials at the Simons Institute, where we invited researchers across several areas of science, mathematics, and computer science to explore this rapidly evolving area. A large group tried to better understand these polynomials and their properties, while many others tried to explore further applications. In the rest of this research vignette, I will explain the journey toward the discovery of a new class of multivariate polynomials that we call completely log-concave polynomials. This discovery and their application motivated several researchers within the program to start collaborating to better understand this class of polynomials and its properties.


Research Vignette: Foundations of Data Science

by Ilias Diakonikolas (University of Southern California), Santosh Vempala (Georgia Institute of Technology), and David P. Woodruff (Carnegie Mellon University)

Algorithmic High-Dimensional Robust Statistics
Fitting a model to a collection of observations is one of the quintessential goals of statistics and machine learning. A major recent advance in theoretical machine learning is the development of efficient learning algorithms for various high-dimensional models, including Gaussian mixture models, independent component analysis, and topic models. The Achilles’ heel of these algorithms is the assumption that data is precisely generated from a model of the given type.

This assumption is crucial for the performance of these algorithms: even a very small fraction of outliers can completely compromise the algorithm’s behavior. For example, k adversarially placed points can completely alter the k-dimensional principal component analysis, a commonly used primitive for many algorithms. However, this assumption is only approximately valid, as real data sets are typically exposed to some source of contamination. Moreover, the data corruption is often systematic, and random models do not accurately capture the nature of the corruption. Hence, it is desirable that any estimator that is to be used in practice is stable in the presence of arbitrarily and adversarially corrupted data.

Indeed, the problem of designing outlier-robust estimators is natural enough that it is studied by a classical body of work, namely robust statistics, whose prototypical question is the design of estimators that perform well in the presence of corrupted data. This area of statistics was initiated by the pioneering works of Tukey and Huber in the 1960s and addresses a conceptual gap in Fisher’s theory of exact parametric inference — since parametric models are typically only approximately valid, robust statistics is essential to complete the theory. From a practical perspective, the question of how to make good inferences from data sets in which pieces of information are corrupted has become a pressing challenge. Specifically, the need for robust statistics is motivated by data poisoning attacks, in the context of adversarial machine learning, as well as by automatic outlier removal for high-dimensional data sets from a variety of applications.

Classical work in robust statistics pinned down the fundamental information-theoretic aspects of high-dimensional robust estimation, establishing the existence of computable and information-theoretically optimal robust estimators for fundamental problems. In contrast, until very recently, the computational complexity aspects of robust estimators were poorly understood. In particular, even for the basic problem of robustly estimating the mean of a high-dimensional data set, all known robust estimators were hard to compute (i.e., computationally intractable). In addition, the accuracy of the known efficient heuristics degrades quickly as the dimension increases. This state of affairs prompted the following natural question: can we reconcile robustness and computational efficiency in high-dimensional estimation?