Research Vignette: Mathematical Software Obfuscation

By Amit Sahai

From revolutionaries and spies to ordinary citizens living under repressive regimes, for centuries people have had the need to hide operational secrets. By an operational secret, we mean a secret that must be kept safe, where the secret is also used to perform ordinary tasks. For example, someone may want to keep the amount of cash she has on-hand secret, but she also needs to refer to this information when making everyday purchases. Even secrets like the location of hidden refugees under one’s protection could be an operational secret, since the holder of the secret would want to plan her ordinary movements to avoid the secret location that she is protecting. Through many clever means, people throughout history managed to protect such secrets. Essential to such protection was the refuge of the human mind, the ultimate sanctuary where secrets can be pondered and processed without fear of loss.

But what if this most basic assumption proves false? What if an adversary can read someone’s mind while she is thinking about the secret she needs to protect? Indeed, it is hard to imagine how someone can protect secrets when the inner dialogue of her mind can betray her. Fortunately, this scenario remains science fiction when applied to humanity. However, if we replace humans by computer programs, this situation is all too common. Computer programs, with inbuilt databases of sensitive information, are routinely captured by adversarial entities. These adversaries can reverse-engineer these programs and see every detail of how the programs “think” about their secrets as they perform their calculations. Furthermore, adversaries can even modify the programs to alter their behavior, in order to coax secrets out of the original computer code. Because computer programs with inbuilt operational secrets are so useful, cryptographic researchers have pondered this challenge as far back as the classic work of Diffie and Hellman in 1976. For decades, however, our ability to suitably “obfuscate” programs in order to defend against these kinds of attacks was based on unsatisfying approaches that failed to provide security against even moderately skilled adversaries.

This changed in 2013, due to the work of Garg, Gentry, Halevi, Raykova, Sahai and Waters, which gave the first sound mathematical approach to this problem. This work enabling a mathematical approach to securing software has been hailed as, “a watershed moment for cryptography,” by the Simons Foundation’s Quanta magazine, and its ramifications were a major theme of the Simons Institute program on Cryptography in Summer 2015. To illustrate why this recent advance in obfuscation has caused such a stir in the cryptographic community, let us consider an analogy with the ancient problem of sending encrypted messages.

CONTINUE READING

Research Vignette: Hard Problems All The Way Up

By Benjamin Rossman and Li-Yang Tan

Computational complexity theory is the branch of computer science that studies the nature and limits of efficient computation; an overarching goal in this area is the classification of computational problems according to their inherent difficulty. One of the most intensively studied such classifications is provided by the Polynomial Hierarchy, which was introduced by Albert Meyer and Larry Stockmeyer in 1972. This hierarchy classifies problems according to a natural notion of logical complexity, and is defined with an infinite number of levels: problems at the zeroth level are the “easiest”, and for every integer k, problems at the (k+1)-st level have logical complexity “one notch higher” than those at level k. In this vignette we describe some recent work done at the Simons Institute, in collaboration with Rocco Servedio of Columbia University, which sheds new light on the structure of this hierarchy.

The polynomial hierarchy begins at the zeroth level with the class P of problems solvable in polynomial time — the easiest problems. In theoretical computer science P captures an elegant and robust notion of computational efficiency: polynomial-time algorithms are considered efficient, and problems in P are considered tractable. Examples of fundamental problems in P include Linear Programming, Matching, and Primality. At the first level above P in the hierarchy is NP, the class of problems solvable in nondeterministic polynomial time, meaning that a “yes” answer can be verified efficiently. Well-known examples of problems in NP include the Boolean Satisfiability and Traveling Salesman problems, among countless others that arise in surprisingly diverse contexts.

Clearly P ⊆ NP, since the answer to a computational problem can be verified efficiently if it can be determined efficiently. But is the converse true? No efficient algorithms are known for the hardest problems in NP, the so-called NP-complete problems: while a “yes” answer to these problems can be verified in polynomial time, to date our best algorithms for solving them — determining whether the answer is “yes” or “no” — all run in exponential time. Indeed, the famous P ≠ NP conjecture asserts that there do not exist efficient algorithms for solving NP-complete problems: any algorithm for these problems must run in super-polynomial time. In other words, the P ≠ NP conjecture asserts that first two levels of the Polynomial Hierarchy are distinct.

To infinity and beyond
The Polynomial Hierarchy extends beyond P and NP to include an infinite number of classes of increasing logical complexity: just as NP generalizes P, the second level of the hierarchy generalizes NP, the third generalizes the second, and so on ad infinitum. To understand this hierarchy it is convenient to think of problems in NP as asking questions with a single existential quantifier. For example, the Boolean Satisfiability problem asks if there exists a satisfying assignment to a Boolean formula, and the Traveling Salesman problem asks if there exists a short tour visiting every city exactly once. Problems at the k-th level of the hierarchy allow for questions with not just one existential quantifier, but k alternating quantifiers:

Does there exist X, such that for all Y, there exists Z such that…?

CONTINUE READING

Reed, Muller, and Costa: Together at the Simons Institute

By Henry Pfister, Yury Polyanskiy, Rüdiger Urbanke and Yihong Wu

1 Information Theory

The publication of Shannon’s 1948 paper [Sha48] is widely recognized as the inception of information theory as a field of study. A key result in Shannon’s paper is that messages can be transmitted reliably over a noisy channel if and only if a sufficient amount of redundancy is added to correct errors. In particular, the error probability after decoding can be made to vanish as the message length increases if the rate R (i.e., the number of information bits per channel use) is less than the Shannon capacity C of the channel. Shannon’s proof relies on a randomly chosen codebook and does not lend itself to the practical construction of good codes. Thus, the 1948 paper also marks the beginning of the search for deterministic code constructions that achieve the Shannon capacity.

First, we consider a simple model of point-to-point communication. The binary erasure channel is a noisy channel with input alphabet {0, 1} and output alphabet {0, 1, ?}. During each time step, the transmitter sends an arbitrary element of the input alphabet, X, and the receiver observes a random channel output, Y, whose distribution is given by

\Prob (Y=y|X=x) = \begin{cases} 1-\epsilon & \textrm{if }x=y \\ \epsilon & \textrm{if }y=\;? \end{cases}

The Shannon capacity of the channel is C = 1 − ϵ bits per channel use.

In wireless networks, the point-to-point model of communication does not capture the true complexity of the problem. One may also consider how much redundancy is required for pairs of transmitters and receivers to communicate reliably when their transmissions interfere with one another. The mathematical model of this problem is known as the interference channel [Car75]. In this case, the challenge is to determine the minimal amount of redundancy that each transmitter must add to their message. Characterizing the capacity region of the interference channel, namely, the set of rate pairs that allow reliable communication, is among the most significant open problems in information theory.

In the two sections below, we will describe recent advances in the understanding of these two problems.

CONTINUE READING

Cynthia Dwork on Algorithmic Fairness

Today the New York Times has an interview with Cynthia Dwork, who is participating in the ongoing Cryptography program.

Cynthia is well known, among other things, for her work on non-malleability (about which she spoke in the historical papers seminar series) and for her work on differential privacy.

The interview is about concerns about fairness, and you can read about the concept of fairness through awareness here.

Summer School on Theoretical Neuroscience

Every summer, Berkeley’s Redwood Institute for Theoretical Neuroscience organizes a ten-day summer course on Neuroscience, bringing to Berkeley several dozen young researchers (graduate students and postdocs) from all walks of science with a serious interest in learning about Neuroscience, and especially about techniques for mining and modeling of neuroscience data. This year, the Simons Institute and the Mathematical Sciences Research Institute were, for the first time, co-organizers of this course, in order to attract more computer scientists and mathematicians to this important field. The program included one day of lectures by Vitaly Feldman and myself on the theory of computation, learning theory, and computational models by Valiant, and recently by Vempala and myself.

The Computing Community Consortium, which has been recently very active in promoting the emerging research interface between Computation and Brain Science, see cra.org/ccc/events/brain-workshop, generously agreed to fund this CS component of the summer course.

We advertised the course to CS departments, and from a field of about a dozen applications we selected four CS graduate students, who ended up attending the course: Chihua Ma (UI Chicago), Yunjie Liu (UC Davis and Lawrence Berkeley Labs), Yu Liu (UC Davis), and Antonio Moretti (Columbia). Below are their contributions about highlights of the summer course.

Continue reading

Don Knuth on SAT

Next semester, the Simons Institute will host a program on Fine-Grained Complexity and Algorithms Design, one of whose core topics is the exact complexity of satisfiability problems.

Coincidentally, Don Knuth has just posted online a draft of Section 7.2.2.2 of The Art of Computer Programming, which will be part of Volume 4B.
Chapter 7 deals with combinatorial searching; Section 7.2, titled “Generating all possibilities” is about enumeration and exhaustive search; Subsection 7.2.2 is about basic backtracking algorithms; and Sub-subsection 7.2.2.2 is about SAT.

Even people who are familiar with Don’s famously comprehensive approach to exposition might be surprised to see that this sub-subsection runs to 317 pages, 106 of which are devoted to solutions of exercises. Unfortunately, there was no space to provide the solution to Exercise 516.

Indistinguishability Obfuscation and Multi-linear Maps: A Brave New World (Guest Post by Ran Canetti)

The following post is written by Ran Canetti

A bunch of us hapless cryptographers got the following boilerplate comment from the FOCS’15 PC:

Overall, submissions related to multi-linear maps and indistinguishability obfuscation were held to a somewhat higher standard. The PC expressed some concern with the recent flurry of activities pertaining to multi-linear maps and indistinguishability obfuscation, given how little we understand and can say and *prove* about the underlying hardness assumptions.

This comment was clearly written with the best of intentions, to explain views expressed at the PC deliberations. And I’m thankful to it – mainly since it made the underlying misconceptions so explicit that it mandated a response. So, after discussing and commiserating with colleagues here at Simons, and after amusing ourselves with some analogues of above statement (e.g., “results on NP completeness are held to a higher standard given how little we understand and can say and *prove* about the hardness solving SAT in polynomial time”), I decided to try to write an – obviously subjective – account for the recent developments in multilinear maps and indistinguishability obfuscation (IO) and why this exciting research should be embraced and highlighted rather than “held to a somewhat higher standard” — in spite of how little we understand about the underlying assumptions. The account is aimed at the general CS-theorist.

Let me start by giving rough definitions of the concepts involved. An Indistinguishability Obfuscator (IO) is a randomized algorithm O that takes as input a circuit C and outputs a (distribution over) circuits O(C) with the properties that:

  • C and O(C) have the same functionality,
  • O(C) is only polynomially larger than C,
  • for any two same-size, functionally equivalent circuits C and C’ we have that O(C) ~ O(C’) (i.e., the distributions over strings representing O(C) and O(C’) are computationally indistinguishable).

IO has been proposed as a notion of obfuscation in 2000 (Hada, Barak-Goldreich-Impagliazzo-Sahai-Vadhan-Yang). Indeed, it is arguably a clean and appealing notion – in some sense the natural extension of semantic security of standard encryption to “functionality-preserving encryption of programs”. However, it has been largely viewed as too weak to be of real applicability or interest. (There were also no candidate polytime IO schemes, but this in my eyes is a secondary point, see below.)

Things changed dramatically in 2013 when Sahai and Waters demonstrated how IO schemes can be ingeniously combined with other rather “mundane” cryptographic constructs to do some amazing things. Since then dozens of papers came about that extend the SW techniques and apply them to obtain even more amazing things – that by now have transcended crypto and spilled over to other areas. (e.g.: deniable encryption, succinct delegation, succinct multi-party computation with hardly any interaction, one message succinct witness hiding and witness indistinguishable proofs, hash functions with random-oracle-like properties, hardness results for PPAD, and many more). In fact, think about a result in your area that assumes that some computation is done inside a black box – most probably IO can replace that assumption in one way or another…

Still, my (subjective but distinct) feeling is that we are far from understanding the limits and full power of IO. Furthermore, the study of IO has brought with it a whole new toolbox of techniques that are intriguing in their own right, and teach us about the power and limitations of working with “encrypted computations”.

So far I have not mentioned any candidate constructions of IO – and indeed the above study is arguably valuable as a pure study of this amazing concept, even without any candidate constructions. (Paraphrasing Levin on quantum computers, one can take the viewpoint that the above is the study of impossibility results for IO…)

However, unlike quantum computers, here we also have candidate constructions. This is where multilinear maps come to play.

Multi-linear maps are this cool new technical tool (or set of tools) that was recently put forth. (The general concept was proposed by Boneh and Silverberg around 2000, and the first candidate construction of one of the current variants was presented in 2012 by Garg, Gentry and Halevi.) Essentially, a multilinear map scheme is a fully homomorphic encryption scheme where the public key provides, in addition to the ability to encrypt elements and perform homomorphic operations on ciphertexts, also the ability to partially decrypt ciphertexts under certain restrictions. There are many incomparable variants of this general paradigm, which differ both in the functionality provided and in the security guarantees. Indeed, variants appear to be closely tied to candidate constructions. Furthermore, our understanding of what’s possible here has been evolving considerably, with multiple new constructions, attacks, and fixes reported.
Still, the number and variety of applications of multi-linear maps makes it clear that this “family of primitives” is extremely powerful and well worth studying – both at the level of candidate constructions, at the level of finding the “right” computational abstractions, and at the level of applications. In a sense, we are here back to the 70’s: we are faced with this new set of algebraic and number theoretic tools, and are struggling to find good ways to use them and abstract them.

Indeed, some of the most powerful applications of multilinear maps are candidate constructions of IO schemes. The first such candidate construction (by Garg, Gentry, Halevi, Raykova, Sahai and Waters in 2013) came with only heuristic arguments for security; However more rigorous analyses of this and other constructions, based on well-defined formulations of multi-linear map variants, soon followed suite. Some of these analyses have eventually been “broken” in the sense that we currently don’t have candidate constructions that satisfy the properties they assume. Still, other analyses do remain valid. Indeed, there are no attacks against the actual basic IO scheme of Garg etal.

The fact that the only current candidate constructions of IO need to assume existence of some variant of multi-linear maps at some point or another may make it seem as it the two concepts are somehow tied together. However, there is no reason to believe that this is the case. For all we know, multi-linear maps are just the path first uncovered to IO, and other paths may well be found. Similarly, even if IO turns out to be unobtainable for some reason, the study of multilinear maps and their power will still remain very relevant.

So, to sum up this long-winded account:

  • IO is a natural and fascinating computational concept. Studying its consequences (both within and outside cryptography) is a well worth endeavor.
  • Studying new candidate constructions of IO and/or new analyses of their security is another well worth endeavor.
  • Multilinear maps are an intriguing and powerful set of techniques and tools. Finding better candidate constructions and abstractions is of central importance to cryptography. Finding new cool uses of these maps is another intriguing challenge.
  • The three should be treated as separate (although touching and potentially interleaving) research efforts.

———–
I’d like to thank Guy Rothblum and Vinod Vaikuntanathan for great comments that significantly improved this post.

Historical talks

During the summer program on cryptography, every Monday afternoon there is a talk on the history of a classic paper or series of papers. Last week, Russell Impagliazzo spoke on his work with Steven Rudich on the impossibility of basing one-way permutations or key agreement on the existence of one-way functions. The week before, Umesh Vazirani spoke on quantum computing, and earlier Ron Rivest spoke about the origins of modern cryptography.

All talks are recorded, and the recordings are available here.

Tomorrow afternoon, Daniele Micciancio will speak at 3pm on lattice-based cryptography.

Tomorrow is also the first day of the Workshop on the mathematics of modern cryptography. The program starts at 9:30am Pacific time, and all talk are broadcast live, as usual.

Tune in at 9:30am (12:30pm Eastern)

This Summer, most of the theory of cryptography community is in Berkeley to participate in the Simons Institute program on cryptography.

The program started with week-long series of lectures, all available here, which covered tools such as lattice problems, multilinear maps, oblivious RAMs, scrambled circuits, and differential privacy, and their applications to homomorphic encryption, obfuscation, delegated computations, and multiparty computations.

This week there is a workshop on secure computations, all whose talks are livestreamed, which starts today at 9:30 Pacific Time with a talk by Amit Sahai on obfuscation.

Research Vignette: Semialgebraic Geometry of Ranks

By Kaie Kubjas

The rank of a matrix M is a fundamental notion in linear algebra. While usually defined as the dimension of the span of the rows or the span of the columns, it could also be defined as the minimal number r such that the matrix M admits a factorization as M=AB, where A has r columns and B has r rows. If all entries of the matrix M are nonnegative, then such a factorization can be pictured as a pair of nested polytopes: the outer one is given by the inequalities Ax≥0, and the inner one by the convex hull of the columns of B. Up to an affine transformation, this picture is independent of the specific factorization AB.

Figure 1: Two nested polygons corresponding to a rank 3 matrix with 6 rows and 5 columns

In many applications, we are interested in factorizations of a certain particular form. For example, we can study factorizations M=AB where A,B have nonnegative entries. The minimal r as above for which such a factorization exists is called the nonnegative rank. Cohen and Rothblum formulated this definition geometrically: a nonnegative matrix M has nonnegative rank at most r if and only if there exists a polytope with r vertices that can be fitted between two nested polytopes associated with the matrix.

Figure 2: Two quadrangles and a triangle between them

The interest in the nonnegative rank started in the combinatorial optimization community with the work of Yannakakis at the end of the 1980s. A linear program aims to minimize or maximize a linear function over a region that is given by linear constraints, i.e., a polyhedron or a polytope. Yannakakis showed that the minimal number of variables and constraints needed to express a linear program over a polytope is closely related to the nonnegative rank of a matrix associated to this polytope. 

One line of research that started with the seminal paper of Yannakakis studies lower bounds on the nonnegative rank, and later also on the positive semidefinite rank, for different polytopes that appear in combinatorial optimization. For example, the Traveling Salesman Problem (TSP) asks: Given a list of cities, what is the shortest cycle that visits every city exactly once? It can be formulated as a linear program over a region which is called the TSP polytope. The TSP problem is NP-hard, and some of the attempts to prove P=NP aimed to give polynomial formulations of the TSP polytope. This motivated Yannakakis to look for superpolynomial lower bounds on the nonnegative rank of the TSP polytope, which was completed by Fiorini et al. in 2012.

The notion of nonnegative rank also appears in statistics: The set of stochastic matrices of nonnegative rank at most r is called the r-th mixture model. It represents the joint probabilities of two random variables that are independent given a third random variable with r possible values. Given a data matrix that is obtained by an opinion poll or a measurement, one would like to estimate the parameters of the true probability distribution that the data come from. Specifically, the maximum likelihood estimate of a data matrix is a matrix in the r-th mixture model that maximizes a specific function, called the likelihood function. There are several ways for numerically solving the maximum likelihood estimation in practice. These methods, however, do not provide a certificate for having found the global optimum.

In my recent research together with Eggermont, Horobeţ, Robeva, and Sturmfels (Robeva and Sturmfels attended the Algebraic Geometry program, and Horobeţ participated in one of the workshops at the Simons Insitute), we have been interested in exact descriptions of the sets of matrices of nonnegative rank at most r. They are semialgebraic sets, which means that they can be characterized by Boolean combinations of polynomial equations and inequalities. Knowing quantifier-free semialgebraic descriptions of these sets would give an exact method for checking if a matrix lies in them. It would be also an essential step towards computing maximum likelihood estimates with certificate.

CONTINUE READING