# Where Do Q-Functions Come From?

by Sean Meyn (University of Florida) and Gergely Neu (Pompeu Fabra University)

One theoretical foundation of reinforcement learning is optimal control, usually rooted in the Markovian variety known as Markov decision processes (MDPs). The MDP model consists of a state process, an action (or input) process, and a one-step reward that is a function of state and action. The goal is to obtain a policy (function from states to actions) that is optimal in some predefined sense. Chris Watkins introduced the Q-function in the 1980s as part of a methodology for reinforcement learning. Given its importance for over three decades, it is not surprising that the question of the true meaning of Q was a hot topic for discussion during the Simons Institute’s Fall 2020 program on Theory of Reinforcement Learning.

This short note focuses on interactions at the start of the program, and research directions inspired in part by these interactions. To start with, who is Q? Was this code for one of Watkins’ friends at Cambridge? The question was posed early on, which led to an online investigation. The mystery was shattered through a response from Chris: we now know that the letter Q stands for quality, not Quinlyn or Quincy. To discuss further questions and potential answers requires some technicalities.

The discounted-cost optimality criterion is a favorite metric for performance in computer science and operations research, and is the setting of the original Q-function formulation. The definition requires a state process $$\{X_k : k\ge 0\}$$ and an action (or input) process $$\{A_k : k\ge 0\}$$, evolving on respective spaces (which are assumed discrete in this note). There is a controlled transition matrix $$P$$ that describes dynamics: $$X_{k+1}$$ is distributed according to $$P(\cdot|x,a)$$ when $$X_k=x$$ and $$A_k=a$$, for any action sequence that is adapted to the state sequence.

With $$\gamma$$ denoting the discount factor, the Q-function is the solution to a nonlinear fixed-point equation $$T^*Q = Q$$ in which $$T^*$$ is the Bellman operator: $\left(T^*Q\right)(x,a) = r(x,a) + \gamma \mathbb{E}_{X’\sim P(\cdot|x,a)}\left[\max_{a’} Q(X’,a’)\right]$ This must hold for each state-action pair $$(x,a)$$, with the maximum over all possible actions. This is a version of the dynamic programming (DP) equation that has been with us for about seven decades.

The magic of Q-learning, which is based on this DP equation, is that the maximum appears within an expectation. This makes possible the application of Monte Carlo methods to obtain an approximate solution based solely on observations of the actual system to be controlled, or through simulations.

One core idea of modern reinforcement learning (RL) is to find approximate solutions of the DP equation within a function class (e.g., neural networks, as popularized by the deep Q-learning approach of Mnih et al., 2015). While success stories are well-known, useful theory is scarce: we don’t know if a solution exists to an approximate DP equation except in very special settings, and we don’t know if a good approximation will lead to good performance for the resulting policy. We don’t even know if the recursive algorithms that define Q-learning will be stable — estimates may diverge to infinity.

There are many ways to read these negative results, and indeed many articles have been written around this subject. Our own reading is probably among the most radical: without understanding the issues around the existence of solutions to these DP equation approximations or their interpretation, we should search for alternative approximations of dynamic programming suitable for application in RL.

These concerns were raised at Sean Meyn’s boot camp lecture, where he called on listeners to revisit an alternate foundation of optimal control: the linear programming (LP) approach introduced by Manne (1960) and further developed by Denardo (1970) and d’Epenoux (1963). The message was greeted with enthusiasm from some attendees, including Gergely Neu, who responded, “You have blown my mind!” He had been working on his own formulation of this idea, which became logistic Q-learning (more on this below).

# Theory at the Institute and Beyond, February 2022 It has been only a few months since the last time this column appeared. Yet there is so much to write about that it feels like a year has passed. For one area in particular, a decade’s worth of developments seems to have emerged in a few months’ time. I am talking of course about the phenomenal developments on error-correcting codes that we witnessed in the past few months.

Error-correcting codes
Error-correcting codes encode a message into a longer codeword so that the original message can be recovered even if many of the bits of the codeword are corrupted. Clearly, there is a trade-off between the amount of redundancy introduced by the error-correcting code and the number of errors it can recover from. A gold standard for error-correcting codes is being “constant rate, constant distance,” also referred to as being “asymptotically good.” An asymptotically good code can recover from a constant fraction of bit errors in the codeword, and the codeword is only a constant factor longer than the original message.

Error-correcting codes can be described using a family of “parity checks,” where a parity check on a subset $$S$$ of bits is a constraint that there are an even number of $$1$$s within the subset $$S$$. A codeword is then a sequence of bits that satisfy all of the parity check constraints. In a low-density parity check (LDPC) code, there is a family of parity checks on constantly many bits each, that together define the code.

Introduced by Gallager in 1960, LDPC codes are central objects in both the theory and the practice of error correction. In 1996, Sipser and Spielman showed how to use expander graphs to construct LDPCs with constant distance and constant rate. Their construction is a thing of beauty. Pick an expander graph and associate one bit on each edge of the graph. For each vertex of the expander graph, the parity of the sum of bits on its edges is even. This construction also admits linear time encoding and decoding algorithms for the same.

Asymptotically good locally testable codes and quantum LDPCs
A locally testable code is one that admits a highly efficient procedure to detect the presence of errors. More precisely, it admits a testing procedure that queries a small number of randomly selected bits in the codeword, and will reject a corrupted codeword with, say, 1% of errors with constant probability. The gold standard would be a constant-query locally testable code, where the testing procedure queries only a constantly many (say, 100) bits of the corrupted codeword.

Locally testable codes have been central to many of the developments in complexity theory for close to three decades now. Linearity testing, aka locally testing the Hadamard code, is still the prime example for property testing and lies at the gateway to the world of probabilistically checkable proofs.

Intuitively for a code to be locally testable, it needs to admit a large number of constant-size parity checks. In other words, locally testability seemingly necessitates the code to have a lot of redundancy. In fact, the classic examples of locally testable codes, such as Hadamard codes or Reed-Muller codes, have codewords that are superpolynomially larger than the message.

Irit Dinur’s ingenious proof of the PCP theorem yielded as a by-product the first locally testable codes where the codewords were only slightly superlinear in size. It was then that one would dare to ask: can locally testability be achieved for “free”? That is, can one have the best of all worlds: a constant-rate, constant-distance code that also admits a constant-query local-tester?

This long-standing open question always seemed out of reach until late last year, when it was resolved affirmatively by two independent groups of researchers simultaneously!

# Theory at the Institute and Beyond, September 2021 Being in one of the talks in the Simons Institute auditorium, witnessing live and lively interaction with the speaker, feels like the closest thing to normal since the start of the pandemic. There is a sense of tangible joy among the participants just to be sharing the same physical space, let alone the fantastic environs of the Institute. The renewed energy is all there to witness in the programs this semester on Computational Complexity of Statistical Inference (CCSI) and Geometric Methods in Optimization and Sampling (GMOS), both of which are now in full swing. Although masking is maintained, it doesn’t seem to change the quintessential Simons program experience even a little bit. I am referring, of course, to the constant feeling of missing out on all the incredibly interesting activities going on, much of which one is unable to fit into their schedule.

At least some of the palpable energy can be attributed to over 40 postdocs and research fellows who have arrived at the Institute this semester, many of whom will stay on for a year or two. This extraordinary group of young researchers covers the whole gamut of topics, ranging from cryptography, quantum computing, and fairness to machine learning, data structures, algorithms, and complexity theory. Each of these postdocs and fellows gave a 10-minute presentation at the “Meet the Fellows’’ welcome event that the Institute held on September 8 and 9. Check out their talks for glimpses of the cutting edge in all these subfields of theory.

An advance in algebraic circuit complexity
This time around, there is some good news from the front lines on circuit complexity, one of the most challenging arenas within theoretical computer science.

An algebraic circuit consists of gates, each of which carries out either addition or multiplication over some field, say real numbers. The depth of the circuit is the length of the longest path from the output to one of its inputs. Naturally, an algebraic circuit computes a polynomial over its inputs.

In the world of Boolean circuits with AND/OR/NOT gates, lower bounds against constant depth circuits, aka AC0 circuit lower bounds, have been known since the 1980s and are one of the most influential results in complexity theory. For general algebraic circuits over a large field (say reals), even superpolynomial lower bounds for depth three circuits had remained elusive. In a remarkable paper, Nutan Limaye, Srikanth Srinivasan, and Sébastien Tavenas have obtained the first superpolynomial lower bounds against general algebraic circuits of all constant depths over fields of characteristic zero (say reals). Furthermore, the lower-bound result is shown for a simple polynomial known as “iterated matrix multiplication” whose input consists of $$d$$ matrices $$X_1,\ldots,X_d$$ of dimension $$n \times n$$, and the goal is to compute a fixed entry of their product $$X_1 \cdot X_2 \cdots X_d$$. The same work also obtains a depth hierarchy theorem for algebraic circuits showing that for every depth D, there is an explicit polynomial that can be computed by a depth D circuit of size s, but requires circuits of size superpolynomial in s if the depth is D-1.

Remarkable work of Matthew Brennan
The theory community suffered a terrible loss this year with the tragic and untimely passing of one of our rising stars, Matthew Brennan. While still a graduate student at MIT, Matthew almost single-handedly pushed forward an ambitious research program at the intersection of computational complexity and statistics. Here we will try to give a glimpse of Matthew’s extensive body of research.

# Trends in Machine Learning Theory

Welcome to ALT Highlights, a series of blog posts spotlighting various happenings at the recent conference ALT 2021, including plenary talks, tutorials, trends in learning theory, and more! To reach a broad audience, the series is disseminated as guest posts on different blogs in machine learning and theoretical computer science. This initiative is organized by the Learning Theory Alliance and is overseen by Gautam Kamath. All posts in ALT Highlights are indexed on the official Learning Theory Alliance blog.

This is the sixth and final post in the series, on trends in machine learning theory, written by Margalit GlasgowMichal Moshkovitz, and Cyrus Rashtchian.

Introduction
Throughout the last few decades, we have witnessed unprecedented growth of machine learning. Originally a topic formalized by a small group of computer scientists, machine learning now impacts many areas: the physical sciences, medicine, commerce, finance, urban planning, and more. The rapid growth of machine learning can be partially attributed to the availability of large amounts of data and the development of powerful computing devices. Another important factor is that machine learning has foundations in many other fields, such as theoretical computer science, algorithms, applied mathematics, statistics, and optimization.

If machine learning is already mathematically rooted in many existing research areas, why do we need a field solely dedicated to learning theory? According to Daniel Hsu, “Learning theory serves (at least) two purposes: to help make sense of machine learning, and also to explore the capabilities and limitations of learning algorithms.” Besides finding innovative applications for existing tools, learning theorists also provide answers to long-standing problems and ask new fundamental questions.

Modern learning theory goes beyond classical statistical and computer science paradigms by:

• developing insights about specific computational models (e.g., neural networks)
• analyzing popular learning algorithms (e.g., stochastic gradient descent)
• taking into account data distributions (e.g., margin bounds or manifold assumptions)
• adding auxiliary goals (e.g., robustness or privacy), and
• rethinking how algorithms interact with and access data (e.g., online or reinforcement learning).

By digging deep into the basic questions, researchers generate new concepts and models that change the way we solve problems and help us understand emerging phenomena.

This article provides a brief overview of three key areas in machine learning theory: new learning paradigms, trustworthy machine learning, and reinforcement learning. We describe the main thrust of each of these areas, as well as point to a few papers from ALT 2021 (the 32nd International Conference on Algorithmic Learning Theory) that touch each of these topics. To share a broader view, we also asked experts in the areas to comment on the field and on their recent papers. Needless to say, this article only scratches the surface. At the end, we point to places to learn more about learning theory.

# Theory at the Institute and Beyond, May 2021 Another semester of remote activities at the Simons Institute draws to a close. In our second semester of remote operations, many of the quirks of running programs online had been ironed out. But pesky challenges such as participants being in different time zones remained. To cope with these, the programs spread their workshop activities throughout the semester. Not only did this help with the time zones, but it helped keep up the intensity all through the semester.

One hopes that the Institute will never need to organize a remote semester again. At the end of June, we are excited to host the Summer Cluster in Quantum Computation, our first in-person activity since COVID-19 hit. By all expectations, the programs in the fall will likely be fairly close to the normal that we have all been longing for.

In fact, it looks like the Institute will be buzzing with activities in the fall, even more than it used to be, as if that were possible. In a few months, the Institute will welcome more than 40 talented young researchers as postdoctoral researchers and program fellows. They cover the whole gamut of areas, ranging from complexity theory and algorithms to quantum computation and machine learning.

As we settle into a different normal again, one wonders what this new normal will look like a few years from now. In particular, what aspects of this COVID era will continue to stay with us theorists, say, five years from now? Will we have online conferences and workshops? Will meeting anyone from anywhere seem as easy as it does now? How often will a speaker be physically present at our seminars? Only time will tell.

Solving linear systems faster
The ACM SODA conference held virtually in January featured a breakthrough result in algorithms for one of the oldest computational problems — namely, solving a system of linear equations. Richard Peng and Santosh Vempala exhibited a new algorithm to solve sparse linear systems that beats matrix multiplication, in certain regimes of parameters.

# Research Vignette: Cryptography and Game Theory for Blockchains

by Vassilis Zikas (University of Edinburgh) Blockchains and decentralized ledgers are creating a new reality for modern society. A major scientific impact of this new reality is that it creates an interdisciplinary arena where traditionally independent research areas, such as cryptography and game theory/economics, need to work together to address the relevant questions. The Simons Institute’s Fall 2019 program on Proofs, Consensus, and Decentralizing Society fostered much-needed cross-disciplinary collaboration by bringing together researchers from these disciplines.

## An interdisciplinary research arena

The study of the interplay of cryptography and game theory/economics within the blockchain consists of several subareas. Here we discuss three such areas, with a focus on Bitcoin, as one of the most widely studied and adopted to date blockchains and cryptocurrencies. We stress that the relevant literature is huge and that discussing it all here is beyond the scope of this research vignette.

1. Cryptographic security and economic robustnessThe works of Garay et al.  and Pass et al.  initiated the rigorous cryptographic study of the Bitcoin blockchain protocol and proved that under common assumptions about the underlying hash function and assuming the majority of the hashing power in the system is invested in executing the Bitcoin protocol (rather than attacking it), the protocol achieves a number of basic properties, such as common prefix (corresponding to the traditional notion of safety), chain growth (corresponding to liveness), and chain quality (ensuring that honest parties get to contribute blocks at an acceptable frequency). Follow-up work by Badertscher et al.  defined the functionality offered by the Bitcoin ledger and proved the protocol’s security under the above honest-majority assumption in a composable framework. This enables using the ledger functionality directly — without worrying about implementation details — within higher-level protocols. These initial works have ignited a number of follow-ups aiming at relaxing the underlying assumptions and/or tuning the protocol abstraction to better capture reality.

Parallel to the cryptographic security of blockchain protocols, their economic robustness — i.e., their resilience to incentive-driven attacks/misbehavior — has been extensively investigated. A classical example here is the work of Eyal and Sirer , which abstracted the protocol execution as a simple mining game and demonstrated that by withholding mined blocks and strategically releasing them, attackers with favorable network conditions might be incentivized to deviate from the Bitcoin protocol, even when they control only a minority of the hashing power. Note that such deviations cannot affect the worst-case security guarantees established in the cryptographic security proofs, but they can push them to their limits. For instance, although a selfish miner cannot break common prefix and chain quality, they can temporarily create the longest-allowed forks and/or minimize the number of blocks contributed by honest miners to the worst-case value allowed by chain quality.

2. Economics on blockchains. Independently of the questions about their economic robustness, blockchains and their associated cryptocurrencies have created a new scientific playground for economists and game theorists to develop and test new theories and confirm old ones. Indicatively, Huberman et al.  investigated how an ideal abstraction of Bitcoin yields a new market design paradigm where market forces do not control the functionality of the underlying payment system. They showed how analyzing this paradigm can explain behavioral aspects of Bitcoin and pointed to interesting modifications that can affect the efficiency of the protocol itself. Similarly, Benigno et al.  demonstrated how under common economics assumptions, equipping a two-country economy with a global cryptocurrency can create market forces driving the nominal interest rates to be equal in the two countries and yielding a rate relation between the two national currencies; and Prat and Walter  proposed a model using the Bitcoin-U.S. dollar exchange rate to forecast the computing power of the Bitcoin network.

3. Blockchain-induced incentives on cryptographic protocols. A third type of blockchain-related problem where cryptography and game theory meet is the design of more efficient and resilient cryptographic protocols using incentives induced by blockchains and cryptocurrencies. The classical example here is fair multiparty computation (in short, fair MPC). In MPC, n parties wish to run a protocol to jointly compute a function on their private data. Fairness in this context requires that if a worst-case adversary — controlling and coordinating the (malicious) parties attacking the protocol — learns (any information on) the output, then the honest parties should also learn it. Cleve’s well-known impossibility result  mandates that if the adversary controls a majority of parties, then fairness is impossible. Intuitively, the reason is that as the protocol has to tolerate any adversarial coalition, the actual corrupted parties might be the ones to first jointly learn such information; once they do, they can stop playing, thereby preventing other parties from also learning it. In [2, 6], it was shown that using the Bitcoin blockchain as an automated escrow mechanism, one can enforce a version of fairness based on collaterals where either nobody learns any information or if someone does and prevents others from also learning it, then he loses his collateral to them. Assuming the adversary values his collateral higher than breaking fairness, this mechanism induces a fair evaluation. These results were subsequently extended to ensure robustness , i.e., ensure that the protocol will not fail and either it will fairly conclude or the adversary will lose his collateral.

# Research Vignette: Statistical Learning-Aided Design of a Blockchain Payment System

by Xiaowu Dai (UC Berkeley) A blockchain-based payment system uses a decentralized network of computers (miners) to verify transactions and maintain the ledger containing transaction history. The blockchain design is governed by a computer protocol and does not require trust in any centralized organization. The system supports only transactions of its native coins, such as bitcoins, which have value because payment recipients have confidence in their future usefulness. The design of such a novel system was proposed by . It is of interest to study the mechanism design of the blockchain payment system and related questions on transparency and fairness [1, 6, 7, 13].

This project is connected with what was explored during the Simons Institute’s Fall 2019 program on Proofs, Consensus, and Decentralizing Society, especially with the algorithms and economics questions that emerged from the blockchain-based systems.

The blockchain payment system can be modeled as a two-sided market that intermediates between users and miners [5, 9]. The rise in Bitcoin transaction volume, coupled with limited capacity on the number of blocks that can be posted to the blockchain, raises transaction delays and motivates users to pay transaction fees for service priority. We want to understand how the transaction fees influence users’ willingness to participate in the payment system. Our analysis suggests a simple modification on the design by adding a statistical learning-aid recommendation of transaction fees for users. In particular, we build confidence bands for the regression function $$f(\cdot)$$ in the model

$\text{delay}\ \text{ti}\text{me} = f\left( \text{transaction}\ \text{fee} \right) + g\left( \text{control} \right) + \text{disturbance}\text{.}$

Here, the control variable includes the number of pending transactions, fees of pending transactions, current infrastructure level, rate of block addition to the ledger, and history of random arrival of transactions, among others . The $$f(\cdot)$$ is the main nonparametric regression function that we want to infer, and the function $$g(\cdot)$$ is the nuisance parameter. The dimension of the control vector can be large relative to the sample size. We develop a simple procedure to construct an honest confidence band for $$f(\cdot)$$, a method that builds on the recently developed framework (e.g., [2, 3]). The confidence band would cover the true $$f(\cdot)$$ at the nominal level (say, 95% level) uniformly over a large function space given a large sample of data. The notion of honesty is closely related to the use of the worst-case criterion that is necessary for good finite-sample performance .

This statistical learning-aid recommendation could provide the relationship between the transaction fees and the predicted transaction delay time, together with confidence bands. It leads to several implications. First, the recommendation enables the transparency of the blockchain payment system mechanism. It allows a direct comparison with the traditional payment system operated by a centralized firm like Visa or PayPal. The firm can reverse transactions in case of error or fraud, a property that is lacking in the blockchain payment system. On the other hand, users benefit from the transparency since they can decide which payment system is better for processing the transaction and how to balance the trade-off between transaction fees and transaction delay. Second, the blockchain payment system enjoys the Vickrey, Clarke, Groves (VCG) property in that each user pays a transaction fee equal to the externality on other users caused by delaying others’ transactions [4, 8, 9, 14]. The recommendation also adds the fairness guarantee; that way, with high probability, the transaction fee is close to the actual value of service priority and users avoid paying extra.

# Research Vignette: Asynchrony/Synchrony vs. Mobile/Stationary

by Eli Gafni (UCLA) In the Simons Institute program on Proofs, Consensus, and Decentralizing Society, systems like Stellar reminded us of the anomaly that results from introducing cryptography into distributed computing.

The research I embarked on was to find conditions under which the anomaly is eliminated.

In synchronous systems with processors using signatures that may behave in a Byzantine malicious manner, consensus can be reached as long as at most half of the processors are engaged in the malicious behavior. In contrast, when the system is asynchronous but eventually settles on synchronous behavior, consensus can be reached only under the more stringent condition that at most a third of processors rather than at most half are malicious.

This anomaly that synchronous systems and eventually synchronous systems do not have the same capability in reaching consensus is exhibited only with the introduction of cryptography signatures.

We resolve this anomaly by proposing that the definition of asynchrony, while practically pleasing, should be substituted by the more elegant, appealing mathematical definition of viewing asynchrony as the traditional synchronous system, but with the faulty behavior exhibited by different sets from round to round, giving rise to the pair mobile/stationary rather than asynchronous/synchronous.

This shift raises two questions: one about the meaning of mobile Byzantine and the other about the meaning of possibly all signatures forged over the rounds.

We resolve the first question by attributing the faulty behavior solely to the communication subsystem, considering processors to always behave correctly. We resolve the second question by abstracting signatures as just a constraint on which forwarded messages can be forged and which cannot. (We leave open the question of implementing the constraint.)

With these resolutions, a stationary system with signature constraint can reach consensus in the face of malicious faults as long as no more than half of processors fail in a round. The notion of eventual stationarity is now the elegant notion that the mobility of the system freezes. The system was hot and is now cooled. This supports the observation that, mathematically, the correct pair to consider is mobile/stationary rather than asynchrony/synchrony.

# Research Vignette: The Unlikely Friendship of Lattices and Elliptic Curves One of the most fascinating things about cryptography is its deeply paradoxical nature: among its early successes is the ability for two strangers to meet, generate a shared secret key, and thereon communicate privately — all of these performed entirely within a crowd. Over the last several decades, we have witnessed ever more surprising cryptographic protocols, achieving increasingly paradoxical properties, ranging from zero knowledge to fully homomorphic encryption. This article will focus on one such paradox, a new technical approach that has been unraveling surprising power, opening up avenues for hitherto unrealized cryptography. This is the unlikely friendship of two seemingly disparate mathematical structures and conjectured hard problems on these, namely, the hardness of finding short independent vectors in high-dimensional lattices and hardness assumptions on elliptic curves over finite fields. This topic is connected with what was explored during the Simons Institute’s Spring 2020 program on Lattices: Algorithms, Complexity, and Cryptography — the interface of lattices with other mathematical structures and new cryptography that can emerge from it.

Our assumptions

In more detail, our first assumption on high-dimensional lattices is the so-called learning with errors (LWE) proposed by Oded Regev in 2005 . LWE conjectures that it is hard to recover a secret $$\mathbf{s}\in \mathbb{Z}_q^n$$ from a sequence of noisy linear equations on $$\mathbf{s}$$. In more detail, given a large prime $$q$$ and polynomially many pairs $$(\mathbf{a}_i, \langle \mathbf{a}_i,\;\mathbf{s}\rangle + e_i)$$, where $$\mathbf{a}_i \in \mathbb{Z}_q^n$$ are random and $$e_i \in \mathbb{Z}_q$$ are “small” perturbations chosen from some appropriate distribution, the LWE problem asks to find $$\mathbf{s}$$. For our second assumption, we require three cyclic groups of large prime order ($$q$$, say), $$\mathbb{G}_1$$, $$\mathbb{G}_2$$, and $$\mathbb{G}_T$$. Let $$e:\mathbb{G}_1\times \mathbb{G}_2 \to \mathbb{G}_T$$ be a nondegenerate bilinear map or “pairing,” and $$g_1$$ and $$g_2$$ be the generators of $$\mathbb{G}_1$$ and $$\mathbb{G}_2$$, respectively. Then, by the bilinearity of the pairing, we have that $$e (\;g_1^a, \; g_2^b \;) = e(\;g_1,\; g_2\;)^{ab}$$ for any $$a, b \in \mathbb{Z}_q$$. We require that the group operations in $$\mathbb{G}_1$$, $$\mathbb{G}_2$$, and $$\mathbb{G}_T$$ as well as the bilinear map $$e$$ can be efficiently computed. In terms of hardness, we will roughly assume that the only information the adversary can learn is via legitimate group and pairing computations on the elements she receives. This strong security guarantee can be formalized in the so-called generic group model [22, 20]. We may think of $$a$$ as a secret “message” that is encoded in the exponent of a group element $$g_1^a$$. Both LWE and bilinear map–based assumptions have been studied extensively, and we have significant confidence in their hardness. These are also two of the most versatile tools in cryptography and have been used to construct breakthrough applications such as identity-based encryption , fully homomorphic encryption , and such others. Traditionally, these assumptions were used in mutual exclusion for any given construction, but recently we are learning to combine them in novel non-black-box ways to yield surprising results. In this article, I will describe a recent construction of the primitive of broadcast encryption in a joint work with Shota Yamada  where we leveraged a serendipitous interplay of these structures to obtain optimal parameters.

# Lattice Blog Reduction – Part III: Self-Dual BKZ

This is the third and last entry in a series of posts about lattice block reduction. See here and here for the first and second parts, resp. In this post I will assume you have read the other parts.

In the first two parts we looked at BKZ and Slide reduction, the former being the oldest and most useful in practice, while the latter achieves the best provable bounds and has the cleaner analysis. While BKZ is a natural generalization of LLL, we have seen that the analysis of LLL does not generalize well to BKZ. One can view Slide reduction as a different generalization of LLL with the goal of also naturally generalizing its analysis. As we mentioned in the first part, there is another analysis technique based on dynamical systems, introduced in [HPS11]. Unfortunately, as applied to BKZ, there are some cumbersome technicalities and the resulting bounds on the output quality are not as tight as we would like them to be (i.e. as for Slide reduction). One can view the algorithm we are considering today – SDBKZ [MW16] – as a generalization of LLL that lends itself much easier to this dynamical systems analysis: it is simpler, cleaner and yields better results. Since part of the goal of today’s post is to demonstrate this very useful analysis technique, SDBKZ is a natural candidate.

# SDBKZ

Recall the two tools we’ve been relying on in the first two algorithms, SVP and DSVP reduction of projected subblocks: Effect of a call to the DSVP oracle. GSO log norms of the input in black, of the output in blue. Note that the sum of the GSO log norms is a constant, so increasing the length of the last vector, decreases the (average of the) remaining vectors. Effect of a call to the DSVP oracle. GSO log norms of the input in black, of the output in blue. Note that the sum of the GSO log norms is a constant, so increasing the length of the last vector, decreases the (average of the) remaining vectors.

We will use both of them again today. Like BKZ, a tour of SDBKZ starts by calling the SVP oracle on successive blocks of our basis. However, when we reach the end of the basis, we will not decrease the size of the window, since this is actually quite inconvenient for the analysis. Instead, we will keep the size of the window constant but switch to DSVP reduction, i.e. at the end of the BKZ tour we DSVP reduce the last block. This will locally maximize the last GSO vector in the basis, just as the first SVP call locally minimized the first vector of the basis. Then we will move the window successively backwards, mirroring a BKZ tour, but using DSVP reduction, until we reach the beginning of the basis again. At this point, we switch back to SVP reduction and move the window forward, etc. So SDBKZ runs in forward and backward tours. SDBKZ in one picture: apply the SVP oracle to the projected blocks from start to finish and when you reach the end, apply the DSVP oracle to from finish to start. Repeat.

A nice observation here is that the backward tour can be viewed equivalently as: 1) compute the reversed dual basis (i.e. the dual basis with reversed columns), 2) run a forward tour, 3) compute the primal basis again. The first of these two steps is self-inverse: computing the reversed dual basis of the reversed dual basis yields the original primal basis. This means step 3) is actually the same as step 1). So in effect, one can view SDBKZ as simply repeating the following two steps: 1) run a forward tour, 2) compute the reversed dual basis. So it doesn’t matter if we use the primal or the dual basis as input, the operations of the algorithm are the same. This is why it is called Self-Dual BKZ.

There is one caveat with this algorithm: it is not clear, when one should terminate. In BKZ and Slide reduction one can formulate clear criteria, when the algorithm makes no more progress anymore. In SDBKZ this is not the case, but the analysis will show that we can bound the number of required tours ahead of time.

#### The Analysis

We will start by analyzing the effect of a forward tour. Let $${\mathbf{B}}$$ be our input basis. The first call to the SVP oracle in a forward tour replaces $${\mathbf{b}}_1$$ with the shortest vector in $${\mathbf{B}}_{[1,k]}$$. This means that the new basis $${\mathbf{B}}’$$ satifies $$\| {\mathbf{b}}_1′ \| \leq \sqrt{\gamma_k} (\prod_{i=1}^k \|{\mathbf{b}}_i^* \|)^{1/k}$$ by Minkowski’s bound. Equivalently, this can be written as $\log \| {\mathbf{b}}_1′ \| \leq \log \sqrt{\gamma_k} + \frac1k (\sum_{i=1}^k \log \|{\mathbf{b}}_i^* \|).$ So if we consider the $$\log \|{\mathbf{b}}_i^*\|$$ as variables, it seems like linear algebra could be useful here. So far, so good. The second step is more tricky though. We know that the next basis $${\mathbf{B}}”$$, i.e. after the call to the SVP oracle on $${\mathbf{B}}’_{[2,k+1]}$$, satisfies $${\mathbf{b}}_1” = {\mathbf{b}}_1’$$ and $$\| ({\mathbf{b}}_2”)^* \| \leq \sqrt{\gamma_k} (\prod_{i=2}^{k+1} \|({\mathbf{b}}’_i)^* \|)^{1/k}$$. Unfortunately, we have no control over $$\|({\mathbf{b}}’_i)^* \|$$ for $$i \in {2,\dots,k}$$, since we do not know how the SVP oracle in the first call changed these vector. However, we do know that the lattice $${\mathbf{B}}_{[1,k+1]}$$ did not change in that call. So we can write $\prod_{i=2}^{k+1} \|({\mathbf{b}}’_i)^* \| = \frac{\prod_{i=1}^{k+1} \|{\mathbf{b}}_i^* \|}{\| {\mathbf{b}}’_1 \|}$ and thus we obtain $\log \| ({\mathbf{b}}_2′)^* \| \leq \log \sqrt{\gamma_k} + \frac1k (\sum_{i=1}^{k+1} \log \|{\mathbf{b}}_i^* \| – \log \|{\mathbf{b}}’_1 \|).$ Again, this looks fairly “linear algebraicy”, so it could be useful. But there is another issue now: in order to get an inequality purely in the input basis $${\mathbf{B}}$$, we would like to use our inequality for $$\log \|{\mathbf{b}}_1′ \|$$ in the one for $$\log \| ({\mathbf{b}}_2′)^* \|$$. But the coefficient of $$\log \|{\mathbf{b}}_1′ \|$$ is negative, so we would need a lower bound for $$\log \|{\mathbf{b}}_1′ \|$$. Furthermore, we would like to use upper bounds for our variables later, since the analysis of a tour will result in upper bounds and we would like to apply it iteratively. For this, negative coefficients are a problem. So, we need one more modification: we will use a change of variable to fix this. Instead of considering the variables $$\log \| {\mathbf{b}}_i^* \|$$, we let the input variables to our forward tour be $$x_i = \sum_{j < k+i} \log \|{\mathbf{b}}^*_i \|$$ and the output variables $$y_i = \sum_{j \leq i} \log \|({\mathbf{b}}’_i)^* \|$$ for $$i \in [1,\dots,n-k]$$. Clearly, we can now write our upper bound on $$\log \|({\mathbf{b}}’_1)^*\|$$ as $y_1 \leq \log \sqrt{\gamma_k} + \frac{x_1}{k}.$ More generally, we have $\|({\mathbf{b}}’_i)^* \| \leq \sqrt{\gamma_k} \left(\frac{\prod_{j=1}^{i+k-1} \|{\mathbf{b}}_j^* \|}{\prod_{j=1}^{i-1} \|({\mathbf{b}}’_j)^* \|} \right)^{\frac1k}$ which means for our variables $$x_i$$ and $$y_i$$ that $y_i = y_{i-1} + \log \| ({\mathbf{b}}’_i)^* \| \leq y_{i-1} + \log \sqrt{\gamma_k} + \frac{x_i – y_{i-1}}{k} = (1-\frac1k) y_{i-1} + \frac1k x_i + \log \sqrt{\gamma_k}.$

Note that we can write each $$y_i$$ in terms of $$x_i$$ and the previous $$y_i$$ with only positive coefficients. So now we can apply induction to write each $$y_i$$ only in terms of the $$x_i$$’s, which shows that $y_i = \frac1k \sum_{j=1}^i \omega^{i-j} x_j + (1-\omega)^i k \alpha$ where we simplified notation a little by defining $$\alpha = \log \sqrt{\gamma_k}$$ and $$\omega = 1-\frac1k$$. By collecting the $$x_i$$’s and $$y_i$$’s in a vector each, we have the vectorial inequality ${\mathbf{y}} \leq {\mathbf{A}} {\mathbf{x}} + {\mathbf{b}}$ where ${\mathbf{b}} = \alpha k \left[ \begin{array}{c} 1 – \omega \\ \vdots \\ 1 – \omega^{n-k} \end{array}\right] \qquad\qquad {\mathbf{A}} = \frac1k \left[ \begin{array}{cccc} 1 & & & \\ \omega & 1 & & \\ \vdots & \ddots & \ddots & \\ \omega^{n-k-1} & \cdots & \omega & 1 \end{array} \right].$

Now recall that after a forward tour, SDBKZ computes the reversed dual basis. Given the close relationship between the primal and the dual basis and their GSO, one can show that simply reversing the vector $${\mathbf{y}}$$ will yield the right variables $${\mathbf{x}}’_i$$ to start the next “forward tour” (which is actually a backward tour, but on the dual). I.e. after reversing $${\mathbf{y}}$$, the variables represent the logarithm of the corresponding subdeterminants of the dual basis. (For this we assume for convenience and w.l.o.g. that the lattice has determinant 1; otherwise, there would be a scaling factor involved in this transformation.)

In summary, the effect on the vector $${\mathbf{x}}$$ of executing once the two steps, 1) forward tour and 2) computing the reversed dual basis, can be described as ${\mathbf{x}}’ \leq {\mathbf{R}} {\mathbf{A}} {\mathbf{x}} + {\mathbf{R}} {\mathbf{b}}$ where $${\mathbf{R}}$$ is the reversed identity matrix (i.e. the identity matrix with reversed columns). Iterating the two steps simply means we will be iterating the vectorial inequality above. So analyzing the affine dynamical system ${\mathbf{x}} \mapsto {\mathbf{R}} {\mathbf{A}} {\mathbf{x}} + {\mathbf{R}} {\mathbf{b}}$ will allow us to deduce information about the basis after a certain number of iterations.

#### Small Digression: Affine Dynamical Systems

Consider some dynamical system $${\mathbf{x}} \mapsto {\mathbf{A}} {\mathbf{x}} + {\mathbf{b}}$$ and assume it has exactly one fixed point, i.e. $${\mathbf{x}}^*$$ such that $${\mathbf{A}} {\mathbf{x}}^* + {\mathbf{b}} = {\mathbf{x}}^*$$. We can write any input $${\mathbf{x}}’$$ as $${\mathbf{x}}’ = {\mathbf{x}}^* + {\mathbf{e}}$$ for some “error vector” $${\mathbf{e}}$$. When applying the system to it, we get $${\mathbf{x}}’ \mapsto {\mathbf{A}} {\mathbf{x}}’ + {\mathbf{b}} = {\mathbf{x}}^* + {\mathbf{A}} {\mathbf{e}}$$. So the error vector $${\mathbf{e}}$$ is mapped to $${\mathbf{A}} {\mathbf{e}}$$. Applying this $$t$$ times maps $${\mathbf{e}}$$ to $${\mathbf{A}}^t {\mathbf{e}}$$, which means after $$t$$ iterations the error vector has norm $$\|{\mathbf{A}}^t {\mathbf{e}} \|_{p} \leq \|{\mathbf{A}}^t \|_{p} \| {\mathbf{e}} \|_{p}$$ (where $$\| \cdot \|_{p}$$ is the matrix norm induced by the vector $$p$$-norm). If we can show that $$\|{\mathbf{A}} \|_p \leq 1 – \epsilon$$, then $$\|{\mathbf{A}}^t \|_p \leq \|A \|^t \leq (1-\epsilon)^t \leq e^{-\epsilon t}$$, so the error vector will decay exponentially in $$t$$ with base $$e^{-\epsilon}$$ and the algorithm converges to the fixed point $${\mathbf{x}}^*$$.

Back to our concrete system above. As we just saw, we can analyze its output quality by computing its fixed point and its running time by computing $$\|{\mathbf{R}} {\mathbf{A}} \|_p$$ for some induced matrix $$p$$-norm. Since this has been a lenghty post already, I hope you’ll trust me that our system above has a fixed point $${\mathbf{x}}^*$$, which can be written out explicitely in closed form. As a teaser, its first coordinate is $x^*_1 = \frac{(n-k)k}{k-1} \alpha.$ This means that if the algorithm converges, it will converge to a basis such that $$\sum_{j \leq k}\log \| {\mathbf{b}}_j^*\| \leq \frac{(n-k)k}{k-1} \log \sqrt{\gamma_k}$$. Applying Minkowski’s Theorem to the first block $${\mathbf{B}}_{[1,k]}$$ now shows that the shortest vector in this block satisfies $$\lambda_1({\mathbf{B}}_{[1,k]}) \leq \sqrt{\gamma_k}^{\frac{n-1}{k-1}}$$. Note that the next forward tour will find a vector of such length. Recall that we assumed that our lattice has determinant 1, so this is exactly the Hermite factor achieved by Slide reduction, but for arbitrary block size (we do not need to assume that $$k$$ divides $$n$$) and better than what we can achieve for BKZ (even using the same technique). Moreover, the fixed point actually gives us more information: the other coordinates (that I have ommited here) allow us control over all but $$k$$ GSO vectors and by terminating the algorithm at different positions, it allows us to choose which vectors we want control over.

It remains to show that the algorithm actually converges and figure out how fast. It is fairly straight-forward to show that $\|{\mathbf{R}} {\mathbf{A}}\|_{\infty} = \|{\mathbf{A}}\|_{\infty} = 1 – \omega^{n-k} \approx e^{-\frac{n-k}{k}}.$ (Consider the last row of $${\mathbf{A}}$$.) This is always smaller than 1, so the algorithm does indeed converge. For $$k = \Omega(n)$$ this is bounded far enough from 1 such that the system will converge to the fixed point up to an arbitrary constant in a number of SVP calls that is polynomial in $$n$$. Using another change of variable [N16] or considering the relative error instead of the absolute error [MW15], one can show that this also holds for smaller $$k$$.

As mentioned before, this type of analysis was introduced in [HPS11] and has inspired new ideas even in the heuristic analysis of BKZ. In particular, one can predict the behavior of BKZ by simply running such a dynamical system on typical inputs (and making some heuristic assumptions). This idea has been and is being used extensively in cryptanalysis and in optimizing parameters of state-of-the-art algorithms.

Finally, a few last words on SDBKZ: we have seen that it achieves a good Hermite factor, but what can we say about the approximation factor? I actually do not know if the algorithm achieves a good approximation factor and also do not see a good way to analyze it. However, there is a reduction [L86] from achieving approximation factor $$\alpha$$ to achieving Hermite factor $$\sqrt{\alpha}$$. So SDBKZ can be used to achieve approximation factor $$\gamma_k^{\frac{n-1}{k-1}}$$. This is a little unsatisfactory in two ways: 1) the reduction results in a different algorithm, and 2) the bound is a little worse than the factor achieved by slide reduction, which is $$\gamma_k^{\frac{n-k}{k-1}}$$. On a positive note, a recent work [ALNS20] has shown that, due to the strong bound on the Hermite factor, SDBKZ can be used to generalize Slide reduction to arbitrary block size $$k$$ in a way to achieve the approximation factor $$\gamma_k^{\frac{n-k}{k-1}}$$. Another recent work [ABFKSW20] exploited the fact that SDBKZ allows to heuristically predict large parts of the basis to achieve better bounds on the running time of the SVP oracle.

• Lovász. An Algorithmic Theory of Numbers, Graphs and Convexity. 1986

• Hanrot, Pujol, Stehlé. Analyzing blockwise lattice algorithms using dynamical systems. CRYPTO 2011

• Micciancio, Walter. Practical, predictable lattice basis reduction – Full Version. http://eprint.iacr.org/2015/1123

• Micciancio, Walter. Practical, predictable lattice basis reduction. EUROCRYPT 2016

• Neumaier. Bounding basis reduction properties. Designs, Codes and Cryptography 2016

• Aggarwal, Li, Nguyen, Stephens-Davidowitz. Slide Reduction, Revisited—Filling the Gaps in SVP Approximation. CRYPTO 2020

• Albrecht, Bai, Fouque, Kirchner, Stehlé, Wen. Faster Enumeration-based Lattice Reduction: Root Hermite Factor $$k^{(1/(2k))}$$ in Time $$k^{(k/8 + o(k))}$$. CRYPTO 2020