by James R. Lee, University of Washington

The rise of machine learning in recent decades has generated a renewed interest in online decision-making, where algorithms are equipped only with partial information about the world, and must take actions in the face of uncertainty about the future. There are various ways to analyze how much the presence of uncertainty degrades optimization. Two of the most general models (*competitive analysis* and *multi-arm bandits*) are “probability free” (aka “worst-case,” or “adversarial”), in the sense that one can measure the performance of algorithms without making distributional assumptions on the input.

## Competitive analysis

One such model goes by the name of competitive analysis, and has been studied in theoretical CS since the 1970s. Imagine, for instance, that one maintains a binary search tree (BST) with key-value pairs at the nodes. At every point in time, a key is requested, and the corresponding value is looked up via binary search. Along the search path, the algorithm is allowed to modify the tree using standard tree rotation operations.

The cost of such an algorithm is the average number of binary search steps per key-lookup over a long sequence of requests. To minimize the cost, it is beneficial for the algorithm to dynamically rebalance the tree so that frequently-requested keys are near the root. Sleator and Tarjan had this model in mind when they invented the Splay Tree data structure. Their (still unproven) *Dynamic Optimality Conjecture* asserts that, for every sequence of key requests, the number of operations used by the Splay Tree is within a constant factor of the number of operations used by *any binary search tree,* even an *offline algorithm* that is allowed to see the entire sequence of key requests in advance. (This is a very strong requirement! We make no assumption on the structure of the request sequence, and have to compare our performance to an algorithm that sees the entire request sequence up front.)

In the language of competitive analysis, the conjecture states that the Splay Tree algorithm is “competitive.” In general, an *\(\alpha\)-competitive* online algorithm is one whose total cost on every input sequence is within an \(\alpha\) factor of the cost incurred by the optimal *offline* algorithm that sees the whole input sequence in advance.

## Multi-arm bandits

Another compelling model arising in online learning is the *multi-armed bandit problem.* Here, an agent has a set \(\mathcal{A}\) of feasible actions. At time \(t=1,2,3,\ldots\), the agent plays an action \(a_t \in \mathcal{A}\), and only then the cost of that action is revealed. Imagine, for instance, choosing an advertisement \(a_t \in \mathcal{A}\) to present to a user, and then learning afterward the probability it led to a sale. The goal of an algorithm in this model is to achieve a total cost (often called the *loss*) that is not too much worse than the best *fixed* action in hindsight. The gap between the two is called the *regret.*