## June 22

Time | Speaker | Title | Slides | Video |

8:00-8:30am | Breakfast | |||

8:30-9:30am | Jelani Nelson | Chaining introduction | ||

9:35-10:35am | Radosław Adamczak | On the role of chaining methods in concentration inequalities for polynomials | ||

Break | ||||

11:00am-12:00pm | Tomasz Tkocz | Upper and lower bounds for suprema of canonical processes | ||

Lunch (on your own) | ||||

2:00-3:00pm | Witold Bednorz | How to compute an approximation to b(T) deterministically and compare b(T) with b(S) | ||

3:05-4:05pm | Oded Regev | The Restricted Isometry Property of Subsampled Fourier Matrices | ||

Break | ||||

4:30-5:30pm | Sjoerd Dirksen | Chaining methods in random dimensionality reduction |

## June 23

Time | Speaker | Title | Slides | Video |

8:00-8:30am | Breakfast | |||

8:30-9:30am | Kyle Luh | Chaining, Random Matrices, and Dictionary Learning | ||

9:35-10:35am | Stephen Chestnut | Finding heavy hitters by chaining with a few random bits | ||

Break | ||||

11:00am-12:00pm | Mary Wootters | Chaining and list-decoding | ||

Lunch (on your own) | ||||

2:00-3:00pm | Alex (Lin) Zhai | Exponential concentration of cover times | ||

3:05-4:05pm | Jian Ding | Bandits with Switching Costs: T^{2/3} Regret |
||

Break | ||||

4:30-5:30pm | Mahdi Soltanolkotabi | Generic chaining meets (non)convex optimization |

## Abstracts

**Jelani Nelson. Title:** Chaining introduction.**Abstract:** An overview of some basics of chaining.

**Radosław Adamczak. Title:** On the role of chaining methods in concentration inequalities for polynomials.**Abstract:** I will survey some concentration inequalities for general polynomials in independent random variables and random variables satisfying some geometric conditions (P. Wolff, A. 2015) . The inequalities in question are rooted in a result by R. Latała (2006) providing two-sided estimates for moments of tetrahedral Gaussian polynomials. I will indicate briefly how to reduce the general case to the problem investigated by Latała and then to the estimation of suprema of certain stochastic processes. Then I will sketch the simplest case of a chaining argument due to Latała, providing this estimation. If time permits I will also mention some related work (R. Latała, A. 2012) concerning two-sided estimates of moments of tetrahedral polynomials in independent random variables with log-concave tails.

**Tomasz Tkocz. Title:** Upper and lower bounds for suprema of canonical processes.**Abstract:** We shall present two-sided bounds for expected values of suprema of canonical processes based on random variables with moments growing regularly. We shall also discuss a Sudakov-type minoration principle for canonical processes. Joint work with R. Latała.

**Witold Bednorz**. Title: How to compute an approximation to b(T) deterministically and compare b(T) with b(S).**Abstract:** The talk concerns the Bernoulli Theorem which states that there exists a geometrical quantity up to a universal constant comparable with b(T) - the expectation of a supremum of a canonical process of random signs given on the index space T. The aim is to present a basic view of the proof formulated in the language of an existence of a deterministic algorithm for computing an approximation to the quantity comparable with b(T). As a consequence of the result there will be presented a new version of the so called Bernoulli comparison, namely that under certain conditions on S and T one can easily compare b(T) with b(S).

**Oded Regev. Title:** The Restricted Isometry Property of Subsampled Fourier Matrices.**Abstract:** A matrix *A* in ℂ^{q x N} satisfies the *restricted isometry property of order k with constant ε* if it preserves the ℓ_{2} norm of all *k*-sparse vectors up to a factor of 1+ε. We prove that a matrix *A* obtained by randomly *q* = O(*k* log^{2}*k* log *N*) rows from an *N* x *N* Fourier matrix satisfies the restricted isometry property of order *k* with a fixed ε with high probability. This improves on Rudelson and Vershynin (Comm. Pure Appl. Math.,2008), its subsequent improvements, and Bourgain (GAFA Seminar Notes,2014). Joint work with Ishay Haviv.

**Sjoerd Dirksen. Title:** Chaining methods in random dimensionality reduction.**Abstract:** In modern data analysis, one frequently needs to deal with high-dimensional data sets. To alleviate the computational and storage issues that arise in handling this type of data, it is of interest to pre-process the data set to reduce its dimensionality, while preserving the information in the data that is vital for the computational task at hand.

I will consider dimensionality reduction with Johnson-Lindenstrauss type embeddings, which are random matrix constructions aimed at reducing the dimension while approximately preserving Euclidean inter-point distances in the data set. In particular I will consider "fast" and "sparse" Johnson-Lindenstrauss embeddings, which allow for a fast implementation. The goal of the talk is to show how chaining methods play an important role in quantifying the achievable embedding dimension in terms of the "intrinsic dimension" of the original data set. Part of the talk is based on joint work with Jean Bourgain (IAS Princeton) and Jelani Nelson (Harvard).

**Kyle Luh. Title:** Chaining, Random Matrices, and Dictionary Learning.**Abstract:** Let *X* be a sparse random matrix of size *n* x *p* (*p* >> *n*). We prove that if *p > C n* log^{3}*n*, then with probability 1-o(1), ||*X*^{T}*v*||_{1} is close to its expectation for all vectors *v* in ℝ^{n} (simultaneously). The bound on *p* is sharp up to the polylogarithmic factor. The proof of this concentration result crucially uses a chaining-type argument along with a refined version of Bernstein's inequality. One of the implications of this result is related to the work of Spielman, Wang, and Wright on dictionary learning. Letting *A* be an *n* x *n* matrix, *X* be an *n* x *p* matrix and *Y* = *AX*, the problem is to recover *A* and *X*, given *Y* and sparsity restrictions on *X*. This is joint work with Van Vu.

Time permitting, we will survey other problems in random matrices in which chaining has led to breakthroughs.

**Stephen Chestnut. Title:** Finding heavy hitters by chaining with a few random bits.**Abstract:** The task of finding frequent items, also known as heavy hitters, is one of the fundamental problems in the area of data streams. The goal is to read a long list once and output every item appearing in the list at least a certain number of times. More precisely, we seek items appearing at least eps*||f||_2 times, where the ith coordinate of the vector f is the number of times i appears in the list.

In this talk I will discuss recent improvements to our understanding of the AMS sketch (Alon, Matias, and Szegedy STOC'96) and some similar sketches that allow us to find heavy hitters in O(1) memory and O(1) update time for constant eps. We will view the sketches as stochastic processes evolving with the updates of the stream and then use the chaining method to bound their values. It is also crucial to control the number of random bits needed to execute the stochastic processes because the algorithm must store all of those bits in memory. We will see how this can be accomplished in two different ways. The talk is based on joint work with Vladimir Braverman, Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David P. Woodruff in arxiv:1511.00661 and arxiv:1603.00759.

**Mary Wootters. Title:** Chaining and list-decoding.**Abstract:** In the theory of error-correcting codes, the following problem—called *list-decoding*—arises: given a subset C of F_{q}^{n}, and an integer ρ in [0,n], how many elements of C are contained in the most-populated Hamming ball of radius ρ in F_{q}^{n}? This question comes up when C is a good *error correcting code*: codes of particular interest include favorites like (most) linear codes (that is, when C is a linear subspace of F_{q}^{n}), and Reed-Solomon Codes (when C corresponds to a set of low-degree polynomials over F_{q}). In this talk, we'll show how to use a chaining argument to bound the list-decodability of these and other families of codes, establishes near-optimal bounds in certain parameter regimes. This is based on joint work with Atri Rudra.

**Alex (Lin) Zhai. Title:** Exponential concentration of cover times.**Abstract:** For random walk on a graph, the *cover time* is the time it takes for all the vertices to be visited at least once. Ding, Lee, and Peres (2010) proved a connection between the cover time and the Gaussian free field, a certain multivariate Gaussian with covariances related to the graph. The connection is obtained via a miraculous formula known as the generalized second Ray-Knight theorem, which remains somewhat mysterious despite many years of study.

We will prove a stochastic domination in the generalized second Ray-Knight theorem, which was observed by Ding (2011) to imply that the cover time is exponentially concentrated around its expectation. It was later pointed out to us that this stochastic domination result appeared earlier in a preprint of Lupu, but the connection to cover times was not mentioned. We do not assume familiarity with Gaussian free fields and Ray-Knight theorems.

**Jian Ding. Title:** Bandits with Switching Costs: T^{2/3} Regret.**Abstract:** Consider a player selecting, repeatedly, one of two possible actions. In each round the player learns the loss arising from his action, and his regret after T rounds is his cumulative loss, minus the loss incurred by consistently selecting the better of the two actions. It is well known that in this setting (known as the adversarial two-armed bandit problem) several algorithms (including a variant of multiplicative weights) can ensure the player a regret of at most O(T^{1/2}) and this is sharp. However, if the player incurs a unit cost each time he switches actions then the best known upper bound on regret was O(T^{2/3}) while the best lower bound known was still of order T^{1/2}. We resolve this gap, by proving the upper bound is sharp (up to a log term). In the corresponding full-information problem, the minimax regret is known to grow at a slower rate of T^{1/2}. The difference between these two rates indicates that learning with bandit feedback (i.e. just knowing the loss from the player's action, not the alternative) can be significantly harder than learning with full-information feedback. It also shows that without switching costs, any regret-minimizing algorithm for the bandit problem must sometimes switch actions very frequently. The proof is based on an information-theoretic analysis of a loss process arising from a multi-scale random walk. Joint work with Ofer Dekel, Tomer Koren and Yuval Peres..

**Mahdi Soltanolkotabi. Title:** Generic chaining meets (non)convex optimization.**Abstract:** (Non)convex iterative shrinkage schemes are the working-horse of signal processing and machine learning, yet our mathematical understanding of such algorithms is still in its infancy. In this talk I will discuss some recent results demonstrating that the global optimum of a variety of (non)convex problems can be found efficiently via local search heuristics. These results hold as long as the coefficients of the objective functions are sufficiently randomized e.g. are functions of Gaussian random variables. At the heart of this new analysis are powerful results for concentration of Gaussian stochastic processes. Surprisingly, these stochastic processes seem to exhibit universality behavior: it seems that essentially the same concentration results hold for many non-Gaussian and non-i.i.d. random matrices. I will end by discussing our recent efforts towards settling this conjecture for a wide variety of multiplication friendly matrices (e.g. Fourier matrices) via generic chaining methods. Time permitting, I will discuss why (at least in my view) generic chaining type arguments will have to play a crucial role in furthering our understanding of many optimization heuristics that arise in concrete applications. Part of this talk is based on joint work with Samet Oymak and Ben Recht.