In this talk, I will present a simple reduction that demonstrates the cryptographic hardness of learning a single periodic neuron over isotropic Gaussian distributions in the presence of noise. The proposed "hard" family of functions, which are well-approximated by one-layer neural networks, take the general form of a univariate periodic function applied to an affine projection of the data. These functions have appeared in previous seminal works which demonstrate their hardness against gradient-based (GD) methods (Shamir'18), and Statistical Query (SQ) algorithms (Song et al.'17). Our result shows that if (polynomially) small noise is added to the labels, the intractability of learning these functions applies to all polynomial-time algorithms, beyond gradient-based and SQ algorithms, under the aforementioned cryptographic assumptions. Furthermore, we show that for exponentially small noise a polynomial-time algorithm based on lattice basis reduction can bypass the SQ and GD hardness. This is based on joint work with Min Jae Song and Joan Bruna.
Current machine learning (ML) systems are remarkably brittle, raising serious concerns about their deployment in safety-critical applications like self-driving cars and predictive healthcare. In such applications, models could encounter test distributions that differ wildly from the training distributions. Trustworthy ML thus requires strong robustness guarantees from learning, including robustness to worst-case distribution shifts. Robustness to worst-case distribution shifts raises several computational and statistical challenges over ‘standard’ machine learning. In this talk, I will present two formal settings of worst-case distribution shifts motivated by adversarial attacks on test inputs and presence of spurious correlations like image backgrounds. Empirical observations demonstrate (i) an arms race between attacks and existing heuristic defenses necessitating provable guarantees much like cryptography (ii) increased sample complexity of robust learning (iii) resurgence of the need for regularization in robust learning. We capture each of these observations in simple theoretical models that nevertheless yield principled and scalable approaches to overcome the hurdles in robust learning, particularly via the use of unlabeled data.
A major challenge of multiagent reinforcement learning (MARL) is the curse of multiagents, where the size of the joint action space scales exponentially with the number of agents. This remains to be a bottleneck for designing efficient MARL algorithms even in a basic scenario with finitely many states and actions. This paper resolves this challenge for the model of episodic Markov games. We design a new class of fully decentralized algorithms—V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlated equilibria and coarse correlated equilibria (in the multiplayer general-sum setting) in a number of samples that only scales with max_i Ai, where Ai is the number of actions for the ith player. This is in sharp contrast to the size of the joint action space which is \prod_i Ai. V-learning (in its basic form) is a new class of single-agent RL algorithms that convert any adversarial bandit algorithm with suitable regret guarantees into a RL algorithm. Similar to the classical Q-learning algorithm, it performs incremental updates to the value functions. Different from Q-learning, it only maintains the estimates of V-values instead of Q-values. This key difference allows V-learning to achieve the claimed guarantees in the MARL setting by simply letting all agents run V-learning independently.
Many applications involve non-Euclidean data, where exploiting Riemannian geometry can deliver algorithms that are computationally superior to standard nonlinear programming approaches. This observation has resulted in an increasing interest in Riemannian methods in the optimization and machine learning community. In this talk, we consider the problem of optimizing a function on a Riemannian manifold subject to convex constraints. We introduce Riemannian Frank-Wolfe (RFW) methods, a class of projection-free algorithms for constrained geodesically convex optimization. To understand the algorithm’s efficiency, we discuss (1) its iteration complexity, (2) the complexity of computing the Riemannian gradient, and (3) the complexity of the Riemannian “linear” oracle (RLO), a crucial subroutine at the heart of the algorithm. We complement our theoretical results with an empirical comparison of RFW against state-of-the-art Riemannian optimization methods. Joint work with Suvrit Sra (MIT).
The state of a system of several quantum particles is an element in the tensor product of Hilbert spaces. In certain situations, the inner product requirement can be relaxed. Quantum states then turn into tensors, and tools and intuition from quantum information, algebraic geometry and algebraic complexity can connect. I will explain the two main traits of application, each leading to an optimization problem. The first is to quantum entanglement - the strong correlations among quantum particles - leading to an optimization over moment polytopes. The second is to many-body physics, where tensor networks are used as an ansatz class to describe the intricate correlations in quantum materials.
The simplest homogeneous polynomials with nonnegative coefficients are products of linear forms Prod_{A}(X) associated with nonnegative matrices A. We prove that for any H-Stable(homogeneous and stable) polynomial p with P(E) = 1, where E is the vector of all ones, it's value p(X) = Prod_{A(X)}(X), where A(X) is nonnegative matrix with unit row sums and the vector of column sums equal to the gradient of p at E. I will first explain some intuition, and history, behind the result; sketch the proof and present a few applications and generalizations of this "productization" property. (Joint work with Jonathan Leake).
Maximum entropy distributions on discrete domains have been studied in a number of contexts over the past few decades. In particular, they have been used in deterministic algorithms for the approximation of the permanent, scaling problems for torus actions, applications of spanning tree sampling problems, and beyond. These applications are intimately linked to distributions on some finite support set, with a given expectation, and which maximize entropy. In this talk, we will discuss the generalization of maximum entropy distributions from finite discrete support sets to continuous manifolds. Such distributions arise in connection to a number of topics; for example, quantum entropy, interior point methods, matrix distributions studied in statistics, the isotropic constant, and low-rank matrix approximation. We will discuss a few of these applications, along with some of the computability issues which arise when moving from discrete support to continuous. We will also briefly discuss the connection between this generalization and the norm minimization techniques used for general scaling problems. This is joint work with Nisheeth Vishnoi.
In this paper, we address the noncommutative rank (nc-rank) computation of a linear symbolic matrix A=A1 x1+A2 x2 +⋯+ Am xm, where each Ai is an n x n matrix over a field K, and xi (i=1,2,…,m) are noncommutative variables. For this problem, polynomial time algorithms were given by Garg, Gurvits, Oliveira, and Wigderson for K=Q, and by Ivanyos, Qiao, and Subrahmanyam for an arbitrary field K. We present a significantly different polynomial time algorithm that works on an arbitrary field K. Our algorithm is based on a combination of submodular optimization on modular lattices and convex optimization on CAT(0) spaces.
The Kronecker coefficients of the symmetric group are the multiplicities of irreducible representations in the decomposition of the tensor product of two other irreducible representations. Ever since their definition by Murnaghan more than 80 years ago they've presented a major mystery and open problem in Algebraic Combinatorics. Recently they have come to play a crucial role in Geometric Complexity Theory in the quest for separation of VP and VNP. In this talk I'll give a broad overview of this topic with respect to combinatorics, asymptotics and computational complexity.
Tensor network states form a variational ansatz class widely used in the study of quantum many-body systems. Geometrically, these states form an algebraic variety of tensors with rich representation theoretic structure. It is known that tensors on the "boundary" of this variety can provide more efficient representations for states of physical interest, but the pathological geometric properties of the boundary make it difficult to extend the classical optimization methods. In recent work, we introduced a new ansatz class which includes states at the boundary of the tensor network variety. I will present some of the geometric features of this class and explain how it can be used in the variational approach. This is based on joint work with M. Christandl, D. Stilck-Franca and A. Werner.
Exponential densities on unitary conjugation orbits of Hermitian matrices, known as Harish-Chandra-Itzykson-Zuber (HCIZ) densities, are important in various settings in physics and random matrix theory. However, the basic question of efficient sampling from these densities has remained open. We present two efficient algorithms to sample matrices from distributions that are close to the HCIZ distribution. Both algorithms exploit a natural self-reducible structure that arises from continuous symmetries of the underlying unitary orbit. We will also discuss applications to quantum inference and differentially private rank-k approximation.