Robust Distribution Learning with Local and Global Adversarial Corruptions (via Zoom)

Abstract: As the scope of data-driven decision-making grows, so does the risk posed by data poisoning attacks. Existing models for data corruption are primarily global, allowing an adversary to arbitrarily modify a bounded fraction of samples, or local, allowing for all samples to be slightly perturbed. This work examines non-parametric distribution learning under a unified framework supporting both local and global adversarial corruptions. The combined model presents new computational and statistical challenges which cannot be properly addressed via naive combinations of existing methods. To overcome this, we build upon recent techniques in algorithmic robust statistics and take a perspective rooted in optimal transport.
Formally, given corrupted samples from a distribution P, we seek to compute an estimate P̂ that minimizes the 1-Wasserstein distance W(P̂,P). In fact, we attack the finer-grained task of minimizing W(Π♯P̂, Π♯P) for all orthogonal projections Π, with performance scaling with rank(Π) = k. This allows us to account simultaneously for mean estimation (k=1), distribution estimation under the Wasserstein distance (k=d), as well as the settings interpolating between these two extremes. We characterize the optimal population-limit risk for this task and then develop an efficient finite-sample algorithm. Our procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator. We apply this algorithm to robust stochastic optimization, and, in the process, uncover a new method for overcoming the curse of dimensionality in Wasserstein distributionally robust optimization.
Based on joint work with Ziv Goldfeld and Soroosh Shafiee.

Bio: Sloan Nietert is a fifth-year PhD student in Computer Science at Cornell University, where he is advised by Ziv Goldfeld. His research interests include machine learning theory, online algorithms, and high-dimensional statistics, with a particular focus on optimal transport. His honors include the NSF Graduate Research Fellowship, a Fulbright U.S. Student Grant with the Alfréd Rényi Institute of Mathematics, and the Outstanding Senior in Science award from Clemson University.