- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Fall 2024 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University - High School Programming Contests 2024
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
Robust Distribution Learning with Local and Global Adversarial Corruptions (via Zoom)
Abstract: As the scope of data-driven decision-making grows, so does the risk posed by data poisoning attacks. Existing models for data corruption are primarily global, allowing an adversary to arbitrarily modify a bounded fraction of samples, or local, allowing for all samples to be slightly perturbed. This work examines non-parametric distribution learning under a unified framework supporting both local and global adversarial corruptions. The combined model presents new computational and statistical challenges which cannot be properly addressed via naive combinations of existing methods. To overcome this, we build upon recent techniques in algorithmic robust statistics and take a perspective rooted in optimal transport.
Formally, given corrupted samples from a distribution P, we seek to compute an estimate P̂ that minimizes the 1-Wasserstein distance W(P̂,P). In fact, we attack the finer-grained task of minimizing W(Π♯P̂, Π♯P) for all orthogonal projections Π, with performance scaling with rank(Π) = k. This allows us to account simultaneously for mean estimation (k=1), distribution estimation under the Wasserstein distance (k=d), as well as the settings interpolating between these two extremes. We characterize the optimal population-limit risk for this task and then develop an efficient finite-sample algorithm. Our procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator. We apply this algorithm to robust stochastic optimization, and, in the process, uncover a new method for overcoming the curse of dimensionality in Wasserstein distributionally robust optimization.
Based on joint work with Ziv Goldfeld and Soroosh Shafiee.
Bio: Sloan Nietert is a fifth-year PhD student in Computer Science at Cornell University, where he is advised by Ziv Goldfeld. His research interests include machine learning theory, online algorithms, and high-dimensional statistics, with a particular focus on optimal transport. His honors include the NSF Graduate Research Fellowship, a Fulbright U.S. Student Grant with the Alfréd Rényi Institute of Mathematics, and the Outstanding Senior in Science award from Clemson University.