- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Fall 2024 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University - High School Programming Contests 2024
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
Machine learning (ML) has become one of the most powerful classes of tools for artificial intelligence, personalized web services and data science problems across fields. However, the use of ML on sensitive data sets involving medical, financial and behavioral data are greatly limited due to privacy concern. In this talk, we consider the problem of statistical learning with privacy constraints. Under Vapnik's general learning setting and the formalism of differential privacy (DP), we establish simple conditions that characterizes the private learnability, which reveals a mixture of positive and negative insight. We then identify generic methods that reuse existing randomness to effectively solve private learning in practice; and discuss a weaker notion of privacy — on-avg KL-privacy — that allows for orders-of-magnitude more favorable privacy-utility tradeoff, while preserving key properties of differential privacy. Moreover, we show that On-Average KL-Privacy is **equivalent** to generalization for a large class of commonly-used tools in statistics and machine learning that sample from Gibbs distributions---a class of distributions that arises naturally from the maximum entropy principle. Finally, I will describe a few exciting future directions that use statistics/machine learning tools to advance he state-of-the-art for privacy, and use privacy (and privacy inspired techniques) to formally address the problem of p-hacking (or selective bias) in scientific discovery.
Bio:
Yu-Xiang Wang is a fourth year PhD candidate in the Machine Learning Department, Carnegie Mellon University, expecting to complete in summer 2017.
He works with Stephen Fienberg, Alex Smola, Ryan Tibshirani and Jing Lei on a variety of topics at the intersection of machine learning, optimization and statistics. Yu-Xiang's recent research interests include nonparametric regression over graphs, differential privacy, subspace clustering, large-scale optimization and interactive/adaptive learning (e.g., adaptive data analysis, contextual bandits). He received paper awards from KDD and WSDM, outstanding reviewer award at NIPS and the Baidu Scholarship among other honors.