CS Colloquium: Towards practical machine learning with differential privacy and beyond

Machine learning (ML) has become one of the most powerful classes of tools for artificial intelligence, personalized web services and data science problems across fields. However, the use of ML on sensitive data sets involving medical, financial and behavioral data are greatly limited due to privacy concern. In this talk, we consider the problem of statistical learning with privacy constraints. Under Vapnik's general learning setting and the formalism of differential privacy (DP), we establish simple conditions that characterizes the private learnability, which reveals a mixture of positive and negative insight. We then identify generic methods that reuse existing randomness to effectively solve private learning in practice; and discuss a weaker notion of privacy — on-avg KL-privacy — that allows for orders-of-magnitude more favorable privacy-utility tradeoff, while preserving key properties of differential privacy. Moreover, we show that On-Average KL-Privacy is **equivalent** to generalization for a large class of commonly-used tools in statistics and machine learning that sample from Gibbs distributions---a class of distributions that arises naturally from the maximum entropy principle. Finally, I will describe a few exciting future directions that use statistics/machine learning tools to advance he state-of-the-art for privacy, and use privacy (and privacy inspired techniques) to formally address the problem of p-hacking (or selective bias) in scientific discovery.

Bio:
Yu-Xiang Wang is a fourth year PhD candidate in the Machine Learning Department, Carnegie Mellon University, expecting to complete in summer 2017.

He works with Stephen Fienberg, Alex Smola, Ryan Tibshirani and Jing Lei on a variety of topics at the intersection of machine learning, optimization and statistics. Yu-Xiang's recent research interests include nonparametric regression over graphs, differential privacy, subspace clustering, large-scale optimization and interactive/adaptive learning (e.g., adaptive data analysis, contextual bandits). He received paper awards from KDD and WSDM, outstanding reviewer award at NIPS and the Baidu Scholarship among other honors.