Dylan Foster, Ph.D. alumnus of the Cornell Department of Computer Science (2019; advised by Karthik Sridharan), and currently a postdoctoral researcher at the MIT Institute for Foundations of Data Science, received two distinctions at the latest Conference on Learning Theory (COLT19). First, Foster received Best Paper Award for “Statistical Learning with a Nuisance Component,” (shared with Cornell CS Ph.D. alumnus Vasilis Syrgkanis) and also Best Student Paper Award for “The Complexity of Making the Gradient Small in Stochastic Convex Optimization” (shared with current Cornell CS doctoral candidate Ayush Sekhari, Ohad Shamir, Nathan Srebro, Karthik Sridharan, and Blake Woodworth). COLT is the premier conference on theoretical aspects of machine learning.
Foster and Syrgkanis provide an abstract of their Best Paper Award, which is also known under the title “Orthogonal Statistical Learning”:
We provide excess risk guarantees for statistical learning in a setting where the population risk with respect to which we evaluate the target model depends on an unknown model that must be to be estimated from data (a “nuisance model”). We analyze a two-stage sample splitting meta-algorithm that takes as input two arbitrary estimation algorithms: one for the target model and one for the nuisance model. We show that if the population risk satisfies a condition called Neyman orthogonality, the impact of the nuisance estimation error on the excess risk bound achieved by the meta-algorithm is of second order. Our theorem is agnostic to the particular algorithms used for the target and nuisance and only makes an assumption on their individual performance. This enables the use of a plethora of existing results from statistical learning and machine learning literature to give new guarantees for learning with a nuisance component. Moreover, by focusing on excess risk rather than parameter estimation, we can give guarantees under weaker assumptions than in previous works and accommodate the case where the target parameter belongs to a complex nonparametric class. We characterize conditions on the metric entropy such that oracle rates—rates of the same order as if we knew the nuisance model—are achieved. We also analyze the rates achieved by specific estimation algorithms such as variance-penalized empirical risk minimization, neural network estimation and sparse high-dimensional linear model estimation. We highlight the applicability of our results in four settings of central importance in the literature: 1) heterogeneous treatment effect estimation, 2) offline policy optimization, 3) domain adaptation, and 4) learning with missing data.
Read the full paper "Orthogonal Statistical Learning."
For the Best Student Paper, Foster, et al. describe their project his way:
We give nearly matching upper and lower bounds on the oracle complexity of finding ϵ-stationary points (‖∇F(x)‖≤ϵ) in stochastic convex optimization. We jointly analyze the oracle complexity in both the local stochastic oracle model and the global oracle (or, statistical learning) model. This allows us to decompose the complexity of finding near-stationary points into optimization complexity and sample complexity, and reveals some surprising differences between the complexity of stochastic optimization versus learning. Notably, we show that in the global oracle/statistical learning model, only logarithmic dependence on smoothness is required to find a near-stationary point, whereas polynomial dependence on smoothness is necessary in the local stochastic oracle model. In other words, the separation in complexity between the two models can be exponential, and that the folklore understanding that smoothness is required to find stationary points is only weakly true for statistical learning. Our upper bounds are based on extensions of a recent "recursive regularization" technique proposed by Allen-Zhu (2018). We show how to extend the technique to achieve near-optimal rates, and in particular show how to leverage the extra information available in the global oracle model. Our algorithm for the global model can be implemented efficiently through finite sum methods, and suggests an interesting new computational-statistical tradeoff.
Read "The Complexity of Making the Gradient Small in Stochastic Convex Optimization."
As Foster notes, he is “interested in all aspects of generalization, sample complexity, and related algorithmic problems, especially for real-world problems such as interactive learning, deep learning, and non-convex learning.” More information and additional publications.