CS7792 Counterfactual Machine Learning , T. Joachims, Cornell University

	CS7792 - Counterfactual Machine Learning Special Topics in Machine Learning Fall 2016 Prof. Thorsten Joachims Cornell University, Department of Computer Science & Department of Information Science

	Time and Place First meeting: August 26, 2016 Last meeting: December 2, 2016 Time: Fridays, 10:10am - 11:10am Room: 122 Gates Hall
	Course Description How many clicks will a new ad-placement system get? Will a different news-ranking algorithm increase the dwell times of the users? What ranking function will minimize abandonment in my search engine? Answering such evaluation and learning questions is at the core of improving many of the online systems we use every day. This seminar addresses the problem of using past human-interaction data (e.g. click logs) to learn to improve the performance of the system. This requires integrating causal inference models into the design of the learning algorithm, since we need to make predictions about the system�s performance after an intervention (e.g. fielding a new ranking function). This seminar discusses the emerging research area of counterfactual machine learning in the intersection of machine learning, causal inference, economics, and information retrieval. Topics include causal inference in the counterfactual model, observational vs. experimental data, full-information vs. partial information data, batch learning from bandit feedback, handling selection bias in data, policy learning vs. reward prediction. Concepts will be illustrated with applications in search engines, recommender systems, and computational advertising. The prerequisites for the class are: knowledge of machine learning algorithms and its theory, basic probability, basic statistics, and general mathematical maturity. Enrollment is limited to PhD students.
	Syllabus 08/26: Introduction [slides] [slides 6up] Examples of machine learning problems the require counterfactual reasoning. Overview of course. Administrative issues and course policies. 09/02: Canceled 09/09: Online Learning from User Interactions through Interventions [slides] [slides 6up] Yisong Yue, J. Broder, R. Kleinberg, T. Joachims. The K-armed Dueling Bandits Problem. In COLT, 2009. (paper) [TJ] P. Shivaswamy, T. Joachims. Online Structured Prediction via Coactive Learning, ICML, 2012. (paper) [TJ] 09/16: Counterfactual Model for Online Systems [slides] [slides 6up] Imbens, Rubin, Causal Inference for Statistical Social Science, 2015. Chapters 1,3,12. 09/23: System Evaluation via Counterfactual Estimation [slides] [slides 6up] L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In WSDM, pages 297--306, 2011. (paper) [Davis] L. Li, J.-Y. Kim, I. Zitouni. Toward Predicting the Outcome of an A/B Experiment for Search Relevance. In WSDM, 2015. (paper) [Moontae] 09/30: Causal Reasoning for Online Systems[slides] [slides 6up] L. Bottou, J. Peters, J. Q. Candela, D. X. Charles, M. Chickering, E. Portugaly, D. Ray, P. Y. Simard, and E. Snelson. Counterfactual reasoning and learning systems: The example of computational advertising. Journal of Machine Learning Research, 14(1):3207--3260, 2013. (paper) [Adith] 10/07: Batch Learning from Bandit Feedback 1 [slides] [slides 6up] [ A. Swaminathan, T. Joachims, Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization, JMLR Special Issue in Memory of Alexey Chervonenkis, 16(1):1731-1755, 2015. (paper) [TJ] A. Swaminathan and T. Joachims. The self-normalized estimator for counterfactual learning. In NIPS, pages 3213--3221, 2015. (paper) [TJ] 10/14: Batch Learning from Bandit Feedback 2 [slides] [slides 6up] A. Beygelzimer and J. Langford. The offset tree for learning with partial labels. In KDD, pages 129--138, 2009. (paper) [Michael L] S.~Athey and G.~Imbens. Recursive Partitioning for Heterogeneous Causal Effects. ArXiv e-prints, 2015. (paper) [Aman] 10/21: Online Contextual Bandits and Variance Reduction [slides] [slides 6up] M. Dudik, J. Langford, and L. Li. Doubly robust policy evaluation and learning. In ICML, pages 1097--1104, 2011. (paper) [Ashudeep] J. Langford and T. Zhang. The epoch-greedy algorithm for multi-armed bandits with side information. In NIPS, 2008. (paper) [Ziteng] 10/28: Observational Data[slides] [slides 6up] J. Langford, A. Strehl, and J. Wortman. Exploration scavenging. In ICML, pages 528--535, 2008. (paper) [Pantelis] Alex Strehl, John Langford, Sham Kakade, Lihong Li. Learning from Logged Implicit Exploration Data. NIPS, pages 2217--2225, 2010. (paper) (paper) [Angela] 11/04: Missing Data and Selection Bias [slides] [slides 6up] T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, and T. Joachims. Recommendations as treatments: Debiasing learning and evaluation. In ICML, 2016. (paper) [Tobias] B. M. Marlin and R. S. Zemel. Collaborative prediction and ranking with non-random missing data. In RecSys, pages 5--12, 2009. (paper) [Wouter] 11/11: Click Models [slides] [slides 6up] A. Chuklin, I. Markov, and M. de Rijke. Click Models for Web Search. Morgan & Claypool, 2015. (paper) [Canceled] 11/18: ERM Learning with Behavior Propensity Model [slides] [slides 6up] T. Joachims, A. Swaminathan, T. Schnabel, Unbiased Learning-to-Rank with Biased Feedback, In WSDM, 2017. (paper) [TJ] 12/02: Wrap-up [slides] [slides 6up]
	Contact Please use the CS7792 Piazza Forum for questions and discussions. Otherwise, contact Thorsten Joachims (homepage) [Office hours: Fridays, 11:10am-12:10pm (Gates 236)]. For peer feedback, we are using this CMT Instance for this course. For grades, we are using CMS.
	Grading This is a 1-credit seminar. S/U only (no letter grade, no audit). Grades will be determined based on quizzes, paper presentations, peer reviewing, and class participation. For the paper presentations, we will use peer review. This means that you will comment on other students presentations, giving constructive feedback. The quality of your reviewing also becomes a component of your own grade. To eliminate outlier grades for quizzes and peer reviews, the lowest grade is replaced by the second lowest grade when grades are cumulated at the end of the semester. So, missing one week is no big deal. To pass the course, you need to get at least half of the cumulative quiz points, half of the presentation points, half of the peer reviewing points, and half of the class participation points.
	Reference Material We will mostly read original research papers, but the following books provide entry points for the main topics of the class: Imbens, Rubin, "Causal Inference for Statistics, Social, and Biomedical Sciences", Cambridge University Press, 2015. (online via Cornell Library) Morgan, Winship "Counterfactuals and Causal Inference", Cambridge University Press, 2007. Other sources for general background on machine learning are: Kevin Murphy, "Machine Learning - a Probabilistic Perspective", MIT Press, 2012. (online via Cornell Library) Cristianini, Shawe-Taylor, "Introduction to Support Vector Machines", Cambridge University Press, 2000. (online via Cornell Library) Schoelkopf, Smola, "Learning with Kernels", MIT Press, 2001. (online) Bishop, "Pattern Recognition and Machine Learning", Springer, 2006. Tom Mitchell, "Machine Learning", McGraw Hill, 1997. Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004. Devroye, Gyoerfi, Lugosi, "A Probabilistic Theory of Pattern Recognition", Springer, 1997. Duda, Hart, Stork, "Pattern Classification", Wiley, 2000. Hastie, Tibshirani, Friedman, "The Elements of Statistical Learning", Springer, 2001. Vapnik, "Statistical Learning Theory", Wiley, 1998. Bias in Human Feedback Ruben Sipos, Arpita Ghosh, Thorsten Joachims. Was This Review Helpful to You? It Depends! Context and Voting Patterns in Online Content. WWW, 2014. (paper) O. Chapelle, T. Joachims, F. Radlinski, Yisong Yue, Large-Scale Validation and Analysis of Interleaved Search Evaluation, ACM Transactions on Information Systems (TOIS), 30(1):6.1-6.41, 2012. (paper) T. Joachims, L. Granka, Bing Pan, H. Hembrooke, F. Radlinski, G. Gay. Evaluating the Accuracy of Implicit Feedback from Clicks and Query Reformulations in Web Search, ACM Transactions on Information Systems (TOIS), Vol. 25, No. 2 (April), 2007. (paper) O. Chapelle and Y. Zhang. A dynamic Bayesian network click model for web search ranking. WWW Conference, 2009. (paper) A. Chuklin, I. Markov, and M. de Rijke. Click Models for Web Search. Morgan & Claypool, 2015. (paper) S. Wager, N. Chamandy, O. Muralidharan, A. Najmi. Feedback Detection for Live Predictors. In NIPS, 2015. (paper) Online Learning with Interactive Control Yisong Yue, J. Broder, R. Kleinberg, T. Joachims. The K-armed Dueling Bandits Problem. In COLT, 2009. (paper) P. Shivaswamy, T. Joachims. Online Structured Prediction via Coactive Learning, ICML, 2012. (paper) K. Hofmann, A. Schuth, S. Whiteson, and M. de Rijke. Reusing historical interaction data for faster online learning to rank for {IR}. In WSDM, pages 183--192, 2013. (paper) R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, pages 140--181, 2009. (paper) J. Langford and T. Zhang. The epoch-greedy algorithm for multi-armed bandits with side information. In NIPS, 2008. (paper) F. Lattimore, T. Lattimore, M. Reid. Causal Bandits: Learning Good Interventions via Causal Inference, NIPS, 2016. (paper) Batch Learning from Controlled Interventions L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In WSDM, pages 297--306, 2011. (paper) A. Beygelzimer and J. Langford. The offset tree for learning with partial labels. In KDD, pages 129--138, 2009. (paper) S.~Athey and G.~Imbens. Recursive Partitioning for Heterogeneous Causal Effects. ArXiv e-prints, 2015. (paper) A. Swaminathan and T. Joachims. Counterfactual risk minimization: Learning from logged bandit feedback. In ICML, 2015. (paper) A. Swaminathan, T. Joachims, Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization, JMLR Special Issue in Memory of Alexey Chervonenkis, 16(1):1731-1755, 2015. (paper) A. Swaminathan and T. Joachims. The self-normalized estimator for counterfactual learning. In NIPS, pages 3213--3221, 2015. (paper) A. Swaminathan, A. Krishnamurthy, A. Agarwal, M. Dudik, and J. Langford. Off-policy evaluation and optimization for slate recommendation. Arxiv Preprint, 2016. (paper) M. Dudik, J. Langford, and L. Li. Doubly robust policy evaluation and learning. In ICML, pages 1097--1104, 2011. (paper) L. Bottou, J. Peters, J. Q. Candela, D. X. Charles, M. Chickering, E. Portugaly, D. Ray, P. Y. Simard, and E. Snelson. Counterfactual reasoning and learning systems: The example of computational advertising. Journal of Machine Learning Research, 14(1):3207--3260, 2013. (paper) C. Cortes, Y. Mansour, and M. Mohri. Learning bounds for importance weighting. In NIPS, pages 442--450, 2010. (paper) L. Li, S. Chen, J. Kleban, and A. Gupta. Counterfactual estimation and optimization of click metrics in search engines: {A} case study. In WWW Companion, pages 929--934, 2015. (paper) J. Mary, P. Preux, and O. Nicol. Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques. In ICML, pages 172--180, 2014. (paper) J. Schulman, S. Levine, P. Moritz, M. Jordan, P. Abbeel. Trust Region Policy Optimization. In ICML, 2015. (paper) Batch Learning from Observational Feedback T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, and T. Joachims. Recommendations as treatments: Debiasing learning and evaluation. In ICML, 2016. (paper) B. M. Marlin and R. S. Zemel. Collaborative prediction and ranking with non-random missing data. In RecSys, pages 5--12, 2009. (paper) J. M. Hern�ndez-Lobato, N. Houlsby, and Z. Ghahramani. Probabilistic matrix factorization with non-random missing data. In ICML, pages 1512--1520, 2014. (paper) T. Joachims, A. Swaminathan, T. Schnabel, Unbiased Learning-to-Rank with Biased Feedback, Arxiv Preprint, 2016. (paper) L. Li, J.-Y. Kim, I. Zitouni. Toward Predicting the Outcome of an A/B Experiment for Search Relevance. In WSDM, 2015. (paper) D. Liang, L. Charlin, J. McInerney, D. Blei. Modeling User Exposure in Recommendation. In WWW, 2016. (paper)
	Academic Integrity This course follows the Cornell University Code of Academic Integrity. Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. Collaborations are allowed only if explicitly permitted. Violations of the rules (e.g. cheating, copying, non-approved collaborations) will not be tolerated.