|
Syllabus
- 08/24: Introduction [slides]
- Examples of machine learning problems the require counterfactual reasoning.
- Overview of course.
- Administrative issues and course policies.
- 08/31: The Counterfactual Model for Learning Systems. [slides]
- Background: Imbens, Rubin, Causal Inference for Statistical Social Science, 2015. Chapters 1,3,12. (online via Cornell Library)
- 09/07: Basics of online and offline estimation.
- The Counterfactual Model for Learning Systems (continued). [Thorsten Joachims]
- R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, pages 140--181, 2009. (paper) [Briana Vecchione]
- 09/14: Doubly-robust estimator.
- M. Dudik, J. Langford, and L. Li. Doubly robust policy evaluation and learning. In ICML, pages 1097--1104, 2011. (paper) [Lequn Wang]
- M. Farajtabar, Yinlam Chow, M. Ghavamzadeh. More Robust Doubly Robust Off-policy Evaluation. In ICML, 2018. (paper) [Xiaojie Mao]
- 09/21: Combination estimators.
- Yu-Xiang Wang, Alekh Agarwal and Miro Dudik. Optimal and Adaptive Off-Policy Evaluation in Contextual Bandits. In ICML, 2017. (paper) [Yi Su]
- Philip Thomas, Emma Brunskill. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning. In ICML, 2016. (paper) [Yi Su]
- 09/28: Recommender evaluation.
- A. Gilotte, C. Calauzenes, T. Nedelec, A. Abraham and S. Dolle. Offline A/B testing for recommender systems. In WSDM, 2018. (paper) [Shachi Deshpande]
- L. Yang, Y. Cui, Y. Xuan, C. Wang, S. Belongie, D. Estrin. Unbiased Offline Recommender Evaluation for Missing-Not-At-Random Implicit Feedback. In RecSys, 2018. (paper) [Longqi Yang]
- D. Liang, L. Charlin, D. Blei. Causal Inference for Recommendation. In UAI Workshop, 2016. (paper) [Longqi Yang]
- 10/05: Extensions to offline evaluation.
- Alex Strehl, John Langford, Sham Kakade, Lihong Li. Learning from Logged Implicit Exploration Data. NIPS, pages 2217--2225, 2010. (paper) [Chengrun Yang]
- L. Bottou, J. Peters, J. Q. Candela, D. X. Charles, M. Chickering, E. Portugaly, D. Ray, P. Y. Simard, and E. Snelson. Counterfactual reasoning and learning systems: The example of computational advertising. Journal of Machine Learning Research, 14(1):3207--3260, 2013. (paper) [Katherine van Koevering]
- 10/12: Batch learning from bandit feedback (BLBF). [slides]
- A. Swaminathan, T. Joachims, Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization, JMLR Special Issue in Memory of Alexey Chervonenkis, 16(1):1731-1755, 2015. (paper) [Thorsten Joachims]
- T. Joachims, A. Swaminathan, M. de Rijke. Deep Learning with Logged Bandit Feedback. In ICLR, 2018. (paper) [Thorsten Joachims]
- 10/19: Propensity overfitting and dealing with large action spaces.
- A. Swaminathan and T. Joachims. The self-normalized estimator for counterfactual learning. In NIPS, pages 3213--3221, 2015. (paper) [Thorsten Joachims]
- N. Kallus and A. Zhou. Policy Evaluation and Optimization with Continuous Treatments. In AISTATS, 2018. (paper) [Angela Zhou]
- 10/26: Error bounds and learning to rank with partial feedback. [slides]
- C. Cortes, Y. Mansour, and M. Mohri. Learning bounds for importance weighting. In NIPS, pages 442--450, 2010. (paper) [Kate Donahue]
- T. Joachims, A. Swaminathan, T. Schnabel, Unbiased Learning-to-Rank with Biased Feedback, In WSDM, 2017. (paper) [Thorsten Joachims]
- 11/02: Propensity estimation for learning to rank.
- X Wang, N Golbandi, M Bendersky, D Metzler, M. Najork. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In WSDM, 2018. (paper) [Aman Agarwal]
- A. Agarwal, I. Zaitsev, Xuanhui Wang, Cheng Li, M. Najork, T. Joachims. Estimating Position Bias without Intrusive Interventions. To appear in WSDM, 2019 (paper) [Aman Agarwal]
- 11/09: Embeddings and observational data.
- S. Bonner, F. Vasile. Causal Embeddings for Recommendation. Arxiv, 2018. (paper) [Ashudeep Singh]
- N. Kallus. Balanced Policy Evaluation and Learning. Arxiv, 2017. (paper) [Angela Zhou]
- 11/16: Tree-based policy learning.
- S. Athey and G. Imbens. Recursive Partitioning for Heterogeneous Causal Effects. PNAS, 112(27):7353-7360, 2015. (paper) [Cheng Perng Phoo]
- 11/23: Thanksgiving
- 11/30: Wrapup
|
|
Reference Material
We will mostly read original research papers, but the following books and tutorials provide entry points for the main topics of the class:
- Imbens, Rubin, "Causal Inference for Statistics, Social, and Biomedical Sciences", Cambridge University Press, 2015. (online via Cornell Library)
- Morgan, Winship "Counterfactuals and Causal Inference", Cambridge University Press, 2007.
- T. Joachims, A. Swaminathan. SIGIR Tutorial on Counterfactual Evaluation and Learning for Search, Recommendation and Ad Placement, 2016. (homepage)
Other sources for general background on machine learning are:
- Kevin Murphy, "Machine Learning - a Probabilistic Perspective", MIT Press, 2012. (online via Cornell Library)
- Schoelkopf, Smola, "Learning with Kernels", MIT Press, 2001. (online)
- Bishop, "Pattern Recognition and Machine Learning", Springer, 2006.
- Tom Mitchell, "Machine Learning", McGraw Hill, 1997.
- Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.
- Devroye, Gyoerfi, Lugosi, "A Probabilistic Theory of Pattern Recognition", Springer, 1997.
- Duda, Hart, Stork, "Pattern Classification", Wiley, 2000.
- Hastie, Tibshirani, Friedman, "The Elements of Statistical Learning", Springer, 2001.
- Vapnik, "Statistical Learning Theory", Wiley, 1998.
Bias in Human Feedback
- T. Joachims, L. Granka, Bing Pan, H. Hembrooke, F. Radlinski, G. Gay. Evaluating the Accuracy of Implicit Feedback from Clicks and Query Reformulations in Web Search, ACM Transactions on Information Systems (TOIS), Vol. 25, No. 2 (April), 2007. (paper)
- A. Agarwal, I. Zaitsev, T. Joachims. Consistent Position Bias Estimation without Online Interventions for Learning-to-Rank, ICML Workshop on CausalML, 2018. (paper)
- X Wang, M Bendersky, D Metzler, M. Najork. Learning to Rank with Selection Bias in Personal Search. In SIGIR, 2016. (paper)
- X Wang, N Golbandi, M Bendersky, D Metzler, M. Najork. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In WSDM, 2018. (paper)
- O. Chapelle, T. Joachims, F. Radlinski, Yisong Yue, Large-Scale Validation and Analysis of Interleaved Search Evaluation, ACM Transactions on Information Systems (TOIS), 30(1):6.1-6.41, 2012. (paper)
- Ruben Sipos, Arpita Ghosh, Thorsten Joachims. Was This Review Helpful to You? It Depends! Context and Voting Patterns in Online Content. WWW, 2014. (paper)
- O. Chapelle and Y. Zhang. A dynamic Bayesian network click model for web search ranking. WWW Conference, 2009. (paper)
- A. Chuklin, I. Markov, and M. de Rijke. Click Models for Web Search. Morgan & Claypool, 2015. (paper)
- S. Wager, N. Chamandy, O. Muralidharan, A. Najmi. Feedback Detection for Live Predictors. In NIPS, 2015. (paper)
Online Learning with Interactive Control
- J. Langford and T. Zhang. The epoch-greedy algorithm for multi-armed bandits with side information. In NIPS, 2008. (paper)
- D. Foster, A. Agarwal, M. Dudik, H. Luo, R. Schapire. Practical Contextual Bandits with Regression Oracles. Arxiv. (paper)
- V. Syrgkanis, A. Krishnamurthy, R. Schapire. Efficient algorithms for adversarial contextual learning. In ICML, 2016. (paper)
- Yisong Yue, J. Broder, R. Kleinberg, T. Joachims. The K-armed Dueling Bandits Problem. In COLT, 2009. (paper)
- Yisong Yue, T. Joachims. Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem. In ICML, 2009. (paper)
- P. Shivaswamy, T. Joachims. Online Structured Prediction via Coactive Learning, ICML, 2012. (paper)
- K. Hofmann, A. Schuth, S. Whiteson, and M. de Rijke. Reusing historical interaction data for faster online learning to rank for {IR}. In WSDM, pages 183--192, 2013. (paper)
- F. Lattimore, T. Lattimore, M. Reid. Causal Bandits: Learning Good Interventions via Causal Inference, NIPS, 2016. (paper)
- R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, pages 140--181, 2009. (paper)
Counterfactual Estimators for Policies
- Miroslav Dudik, Dumitru Erhan, John Langford, Lihong Li. Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits. In UAI, 247-254, 2012. (paper)
- M. Farajtabar, Yinlam Chow, M. Ghavamzadeh. More Robust Doubly Robust Off-policy Evaluation. In ICML, 2018. (paper)
- M. Dudik, J. Langford, and L. Li. Doubly robust policy evaluation and learning. In ICML, pages 1097--1104, 2011. (paper)
- J. Langford, A. Strehl, and J. Wortman. Exploration scavenging. In ICML, pages 528--535, 2008. (paper)
- Alex Strehl, John Langford, Sham Kakade, Lihong Li. Learning from Logged Implicit Exploration Data. NIPS, pages 2217--2225, 2010. (paper)
- Yu-Xiang Wang, Alekh Agarwal and Miro Dudik. Optimal and Adaptive Off-Policy Evaluation in Contextual Bandits. In ICML, 2017. (paper)
- Philip Thomas, Emma Brunskill. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning. In ICML, 2016. (paper)
- A. Gilotte, C. Calauzenes, T. Nedelec, A. Abraham and S. Dolle. Offline A/B testing for recommender systems. In WSDM, 2018. (paper)
- A. Agarwal, S. Basu, T. Schnabel, T. Joachims. Effective Evaluation using Logged Bandit Feedback from Multiple Loggers. In KDD, 2017. (paper)
- L. Li, J.-Y. Kim, I. Zitouni. Toward Predicting the Outcome of an A/B Experiment for Search Relevance. In WSDM, 2015. (paper)
- L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In WSDM, pages 297--306, 2011. (paper)
- B. Carterette, P. Chandar. Offline Comparative Evaluation with Incremental, Minimally-Invasive Online Feedback. In SIGIR, 2018. (paper)
- L. Yang, Y. Cui, Y. Xuan, C. Wang, S. Belongie, D. Estrin. Unbiased Offline Recommender Evaluation for Missing-Not-At-Random Implicit Feedback. In RecSys, 2018. (paper)
- N. Hassanpour, R. Greiner. A Novel Evaluation Methodology for Assessing Off-Policy Learning Methods in Contextual Bandits. In Canadian Conference on Artificial Intelligence, 2018. (paper)
- F. Johansson, U. Shalit, D. Sontag. Learning representations for counterfactual inference. In ICML, 2016. (paper)
Batch Learning from Controlled Interventions
- A. Beygelzimer and J. Langford. The offset tree for learning with partial labels. In KDD, pages 129--138, 2009. (paper)
- S. Athey and G. Imbens. Recursive Partitioning for Heterogeneous Causal Effects. PNAS, 112(27):7353-7360, 2015. (paper)
- A. Swaminathan and T. Joachims. Counterfactual risk minimization: Learning from logged bandit feedback. In ICML, 2015. (paper)
- A. Swaminathan, T. Joachims, Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization, JMLR Special Issue in Memory of Alexey Chervonenkis, 16(1):1731-1755, 2015. (paper)
- A. Swaminathan and T. Joachims. The self-normalized estimator for counterfactual learning. In NIPS, pages 3213--3221, 2015. (paper)
- A. Swaminathan, A. Krishnamurthy, A. Agarwal, M. Dudik, and J. Langford. Off-policy evaluation and optimization for slate recommendation. NIPS, 2017. (paper)
- L. Bottou, J. Peters, J. Q. Candela, D. X. Charles, M. Chickering, E. Portugaly, D. Ray, P. Y. Simard, and E. Snelson. Counterfactual reasoning and learning systems: The example of computational advertising. Journal of Machine Learning Research, 14(1):3207--3260, 2013. (paper)
- C. Cortes, Y. Mansour, and M. Mohri. Learning bounds for importance weighting. In NIPS, pages 442--450, 2010. (paper)
- L. Li, S. Chen, J. Kleban, and A. Gupta. Counterfactual estimation and optimization of click metrics in search engines: A case study. In WWW Companion, pages 929--934, 2015. (paper)
- J. Mary, P. Preux, and O. Nicol. Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques. In ICML, pages 172--180, 2014. (paper)
- J. Schulman, S. Levine, P. Moritz, M. Jordan, P. Abbeel. Trust Region Policy Optimization. In ICML, 2015. (paper)
- B. London and T. Sandler. Bayesian Counterfactual Risk Minimization. ICML Workshop on CausalML, 2018. (paper)
- N. Kallus and A. Zhou. Policy Evaluation and Optimization with Continuous Treatments. In AISTATS, 2018. (paper)
- Stefan Wager and Susan Athey. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. Journal of the American Statistical Association. (paper)
- Susan Athey and Stefan Wager. Efficient Policy Learning. Arxiv, 2017. (paper)
- Yi Su, A. Agarwal, T. Joachims, Learning from Logged Bandit Feedback of Multiple Loggers, ICML Workshop on CausalML, 2018. (paper)
- S. Bonner, F. Vasile. Causal Embeddings for Recommendation. Arxiv, 2018. (paper)
- T. Joachims, A. Swaminathan, M. de Rijke. Deep Learning with Logged Bandit Feedback. In ICLR, 2018. (paper)
Learning with Observational Feedback
- T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, and T. Joachims. Recommendations as treatments: Debiasing learning and evaluation. In ICML, 2016. (paper)
- D. Liang, L. Charlin, D. Blei. Causal Inference for Recommendation. In UAI Workshop, 2016. (paper)
- B. M. Marlin and R. S. Zemel. Collaborative prediction and ranking with non-random missing data. In RecSys, pages 5--12, 2009. (paper)
- J. M. Hernandez-Lobato, N. Houlsby, and Z. Ghahramani. Probabilistic matrix factorization with non-random missing data. In ICML, pages 1512--1520, 2014. (paper)
- T. Joachims, A. Swaminathan, T. Schnabel, Unbiased Learning-to-Rank with Biased Feedback, In WSDM, 2017. (paper)
- D. Liang, L. Charlin, J. McInerney, D. Blei. Modeling User Exposure in Recommendation. In WWW, 2016. (paper)
- N. Kallus. Balanced Policy Evaluation and Learning. Arxiv, 2017. (paper)
- N. Kallus, A. Zhou. Confounding-Robust Policy Improvement. Axiv, 2018. (paper)
- A. Agarwal, I. Zaitsev, T. Joachims. Counterfactual Learning-to-Rank for Additive Metrics and Deep Models. Arxiv, 2018. (paper)
|