CS 778: Topics in Machine Learning

Topics in Machine Learning Learning to Predict Structured Objects COM S 778 - FALL 2006 Cornell University Department of Computer Science

Time and Place
	First lecture: August 29th, 2006 Last lecture: November 28th, 2006 Tuesday, 1:25pm - 2:40pm in Upson 5126
Instructor
	Thorsten Joachims, tj@cs.cornell.edu, 4153 Upson Hall.
Syllabus
	Over the last decade, much of the research on discriminative learning has focused on problems like classification and regression, where the prediction is a single univariate variable. But what if we need to predict complex objects like trees, orderings, or alignments? Such problems arise, for example, when a natural language parser needs to predict the correct parse tree for a given sentence, when predicting which ranking to return in response to a query, when predicting the alignment between two proteins, or when one needs to optimize a performance measure like the F1-score. Over the recent years, the first discriminative learning algorithms have been developed for these structured prediction problems, leading to an emerging area of research. The course will review the state of the art, tie existing approaches together, and identify new applications and open questions.
Preparing the Presentation
	Your presentations should be a self-contained summary of the paper, adding background and examples as necessary. Plan to speak for about 20 minutes. Please schedule a meeting with me so that I can give you feedback on your presentation on or before the Friday preceding your presentation.
Prerequisites
	The course assumes basic knowledge of machine learning as covered in either COM S 478 or COM S 578. If you did not take either of these classes, please come talk to me.
Grading
	This is a 3-credit course which can either be taken Pass/Fail or for letter grade (no audit). The course will consist of some faculty lectures providing background and paper presentations by students. Every student has to present a paper, and everybody is expected to have read the assigned paper before the lecture. There will be short quizzes about the paper to motivate careful reading. Pass/Fail will be determined based on the presentation, class participation, and the quizzes. Students taking the class for letter grade will do a class project in addition.
Academic Integrity
	This course follows the Cornell University Code of Academic Integrity. Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. Violations of the rules (e.g. cheating, copying, non-approved collaborations) will not be tolerated.
References
	Background References on Machine Learning Ų Author: Tom Mitchell Title: "Machine Learning" Publisher: McGraw Hill, 1997 Ų Author: Ethem Alpaydin Title: "Introduction to Machine Learning" Publisher: MIT Press, 2004 Ų Authors: Christopher Manning and Hinrich Schutze Title: "Foundations of Statistical NLP" Publisher: MIT Press, 1999 Ų Authors: R. Duda, P. Hart, D. Stork Title: "Pattern Classification" Publisher: Wiley, 2001 Ų Authors: T. Hastie, R. Tishirani, and J. Friedman Title: "The Elements of Statistical Learning" Publisher: Springer, 2001 Ų Author: B. Schoelkopf and A. Smola Title: "Learning with Kernels" Publisher: MIT Press, 2001 Overviews and Tutorials Ų Author: Thomas G. Dietterich Title: Machine Learning for Sequential Data: A Review Journal:: In Structural, Syntactic, and Statistical Pattern Recognition; Lecture Notes in Computer Science, Vol. 2396, T. Caelli (Ed.), pp. 15–30 Publisher: Springer-Verlag, 2002 Ų Author: Ben Taskar Title: Max-Margin Methods for NLP: Estimation, Structure, and Applications Proceedings: Tutorial at the Conference of the Association for Computational Linguistics (ACL05), Ann Arbor, MI, June 2005 Ų Author: Thorsten Joachims Title: Learning to Predict Trees, Sequences, and other Structured Outputs Proceedings: Tutorial at NESCAI, 2006 Ų Author: H. Wallach Title: Conditional Random Fields: An Introduction. Predicting Sequences Ų *Authors: Andrew McCallum, Dayne Freitag, and Fernando Pereira Title: Maximum entropy Markov models for information extraction and segmentation Proceedings: International Conference on Machine Learning (ICML 2000), pages 591-598, Stanford, California, 2000 Ų Authors: Yasemin Altun, Thomas Hofmann, Mark Johnson Title: Loss Functions and Optimization Methods for Discriminative Learning of Label Sequences** Proceedings: Empirical Methods in Natural Language Processing (EMNLP), 2003 Ų Authors: Yasemin Altun, Thomas Hofmann & Mark Johnson Title: Discriminative Learning for Label Sequences via Boosting Proceedings: Advances in Neural Information Processing Systems (NIPS15), 2003 Ų Authors: Jospeh Bockhorst and Mark Craven Title: Markov Networks for Detecting Overlapping Elements in Sequence Data. Proceedings: Advances in Neural Information Processing Systems 17* (NIPS 2004), 2005. Neural Networks Ų **Authors: Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner Title: Gradient-based learning applied to document recognition. Journal: IEEE, 86(11):2278{2324} 1998 Ų *Authors: LeCun and Huang Title: Loss Functions for Discriminative Training of Energy-Based Models* Proceedings: AI-Stats, 2005 Structural SVMs and Maximum Margin Markov Networks Ų Authors: Yasemin Altun and Thomas Hofmann Title: Large Margin Methods for Label Sequence Learning. Proceedings: 8th European Conference on Speech Communication and Technology (EuroSpeech), 2003. Ų Authors: Yasemin Altun, Ioannis Tsochantaridis and Thomas Hofmann Title: Hidden Markov Support Vector Machines Proceedings: 20th International Conference on Machine Learning (ICML), 2003 Ų Author: T. Joachims Title: Learning to Align Sequences: A Maximum-Margin Approach Technical Report, 2003 Ų **Author: Ben Taskar, Carlos Guestrin and Daphne Koller Title: Max-Margin Markov Networks. Proceedings: In Advances in Neural Information Processing Systems 16* (NIPS 2003), 2004. Ų Authors: Simon Lacoste-Julien Title: Combining SVM with graphical models for supervised classification: an introduction to Max-Margin Markov Networks CS281A Project Report, UC Berkeley, 2003 Ų Authors: Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun Title: Support Vector Machine Learning for Interdependent & Structured Output Spaces, Proceedings: International Conference on Machine Learning (ICML), 2004. Ų *** Authors: Ben Taskar, Dan Klein, Michael Collins, Daphne Koller, and Christopher Manning Title: Max-Margin Parsing. Proceedings: EMNLP 2004. Ų Author: Ben Taskar Title: Learning Structured Prediction Models: A Large Margin Approach. Publisher: Stanford University, CA, December 2004. Ų Author: Peter Bartlett, Michael Collins, Ben Taskar, and David McAllester Title: Exponentiated gradient algorithms for large-margin structured classification. Proceedings: NIPS04, 2005. Ų Authors: T. Finley and T. Joachims Title: Supervised Clustering with Support Vector Machines, Proceedings: International Conference on Machine Learning (ICML), 2005. Ų Authors: T. Joachims, T. Galor, and R. Elber Titles: Learning to Align Sequences: A Maximum-Margin Approach, Book: New Algorithms for Macromolecular Simulation, B. Leimkuhler, LNCS Vol. 49, Publisher: Springer, 2005. Ų Authors: I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun Title: Large Margin Methods for Structured and Interdependent Output Variables, Journal: Journal of Machine Learning Research (JMLR), 6(Sep):1453-1484, 2005. Ų Author: B. Taskar, S. Lacoste-Julien, and D. Klein, Title: A Discriminative Matching Approach to Word Alignment, Proceedings: Empirical Methods in Natural Language Processing (EMNLP05), Ų Author: B. Taskar, V. Chatalbashev, D. Koller and C. Guestrin. Title: Learning Structured Prediction Models: A Large Margin Approach. Proceedings: International Conference on Machine Learning (ICML05), 2005. Ų *Author: D. Anguelov, B. Taskar, V. Chatalbashev, D. Koller, D. Gupta, G. Heitz, A. Ng. Title: Discriminative Learning of Markov Random Fields for Segmentation of 3D Scan Data. Proceedings: International Conference on Computer Vision and Pattern Recognition (CVPR05), San Diego, CA, June 2005. Ų Authors: Ryan McDonald, Koby Crammer, and Fernando Pereira Title: Online large-margin training of dependency parsers. Proceedings: Annual Meeting of the Association for Computational Linguistics (ACL 2005), July 2005. Ų Author: B. Taskar, S. Lacoste-Julien, and M. Jordan, Title: Structured Prediction via the Extragradient Method, Proceedings: Neural Information Processing Systems Conference (NIPS05), Vancouver, British Columbia, 2006. [Longer version] Perceptron and Reranking Ų Authors: Brian Roark, Murat Saraclar, Michael Collins, and Mark Johnson Title: Discriminative language modeling with conditional random fields and the perceptron algorithm. Proceedings: ACL 2004. Ų * Author: Michael Collins Title: Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. Proceedings: EMNLP 2002. (This paper includes theorems and proofs which apply to the algorithms in the ACL 2002 papers.) Ų Authors: Michael Collins and Terry Koo Title: Discriminative Reranking for Natural Language Parsing. (gzipped version) Journal: Computational Linguistics 31(1):25-69. Ų Authors: Michael Collins and Nigel Duffy Title: New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron Proceedings: ACL 2002. Ų Author: Michael Collins Title: Ranking Algorithms for Named-Entity Extraction: Boosting and the Voted Perceptron. Proceedings: ACL 2002. Ų Author: Michael Collins Title: Discriminative Reranking for Natural Language Parsing. Proceedings: ICML 2000. Gaussian Process Ų Authors: Yasemin Altun, Thomas Hofmann and Alexander J. Smola Title: Gaussian process classification for segmenting and annotating sequences. Proceedings: International Conference on Machine Learning (ICML 2004), 2004. Conditional Random Fields Ų **Authors: John Lafferty, Andrew McCallum, Fernando Pereira Title: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings: International Conference on Machine Learning* (ICML-2001), 2001. Ų Authors: Fei Sha and Fernando Pereira Title: Shallow Parsing with Conditional Random Fields. Proceedings: Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT/NAACL-03), 2003. Ų Author: Andrew McCallum Title: Efficiently Inducing Features of Conditional Random Fields. Proceedings: Conference in Uncertainty in Articifical Intelligence (UAI-2003), 2003. Ų Authors: David Pinto, Andrew McCallum, Xing Wei and W. Bruce Croft Title: Table Extraction Using Conditional Random Fields. Proceedings: Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), 2003. Ų Authors: Andrew McCallum and Wei Li Title: Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. Proceedings: Conference on Natural Language Learning (CoNLL), 2003. Ų Authors: Andrew McCallum, Khashayar Rohanimanesh and Charles Sutton Title: Dynamic Conditional Random Fields for Jointly Labeling Multiple Sequences. Proceedings: Workshop on Syntax, Semantics, Statistics; 16th Annual Conference on Neural Information Processing Systems (NIPS 2003), 2004. Ų Authors: Kevin Murphy, Antonio Torralba and William T.F. Freeman Titles: Using the forest to see the trees: a graphical model relating features, objects and scenes. Proceedings: Advances in Neural Information Processing Systems 16 (NIPS 2003), 2004. Ų Authors: Sanjiv Kumar and Martial Hebert Title: Discriminative Fields for Modeling Spatial Dependencies in Natural Images. Proceedings: Advances in Neural Information Processing Systems 16 (NIPS 2003), 2004. Ų Author: Burr Settles Title: Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. Proceedings: International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), 2004. Ų Authors: Charles Sutton, Khashayar Rohanimanesh and Andrew McCallum Title: Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. Proceedings: International Conference on Machine Learning (ICML 2004), 2004. Ų Authors: John Lafferty, Xiaojin Zhu and Yan Liu Title: Kernel conditional random fields: representation and clique selection. Proceedings: International Conference on Machine Learning (ICML 2004), 2004. Ų Authors: Xuming He, Richard Zemel, and Miguel Į. Carreira-Perpińįn Title: Multiscale conditional random fields for image labelling. Proceedings: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), 2004. Ų Author: Yasemin Altun, Alex J. Smola, Thomas Hofmann Title: Exponential Families for Conditional Random Fields. Proceedings: Conference on Uncertainty in Artificial Intelligence (UAI-2004), 2004. Ų Authors: Michelle L. Gregory and Yasemin Altun Title: Using Conditional Random Fields to Predict Pitch Accents in Conversational Speech. Proceedings: Annual Meeting of the Association for Computational Linguistics (ACL 2004), 2004. Ų Author: Brian Roark, Murat Saraclar, Michael Collins and Mark Johnson Title: Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm. Proceedings: Annual Meeting of the Association for Computational Linguistics (ACL 2004), 2004. Ų Author: Trausti T. Kristjansson, Aron Culotta, Paul Viola and Andrew McCallum Title: Interactive Information Extraction with Constrained Conditional Random Fields. Proceedings: National Conference on Artificial Intelligence (AAAI 2004), 2004. Ų Author: Thomas G. Dietterich, Adam Ashenfelter and Yaroslav Bulatov Title: Training Conditional Random Fields via Gradient Tree Boosting. Proceedings: International Conference on Machine Learning (ICML 2004), 2004. Ų **Author: Fuchun Peng and Andrew McCallum Title: Accurate Information Extraction from Research Papers using Conditional Random Fields. Proceedings: Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics* (HLT/NAACL-04), 2004. Ų Author: Ariadna Quattoni, Michael Collins and Trevor Darrel Title: Conditional Random Fields for Object Recognition. Proceedings: Advances in Neural Information Processing Systems 17 (NIPS 2004), 2005. Ų Authors: Yuan Qi, Martin Szummer and Thomas P. Minka Title: Bayesian Conditional Random Fields. Proceedings: International Workshop on Artificial Intelligence and Statistics (AISTATS 2005), 2005. Ų Authors: Aron Culotta, David Kulp and Andrew McCallum Title: Gene Prediction with Conditional Random Fields. Technical Report UM-CS-2005-028. University of Massachusetts, Amherst, 2005. Ų Authors: Yang Wang and Qiang Ji Title: A Dynamic Conditional Random Field Model for Object Segmentation in Image Sequences. Proceedings: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), Volume 1, 2005. Ų Author: Ryan McDonald and Fernando Pereira Title: Identifying gene and protein mentions in text using conditional random fields Journal: BMC Bioinformatics, 6(Suppl 1):S6, May 2005. Ų Author: Yejin Choi, Claire Cardie, Ellen Riloff, and Siddharth Patwardhan Title: Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns. Proceedings: HLT-EMNLP, 2005. Ų Authors: S.V.N. Vishwanathan, Nicol N. Schraudolph, Mark W. Schmidt, Kevin P. Murphy, Title: Accelerated Training of Conditional Random Fields with Stochastic Gradient Methods, Proceedings: ICML, 2006. Post-Processing to Enforce Dependencies Ų *Authors: Punyakanok, V., Roth, D., Yih, W., & Zimak, D. (2004). Title: Semantic role labeling via integer linear programming inference. Proceedings: COLING-2004. Ų Authors: D. Roth and W. Yih Title: Integer Linear Programming Inference for Conditional Random Fields. Proceedings: International Conference on Machine Learning (ICML) (2005) pp. 737--744 Ų Authors: V. Punyakanok, D. Roth, W. Yih, and D. Zimak Title: Learning and Inference over Constrained Output. Proceedings: International Joint Conference on Artificial Intelligence (IJCAI) (2005) pp. 1124--1129 Learning a Search Heuristic Ų Authors: Hal Daume and Daniel Marcu Title: Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction Proceedings: International Conference on Machine Learning (ICML), 2005. Ų Authors: Hal Daume, John Langford, and Daniel Marcu Title: Search-based Structured Prediction Unpublished. Kernel Dependency Estimation* Ų *Authors: J. Weston, O. Chapelle, A. Elisseeff, B. Schoelkopf and V. Vapnik Title: "Kernel Dependency Estimation" Proceedings: NIPS 2002. Ų Authors: G. Bakir, J. Weston and B. Schölkopf Title: "Learning to find Pre-Images" Proceedings: NIPS, 2003 Ų Authors: J. Kwok, I. Tsang Title: "The Pre-Image Problem in Kernel Methods" Journal: IEEE Transactions on Neural Networks, Vol. 15, No. 6, 1517 - 1525, 2004. Ų Authors: J. Weston, B. Schoelkopf and O. Bousquet Title: "Joint Kernel Maps" Proceedings: From the 8th International Work-Conference on Artificial Neural Networks, LNCS 3512, 176-191. Ų Authors: Corinna Cortes, Mehryar Mohri and Jason Weston Title: "A General Regression Technique for Learning Transductions" Proceedings: ICML, 2005 Ų Author: G. Bakir Title: "Extensions to Kernel Dependency Estimation" Thesis, 2006. Semi-Supervised Learning and Transduction Ų Authors: John Lafferty, Yan Liu and Xiaojin Zhu. Title: Kernel Conditional Random Fields: Representation, Clique Selection, and Semi-Supervised Learning. Journal: Technical Report CMU-CS-04-115, Carnegie Mellon University, 2004. Ų Author: Terry Koo and Michael Collins Title: Hidden-Variable Models for Discriminative Reranking. Proceedings: EMNLP 2005. Ų Authors: Ulf Brefeld, Tobias Scheffer. Title: Semi-Supervised Learning for Structured Output Variables, Proceedings: ICML, 2006. Ų Authors: Linli Xu, Dana Wilkinson, Finnegan Southey, Dale Schuurmans Title: Discriminative Unsupervised Learning of Structured Predictors Proceedings: ICML, 2006. Reinforcement Learning Ų Authors: Pieter Abbeel and Andrew Y. Ng. Title: Apprenticeship learning via inverse reinforcement learning Proceedings: International Conference on Machine Learning, 2004. Ų Authors: Nathan D. Ratliff, J. Andrew Bagnell, Martin A. Zinkevich, Title: Maximum Margin Planning, Proceedings: ICML, 2006. Loss Functions* Ų Authors: Lijuan Cai, Thomas Hofmann, Title: Hierarchical Document Categorization with Support Vector Machines, Proceedings: ACM 13th Conference on Information and Knowledge Management, 2004. Ų Author: T. Joachims Title: A Support Vector Method for Multivariate Performance Measures, Proceedings: International Conference on Machine Learning (ICML), 2005. Ranking Ų Authors: William W. Cohen, Rob Schapire, Yoram Singer Title: Learning to Order Things in J. Artif. Intell. Res. (JAIR) 10: 243-270 (1999). (Originally published as: William W. Cohen, Robert E. Schapire, Yoram Singer (1997): Learning to Order Things Proceedings: NIPS 1997. Ų Authors: Ralf Herbrich, Thore Graepel, Klaus Obermayer Title: Support Vector Learning for Ordinal Regression Proceedings: International Conference on Artificial Neural Networks 97--102 Ų Authors: Ralf Herbrich, Thore Graepel, Klaus Obermayer Title: Large Margin Rank Boundaries for Ordinal Regression Book: Advances in Large Margin Classifiers 115--132, 2000. Ų Authors: Koby Crammer and Yoram Singer, Title: PRanking with Ranking Proceedings: Neural Information Processing Systems (NIPS), 2001. Ų Author: T. Joachims Title: Optimizing Search Engines Using Clickthrough Data, Proceedings: ACM Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2002. Ų Author: Yoav Freund, Raj D. Iyer, Robert E. Schapire, Yoram Singer: Title: An Efficient Boosting Algorithm for Combining Preferences Journal: Journal of Machine Learning Research 4: 933-969 (2003). Ų *Authors: Marie desJardins, Eric Eaton, Kiri L. Wagstaff, Title: Learning User Preferences for Sets of Objects, Proceedings: ICML, 2006. Alignment and Translation Ų Author: T. Joachims Title: Learning to Align Sequences: A Maximum-Margin Approach Technical Report, 2003 Ų Author: T. Joachims, T. Galor, and R. Elber Title: Learning to Align Sequences: A Maximum-Margin Approach Book: New Algorithms for Macromolecular Simulation, B. Leimkuhler, LNCS Vol. 49, Springer, 2005. Ų Authors: Luke S. Zettlemoyer and Michael Collins Title: Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars. Proceedings: UAI 2005. Ų Authors: Brooke Cowan, Ivona Kucerova, and Michael Collins Title: A Discriminative Model for Tree-to-Tree Translation. Proceedings: EMNLP 2006. Ų Author: P. Liang, Alexandre Bouchard-Cote, D. Klein and B. Taskar Title: An End-to-End Discriminative Approach to Machine Translation, Proceedings: Association for Computational Linguistics (ACL06), Sydney, Australia, July 2006. Ų Author: P. Liang, B. Taskar, and D. Klein. Title: Alignment by Agreement, Proceedings: Human Language Technology conference and North American chapter of the Association for Computational Linguistics (HLT-NAACL06), New York, June 2006. Ų Authors: Andrew McCallum, Kedar Bellare, and Fernando Pereira Title: A conditional random field for discriminatively-trained finite-state string edit distance. Proceedings: Conference on Uncertainty in Artificial Intelligence (UAI 2005), July 2005. Ų *Authors: Chuong B. Do, Samuel S. Gross, and Serafim Batzoglou Title: CONTRAlign: Discriminative Training for Protein Sequence Alignment, Proceedings: RECOMB, 2006. Ų Authors: Chun-Nam Yu, T. Joachims and R. Elber Title: Training Protein Threading Models Using Structural SVMs, Proceedings: ICML Workshop on Learning in Structured Output Spaces, 2006.