CS 678 Spring 2006
COURSE: COM-S 678 (096-006) Spring 2006
TITLE: Advanced Topics in Machine Learning
INSTRUCTOR: Rich Caruana (caruana@cs.cornell.edu)
SCHEDULE: MWF 1:25-2:15 Upson 109
Announcements:
Final Project Reports are due MONDAY, MAY 15.
The schedule below has been updated using the information from the sign-up sheet in Monday's class. Please let me know ASAP if there is a problem.
Course
Description:
This graduate-level course is aimed at students who already have a solid
foundation in machine learning and want to delve deeper into topics
currently under study in the research community. The course will
focus on three areas of active research in machine learning: Ensemble
Learning, Inductive Transfer, and Semi-Supervised Learning and
Clustering.
ENSEMBLE LEARNING is a large collection of methods that yield improved generalization by training more than one model on each problem (e.g. train 100 SVMs instead of just one SVM), and then combine the predictions these models make by averaging, voting, or other methods. Ensemble learning methods include bagging, boosting, Bayesian averaging, random forests, error-correcting output codes, probing, and ensemble selection. These methods differ in how they train different models, how they combine model predictions, and in what kinds of guarantees they provide.
INDUCTIVE TRANSFER (a.k.a. multitask learning, lifelong learning, learning-to-learn, representation learning) is a class of learning algorithms that yield improved generalization by learning groups of related problems in series or in parallel. Generalization performance improves when what is learned for some of the learning problems is transfered to the other learning problems. Most of the research in inductive transfer focuses on how to transfer learned structure between related problems and how to characterize learning problems for which inductive transfer is most beneficial.
SEMI-SUPERVISED LEARNING (a.k.a. clustering with side information, meta-clustering, transduction, ...) is a collection of methods where partial supervision in the form of labels or constraints is provided for some training cases, but is missing for most training cases. In these problems learning often consists of clustering the data using the side information to guide the algorithm towards clusterings that are consistent with the auxiliary information. Research in semi-supervised learning is important because many real datasets have only partial labels or side information, yet is somewhat handicapped by the difficulty of objectively evaluating cluster quality.
Because most ensemble, inductive transfer, and semi-supervised learning methods are too new to be included in textbooks, much of the course will focus on reading recent research papers once a general introduction to each area has been given. Grading will be based on class participation and a significant final project. Possible final projects include applying one of the methods in a novel way to a novel problem, comparing competing methods on a set of test problems, implementing/testing a potential improvement to a method, or developing/testing a new method. Successful projects may yield publishable research.
Prerequisites: grade of B+ or higher in 478 or 578, or equivalent with permission of the instructor.
If you want to take the course please email me, or just show up for the first class next Monday at 1:25 in Upson 109
-Rich.
Jan 23: Administrivia and Introduction
Jan 25: Ensemble Selection
Jan 27: no class
Jan 30: Ensemble Selection
Feb 01: Ensemble Selection
Feb 03: Bias/Variance Decomposition
Feb 06: Bias/Variance, Bagging
Feb 08: Bagging and Boosting
Feb 10: Boosting
Feb 13: Multitask Learning
Feb 15: Multitask Learning
Feb 17: Multitask Learning
Feb 20: Bauer & Kohavi: Empirical Comparison of Ensemble Methods (Karan
Singh & Nick Hamatake)
Feb 22: Random Forests (Robert Young &
Feb 24: Meta Clustering and Semi-Supervised Learning
Feb 27: Ludmila, Kunchev & Whitaker: Measures of Diversity (TJ & Justin
Wick)
Mar 01: Domingos: Bayesian Averaging (Daria Sorokina & Yunpeng)
Mar 03: Valentini & Dietterich: Bias-Variance Analysis of Support Vector Machines for the Development
of SVM-Based Ensemble Methods (David Michael & Tim)
Mar 06: Stacking with MDTs (Yunsong Guo & Nam Nguyen)
Mar 08: O'Sullivan, Langford, Caruana: Feature Boosting (Caruana) & Ensemble
Selection Code and Models
Mar 10: Collins & Schapire: Logistic Regression, AdaBoost and Bregman
Distances (James Lenfestey & Yisong Yue)
Mar 13:
J Wu, JM Rehg, MD Mullin: Rare Event Detection Cascade by Direct Feature Selection
(Soo Yeon Lee & Michael Schmidt)
Mar 15: Ensemble Selection Totorial
Mar 17:
TG Dietterich, G Bakiri: Solving Multiclass Learning Problems via Error-Correcting Output Codes
(Art Munson & Alex Niculescu)
Mar 18-26: Spring Break
Mar 27: TG Dietterich: Ensemble Methods in Machine Learning - Multiple Classifier Systems, 2000
(Muthiah Chettiar)
Mar 29: Ando & Zhang: "A Framework for Learning
Predictive Structures from Multiple Tasks and Unlabeled Data"
http://www-cs-students.stanford.edu/~tzhang/papers/jmlr05_semisup.pdf
(TJ & Justin Wick)
Mar 31: Thrun. Is learning the nth thing any easier than learning the
first? In Advances in NIPS, 640-646, 1996. (Dave Michael & Robert Jung)
Apr 03: paper review
Apr 05: Combining Labeled and Unlabeled Data with Co-Training. (Muthiah &
Yunpeng)
Apr 07: Selective Transfer of Task Knowledge Using Stochastic Noise Silver
and McCracken, 2003, Advances in AI (Nam Nguyen & Yunsong) http://www.springerlink.com/link.asp?id=eb5jrhahbx3uq32g
Apr 10: Bakker & Heskes: Task Clustering and Gating for Bayesian Multitask
Learning. (Art Munson & Daria Sorokina) http://www.ai.mit.edu/projects/jmlr/papers/volume4/bakker03a/bakker03a.pdf
Apr 12: Rosenstein, Marx, Kaelbling, Dietterich: To Transfer or Not to Transfer
(Karan Singh & Nick Hamatake)
Apr 14: Ben-David, S., and Schuller, R., "Exploiting Task Relatedness for
Multitask Learning" Proceedings of COLT 2003. (Alex Niculescu & Tim
Harbers)
http://iitrl.acadiau.ca/itws05/Papers/ITWS10-RosensteinM05_ITWS.pdf
Apr 17: Kai Yu & Volker Tresp: Learning to Learn and Collaborative Filtering
(Aaron & Yisong) http://www.cs.berkeley.edu/~russell/classes/cs294/f05/papers/yu+tresp-2005.pdf
Apr 19:
Apr 21: Alex Niculescu: Multitask Bayes Net Structure Learning (Alex)
Apr 24: Caruana, Elhawary, Nuguyen, Smith: Meta Clustering (Nuguyen)
Apr 26: Kiri Wagstaff, Claire Cardie, Seth Rogers, and Stefan Schroedl: Constrained
K-means Clustering with Background Knowledge (Michael Schmidt, Robert Jung,
TJ)
Apr 28: Eric P. Xing, Andrew Y. Ng, Michael I. Jordan and Stuart Russell:
Distance Metric Learning with Application to Clustering with
Side-Information in NIPS 03 (Yunsong, Yunpeng, Yisong)
May 01: Linli Xu, James Neufeld, Bryce Larson, Dale Schuurmans: Maximum Margin
Clustering (NIPS 2004) http://books.nips.cc/papers/files/nips17/NIPS2004_0834.pdf
(Tim Harbers, Aaron, Soo Yeon Lee)
May 03: S Basu, A Banerjee, RJ Mooney: Semi-supervised Clustering by Seeding http://www.cs.utexas.edu/users/ml/papers/semi-icml-02.ps.gz
(Karan, Nick, Dave Michael)
May 05: Learning from Labeled and Unlabeled Data using Graph Mincuts www.cs.cmu.edu/~shuchi/papers/mincut.ps
(David Siegel, Muthiah, Daria)
Ensemble Selection Slides (ensemble.selection.slides)
Bias/Variance, Bagging, Boosting Slides (bias.variance.bagging.boosting.slides)
Performance Measures (background infor for those not familiar with ROC, Precision/Recall, ...) (performance.measure.slides)
Inductive Transfer and Multitask Learning Slides (mtl.slides)
Meta Clustering and Semi-Supervised Learning (semisup.slides)
Empirical Comparison by Bauer & Kohavi (bauer.slides)
Ensemble Selection Compact 5k Libraries (ES.small.libs.tar).
Ensemble selection code (shotgun.dist.tar.gz)
X Zhu: Semi-Supervised Learning Literature Survey
http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
G Fung, OL Mangasarian: Semi-supervised support vector machines for
unlabeled data classification, Optimization Methods and Software, 2001Q Zhao, DJ Miller: SEMISUPERVISED LEARNING OF MIXTURE MODELS WITH
CLASS CONSTRAINTSAlexander Strehl, Joydeep Ghosh: Cluster Ensembles Knowledge Reuse
Framework for Combining PartitioningsXiaojin Zhu, Zoubin Ghahramani, John Lafferty: Semi-Supervised
Learning Using Gaussian Fields and Harmonic FunctionsTransductive Learning via Spectral Graph Partitioning
http://www.cs.cornell.edu/People/tj/publications/joachims_03a.pdf
Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lot
http://research.microsoft.com/~chelba/publications/camera_ready.EMNLP04.pdf
D Zhou, O Bousquet, TN Lal, J Weston, B Scholkopf: Learning with local
and global consistency, NIPS*2004RK Ando, T Zhang: A High-Performance Semi-Supervised Learning Method
for Text ChunkingA Blum, J Lafferty, MR Rwebangira, R Reddy: Semi-Supervised Learning
Using Randomized Mincuts, ICML*2004Kristen P. Bennett: Semi-Superivised Support Vector Machines
http://www1.cs.columbia.edu/~dplewis/candidacy/bennett98semisupervised.pdf
Jing Gao: Semi-Supervised Clustering with Partial Background Information
http://www.cse.msu.edu/~chenghai/paper/Semi-supervised%20Clustering%20with%20Partial%20Background%20Information.pdf
X. Z. Fern, C. E. Brodley: Solving Clustering Ensemble Problems by
Bipartite Grpah Partitioning (ICML 2004)Lafferty and Zhu: Harmonic mixtures: combining mixture models and
graph-based methods for inductive and scalable semi-supervised learningSugato Basu, Mikhail Bilenko, and Raymond J. Mooney: A Probabilistic
Framework for Semi-Supervised Clustering (Best Research Paper Award KDD-2004)Integrating constraints and metric learning in semi-supervised
clusteringT. Finley and T. Joachims: Supervised Clustering with Support Vector
Machines in ICML05 (distinguished student paper award)Neil D. Lawrence, John C. Platt, "Learning to Learn with the
Informative Vector Machine". International Conference on Machine Learning, Paper 65, 2004. http://delivery.acm.org/10.1145/1020000/1015382/p178-lawrence.pdf?key1=1015382&key2=8406218311&coll=GUIDE&dl=GUIDE&CFID=63282663&CFTOKEN=21052308Joshua Tenebaum, Thomas Griffiths. (2001) "Structure learning in human causal induction". Advances in Neural Information Processing Systems 13. Cambridge, MA: MIT Press.
Composition of Conditional Random fields for transfer learning http://www.cs.umass.edu/~mccallum/papers/transfer-emnlp05.pdf
Semiparametric Latent Factor Models http://www.cs.berkeley.edu/~jordan/papers/teh-seeger-jordan04.pdf
Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement Schmidhuber, Zhao, and Wiering, 1997, Journal of Machine Learning http://www.springerlink.com/link.asp?id=l550222682001578
Jonathan Baxter. "A model of inductive bias learning". JAIR, 12, 149-198, 2000. http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume12/baxter00a.pdf
Learning to Learn and Collaborative Filtering Kai Yu & Volker Tresp http://www.cs.berkeley.edu/~russell/classes/cs294/f05/papers/yu+tresp-2005.pdf
Inductive Transfer using Kernel Multitask Latent Analysis Z. Xiang and K. P. Bennett http://iitrl.acadiau.ca/itws05/Papers/ITWS17-XiangBennett_REV.pdf
Learning Multiple Tasks with Kernel Methods Theodoros Evgeniou, Charles A. Micchelli and Massimiliano Pontil http://delivery.acm.org/10.1145/1090000/1088693/6-615-evgeniou.pdf?key1=1088693&key2=6965218311&coll=GUIDE&dl=GUIDE&CFID=63282663&CFTOKEN=21052308
http://www.aicml.cs.ualberta.ca/_banff04/icml/pages/papers/355.ps
Pengcheng Wu, Thomas Dietterich, Improving SVM Accuracy by Training on Auxiliary Data Sources (ICML 2004).
"Learning Gaussian Processes from Multiple Tasks" http://www.machinelearning.org/proceedings/icml2005/papers/128_GaussianProcesses_YuEtAl.pdf
"Regularized multi-task learning." http://www.cs.berkeley.edu/~russell/classes/cs294/f05/papers/evgeniou+pontil-2004.pdf
Christophe Giraud-Carrier, Ricardo Vilalta, Pavel Brazdil. (2004) "Introduction to the Special Issue on Meta-Learning". Mach. Learn. 54, 3 (Mar. 2004), 187-193.
"Transfer in Variable-Reward Hierarchical Reinforcement Learning" Neville Mehta, Sriraam Natarajan, Prasad Tadepalli, Alan Fern http://iitrl.acadiau.ca/itws05/Papers/ITWS14-Mehta-vrhrl_REF._pdf
Transfer Learning of Object Classes: From Cartoons to Photographs Geremy Heitz, Gal Elidan, Daphne Koller http://iitrl.acadiau.ca/itws05/Papers/ITWS06-Hietz_REV.pdf
Benefitting from the variables that variable selection discards Rich Caruana, Virginia R. de Sa https://portal.acm.org/poplogin.cfm?dl=ACM&coll=GUIDE&comp_id=944972&want_href=delivery%2Ecfm%3Fid%3D944972%26type%3Dpdf&CFID=63303065&CFTOKEN=46354793&td=1138150069390
Leo Breiman. "Bagging Predictors," Machine Learning, 24, 123-140, 1996.
Rich Caruana, Alex Niculescu, Geoff Crew, and Alex Ksikes, "Ensemble
Selection from Libraries of Models," The International Conference on
Machine Learning (ICML'04), 2004.
Eric Bauer and Ron Kohavi
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting,
and Variants
citeseer.ist.psu.edu/bauer99empirical.html
Random Forests by Leo Breiman, Machine Learning Vol 45 No.1 Oct 2001.
Ensemble Methods in Machine Learning TG Dietterich - Multiple
Classifier Systems, 2000
Robert E. Schapire. "The Boosting Approach to Machine Learning: An
Overview," MSRI Workshop on Nonlinear Estimation and Classification.
Friedman, and Popescu, "Predictive Learning via Rule Ensembles."
<http://www-stat.stanford.edu/~jhf/ftp/RuleFit.pdf> (Feb. 2005)
O'Sullivan, Langford, Caruana, and Blum. "Feature Boosting" ICML2000.
A comparison of stacking with MDTs to bagging, boosting, and other stacking
methods. http://ai.ijs.si/bernard/mdts/pub03.pdf
In this paper, we present an integration of the algorithm MLC4.5 for
learning meta decision trees (MDTs) into the Weka data mining
suite. MDTs are a method for combining multiple classifiers. Instead
of giving a prediction, MDT leaves specify which classifier should be
used to obtain a prediction. The algorithm is based on the C4.5
algorithm for learning ordinary decision trees. An extensive
performance evaluation of stacking with MDTs on twenty-one data sets
has been performed. We combine base-level classifiers generated by
three learning algorithms: an algorithm for learning decision trees, a
nearest neighbor algorithm and a naive Bayes algorithm. We compare
MDTs to bagged and boosted decision trees, and to combined classifiers
with voting and three different stacking methods: with ordinary
decision trees, with naive Bayes algorithm and with multi-response
linear regression as a meta-level classifier. In terms of performance,
stacking with MDTs gives better results than other methods except when
compared to stacking with multi- response linear regression as a
meta-level classifier; the latter is slightly better than MDTs.
Ludmila I. Kunchev, Christopher J. Whitaker: Measures of Diversity in
Classifier Ensembles and Their Relationship with the Ensemble Accuracy
(in Machine Learning Vol 51/2 (2003))
http://www.springerlink.com/link.asp?id=q16465142221t2qv
Solving Multiclass Learning Problems via Error-Correcting Output Codes
TG Dietterich, G Bakiri - Arxiv preprint cs.AI/9501101, 1995
http://arxiv.org/PS_cache/cs/pdf/9501/9501101.pdf
"Bayesian Averaging of Classifers and the Overfitting Problem"
http://www.cs.washington.edu/homes/pedrod/papers/mlc00b.pdf
"Logistic Regression, AdaBoost and Bregman Distances"
http://people.csail.mit.edu/mcollins/papers/colt2000.ps
Learning a Rare Event Detection Cascade by Direct Feature Selection
(looks like an interesting modification of Viola/Jones technique)
J Wu, JM Rehg, MD Mullin - Advances in Neural Information Processing
Systems, 2004
http://www.cc.gatech.edu/~wujx/paper/nips16_bw.pdf
Bias-Variance Analysis of Support Vector Machines for the Development
of SVM-Based Ensemble Methods
<http://portal.acm.org/citation.cfm?id=3D1016783&coll=3DPortal&dl=3D=GUIDE&CFID=3D63305426&CFTOKEN=3D19334353>by
Georgio Valentini, Thomas
Learning Ensembles from Bites: A Scalable and Accurate Approach Nitesh
V. Chawla, Lawrence O. Hall, Kevin W. Bowyer, W. Philip Kegelmeyer
http://delivery.acm.org.proxy.library.cornell.edu:2048/10.1145/1010000/1005347/p421-chawla.pdf?key1=1005347&key2=2547418311&coll=ACM&dl=ACM&CFID=66595959&CFTOKEN=9180342
Theoretical Views of boosting and applications
http://www.cs.princeton.edu/~schapire/uncompress-papers.cgi/Schapire99d.ps