Advanced Topics in Machine
Learning
CS678 - Spring 2002
Cornell University
Department of Computer Science
|
|
Time and Place |
|
First lecture: January 21st, 2002
Last lecture: May 3rd, 2002
- Monday, 1:25pm - 2:15pm in Hollister Hall 110
- Wednesday, 1:25pm - 2:15pm in
Hollister Hall 110
- Friday, 1:25pm - 2:15pm in
Hollister Hall 110
|
Instructors |
|
|
Lecture Notes, Slides, and Handouts |
|
Lecture notes and slides are handed out in class.
Papers for Student Presentations:
- SVM Clustering:
- B. Sch�lkopf, J. Platt, J. Shawe-Taylor, A. J.
Smola, and R. C. Williamson. Estimating
the support of a high-dimensional distribution. Technical
Report 99-87, Microsoft Research, 1999. To appear in Neural
Computation, 2001.
- Ben-Hur et al., Support Vector
Clustering. JMLR, 2, 2001.
(1-2 students, 20 minutes, April 22/24)
- SVM Regression:
- A. J. Smola and B. Sch�lkopf.
A
tutorial on support vector regression. NeuroCOLT Technical
Report NC-TR-98-030, Royal Holloway College, University of London,
UK, 1998. To appear in Statistics and Computing, 2001. (pages
1-14 only)
(1 student, 20 minutes, Feb. 27 or March 1)
- Kernel Principal Component Analysis:
Bernhard Sch�lkopf, Alexander Smola, Klaus-Robert M�ller, Kernel
Principal Component Analysis, in: B. Scholkopf, C. Burges,
and A. Smola, editors, Advances in Kernel Methods ---
Support Vector Learning. MIT Press, Cambridge, MA, 1999. 327 --
352. Short
version or chapter in Support
Vector Learning for background.
(1 student, 20 minutes, April 12/15/17)
- Multi-Class SVMs:
- John Platt, Large-Margin
DAGs for Multi-Class Classification, NIPS 2000.
(1 student, 20 minutes, Feb. 27 or March 1)
- Learning Rankings:
- William W. Cohen, Robert E. Schapire, Yoram Singer, Learning
to order things, Journal of Artificial Intelligence Research,
10, 1999.
(1 student, 20 minutes, April 1/3).
- Boosting/Bagging:
- Leo Breiman, Arcing
Classifiers, Machine Learning, 1998.
- Eric Bauer, Roni Kohavi, An Empirical
Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants, Machine Learning, 1999.
(2 students, 30 minutes, Feb. 22/25/27)
- Learning to Learn:
- Sebastian Thrun and Joseph O'Sullivan, Discovering
Structure in Multiple Learning Tasks: The TC Algorithm, ICML-96.
(1 student, 20 minutes, March 6/8)
- Clustering:
- P.S. Bradley, Usama Fayyad, and Cory Reina, Scaling
Clustering Algorithms to Large Databases, AAAI-98.
- P.S. Bradley and Usama Fayyad, Refining
Initial Points for K-Means Clustering, ICML-98.
(1 student, 30 minutes, April 29 or May 1)
- Graphical Models:
-
(1 student, 20 minutes, March 15/25/27)
- ROC and Related Methods:
-Foster Provost and Tom Fawcett, Analysis
and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions, KDD-97 (ROC Convex Hull Method)
-Chris Drummond and Robert C. Holte, Explicitly
Representing Expected Cost: An Alternative to ROC Representation, KDD-2000.
(1 student, 25 minutes, April 19/22/24)
|
Syllabus |
|
This 4 credit course extends and complements
CS478 and CS578, giving in-depth coverage of new and advanced methods in
machine learning. In particular, we will connect to open research
questions in machine learning, giving starting points for future
work. The content of the course will reflect an equal
balance between learning theory and practical machine learning, making
an emphasis on approaches with practical relevance. The
course will cover the following main topics:
- Support Vector Machines and Kernel-based Methods:
VC theory, optimal hyperplane and maximum-margin separation,
soft-margin, SVMs for regression, Mercer kernels, error bounds,
leave-one-out bounds, quadratic programming, connections to
related methods (8 lectures)
- Unsupervised Learning and Clustering:
agglomerative clustering, distributional clustering, k-means,
Bayesian clustering, principal component analysis, scaling issues
for large datasets (6 lectures)
- Bayes Nets: inference, maximum likelihood
estimation, latent variables, expectation/maximization, hidden
Markov models, learning structure, causality (5 lectures)
- Boosting and Bagging: Adaboost,
bias/variance, margins (5 lectures)
- Error Estimation and Model Selection: no
free lunch, bias/variance, Bayesian learning, minimum description
length, Leave-one-out and cross-validation, holdout testing,
bootstrap estimation (3 lectures)
- Learning to Order Data: learning
retrieval functions in information retrieval, learning for ROC
analysis (3 lectures)
- Inductive Transfer: Learning multiple
related tasks (3 lectures)
- Reinforcement Learning: Markov decision
processes, finite state models, Q-learning, dynamic programming (1-2
lecture)
We will
illustrate methods and theory with practical examples in the areas of
information retrieval, language technology, and medical decision making.
|
Reference Material |
|
We will provide reading material and hand
it out in class. It will cover all material presented in this
course. For further reading, we recommended the following books that
each cover part of the syllabus:
- Duda, Hart, Stork, "Pattern
Classification"
- Devroye, Gyoerfi, Lugosi, "A Probabilistic
Theory of Pattern Recognition"
- Shawe-Taylor, Cristianini, "Introduction to
Support Vector Machines"
- Hastie, Tibshirani, Friedman, "The Elements of
Statistical Learning"
- Vapnik, "Statistical Learning Theory"
- Sutton, Barto, "Reinforcement Learning"
- Mitchell, "Machine Learning"
|
Prerequisites |
|
Any of the following:
- CS478
- CS578
- equivalent of any of the above
- permission from the instructors
|
Grading |
|
Grades will be determined based on a
take-home midterm exam, a final exam, homework assignments, a research
project, and student presentations of selected papers.
- 20%: Homework: (4 homeworks max, some programming,
some non-programming)
- 20%: MidTerm Exam (take home)
- 20%: Final Exam (in class)
- 20%: Student Paper Presentations
- 20%: Final Projects
Roughly: A=90-100; B=80-90; C=70-80; D=60-70; F= below
60
|