Term | Spring 2019 | Instructor | Christopher De Sa |
Course website | www.cs.cornell.edu/courses/cs4787/2019sp/ | [email hidden] | |
Schedule | MW 7:30-8:45PM | Office hours | Wednesdays 2PM |
Room | Hollister Hall B14 | Office | Bill and Melinda Gates Hall 450 |
[Piazza] [CMS] [Gradescope]
Description: CS4787 will explore the principles behind scalable machine learning systems. The course will cover the algorithmic and the implementation principles that power the current generation of machine learning on big data. We will cover training and inference for both traditional ML algorithms such as linear and logistic regression, as well as deep models. Topics will include: estimating statistics of data quickly with subsampling, stochastic gradient descent and other scalable optimization methods, mini-batch training, accelerated methods, adaptive learning rates, methods for scalable deep learning, hyperparameter optimization, parallel and distributed training, and quantization and model compression.
Prerequisites: CS4780 or equivalent, CS 2110 or equivalent
Format: Lectures during the scheduled lecture period will cover the course content. Problem sets will be used to encourage familiarity with the content and develop competence with the more mathematical aspects of the course. Programming assignments will help build intuition and familiarity with how machine learning algorithms run. There will be one midterm exam and one final exam.
Material: The course is based on books, papers, and other texts in machine learning, scalable optimization, and systems. Texts will be provided ahead of time on the website on a per-lecture basis. You aren't expected to necessarily read the texts, but they will provide useful background for the material we are discussing.
Grading: Students will be evaluated on the following basis.
15% | Problem sets |
40% | Programming assignments |
15% | Midterm Exam |
30% | Final Exam |
Resources: Download the course VM: https://cornell.app.box.com/s/r32b1mnw4sl4k5kdk64ctp9phhpqnqqh
Course calendar may be subject to change.
Wednesday, January 23 | Lecture 1. Introduction and course overview. [Notes] |
Monday, January 28 |
Lecture 2. Estimating large sums with samples, e.g. the empirical risk. Concentration inequalities. [Notes] [Demo]
Background reading material:
|
Wednesday, January 30 |
Lecture 3. Exponential Concentration Inequalities and Emprical Risk Minimization. [Notes]
Background reading material:
|
Monday, February 4 |
Lecture 4. Learning with gradient descent, convex optimization and conditioning. [Notes]
Background reading material:
|
Wednesday, February 6 |
Lecture 5. Stochastic gradient descent. [Notes] [Demo Jupyter] [Demo HTML]
Background reading material:
|
Monday, February 11 |
Lecture 6. Minibatching and the effect of the learning rate. Our first hyperparameters. [Notes] [Demo Jupyter] [Demo HTML]
Background reading material:
|
Wednesday, February 13 |
Lecture 7. Accelerating SGD with momentum. [Notes] [Demo Jupyter] [Demo HTML]
Background reading material:
|
Monday, February 18 |
Lecture 8. Accelerating SGD with preconditioning and adaptive learning rates. [Notes]
Background reading material:
|
Wednesday, February 20 |
Lecture 9. Accelerating SGD with variance reduction and averaging. [Notes]
Background reading material:
|
Monday, February 25 | No lecture. February break. |
Wednesday, February 27 |
Lecture 10. Dimensionality reduction and sparsity. [Notes]
Background reading material:
|
Monday, March 4 |
Lecture 11. Deep neural networks. Matrix multiply as computational core of learning. [Notes]
Background reading material:
|
Wednesday, March 6 |
Lecture 12. Automatic differentiation and ML frameworks. [Notes]
Background reading material:
|
Monday, March 11 |
Lecture 13. Accelerating DNN training: early stopping and batch normalization. [Notes] [Demo Jupyter] [Demo PDF]
Background reading material:
|
Wednesday, March 13 |
In-class midterm.
|
Monday, March 18 |
Lecture 14. Hyperparameter optimization. Grid search. Random search. [Notes]
Background reading material:
|
Wednesday, March 20 |
Lecture 15. Kernels and kernel feature extraction. [Notes]
Background reading material:
|
Monday, March 25 |
Lecture 16. Bayesian optimization 1. [Notes]
Background reading material:
|
Wednesday, March 27 |
Lecture 17. Bayesian optimization 2. [Notes]
Background reading material: same as Bayesian optimization 1. |
Monday, April 1 | No lecture. Spring break. |
Wednesday, April 3 | No lecture. Spring break. |
Monday, April 8 |
Lecture 18. Parallelism. [Notes]
Background reading material:
|
Wednesday, April 10 |
Lecture 19. Parallelism 2. [Notes]
Background reading material: same as Parallelism 1. |
Monday, April 15 |
Lecture 20. Memory locality and memory bandwidth. [Notes]
Background reading material: same as Parallelism 1. |
Wednesday, April 17 |
Lecture 21. Machine learning on GPUs; matrix multiply returns. [Notes]
Background reading material:
|
Monday, April 22 |
Lecture 22. Distributed learning and the parameter server. [Notes]
Background reading material:
|
Wednesday, April 24 |
Lecture 23. Quantized, low-precision machine learning. [Notes]
Background reading material:
|
Monday, April 29 |
Lecture 24. Deployment and low-latency inference. Deep neural network compression and pruning. [Notes]
Background reading material:
|
Wednesday, May 1 |
Lecture 25. Machine learning accelerators. [Notes]
Background reading material:
|
Monday, May 6 |
Lecture 26. Online Learning, Realtime Learning, and Course Summary. [Notes]
Background reading material:
|
Tuesday, May 14, 9:00 AM | Final Exam. |