cs3780 mascot

Course mascot: Berry, the llama
(The llama and photo are from the personal collection of TG.)

Instructors: Karthik Sridharan and Tushaar Gangavarapu

Contact: Ed (for most questions), intro-ml-prof@cornell.edu (for sensitive/discreet inquiries only)

Course staff office hours: QueueMeIn (instructions will be posted on Ed)

Instructor office hours:

Karthik Sridharan: Monday 11am-12pm (booking link), Gates 424

Tushaar Gangavarapu: Thursday 5-6pm, Ives 103

Lectures: Tuesday and Thursday, 1.25-2.40pm, Baker Lab 200

Course overview: The course provides an introduction to machine learning, focusing on supervised learning and its theoretical foundations. Topics include regularized linear models, boosting, kernels, deep networks, generative models, online learning, and ethical questions arising in ML applications.

Prerequisites (not corequisites): Probability theory (e.g., BTRY 3080, CS 2800, ECON 3130, ENGRD 2700, MATH 4710) and linear algebra (e.g., MATH 2210, MATH 2310, MATH 2940), single-variable calculus (e.g., MATH 1110, MATH 1920) and programming proficiency (e.g., CS 2110).

(Please see the FAQ on corequisites for more information.)

Course logistics: For enrolled students the companion Canvas page serves as a hub for access to Ed (the course forum), Vocareum (for course projects), Gradescope (for HWs), and paper comprehension quizzes. If you are enrolled in the course you should automatically have access to the site. Please let us know if you are unable to access it.


Homework, projects, and exams


Your grade in this course is comprised of three components: homework, exams, and projects. Please also read through the given references in concert with the lectures.

Homework

There will be a number of homework assignments throughout the course, typically made available roughly one to two weeks before the due date. The homework primarily focuses on theoretical aspects of the material and is intended to provide preparation for the exams. Homework may be completed in groups of up to four. The assignments themselves will be made available via Gradescope (through Canvas).

You are allowed two slip days per homework assignment.
Projects

To provide hands-on learning with the methods discussed in class there are a number of programming projects throughout the course. The projects may be completed individually or in a group of two. They are accessed, submitted, and graded using Vocareum.

You are allowed two slip days per project (same as for homework).
Paper comprehension (CS 5780)

Students enrolled in the graduate version of the course (i.e., in CS 5780) are required to read the assigned research paper(s) and complete the associated online quiz. Paper(s) will be assigned roughly once every two to three weeks. This is an individual component (not to be done in groups). The quizzes will be made available on Canvas.

You are allowed two slip days per project (same as for homework).
Exams

There will be two exams for this class (to be completed individually, not in a group), an evening prelim and a final exam. The location and time for the final is TBD; we will update the date below once the registrar finalizes it.

Grading

Final grades are based on homework assignments, programming projects, and the exams. For the 5780 level version of the course, the research comprehension quizzes will also factor in.

For CS 3780 your final grade consists of:

Exams: 48%
Homework: 15%
Projects: 37%
For CS 5780 your final grade consists of:

Exams: 45%
Homework: 10%
Projects: 35%
Paper comprehension: 10%

Undergraduates enrolled in CS 3780 may choose to do the paper comprehension assignments; if completed, you will receive the higher of your two grades between the above schemes.


Schedule


A tentative schedule is as follows, and includes the topics we will be covering, relevant reference material, and assignment information. It is quite possible the specific topics covered on a given day will change slightly. This is particularly true for the lectures in the latter part of the course, and this schedule will be updated as needed.

Please note that the due dates here are mostly correct, but may change. Check Canvas for any changes to assignment due dates.

Date Topic References Notes, assignments, etc.
1/21/25 Introduction PML: 1.1; ESL: Ch. 1; and PPA: Ch. 1 Slides (.pdf):
Includes course logistics, policies, etc.
1/23/25 ML basics PML: 1.2, and ESL: 2.1 and 2.2
1/28/25 K-nearest neighbors and the curse of dimensionality PML: 16.1
1/30/25 The perceptron Wikipedia article
2/4/25 Clustering: K-means ESL: 14.3.6 and 14.3.7, and PML: 21.3
2/6/25 Principal component analysis PML: 20.1, ESL: 14.5.1 and 14.5.2
2/11/25 MLE and MAP Nice YouTube video for MLE and MAP
Ben Taskar's lecture notes
Tom Mitchell's book chapter on MLE and MAP
ESL: 8.2.2-8.3
2/13/25 Naive Bayes ESL: 6.6.3, and Tom Mitchell's book chapter
2/18/25 February break No class
2/20/25 Logistic regression PML: 10.1 and 10.2
2/25/25 Gradient descent, Newton's method PML: 8.1, 8.2, and 8.3
2/27/25 Stochastic gradient descent, Adagrad, Adam PML: 8.4
3/4/25 Linear regression PML 11.1, 11.2,11.3 and ESL 3.2
3/6/25 Prelim review (jeopardy)
3/11/25 Prelim open OH (Prelim day)
3/11/25 Prelim Prelim location: Bailey Hall 101
Prelim time: 7.30pm
3/13/25 Support vector machines,
empirical risk minimization (ERM)
ERM: PML 4.3, 5.4
3/18/25 Kernels PML: 17.1
3/20/25 Kernel SVM PML: 17.3
3/25/25 Model selection
3/27/25 Bias-variance tradeoff
4/1/25 Spring break (Dragon! Dragon! Dragon! Oi! Oi! Oi!)
Woohooo!!
4/3/25 Spring break Woohooo!!
4/8/25 Classification and regression trees
4/10/25 Ensemble methods: Bagging and random forest
4/15/25 Ensemble methods: Boosting
4/17/25 Neural networks
4/22/25 Neural networks (contd.)
4/24/25 Convolutional neural networks
4/29/25 Neural networks: Transformers Transformer algorithm
Transformers explained
5/1/25 Generative AI, diffusion models, and sampling
5/6/25 AI in human society
TBD Final Final location: TBD
Final time: TBD

References


While this course does not explicitly follow a specific textbook, there are several that are very useful references to supplement the course.

Books

We will not be explicitly following any single textbook in this course. Nevertheless, the book by Murphy roughly covers the material for the course and is recommended. Most suggested readings are assigned out of these two texts.

[PML] Probabilistic Machine Learning: An Introduction, by Murphy

We will provide section numbers to this text alongside many of the lectures. This text is available digitally through the Cornell University Library and a draft version is available directly from the author.
PML book website
[ESL] The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman

This text provides a comprehensive introduction to statistical learning and provides in-depth discussion of many of the topics in this course. The book is available directly from the authors.
ESL book website

Additional references

Three additional texts are provided that complement these texts and are useful for further study (or to gain another perspective).

Background references

Background in linear algebra, probability, and calculus, as well as some "mathematical maturity" is assumed for this course. If you feel you need a refresher, or would like to learn more about these topics, the following resources may be useful.

Other resources

This is a non-exhaustive (and an in-progress) list of additional resources (and information) that may be useful for the course.


FAQ


Q. I am currently on the waitlist [at position XX], do I need to do anything (e.g., email the instructors)? enrollment

A. No, you do not need to contact the instructors or the course staff. Given the class capacity, we expect most people to make it off the waitlist. However, even with space, you'll have to wait for CIS admins to add you to the course (which is usually done in batches).

For more information (the FAQ on position in the waitlist might be of specific interest), the following links: can assist you. If you have any more questions, please email courses@cis.cornell.edu.

That said, even if you are not enrolled, we strongly recommend you attend the first lectures (to note the specifics of the placement exam), so long as there are physical seats available.

Q. I am having trouble joining the waitlist, what should I do? enrollment

A. If you are having trouble joining the waitlist, please double check that you have carefully followed all the instructions noted in If you are still having trouble, please email courses@cis.cornell.edu. Unfortunately, the course instructors or staff won't be able to help you with this.

As noted in the previous question, even if you are not enrolled, we strongly recommend you attend the first lectures—we will detail the specifics of the placement exam (and how you can access it, irrespective of your enrollment status) in the class.

Q. Prerequisite or corequisite: I'm currently enrolled in a course listed as a prerequisite, am I still allowed to enroll? enrollment

A. We strongly recommend that you have completed the prerequisite courses prior to taking CS 3780.

However, if you are currently enrolled in a prerequisite course, we will use the placement exam scheduled during the first week of class to assess your comfort level with the necessary background knowledge for CS 3780.

Q. Can I take the class S/U? What about auditing the class? enrollment

A. We allow for S/U grading (per the university policy, students must receive a C- or higher to earn an "S" grade), but auditing the class is not allowed.

Rationale: We strongly believe that active, hands-on learning (from solving homeworks/assignments, in addition to listening to lectures) would be invaluable in gaining both in-depth and practical understanding of machine learning concepts.

Q. Are there other machine learning courses offered this semester? enrollment

A. See https://machinelearning.cis.cornell.edu/pages/courses.php for machine learning courses offered at Cornell (please note that the list may be outdated and/or some courses may not be offered this semester).

Here are a few CS 3780 equivalent courses that cover roughly the same core topics:

Below is a self-compiled, non-exhaustive list of other possible courses covering machine learning topics (broadly speaking) that are offered this semester:

Q. What happened to CS 4780? Or, why is it CS 3780 now? logistics

A. Nothing, really. We renamed CS 4780 to CS 3780 as part of the introduction of the AI Minor. The goal was to signal that this is one of the classes to take first as your entry into Machine Learning (ML), and that other classes can build on CS 4780—ah, we mean CS 3780—as a prerequisite.

The content of the class has not changed in a substantial way, except that we have made sure that ECE 3200 (previously ECE 4200), ORIE 3741 (previously ORIE 4741), STSCI 3740 (previously STSCI 4740) now all cover the same core content as CS 3780. So, these classes are now largely interchangeable, and you can take any one of them.

(Q/A adapted from the Fall 2024 run of the course—the first offering of the course as CS 3780.)


Course policies


Inclusiveness

You should expect and demand to be treated by your classmates and the course staff with respect. You belong here, and we are here to help you learn and enjoy this course. If any incident occurs that challenges this commitment to a supportive and inclusive environment, please let the instructors know so that the issue can be addressed. We are personally committed to this, and subscribe to the Computer Science Department's Values of Inclusion.

(This statement was reproduced with permission from Dan Grossman.)
Mental health resources

Cornell University provides a comprehensive set of mental health resources and the student group Body Positive Cornell has put together a flyer (for 2022-23; resources mentioned are still relevant) outlining the resources available.
Class participation

You are encouraged to actively participate in class. This can take the form of asking questions in class, responding to questions to the class, and actively asking/answering questions on Ed, etc.
Collaboration policy

Students are free to share code and ideas within their stated project/homework group for a given assignment, but should not discuss details about an assignment with individuals outside their group.

The prelim and final are individual assignments and must be completed individually (and not as a group).
Academic integrity

(In the context below, "you," refers to yourself when work is to be done individually, or to your group when work is to be done in a group. The statement below is reproduced with explicit permission from David Bindel.)

An assignment is an academic document, like a journal article. When you turn it in, you are claiming everything in it is your original work, unless you cite a source for it.

If you get an idea from a classmate, the instructors, a book or other published source, or elsewhere, please provide an appropriate citation. This is not only critical to maintaining academic integrity, but it is also an important way for you to give credit to those who have helped you out. When in doubt, cite!! Code or write-ups with appropriate citations will never be considered a violation of academic integrity in this class (though you will not receive credit for code or write-ups that were shared when you should have done them yourself).

For more information, please refer to the Cornell's Code of Academic Integrity.
NOTE: This course is participating in Accepting Responsibility (AR), which is a pilot supplement to the Cornell Code of Academic Integrity (AI). For details about the AR process and how it works with the AI Code, see the AR website.
Use of generative AI tools

Given the ubiquity of generative AI tools (such as OpenAI ChatGPT, GitHub Copilot, Meta Llama, Google Gemini, etc.), and this being a machine learning course, we permit the use of such tools only under the following conditions:

It will be considered a violation of the academic integrity to use generative AI tools to generate initial drafts, or to refine solutions without providing the necessary citations. That said, if you choose to use generative AI tools, we recommend SAI (use this link to join CS 3780 SAI), as it allows us, instructors, to upload course materials to SAI for more relevant generations.
Electronic device policy

Use of electronic devices such as laptops and tablets will not be permitted during class (with the exception of specific in-class activities and for note-taking purposes). We are not plain evil, we are just following extensive research on the negative effects of in-class laptop use on learning. That said, for your convenience, printed notes (i.e., physical copies) will be made available at the start of the lecture for you to follow along.

(The statement above was adapted from INFO/CS 4300 course syllabus, with the permission of Cristian Danescu-Niculescu-Mizil.)
SDS accommodations

In compliance with the Cornell University policy and equal access laws, we are available to discuss appropriate academic accommodations that may be required for student with disabilities. Requests for academic accommodations are to be made during the first three weeks of the semester, except for unusual circumstances, so arrangements can be made. Students are encouraged to register with Student Disability Services to verify their eligibility for appropriate accommodations.

If you need immediate accommodations, please speak to us, instructors, after the class or send an email to us: intro-ml-prof@cornell.edu and SDS: sds_cu@cornell.edu.

Version history (Or, what changed?)


Any major changes (i.e., changes that require your attention) made to the website are logged below with the corresponding dates.

01/21/2025
01/16/2025
  • Added/updated FAQs on enrollment policies and corequisites.
  • Updated Tushaar's OH time to Th 5-6pm (previously, Tu 4-5pm); no change in location.
  • Added a few alternate courses offered this semester (see this FAQ).
01/15/2025
  • Uploaded/updated the initial version of the website (and course policies).