cs3780 mascot

Course mascot: Berry, the llama
(The llama and photo are from the personal collection of TG.)

Instructors: Karthik Sridharan and Tushaar Gangavarapu

Contact: Ed (for most questions), intro-ml-prof@cornell.edu (for sensitive/discreet inquiries only)

Course staff office hours: QueueMeIn (instructions will be posted on Ed)

Instructor office hours:

Karthik Sridharan: Monday 11am-12pm (booking link), Gates 424

Tushaar Gangavarapu: Thursday 5-6pm, Ives 103

Lectures: Tuesday and Thursday, 1.25-2.40pm, Baker Lab 200

Course overview: The course provides an introduction to machine learning, focusing on supervised learning and its theoretical foundations. Topics include regularized linear models, boosting, kernels, deep networks, generative models, online learning, and ethical questions arising in ML applications.

Prerequisites (not corequisites): Probability theory (e.g., BTRY 3080, CS 2800, ECON 3130, ENGRD 2700, MATH 4710) and linear algebra (e.g., MATH 2210, MATH 2310, MATH 2940), single-variable calculus (e.g., MATH 1110, MATH 1920) and programming proficiency (e.g., CS 2110).

(Please see the FAQ on corequisites for more information.)

Course logistics: For enrolled students the companion Canvas page serves as a hub for access to Ed (the course forum), Vocareum (for course projects), Gradescope (for HWs), and paper comprehension quizzes. If you are enrolled in the course you should automatically have access to the site. Please let us know if you are unable to access it.


Homework, projects, and exams


Your grade in this course is comprised of three components: homework, exams, and projects. Please also read through the given references in concert with the lectures.

Homework

There will be a number of homework assignments throughout the course, typically made available roughly one to two weeks before the due date. The homework primarily focuses on theoretical aspects of the material and is intended to provide preparation for the exams. Homework may be completed in groups of up to four. The assignments themselves will be made available via Gradescope (through Canvas).

You are allowed two slip days per homework assignment.
Projects

To provide hands-on learning with the methods discussed in class there are a number of programming projects throughout the course. The projects may be completed individually or in a group of two. They are accessed, submitted, and graded using Vocareum.

You are allowed two slip days per project (same as for homework).
Paper comprehension (CS 5780)

Students enrolled in the graduate version of the course (i.e., in CS 5780) are required to read the assigned research paper(s) and complete the associated online quiz. Paper(s) will be assigned roughly once every two to three weeks. This is an individual component (not to be done in groups). The quizzes will be made available on Canvas.

You are allowed two slip days per project (same as for homework).
Exams

There will be two exams for this class (to be completed individually, not in a group), an evening prelim and a final exam. The location and time for the final is TBD; we will update the date below once the registrar finalizes it.

Grading

Final grades are based on homework assignments, programming projects, and the exams. For the 5780 level version of the course, the research comprehension quizzes will also factor in.

For CS 3780 your final grade consists of:

Exams: 48%
Homework: 15%
Projects: 37%
For CS 5780 your final grade consists of:

Exams: 45%
Homework: 10%
Projects: 35%
Paper comprehension: 10%

Undergraduates enrolled in CS 3780 may choose to do the paper comprehension assignments; if completed, you will receive the higher of your two grades between the above schemes.


Schedule


A tentative schedule is as follows, and includes the topics we will be covering, relevant reference material, and assignment information. It is quite possible the specific topics covered on a given day will change slightly. This is particularly true for the lectures in the latter part of the course, and this schedule will be updated as needed.

NOTE: The due dates listed are generally accurate but may be subject to change. Tentative dates are indicated using unfilled boxes. While the release and due dates are typically marked on Tuesdays or Thursdays, the actual dates might differ (e.g., a homework release marked for Thursday may instead be released on Friday, with the due date adjusted accordingly). Think of these as approximate timeframes for when assignments will be released or due. Make sure to check Canvas for the exact assignment due dates.

Date Topic References Assignments Notes, slides, etc.
1/21/25 Introduction PML: 1.1; ESL: Ch. 1; and PPA: Ch. 1 Placement exam out Lecture slides (pdf)
1/23/25 ML basics PML: 1.2, and ESL: 2.1 and 2.2 P(-1) out References: html; pdf
Handwritten notes
1/28/25 K-nearest neighbors and the curse of dimensionality PML: 16.1 Placement exam due References: html; pdf
Handwritten notes

5780: Cover and Hart, 1967
1/30/25 The perceptron Wikipedia article HW1 out

P0 out P1 out
References: html; pdf
Handwritten notes
2/4/25 Clustering: K-means and the mixture of Gaussians ESL: 14.3.6 and 14.3.7, and PML: 21.3 Quiz-1 (opt) out References: html
Demos: k-means, GMM
Lecture notes (draft)
Handwritten notes
2/6/25 Principal component analysis PML: 20.1, ESL: 14.5.1 and 14.5.2 P0 due

P2 out Quiz-2 out
References: html
Demos: GMM, PCA
Lecture notes (draft)
Handwritten notes

5780: DBSCAN (clustering)
2/11/25 MLE and MAP Nice YouTube video for MLE and MAP
Ben Taskar's lecture notes
Tom Mitchell's book chapter on MLE and MAP
ESL: 8.2.2-8.3
HW1 due

References: html; pdf
Handwritten notes
Annotated notes
2/13/25 Naive Bayes ESL: 6.6.3, and Tom Mitchell's book chapter P1 due Quiz-1 (opt) due

HW2 out P3 out
References: html; pdf
Handwritten notes
Annotated notes
2/18/25 February break No class
2/20/25 Logistic regression PML: 10.1 and 10.2 P2 due Quiz-2 due References: html; pdf
Handwritten Notes
Annotated Notes
2/25/25 Gradient descent, Newton's method PML: 8.1, 8.2, and 8.3 HW2 due (02/24 5pm) References: html
Demo: GD and Newton
Lecture notes (draft)
Handwritten notes

2/27/25 Momentum, adaptive gradients PML: 8.4 HW3 out Quiz-3 out References: html
Demo: GD variants
Lecture notes (draft)
Handwritten notes

5780: Spam email classification
3/4/25 Linear regression PML 11.1, 11.2,11.3 and ESL 3.2 P3 due

References: html
Lecture notes (draft)
Handwritten notes
3/6/25 Prelim review (jeopardy) HW3 due (03/07, 11.59pm)

HW4 out (no submission)
Prelim review (jeopardy)
3/11/25 Prelim open OH (Prelim day)
3/11/25 Prelim Prelim location: Bailey Hall 101
Prelim time: 7.30pm
3/13/25 Support vector machines Quiz-3 due

P4 out
References: html; pdf
Handwritten notes
3/18/25 The kernel trick PML: 17.1 Quiz-4 out References: html; pdf
Handwritten notes

5780: Adam optimizer
3/20/25 Kernel SVM PML: 17.3 HW5 out P5 out kernel html kernel pdf Handwritten notes
3/25/25 Model selection, empirical risk minimization ERM: PML 4.3, 5.4 P4 due
3/27/25 Bias-variance tradeoff HW6 out P6 out
4/1/25 Spring break (Dragon! Dragon! Dragon! Oi! Oi! Oi!)
Woohooo!!
4/3/25 Spring break Woohooo!!
4/8/25 Classification and regression trees HW5 due
4/10/25 Ensemble methods: Bagging and random forest P5 due Quiz-4 due
4/15/25 Ensemble methods: Boosting HW6 due

HW7 out P7 out
4/17/25 Neural networks Quiz-5 out Kaggle out
4/22/25 Neural networks (contd.) P6 due
4/24/25 Convolutional neural networks HW7 due

HW8 out P8 out
4/29/25 Neural networks: Transformers Transformer algorithm
Transformers explained
5/1/25 Generative AI, diffusion models, and sampling P7 due
5/6/25 AI in human society HW8 due P8 due

Quiz-5 due Kaggle due
5/10/25 Final Final location: TBD
Final time: 9am

References


While this course does not explicitly follow a specific textbook, there are several that are very useful references to supplement the course.

Books

We will not be explicitly following any single textbook in this course. Nevertheless, the book by Murphy roughly covers the material for the course and is recommended. Most suggested readings are assigned out of these two texts.

[PML] Probabilistic Machine Learning: An Introduction, by Murphy

We will provide section numbers to this text alongside many of the lectures. This text is available digitally through the Cornell University Library and a draft version is available directly from the author.
PML book website
[ESL] The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman

This text provides a comprehensive introduction to statistical learning and provides in-depth discussion of many of the topics in this course. The book is available directly from the authors.
ESL book website

Additional references

Three additional texts are provided that complement these texts and are useful for further study (or to gain another perspective).

Background references

Background in linear algebra, probability, and calculus, as well as some "mathematical maturity" is assumed for this course. If you feel you need a refresher, or would like to learn more about these topics, the following resources may be useful.

Other resources

This is a non-exhaustive (and an in-progress) list of additional resources (and information) that may be useful for the course.


FAQ


Q. I am currently on the waitlist [at position XX], do I need to do anything (e.g., email the instructors)? enrollment

A. No, you do not need to contact the instructors or the course staff. Given the class capacity, we expect most people to make it off the waitlist. However, even with space, you'll have to wait for CIS admins to add you to the course (which is usually done in batches).

For more information (the FAQ on position in the waitlist might be of specific interest), the following links: can assist you. If you have any more questions, please email courses@cis.cornell.edu.

That said, even if you are not enrolled, we strongly recommend you attend the first lectures (to note the specifics of the placement exam), so long as there are physical seats available.

Q. I am having trouble joining the waitlist, what should I do? enrollment

A. If you are having trouble joining the waitlist, please double check that you have carefully followed all the instructions noted in If you are still having trouble, please email courses@cis.cornell.edu. Unfortunately, the course instructors or staff won't be able to help you with this.

As noted in the previous question, even if you are not enrolled, we strongly recommend you attend the first lectures—we will detail the specifics of the placement exam (and how you can access it, irrespective of your enrollment status) in the class.

Q. Prerequisite or corequisite: I'm currently enrolled in a course listed as a prerequisite, am I still allowed to enroll? enrollment

A. We strongly recommend that you have completed the prerequisite courses prior to taking CS 3780.

However, if you are currently enrolled in a prerequisite course, we will use the placement exam scheduled during the first week of class to assess your comfort level with the necessary background knowledge for CS 3780.

Q. Can I take the class S/U? enrollment

A. We allow for S/U grading. From our perspective, all students are treated as though they are taking the class for a letter grade; we will then convert the letter grade to an "S/U" grade at the end of the semester using the conversion: C- or higher = S, D+ or lower = U.

(Please be sure to review the collaboration policy noted below.)

Q. What about auditing the class? enrollment

A. We do not officially (via enrollment on the Student Center) allow for auditing the class. That said, just sitting in on lectures is fine, so long as there is physical space in the room. To conserve resources, we will not be able to provide access to the course materials (e.g., Canvas, Ed, etc.).

Rationale: We strongly believe that active, hands-on learning (from solving homeworks/assignments, in addition to listening to lectures) would be invaluable in gaining both in-depth and practical understanding of machine learning concepts.

Q. Are there other machine learning courses offered this semester? enrollment

A. See https://machinelearning.cis.cornell.edu/pages/courses.php for machine learning courses offered at Cornell (please note that the list may be outdated and/or some courses may not be offered this semester).

Here are a few CS 3780 equivalent courses that cover roughly the same core topics:

Below is a self-compiled, non-exhaustive list of other possible courses covering machine learning topics (broadly speaking) that are offered this semester:

Q. What happened to CS 4780? Or, why is it CS 3780 now? logistics

A. Nothing, really. We renamed CS 4780 to CS 3780 as part of the introduction of the AI Minor. The goal was to signal that this is one of the classes to take first as your entry into Machine Learning (ML), and that other classes can build on CS 4780—ah, we mean CS 3780—as a prerequisite.

The content of the class has not changed in a substantial way, except that we have made sure that ECE 3200 (previously ECE 4200), ORIE 3741 (previously ORIE 4741), STSCI 3740 (previously STSCI 4740) now all cover the same core content as CS 3780. So, these classes are now largely interchangeable, and you can take any one of them.

(Q/A adapted from the Fall 2024 run of the course—the first offering of the course as CS 3780.)


Course policies


Inclusiveness

You should expect and demand to be treated by your classmates and the course staff with respect. You belong here, and we are here to help you learn and enjoy this course. If any incident occurs that challenges this commitment to a supportive and inclusive environment, please let the instructors know so that the issue can be addressed. We are personally committed to this, and subscribe to the Computer Science Department's Values of Inclusion.

(This statement was reproduced with permission from Dan Grossman.)
Mental health resources

Cornell University provides a comprehensive set of mental health resources and the student group Body Positive Cornell has put together a flyer (for 2022-23; resources mentioned are still relevant) outlining the resources available.
Class participation

You are encouraged to actively participate in class. This can take the form of asking questions in class, responding to questions to the class, and actively asking/answering questions on Ed, etc.
Collaboration policy

Students are free to share code and ideas within their stated project/homework group for a given assignment, but should not discuss details about an assignment with individuals outside their group. If you are taking the course for S/U grading, please be sure to manage your expectations with your assignment partners about your intended level of commitment.

The prelim and final are individual assignments and must be completed individually (and not as a group).
Academic integrity

(In the context below, "you," refers to yourself when work is to be done individually, or to your group when work is to be done in a group. The statement below is reproduced with explicit permission from David Bindel.)

An assignment is an academic document, like a journal article. When you turn it in, you are claiming everything in it is your original work, unless you cite a source for it.

If you get an idea from a classmate, the instructors, a book or other published source, or elsewhere, please provide an appropriate citation. This is not only critical to maintaining academic integrity, but it is also an important way for you to give credit to those who have helped you out. When in doubt, cite!! Code or write-ups with appropriate citations will never be considered a violation of academic integrity in this class (though you will not receive credit for code or write-ups that were shared when you should have done them yourself).

For more information, please refer to the Cornell's Code of Academic Integrity.
Note: This course is participating in Accepting Responsibility (AR), which is a pilot supplement to the Cornell Code of Academic Integrity (AI). For details about the AR process and how it works with the AI Code, see the AR website.
Use of generative AI tools

Given the ubiquity of generative AI tools (such as OpenAI ChatGPT, GitHub Copilot, Meta Llama, Google Gemini, etc.), and this being a machine learning course, we permit the use of such tools only under the following conditions:

It will be considered a violation of the academic integrity to use generative AI tools to generate initial drafts, or to refine solutions without providing the necessary citations. That said, if you choose to use generative AI tools, we recommend SAI (use this link to join CS 3780 SAI), as it allows us, instructors, to upload course materials to SAI for more relevant generations.
Grading and regrade requests

There is no preset forced distribution of grades in this course—we would be thrilled if everyone scored an A+, and we believe you're all capable of doing so.

For each homework or exam that is graded, we will make preliminary score-to-letter-grade landmarks (NOT cutoffs) available to you. These landmarks are homework- or exam-specific and are determined based on the difficulty of the homework or exam.

Students are not in competition with each other, i.e., your grades are not determined by how well others do. As such, we will not report means, medians, or other statistics about the class performance. As Lillian Lee once said, "[r]eporting median is guaranteed to make at least half the class feel bad," even if everyone did well.
Regrade requests: Given the large class size, only those regrade requests made via Gradescope will be considered; this facilitates a centralized record of all regrade requests. We reserve the right to disregard regrade requests made via other channels (e.g., Ed, email, office hours, etc.).

We want grades to accurately represent your understanding of the course material. If you believe that a grading error was made, please submit a regrade request on Gradescope. Be sure to explain clearly why you believe your answer deserves more credit. That said, we explicitly note that if we notice that we awarded you more points than you should have been, we are honor-bound to rectify such scores to the correct values.
Electronic device policy

Use of electronic devices such as laptops and tablets will not be permitted during class (with the exception of specific in-class activities and for note-taking purposes). We are not plain evil, we are just following extensive research on the negative effects of in-class laptop use on learning. That said, for your convenience, printed notes (i.e., physical copies) will be made available at the start of the lecture for you to follow along.

(The statement above was adapted from INFO/CS 4300 course syllabus, with the permission of Cristian Danescu-Niculescu-Mizil.)
SDS accommodations

In compliance with the Cornell University policy and equal access laws, we are available to discuss appropriate academic accommodations that may be required for student with disabilities. Requests for academic accommodations are to be made during the first three weeks of the semester, except for unusual circumstances, so arrangements can be made. Students are encouraged to register with Student Disability Services to verify their eligibility for appropriate accommodations.

If you need immediate accommodations, please speak to us, instructors, after the class or send an email to us: intro-ml-prof@cornell.edu and SDS: sds_cu@cornell.edu.

Version history (Or, what changed?)


Any major changes (i.e., changes that require your attention) made to the website are logged below with the corresponding dates.

01/21/2025
01/21/2025
01/16/2025
  • Added/updated FAQs on enrollment policies and corequisites.
  • Updated Tushaar's OH time to Th 5-6pm (previously, Tu 4-5pm); no change in location.
  • Added a few alternate courses offered this semester (see this FAQ).
01/15/2025
  • Uploaded/updated the initial version of the website (and course policies).