General Information | Course Description | Textbooks | Grading | Ed Discussions | Homework | Course Conduct | Accommodations
General Information
Instructor: Bobby Kleinberg, Gates Hall 317, emailOffice Hours: Tuesdays, 3-4 and Thursday, 10-11.
Lectures: MW 1:25–2:40pm, Phillips Hall 101, map
Canvas (https://canvas.cornell.edu/) will be used for homework distribution and submission, and it contains links to Ed Discussions (for Q&A and announcements) and Gradescope (for grades and regrades).
Course Description
Probability and high-dimensional geometry have become valuable tools in the analysis of algorithms. This course will explore the mathematics that lies behind designing and analyzing randomized algorithms and algorithms for high-dimensional, often random, data. Topics to be covered include randomized algorithms and data structures for hashing, data sketching, and data stream processing; random walks and Markov Chain Monte Carlo algorithms; random graphs; dimensionality reduction for high-dimensional data; and algorithms for detecting sparse or low-rank structures in data.
Textbooks
The course has no required textbook. The assigned readings will be based on typeset lecture notes distributed by the instructor on Canvas and on the Lectures tab of this website. The following are some textbooks whose subject matter overlaps significantly with this course.- A. Blum, J. Hopcroft, and R. Kannan. Foundations of Data Science
- M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis.
- R. Motwani and P. Raghavan. Randomized Algorithm.
- J. Kleinberg and É. Tardos. Algorithm Design.
Prerequisites
The prerequisites for the course are CS 2800 and a linear algebra course at the level of MATH 2210/2940 or above. CS 4820 is highly recommended but not required. A prior course in probability (such as ENGRD 2700, MATH 4710, ECE 3100, BTRY 3080, ECON 3130) is also recommended but not required.
Homework
There will be homework approximately every two weeks. Homework will typically be released on Tuesdays on Canvas and will be due the following week. The homework will always contain (proof-based) theory problems, and it will sometimes also contain coding problems. Interactive Python notebooks such as Jupyter or Colab will generally be a good way to solve the coding problems. (See the Resources tab of this website.) Some of the homework problems will be quite challenging; it's fine if you don't see how to solve every problem on the homework.
Quizzes
There will be weekly quizzes at the end of each Wednesday lecture, excluding the first day of class. These will be very short closed-book, closed-notes tests with short answer questions. The questions will typically be tests of knowledge, for example asking you to write down a definition or a basic formula. The four lowest quiz grades will be dropped, so it is not necessary to schedule a make-up if you happen to miss a quiz.
Final Exam
There will be a final exam during exam week. The exact date will be announced on this website when it has been scheduled.
5850 Project
Students in CS 5850 will additionally need to do a final project, which will involve writing and evaluating algorithms for data analysis, simulation, or optimization.
Grading
For CS 4850, your grade will be based on six homework assignments, a set of in-class quizzes, one final exam, and completion of a course evaluation. For CS 5850, your grade will be based on all of the above, plus a final project.
These components factor into your grade with the following weights.Component | 4850 range | 5850 range |
---|---|---|
Homework | 42% | 36% |
Quizzes | 36% | 27% |
Final Exam | 20% | 20% |
Final Project | 0% | 15% |
Completion of course evaluation: | 2% | 2% |
Ed Discussions
We will be using Ed Discussions as an online discussion forum. Ed Discussions allows for open discussions of all course-related questions. You are encouraged to post any questions you might have about the course material. The course staff monitor Ed Discussions closely and you will usually get a quick response. If you know the answer to a question, you are encouraged to post it.
By default, your posts are visible to the course staff and other students, and you should prefer this mode so that others can benefit from your question and the answer. However, you can post privately so that only the course staff can see your question, and you should do so if your post might reveal information about a solution to a homework problem. You can also post anonymously if you wish. If you post privately, we reserve the right to make your question public if we think the class will benefit.
Ed Discussions is the most effective way to communicate with course staff. Please avoid email if Ed Discussions will do. Broadcast messages from the course staff to students will be sent using Ed Discussions and all course announcements will be posted there and pinned, so check the pinned announcements often!
Typesetting
We will require problem sets to be typeset and submitted as a PDF. This requirement is for everyone's benefit. In general, we recommend that you first develop your solutions in draft form, and then write or type your solution in a concise way. Typesetting not only makes the last step essential (instead of handing in solution in draft form), it also makes it much easier for you to edit and improve your writeup, as well as easier for your TAs to read your proofs. It is up to you which tool you use; though we recommend LaTeX, tools like the Equation Editor in Microsoft Word can be surprisingly effective as an alternative. See typesetting resources for a list of typesetting software and references.
For some proofs or writeups, it may be helpful to use a figure to explain your thinking more concisely. This is encouraged! Again, it is up to you how you want to include that in your writeup, whether it is a picture of a drawing in your notebook that you took with your phone or something you made digitally, as long as the figure was produced by you personally and is clear enough to see, it's a great idea to include it.
Collaboration
In the real world of algorithms research, collaboration and conversation is an important part of how ideas get generated. So too in this course; we encourage you to discuss the homework with your instructors and peers. However, your homework solutions must be written up completely on your own. For problems that involve writing code, you may collaborate with peers to work out a high-level plan for solving the problem, but every line of your program should be written by you. For problems that involve writing paragraphs of text, you may collaborate with peers to figure out how to solve the problem, but every word of your solution should be written by you. You are not allowed to share digital or written notes or images of your work in any form with each other. Just like in research, your work must also include acknowledgements of all students with whom you collaborated. Both the physical or digital distribution of information about solutions and the failure to acknowledge collaborators are serious violations of academic integrity.
Admissible Resources
This semester's 4850/5850 lectures and handouts constitute “admissible resources”, meaning that you may use any of this material in your homework and quiz solutions without rederiving that material. Any other source is considered an external source. You're free to use external sources to generate ideas for solving the homework problems in this class, but you must cite your external sources. Furthermore, your homework solutions are not allowed to depend on any material beyond the admissible resources. Hence, any claims obtained from external sources must be fully substantiated (i.e., rederived) in the solution you turn in.
Advice for Success
Assignments in this class can often require creative insights and complex proofs beyond what previous courses have required. Here are a few tips for succeeding in your writeups:- Start your assignments early. Even if you aren't writing anything down yet, looking over the problem set well in advance of the due date can ensure you have enough time to brainstorm possible solutions and to clear up confusion about how to interpret a problem. Creativity doesn't work well on a deadline.
- Work with your peers to come up with ideas. As previously stated, while you cannot share physical or digital solutions of any kind to these problems (except those that involve writing a piece of code), we actively encourage you team up with classmates and work together on the problems.
- Visit office hours. One of the best ways to get help with the homework is to bring questions to office hours. Homework help in office hours is most effective when the student has already made a real effort to solve the problem being discussed and is prepared to describe what they tried and where they're getting stuck.
- Ask questions in class and on Ed Discussions. The material in this class moves quickly and is often cumulative. If you find yourself scratching your head after a lecture, even after consulting the textbook and course notes, you're certainly not alone, and it's better to seek help then than to wait until you are more confused.
Course Material Copyright
Course materials posted on this website, Ed Discussions, or CMS, are intellectual property belonging to the author. Students are not permitted to post course materials on the web, share them on any online platform, buy, sell, or distribute them outside of Cornell without the express permission of the instructor. Such unauthorized behavior constitutes academic misconduct.
Course Conduct
We understand that our members represent a rich variety of backgrounds and perspectives. Cornell University is committed to providing an atmosphere for learning that respects diversity. We expect students to communicate in a respectful manner with the instructors, course staff, and fellow students, in a way the honors the unique experiences, values, and beliefs represented by different members of our community.
Academic Integrity
Any violation of academic integrity will be severely penalized. You are allowed to collaborate on the homework to the extent of formulating ideas as a group. However, you are expected to write up (and understand) the homework on your own, and you should acknowledge the names of the students with whom you collaborated.
From Cornell's code of academic integrity:
Absolute integrity is expected of every Cornell student in all academic undertakings. Integrity entails a firm adherence to a set of values, and the values most essential to an academic community are grounded on the concept of honesty with respect to the intellectual efforts of oneself and others. Academic integrity is expected not only in formal coursework situations, but in all University relationships and interactions connected to the educational process, including the use of University resources. […]
A Cornell student's submission of work for academic credit indicates that the work is the student's own. All outside assistance should be acknowledged, and the student's academic position truthfully reported at all times. In addition, Cornell students have a right to expect academic integrity from each of their peers.