Numerical Methods for Data Science

Author

David Bindel

Preface

This is an incomplete draft of a text in progress on Numerical Methods for Data Science that I am working on during my sabbatical in Spring 2024. It is being written using Quarto as a typesetting system and Julia as the programming language for most of the computational examples. This project began its life as a text to accompany Cornell CS 6241: Numerical Methods for Data Science, a graduate course at Cornell designed for a target audience of beginning graduate students with a firm foundation in linear algebra, probability and statistics, and multivariable calculus, along with some background in numerical analysis. The focus is on numerical methods, with an eye to how thoughtful design of numerical methods can help us solve problems of data science.

Over several iterations of the course, I have seen students from a variety of backgrounds. They often have some grounding in computational statistics, machine learning, or data analysis in other disciplines, which is helpful. But the majority have not had even a semester introductory survey in numerical analysis, let alone a deeper dive. It also became clear that the list of topics was overly ambitious for a one-semester course, even for students with a strong numerical analysis background.

Hence, the new goal of this project is a textbook with three parts. The first piece is an expanded version of a “background notes” handout that I have developed over several years of teaching. This material might be covered in previous courses in computing and mathematics, but many students can use a refresher of some of the details. The second part is what I think of as “Numerical Methods applied to Data Science.” This corresponds roughly to methods covered in an undergraduate survey course covering standard topics (close to what I usually cover in Cornell CS 4220, though with some changes), with an attempt to provide examples and problems that are identifiably connected to data science. The third part is “Numerical Methods for Data Science,” and covers more advanced topics, and is appropriate for a graduate course. Both the second and the third parts are a little longer than what I can comfortably cover in a single semester, allowing the reader (or the instructor) some flexibility to pick and choose topics.