CS 6241

Numerical Methods for Data Science


Location: Snee 1120
Lecture: TR 2:55-4:10
Discussion: Canvas
Course material: GitHub

Prof: David Bindel
Email: bindel@cornell.edu
OH: Mon 4:30-5:30, Thu 1:30-2:30, or by appt
Scheduler link

TA: Darian Nwankwo
Email: don4@cornell.edu
OH: Wed 1-2, Fri 2-3

News

2025-02-20: Project proposal prompt posted (due Mar 11).

2025-02-06: Reaction paper prompt posted (due Feb 25).

2025-01-30: We have moved rooms to Snee 1120 to accommodate more students.

2025-01-21: Welcome to CS 6241!

Older news »

Overview

In this class, we treat numerical methods underlying a variety of modern machine learning and data analysis techniques. The course consists of six units of roughly two weeks each:

  • Least squares and regression: direct and iterative linear and nonlinear least squares solvers; direct randomized approximations and preconditioning; Newton, Gauss-Newton, and IRLS methods for nonlinear problems; regularization; robust regression.
  • Matrix and tensor data decompositions: direct methods, iterations, and randomized approximations for SVD and related decomposition methods; nonlinear dimensionality reduction; non-negative matrix factorization; tensor decompositions.
  • Low-dimensional structure in function approximation: active subspace / sloppy model approaches to identifying the most relevant parameters in high-dimensional input spaces and model reduction approaches to identifying low-dimensional structure in high-dimensional output spaces.
  • Kernel interpolation and Gaussian processes: statistical and deterministic interpretations and error analysis for kernel interpolation; methods for dealing with ill-conditioned kernel systems; and methods for scalable inference and kernel hyper-parameter learning.
  • Numerical methods for graph data: implication of different graph structures for linear solvers; graph-based coordinate embedding methods; analysis methods based on matrix functions; computation of centrality measures; and spectral methods for graph partitioning and clustering.
  • Learning models of dynamics: system identification and auto-regressive model fitting; Koopman theory; dynamic mode decomposition.

See the syllabus for more information on course logistics.