CS 6241: Numerical Methods for Data Science
Cornell University, Spring 2020.
Lectures: 2:55 PM–4:10 PM, Tuesdays and Thursdays,203 Phillips Hall virtual on zoom.
Instructor: Austin Benson, Assistant Professor, Computer Science (arb@cs.cornell.edu).
Office hours: 4:15 PM–5:15 Thursdays (after class) or by appointment,413B Gates virtual on zoom.
TA: Kangbo Li, PhD student, Computer Science (kl935@cornell.edu).
Office hours: 2:30–3:30pm Mondays,405 Rhodes virtual on zoom.
Lectures: 2:55 PM–4:10 PM, Tuesdays and Thursdays,
Instructor: Austin Benson, Assistant Professor, Computer Science (arb@cs.cornell.edu).
Office hours: 4:15 PM–5:15 Thursdays (after class) or by appointment,
TA: Kangbo Li, PhD student, Computer Science (kl935@cornell.edu).
Office hours: 2:30–3:30pm Mondays,
Resources
Course administration:
- CMS for managing course work.
- Jupyter notebooks and data for class demos.
- CS 6241 Spring 2019 web page.
- David Bindel's 2019 SJTU summer short course on numerical methods for data science.
- Numerical Linear Algebra. L. N. Trefethen and D. Bau III.
Coursework schedule
- Homework 1. Due Th Feb 13 at 11:59pm ET.
- Homework 2. Due Th Mar 5 at 11:59pm ET.
- Reaction paper. Due Th Apr 9 at 11:59pm ET.
- Project proposal. Due Th Apr 23 at 11:59pm ET.
- Project progress report. Due Th May 7 at 11:59pm ET.
- Project final report. Due Th May 21 at 11:59pm ET.
Class schedule
This schedule is tentative and subject to change.Week 1
Lecture 1 (Tu 1/21): overview and backgroundTopics: course overview, vector and matrix norms, linear systems, eigenvalues, basic matrix factorizations, basic optimization
Readings:
- Background notes from David Bindel's 2019 SJTU summer course.
- Numerical Optimization (Chapters 1 and 2).
Topics: solving linear least squares with matrix factorizations, statistical interpretations
Readings:
- The Elements of Statistical Learning (sections 2.9, 3.1, and 3.2).
- Numerical Optimization (Chapters 1 and 2).
- Numerical Linear Algebra (Sections I and II).
- D. Bindel's lecture notes on linear least squares.
Week 2
Lecture 3 (Tu 1/28): regularized linear least squaresTopics: Tikhonov regularization / ridge regression, Lasso, pivoted QR
Readings:
- Applied Numerical Linear Algebra (section 3.5).
- Rank Revealing QR Factorizations. T. F. Chan. LAA, 1987.
- The Elements of Statistical Learning (section 3.4).
- D. Bindel's lecture notes on regularized linear least squares.
Topics: (stochastic) gradient descent, (quasi-)Newton
Readings:
- Numerical Optimization (sections 3.1–3.3)
- Optimization Methods for Large-Scale Machine Learning. L. Bottou, F. E. Curtis, and J. Nocedal. SIREV, 2018.
- D. Bindel's lecture notes on regularized linear least squares.
- D. Bindel's lecture notes on optimization.
Week 3
Lecture 5 (Tu 2/4): iterative optimization and start of latent factor models Topics: gradient descent with errors, more SGD, momentum, acceleration, basic ideas for latent factor modelsReadings:
- D. Bindel's lecture notes on regularized linear least squares.
- D. Bindel's lecture notes on optimization.
- D. Bindel's lecture notes on latent factor models.
- Optimization Methods for Large-Scale Machine Learning. L. Bottou, F. E. Curtis, and J. Nocedal. SIREV, 2018.
- A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights. W. Su, S. Boyd, and E. J. Candès. JMLR, 2016.
- The Advanced Matrix Factorization Jungle. I. Carron.
Readings:
- C. Shalizi's lecture notes on PCA.
- D. Bindel's lecture notes on collaborative filtering and other stories.
- Robust Principal Component Analysis?. E. J. Candès et al. JACM, 2011.
- Proximal Algorithms. N. Parikh and S. Boyd. Foundations and Trends in Optimization, 2014.
- S. Boyd and L. Vandenberghe's lecture notes on subgradients.
- S. Boyd's lecture notes on ADMM.
Week 4 [HW1 due Th 2/13 at 11:59pm ET]
Lecture 7 (Tu 2/11): latent factor models, dimensionality reduction, and matrix factorizationsTopics: SVD-based latent factors and matrix completion
Readings:
- D. Bindel's lecture notes on collaborative filtering and other stories.
- Netflix Update: Try This at Home. S. Funk, 2006.
- Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. Y. Koren. KDD, 2008.
- GloVe: Global Vectors for Word Representation. Pennington et al. EMNLP, 2014.
- Breaking the Softmax Bottleneck: A High-Rank RNN Language Model. Yang et al. ICLR, 2018.
- Exact Matrix Completion via Convex Optimization. E. J. Candès and B. Recht. FOCM, 2009.
Topics: nonnegative matrix factorization (NMF)
Readings:
- D. Bindel's lecture notes on non-negative matrix factorization.
- The Why and How of Nonnegative Matrix Factorization. N. Gillis. arXiv, 2014.
- Learning the parts of objects by non-negative matrix factorization. D. D. Lee and H. S. Seung. Nature, 1999.
- Algorithms for Non-negative Matrix Factorization. D. D. Lee and H. S. Seung. NeurIPS, 2001.
- When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? D. Donoho and V. Stodden. NeurIPS, 2003.
- On the complexity of nonnegative matrix factorization. S. A. Vavavis. SIOPT, 2009.
- Computing a nonnegative matrix factorization—provably. S. Arora, R. Ge, R. Kannan, and A. Moitra. SICOMP, 2016.
Week 5
Lecture 9 (Tu 2/18): interpretable latent factor models and start of tensorsTopics: interpolative decomposition, CUR, basics on tensors, problems with border rank
Readings:
- D. Bindel's lecture notes on SVD and other low rank decompositions.
- Randomized algorithms for the low-rank approximation of matrices. E. Liberty et al. PNAS, 2007.
- CUR matrix decompositions for improved data analysis. M. W. Mahoney and P. Drineas. PNAS, 2009.
- Tensor Rank and the Ill-Posedness of the Best Low-Rank Approximation Problem. V. de Silva and L.-H. Lim. SIMAX, 2008.
Readings:
- Most Tensor Problems Are NP-Hard. C. J. Hillar, L.-H. Lim. JACM, 2013.
- A Counterexample to the Possibility of an Extension of the Eckart–Young Low-Rank Approximation Theorem for the Orthogonal Rank Tensor Decomposition. T. G. Kolda. SIMAX, 2003.
- Tensor Decompositions and Applications. T. G. Kolda and B. W. Bader. SIREV, 2009.
- On the Best Rank-1 and Rank-(R1, R2,...,RN) Approximation of Higher-Order Tensors. L. De Lathauwer, B. De Moor, and J. Vandewalle. SIMAX, 2000.
- Unsupervised Discovery of Demixed, Low-dimensional Neural Dynamics across Multiple Timescales through Tensor Components Analysis. A. H. Williams et al. Neuron, 2018.
Week 6
No lecture Tu 2/25 (February break)Lecture 11 (Th 2/27): Nonlinear dimensionality reduction
Topics: ISOMAP, LLE, t-SNE
Readings:
- A Global Geometric Framework for Nonlinear Dimensionality Reduction. J. B. Tenenbaum, V. de Silva, and J. C. Langford. Science, 2000.
- Nonlinear Dimensionality Reduction by Locally Linear Embedding. S. T. Roweis and L. K. Saul. Science, 2000.
- Stochastic Neighbor Embedding. G. Hinton and S. Roweis. NeurIPS, 2002.
- Visualizing Data using t-SNE. L. van der Maaten and G. Hinton. JMLR, 2008.
- Accelerating t-SNE using Tree-Based Algorithms. L. van der Maaten. JMLR, 2014.
- How to Use t-SNE Effectively. M. Wattenberg, F. Viégas, and I. Johnson. Distill, 2016.
Week 7 [HW2 due Th 3/5 at 11:59pm ET]
Lecture 12 (Th 3/3): Basic network analysisTopics: matrices associated with graphs, common network properties
Readings:
- The Structure and Function of Complex Networks. M. E. J. Newman. SIREV, 2003.
- Mining Large Graphs. D. F. Gleich and M. W. Mahoney. Handbook of Big Data, Handbooks of modern statistical methods, 2016.
- D. Spielman's spectral graph theory course.
- The Anatomy of the Facebook Social Graph. J. Ugander et al. arXiv, 2011.
- Collective dynamics of ‘small-world’ networks. D. J. Watts and S. H. Strogatz. Nature, 1998.
- An Experimental Study of the Small World Problem. J. Travers and S. Milgram. Sociometry, 1969.
Topics: common network properties, random graph models
Readings:
- The Structure and Function of Complex Networks. M. E. J. Newman. SIREV, 2003.
- The Anatomy of the Facebook Social Graph. J. Ugander et al. arXiv, 2011.
- The Diameter of Sparse Random Graphs F. Chung and L. Lu. Advances in Applied Mathematics, 2001.
- Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations J. Leskovec, J. Kleinberg, and C. Faloutsos. KDD, 2005.
- Configuring Random Graph Models with Fixed Degree Sequences. B. K. Fosdick et al. SIREV, 2018.
- Stochastic blockmodels and community structure in networks. B. Karrer and M. E. J. Newman. PRE, 2011.
- Community Detection and Stochastic Block Models: Recent Developments. E. Abbe. JMLR, 2017.
- A Brief History of Generative Models for Power Law and Lognormal Distributions. M. Mitzenmacher. Internet Mathematics, 2004.
- Power-Law Distributions in Empirical Data. A. Clauset, C. R. Shalizi, and M. E. J. Newman. SIREV, 2009.
Week 8
Lecture 14 (Th 3/10): Unsupervised learning on graphsTopics: spectral methods for bipartitioning, ratio cut, normalized cut
Readings:
- A tutorial on spectral clustering. U. von Luxburg. Statistics and Computing, 2007.
- Communities in Networks. M. A. Porter, J.-P. Onnela, and P. J. Mucha. Notices of the AMS, 2009.
- Community detection in networks: A user guide. S. Fortunato and D. Hric. Physics Reports, 2016.
Topics: conductance, Cheeger's inequality, k-way spectral clustering
Readings:
- A tutorial on spectral clustering. U. von Luxburg. Statistics and Computing, 2007.
- D. A. Spielman's lecture notes on Conductance, the Normalized Laplacian, and Cheeger’s Inequality.
- Trace optimization and eigenproblems in dimension reduction methods. E. Kokiopoulou, J. Chen, and Y. Saad. Numer. Linear Algebra Appl., 2011.
Week 9
No lecture Tu 3/17 or Th 3/19 (Cornell classes suspended due to covid-19)Week 10
No lecture Tu 3/24 or Th 3/26 (Cornell classes suspended due to covid-19)Week 11
No lecture Tu 3/31 or Th 4/2 (spring break)Week 12 [Reaction paper due Th 4/9 at 11:59pm ET]
Lecture 16 (Tu 4/7): Graph-based semi-supervised learning [VIRTUAL SYNCHRONOUS LECTURE] Topics: classical graph-based semi-supervised techniques with connections to random walks and spectral clusteringReadings:
- Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. X. Zhu, Z. Ghahraman, and J. Lafferty. ICML, 2003.
- Learning with Local and Global Consistency. D. Zhou et al. NeurIPS, 2004.
- Empirical stationary correlations for semi-supervised learning on graphs. Y. Xu, J. S. Dyer, and A. B. Owen. Ann. Appl. Stat., 2010.
- Using Local Spectral Methods to Robustify Graph-Based Learning Algorithms. D. F. Gleich and M. W. Mahoney. KDD, 2015.
Readings:
- Latent Space Approaches to Social Network Analysis. P. D. Hoff, A. E. Raftery, and M. S. Handcock. JASA, 2002.
- RolX: Structural Role Extraction & Mining in Large Graphs. K. Henderson et al. KDD, 2012.
- DeepWalk: Online Learning of Social Representations. B. Perozzi, R. Al-Rfou, and S. Skiena. KDD, 2014.
- node2vec: Scalable Feature Learning for Networks. A. Grover and J. Leskovec. KDD, 2016.
- Representation Learning on Graphs: Methods and Applications. W. L. Hamilton, R. Ying, and J. Leskovec. Bull. of the IEEE Comp. Soc. TCDE, 2017.
- Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec. J. Qiu et al. WSDM, 2018.
Week 13
Lecture 18 (Th 4/14): Graph neural networks [VIRTUAL SYNCHRONOUS LECTURE] Topics: high-level ideas, basic derivation of graph convolutional networksReadings:
- C. Shalizi's lecture notes on Logistic Regression.
- Learning with Local and Global Consistency. D. Zhou et al. NeurIPS, 2004.
- Representation Learning on Graphs: Methods and Applications. W. L. Hamilton, R. Ying, and J. Leskovec. Bull. of the IEEE Comp. Soc. TCDE, 2017.
- Semi-Supervised Classification with Graph Convolutional Networks. T. N. Kipf and M. Welling. ICLR, 2017.
Readings:
- Outcome Correlation in Graph Neural Network Regression. J. Jia and A. R. Benson. arXiv, 2020.
- Inductive Representation Learning on Large Graphs. W. L. Hamilton, R. Ying, and J. Leskovec. NeurIPS, 2017.
- Spectral networks and locally connected networks on graphs. J. Bruna et al. ICLR, 2014.
Week 14 [Project proposal due Th 4/23 at 11:59pm ET]
"Lecture" 20 (Tu 4/21): small subgraph patterns [READ ON YOUR OWN] Topics: network motifs and network structureReadings:
- Network Motifs: Simple Building Blocks of Complex Networks. R. Milo et al. Science, 2002.
- Higher-order organization of complex networks. A. R. Benson, D. F. Gleich, and J. Leskovec. Science, 2016.
- Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large Graph Collections. J. Ugander, L. Backstrom, and J. Kleinberg. WWW, 2013. [See also the companion page.]
Readings:
- Arboricity and Subgraph Listing Algorithms. N. Chiba and T. Nishizeki. SICOMP, 1985.
- Main-memory triangle computations for very large (sparse (power-law)) graphs. M. Latapy. Theoretical Computer Science, 2008.
- Triadic Measures on Graphs: The Power of Wedge Sampling. C. Seshadhri, A. Pinar, and T. G. Kolda. SDM, 2013.
- ESCAPE: Efficiently Counting All 5-Vertex Subgraphs. A. Pinar, C. Seshadhri, and V. Vishal. WWW, 2017.
- The Power of Pivoting for Exact Clique Counting. S. Jain and C. Seshadhri. WSDM, 2020.
- Retrieving Top Weighted Triangles in Graphs. R. Kumar et al. WSDM, 2020.
Week 15
Lecture 22 (Tu 4/28): kernels [VIRTUAL SYNCHRONOUS LECTURE]Topics: feature maps, kernel trick, function approximation, RKHSs
Readings:
- D. Bindel's lecture notes on Many interpretations of kernels.
- The Elements of Statistical Learning (section 14.5.4).
- scikit-learn kernel PCA example.
- Kernel Methods in Machine Learning. T. Hofmann, B. Schölkopf, and A. J. Smola. Ann. Stat., 2008.
Topics: RKHSs, native spaces, Moore–Aronszajn, Gaussian Processes, GP likelihood and hyperparameter optimization
Readings:
- D. Bindel's lecture notes on Many interpretations of kernels.
- D. Bindel's lecture notes on Approaches to kernel selection.
- D. Bindel's lecture notes on Computing with GPs.
- Gaussian Processes for Machine Learning. C. E. Rasmussen and C. K. I. Williams. MIT Press, 2006.
Week 16 [Progres report due Th 5/7 at 11:59pm ET]
"Lecture" 24 (Tu 5/5): Gaussian Processes [READ ON YOUR OWN] Topics: fast computation for hyperparameter optimizationReadings:
- Scalable Log Determinants for Gaussian Process Kernel Learning. K. Dong et al. NeurIPS, 2017.
Readings:
- scikit-learn Kernels for Gaussian Processes.
- Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP). A. G. Wilson and H. Nickisch. ICML, 2015.
- Scaling Gaussian Process Regression with Derivatives. D. Eriksson et al. NeurIPS, 2018.
- Exact Gaussian Processes on a Million Data Points. K. A. Wang et al. NeurIPS, 2019.
- Fast estimation of tr(f(A)) via stochastic Lanczos quadrature. S. Ubaru, J. Chen, and Y. Saad. SIMAX, 2017.
- D. Bindel's lecture notes on Computing with GPs.
Week 17
Project feedback sessions (Mon 5/11 and Tu 5/12)Week 18 [Final project report due Th 5/21 at 11:59pm ET]
Coursework
Coursework will be managed through and assignments submitted on CMS.
The required coursework consists of three components:
- Homework (20%)
There are two homeworks that will have a theoretical component and/or an implementation with data analysis component. - Reaction paper (20%)
The second component of the course is composing a (roughly) 5-page reaction paper that summarizes and critiques at least two closely related published papers relevant to the class. The motivation of this assignment is to get everyone thinking about research issues related to the class material and to stimulate ideas for the course project (described below). You can work individually or in groups of two or three. Your group members can be the same as your project partners, but they do not have to be. There may be a group presentation involved. Additional details will be posted soon. - Course project (60%)
Most of your grade will be determined by a course project, where the topic is of your choosing but must be related to the class. You can work individually or in groups of two or three (the scope of the project should be scaled according to the group size). We will be looking for three key pieces in the course project: (i) some theoretical or mathematical discussion of a numerical method or algorithm (broadly construed); (ii) some software implementation of a method or algorithm; and (iii) some analysis of a real-world dataset. The project will be split into three components: a proposal, a progress report, and a final report, worth 10%, 15%, and 35% of the final grade, respectively. More details will be posted soon.