CS 6241: Numerical Methods for Data Science
Cornell University, Spring 2019.
Lectures: 2:55 PM–4:10 PM, Tuesdays and Thursdays, 140 Bard Hall.
Instructor: Austin Benson, Assistant Professor, Computer Science.
Contact: arb@cs.cornell.edu.
Office hours: 1:30 PM–2:30 PM Tuesdays or by appointment, 413B Gates.
Lectures: 2:55 PM–4:10 PM, Tuesdays and Thursdays, 140 Bard Hall.
Instructor: Austin Benson, Assistant Professor, Computer Science.
Contact: arb@cs.cornell.edu.
Office hours: 1:30 PM–2:30 PM Tuesdays or by appointment, 413B Gates.
Resources
Course administration and discussion:
- CMS for managing course work.
- Piazza for discussion.
- Jupyter notebooks and data for class demos.
- Numerical Linear Algebra. L. N. Trefethen and D. Bau III.
- Applied Numerical Linear Algebra. J. W. Demmel.
- Matrix Computations. G. H. Golub and C. F. Van Loan.
- The Elements of Statistical Learning. T. Hastie, R. Tibshirani, and J. Friedman.
- Numerical Optimization. J. Nocedal and S. J. Wright.
Coursework schedule
- Homework 1. Due Th 3/7 at 11:59pm ET.
- Reaction paper. Due Th 3/28 at 11:59pm ET.
- Project proposal. Due Th 4/11 at 11:59pm ET.
- Homework 2. Due Th 4/18 at 11:59pm ET.
- Project progress report. Due Th 5/2 at 11:59pm ET.
- Final project report. Due Wed 5/15 4:30pm ET.
Class schedule
This schedule is tentative and subject to change.Week 1
Lectures 1 & 2 (Tu 1/22 and Th 1/24): Numerical linear algebra review, optimization problems, linear least squaresTopics: course overview and expectations, vector and matrix norms, linear systems, eigenvalues, basic matrix factorizations, basics of optimization problems, linear least squares
Readings:
- Background notes from David Bindel's 2018 SJTU summer course.
- Michael Saunders' review notes for CME 338 at Stanford.
- The Best of the 20th Century: Editors Name Top 10 Algorithms, B. A. Cipra. SIAM News, 2000.
- Numerical Optimization (section 2.1)
- The Elements of Statistical Learning (sections 3.1 and 3.2).
- Applied Numerical Linear Algebra (sections 3.1 and 3.2).
Week 2
Lecture 3 (Tu 1/29): Regularized linear least squaresTopics: Tikhonov regularization / ridge regression, Lasso, pivoted QR
Readings:
- Applied Numerical Linear Algebra (section 3.5).
- Rank Revealing QR Factorizations. T. F. Chan. LAA, 1987.
- The Elements of Statistical Learning (section 3.4).
- Numerical Optimization (section 3.3)
- Optimization Methods for Large-Scale Machine Learning. L. Bottou, F. E. Curtis, and J. Nocedal.
Topics: Basics of Krylov subspace methods, LSQR, LSMR
Readings:
- Numerical Linear Algebra. Lecture 32.
- Iterative Methods for Symmetric Ax = b. M. Saunders, Notes for CME 338 at Stanford.
- Iterative Methods for Square and Rectangular Systems. M. Saunders, Notes for CME 338 at Stanford.
- LSQR: An algorithm for sparse linear equations and sparse least squares. C. C. Paige and M. A. Saunders. ACM TOMS, 1982.
- LSMR: An iterative algorithm for sparse least-squares problems. D. C.-L. Fong and M. Saunders. SIAM J. on Scientific Computing, 2011.
- CG versus MINRES: An empirical comparison. D. C.-L. Fong and M. Saunders. SOL Tech Report, 2011.
Week 3
Lecture 5 (Tu 2/5): Latent factor models, linear dimensionality reduction, and matrix factorizationTopics: PCA, robust PCA, CUR
Readings:
- The Advanced Matrix Factorization Jungle. I. Carron.
- The Elements of Statistical Learning (sections 3.4.1 and 14.5).
- Robust Principal Component Analysis?. E. J. Candès, X. Li, Y. Ma, J. Wright. JACM, 2011.
- CUR matrix decompositions for improved data analysis. M. W. Mahoney. MMDS Slides, 2006.
- CUR matrix decompositions for improved data analysis. M. W. Mahoney and P. Drineas. PNAS, 2009.
Topics: NMF
Readings:
- The Why and How of Nonnegative Matrix Factorization. N. Gillis. arXiv, 2014.
- Learning the parts of objects by non-negative matrix factorization. D. D. Lee and H. S. Seung. Nature, 1999.
- Algorithms for Non-negative Matrix Factorization. D. D. Lee and H. S. Seung. NeurIPS, 2001.
- When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? D. Donoho and V. Stodden. NeurIPS, 2003.
- Computing a nonnegative matrix factorization—provably. S. Arora, R. Ge, R. Kannan, and A. Moitra. SIAM J. on Computing, 2016.
Week 4
Lecture 7 (Tu 2/12): Randomized numerical linear algebra [Guest lecture by Anil Damle]Topics: randomized SVD, LSRN, random projections for NLA
Readings:
- Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. N. Halko, P. G. Martinsson, and J. A. Tropp. SIAM Review, 2011.
- LSRN: A Parallel Iterative Solver for Strongly Over- or Under-Determined Systems. X. Meng, M. A. Saunders, and M. W. Mahoney. SIAM J. on Scientific Computing, 2014.
- Numerical Linear Algebra in the Streaming Model. K. L. Clarkson and D. P. Woodruff. STOC, 2009.
Week 5
Lectures 8 & 9 (Tu 2/19 & Th 2/21): Tensors eigenvectors and decompositionsTopics: problems with border rank, ill-posedness, and Eckart–Young analogs; best rank-1 approximation and connections to tensor eigenvectors; Z-eigenpairs and H-eigenpairs; CP and Tucker decompositions
Readings (broad ideas):
- Tensor Decompositions and Applications. T. G. Kolda and B. W. Bader. SIAM Review, 2009.
- Tensor Decompositions, State of the Art and Applications. P. Comon. Mathematics in Signal Processing, 2002.
- Most Tensor Problems Are NP-Hard. C. J. Hillar, L.-H. Lim. JACM, 2013.
- Tensors and Hypermatrices. L.-H. Lim. Chap. 15, Handbook of Linear Algebra, 2nd Ed., 2013.
- Tensor Rank and the Ill-Posedness of the Best Low-Rank Approximation Problem. V. de Silva and L.-H. Lim. SIAM J. on Matrix Analysis and Applications, 2008.
- A Counterexample to the Possibility of an Extension of the Eckart–Young Low-Rank Approximation Theorem for the Orthogonal Rank Tensor Decomposition. T. G. Kolda. SIAM J. on Matrix Analysis and Applications, 2003.
- Orthogonal Tensor Decompositions. T. G. Kolda. SIAM J. on Matrix Analysis and Applications, 2001.
- Tensor Rank is NP-complete. J. Håstad. J. of Algorithms, 1990.
- On the Best Rank-1 and Rank-(R1, R2,...,RN) Approximation of Higher-Order Tensors. L. De Lathauwer, B. De Moor, and J. Vandewalle. SIAM J. on Matrix Analysis and Applications, 2000.
- On the Best Rank-1 Approximation of Higher-Order Supersymmetric Tensors. E. Kofidis and P. A. Regalia. SIAM J. on Matrix Analysis and Applications, 2002.
- Generalized Canonical Polyadic Tensor Decomposition. D. Hong, T. G. Kolda, J. A. Duersch. arXiv, 2018.
- On best rank one approximation of tensors. S. Friedland, V. Mehrmann, R. Pajarola, and S. K. Suter. Numerical Linear Algebra with Applications, 2013.
- Singular values and eigenvalues of tensors: a variational approach. L.-H. Lim. CAMSAP, 2005.
- Eigenvalues of a real supersymmetric tensor. L. Qi. J. of Symbolic Computation, 2005.
- Tensors and Their Eigenvectors. B. Sturmfels, Notices of the AMS, 2016.
- Three hypergraph eigenvector centralities. A. R. Benson. arXiv, 2018
- Learning Binary Latent Variable Models: A Tensor Eigenpair Approach. A. Jaffe et al. ICML, 2018.
- Unsupervised Discovery of Demixed, Low-dimensional Neural Dynamics across Multiple Timescales through Tensor Components Analysis. A. H. Williams et al. Neuron, 2018.
- Tensor Decompositions for Learning Latent Variable Models. A. Anandkumar et al. JMLR, 2014.
- Parallel Tensor Compression for Large-Scale Scientific Data. W. Austin, G. Ballard, and T. G. Kolda. IPDPS, 2016.
Week 6
No lecture Tu 2/26 (February break)Lecture 10 (Th 2/28): Nonlinear dimensionality reduction
Topics: ISOMAP, LLE, (t-)SNE
Readings:
- Linear Dimensionality Reduction: Survey, Insights, and Generalizations. J. P. Cunningham and Z. Ghahramani. JMLR, 2015.
- A Global Geometric Framework for Nonlinear Dimensionality Reduction. J. B. Tenenbaum, V. de Silva, J. C. Langord. Science, 2000.
- Nonlinear Dimensionality Reduction by Locally Linear Embedding. S. T. Roweis and L. K. Saul. Science, 2000.
- Stochastic Neighbor Embedding. G. Hinton and S. Roweis. NeurIPS, 2002.
- Visualizing Data using t-SNE. L. van der Maaten and G. Hinton. JMLR, 2008.
- How to Use t-SNE Effectively. M. Wattenberg, F. Viégas, and I. Johnson. Distill, 2016.
- Autoencoders. I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning (Chapter 14), 2016.
Week 7 [HW1 due, Th 3/7 at 11:59pm ET]
Lectures 11 & 12 (Tu 3/5 & Th 3/7): Basic network analysis, structure, heavy tails, random graph modelsTopics: adjacency matrix, graph laplacian, random walk matrix, basic spectral graph theory, sparsity, hop distances, clustering, heavy tails and degree distributions, Erdős–Rényi, configuration models, stochastic block models
Readings (survey):
- The Structure and Function of Complex Networks. M. E. J. Newman. SIAM Review, 2003.
- Mining Large Graphs. D. F. Gleich and M. W. Mahoney. Handbook of Big Data, Handbooks of modern statistical methods, 2016.
- Dan Spielman's spectral graph theory course.
- Eigenvalues of graphs. L. Lovász. 2007.
- Revisiting Power-law Distributions in Spectra of Real World Networks. N. Eikmeier and D. F. Gleich. KDD, 2017.
- Mathematics of Networks. M. E. J. Newman. The New Palgrave Dictionary of Economics, 2/e, 2008.
- Random walks and diffusion on networks. N. Masuda, M. A. Porter, and R. Lambiotte. Physics Reports, 2017.
- The Anatomy of the Facebook Social Graph. J. Ugander et al. arXiv, 2011.
- What is Twitter, a Social Network or a News Media? H. Kwak et al. WWW, 2010.
- Planetary-Scale Views on a Large Instant-Messaging Network J. Leskovec and E. Horvitz. WWW, 2008.
- Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations J. Leskovec, J. Kleinberg, and C. Faloutsos. KDD, 2005.
- Collective dynamics of ‘small-world’ networks. D. J. Watts and S. H. Strogatz. Nature, 1998.
- An Experimental Study of the Small World Problem. J. Travers and S. Milgram. Sociometry, 1969.
- Scale-free networks are rare. A. D. Broido and A. Clauset. Nature Communications, 2019.
- Tracing information flow on a global scale using Internet chain-letter data. D. Liben-Nowell and J. Kleinberg. PNAS, 2008.
- Using selection bias to explain the observed structure of Internet diffusions. B. Golub and M. O. Jackson. PNAS, 2010.
- The structure of positive interpersonal relations in small groups. J. A. Davis and S. Leinhardt. Sociological Theories in Progress, Vol. 2, 1967.
- A Brief History of Generative Models for Power Law and Lognormal Distributions. M. Mitzenmacher. Internet Mathematics, 2004.
- Power-Law Distributions in Empirical Data. A. Clauset, C. R. Shalizi, and M. E. J. Newman. SIAM Review, 2009.
- Power laws, Pareto distributions and Zipf’s law. M. E. J. Newman. Contemporary Physics, 2005.
- Configuring Random Graph Models with Fixed Degree Sequences. B. K. Fosdick, D. B. Larremore, J. Nishimura, J. Ugander. SIAM Review, 2018.
- Stochastic blockmodels and community structure in networks. B. Karrer and M. E. J. Newman. PRE, 2011.
- Mixed Membership Stochastic Blockmodels. E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. JMLR, 2008.
- Affiliation Networks. S. Lattanzi and D. Sivakumar. STOC, 2009.
- Coin-flipping, ball-dropping, and grass-hopping for generating random graphs from matrices of edge probabilities. A. S. Ramani, N. Eikmeier, and D. F. Gleich. SIAM Review, 2019 (to appear).
Week 8
Lectures 13 & 14 (Tu 3/12 & Th 3/14): Learning on graphs: unsupervised network clustering and community detectionTopics: Spectral methods for bisection, ratio cut, normalized cut, and modularity
Readings (spectral methods):
- A tutorial on spectral clustering. Ulrike von Luxburg. Statistics and Computing, 2007.
- Mining Large Graphs (Section 7.3). D. F. Gleich and M. W. Mahoney. Handbook of Big Data, Handbooks of modern statistical methods, 2016.
- Conductance, the Normalized Laplacian, and Cheeger’s Inequality. D. A. Spielman. Lecture notes, 2015.
- On Clusterings: Good, Bad and Spectral. R. Kannan, S. Vempala, and A. Vetta. JACM, 2004.
- On Spectral Clustering: Analysis and an Algorithm. Ng, Jordan, and Weiss. NeurIPS, 2001.
- Kernel k-means, Spectral Clustering and Normalized Cuts. Dhillon, Guan, and Kulis. KDD, 2004.
- Modularity and community structure in networks. M. E. J. Newman. PNAS, 2003.
- Spectral methods for community detection and graph partitioning. M. E. J. Newman. PNAS, 2013.
- Simple, direct and efficient multi-way spectral clustering. A. Damle, V. Minden, and L. Ying. Information and Inference, 2018.
- Co-clustering documents and words using Bipartite Spectral Graph Partitioning. I. S. Dhillon. KDD, 2004.
- Fixing two weaknesses of the Spectral Method. K. J. Lang. NeurIPS, 2006.
- Communities in Networks. M. A. Porter, J.-P. Onnela, and P. J. Mucha. Notices of the AMS, 2009.
- Community detection in networks: A user guide. S. Fortunato and D. Hric. Physics Reports, 2016.
- The Computer Science and Physics of Community Detection: Landscapes, Phase Transitions, and Hardness. C. Moore. arXiv, 2017.
- Community Detection and Stochastic Block Models: Recent Developments. E. Abbe. JMLR, 2018.
- The ground truth about metadata and community detection in networks. L. Peel, D. B. Larremore, and A. Clauset. Science Advances, 2017.
- Lecture 13
- Jupyter Notebook on spectral clustering (part I)
- Lecture 14
- Jupyter Notebook on spectral clustering (part II)
Week 9
Lecture 15 (Tu 3/19): Graph-based semi-supervised learningTopics: graph-based semi-supervised techniques and connections to random walks and spectral clustering
Readings:
- Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. X. Zhu, Z. Ghahraman, and J. Lafferty. ICML, 2003.
- Transductive Learning via Spectral Graph Partitioning. T. Joachims. ICML, 2003.
- Learning with Local and Global Consistency. D. Zhou et al. NeurIPS, 2004.
- Using Local Spectral Methods to Robustify Graph-Based Learning Algorithms. D. F. Gleich and M. W. Mahoney. KDD, 2015.
- Monophily in social networks introduces similarity among friends-of-friends. K. M. Altenburger and J. Ugander. Nature Human Behavior, 2018.
Topics: node embeddings, latent space models, role discovery, node2vec
Readings:
- Latent Space Approaches to Social Network Analysis. Hoff, Raftery, and Handcock. JASA, 2002.
- RolX: Structural Role Extraction & Mining in Large Graphs. K Henderson et al. KDD, 2012.
- DeepWalk: Online Learning of Social Representations. B. Perozzi, R. Al-Rfou, and S. Skiena. KDD, 2014.
- LINE: Large-scale Information Network Embedding. J. Tang et al. WWW, 2015.
- node2vec: Scalable Feature Learning for Networks. A. Grover and J. Leskovec. KDD, 2016
- Representation Learning on Graphs: Methods and Applications. W. L. Hamilton, R. Ying, and J. Leskovec. IEEE Data Engineering Bulletin, 2017.
- Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec. J. Qiu et al. WSDM, 2018.
Week 10 [Reaction paper due Th 3/28 at 11:59pm ET]
Lecture 17 & 18 (Tu 3/26 & Th 3/28): Small patterns in networksTopics: motifs, graphlets, and small subgraphs
Readings (foundations):
- Network Motifs: Simple Building Blocks of Complex Networks. R. Milo et al. Science, 2002.
- Superfamilies of Evolved and Designed Networks. R. Milo et al. Science, 2004.
- Motifs in temporal networks. A. Paranjape, A. R. Benson, and J. Leskovec. WSDM, 2017.
- Higher-order organization of complex networks. A. R. Benson, D. F. Gleich, and J. Leskovec. Science, 2016.
- Higher-order clustering in networks. H. Yin, A. R. Benson, and J. Leskovec. PRE, 2018.
- Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large Graph Collections. J. Ugander, L. Backstrom, and J. Kleinberg. WWW, 2013.
- The structure of positive interpersonal relations in small groups. J. A. Davis and S. Leinhardt. Sociological Theories in Progress, Vol. 2, 1967.
- Modeling interactome: scale-free or geometric?. N. Pržulj, D. G. Corneil, and I. Jurisica. Bioinformatics, 2004.
- Structure and function of the feed-forward loop network motif. S. Mangan and U. Alon. PNAS, 2003.
- The Coherent Feedforward Loop Serves as a Signsensitive Delay Element in Transcription Networks. S. Mangan, A. Zaslaver, and U. Alon. J. of Molecular Biology, 2003.
- Counting Triangles. T. Roughgarden. Stanford CS167 lecture notes, 2014.
- Arboricity and Subgraph Listing Algorithms. N. Chiba and T. Nishizeki. SIAM J. on Computing, 1985.
- Main-memory triangle computations for very large (sparse (power-law)) graphs. M. Latapy. Theoretical Computer Science, 2008.
- Triadic Measures on Graphs: The Power of Wedge Sampling. C. Seshadhri, A. Pinar, and T. G. Kolda. SDM, 2013.
- Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts. M. Jha, C. Seshadhri, and A. Pinar. WWW, 2015.
- ESCAPE: Efficiently Counting All 5-Vertex Subgraphs. A. Pinar, C. Seshadhri, and V. Vishal. WWW, 2017.
- Sampling Methods for Counting Temporal Motifs. P. Liu, A. R. Benson, and M. Charikar. WSDM, 2019.
Week 11
No lecture Tu 4/2 or Th 4/4 (spring break)Week 12 [Project proposal due Th 4/11 at 11:59pm ET]
No lecture Tu 4/9 [Austin is in DC]Lecture 19 (Th 4/11): Ranking and network centrality
Topics: Katz, eigenvector, PageRank, and HITS
Readings (original papers):
- A new status index derived from sociometric analysis. L. Katz. Psychometrika, 1953.
- Technique for Analyzing Overlapping Memberships. P. Bonacich. Sociological Methodology, 1972.
- The PageRank Citation Ranking: Bringing Order to the Web. L. Page, S. Brin, R. Motwani and T. Winograd. Stanford Technical Report, 1998.
- Authoritative Sources in a Hyperlinked Environment. J. M. Kleinberg. SODA, 1998.
- PageRank beyond the Web. D. F. Gleich. SIREV, 2015.
- Link Analysis, Eigenvectors and Stability. A. Y. Ng, A. X. Zheng, and M. I. Jordan. IJCAI, 2001.
- Ranking hubs and authorities using matrix functions. M. Benzi, E. Estrada, and C. Klymko. Linear Algebra and its Applications, 2013.
- Axioms for Centrality. P. Boldi and S. Vigna. Internet Mathematics, 2013.
Week 13 [HW2 due, Th 4/18 at 11:59pm ET]
No lecture Tu 4/16 [Austin is in DC]Lecture 20 (Th 4/18): Ranking and network centrality in hypergraphs
Topics: Extensions of PageRank and eigenvector centrality to hypergraphs
Main readings:
- Multilinear PageRank. D. F. Gleich, L.-H. Lim, and Y. Yu. SIAM J. on Matrix Analysis and Applications, 2015.
- Three hypergraph eigenvector centralities. A. R. Benson. arXiv, 2018
- Multi-Dimensional, Multilayer, Nonlinear and Dynamic HITS. Francesca Arrigo, Francesco Tudisco. arXiv, 2018.
- The Spacey Random Walk: A Stochastic Process for Higher-Order Data. A. R. Benson, D. F. Gleich, and L.-H. Lim. SIAM Review, 2017.
- Some variational principles for Z-eigenvalues of nonnegative tensors. K. C. Chang, K. J. Pearson, and T. Zhang. Linear Algebra and its Applications, 2013.
- Computing tensor Z-eigenvectors with dynamical systems. A. R. Benson and D. F. Gleich. arXiv, 2018.
- Finding the largest eigenvalue of a nonnegative tensor. M. Ng, L. Qi, and G. Zhou. SIAM J. on Matrix Analysis and Applications, 2010.
Week 14
Lecture 21 (Tu 4/23): Special topics: Kernels and function approximationTopics: feature maps, kernel trick, RKHS, Gaussian Processes
Readings:
- Gaussian Processes for Machine Learning (Chapters 1 and 2). C. E. Rasmussen and C. K. I. Williams. MIT Press, 2006.
- Kernel Techniques: From Machine Learning to Meshless Methods. R. Schaback and H. Wendland. Acta Numerica, 2006.
- David Bindel's lecture notes (Many interpretations of kernels)
- David Bindel's lecture notes (Gaussian processes and kernel learning)
Week 15 [Project progress report due Th 5/2 at 11:59pm ET]
Lectures 22 & 23 (Tu 4/30 & Th 5/2): Special topics: Discrete choiceTopics: random utility models, (mixed) logits, IIA, probit, network growth as discrete choice
Readings:
- Discrete Choice Methods with Simulation. K. Train. Cambridge University Press, 2009.
- On the Relevance of Irrelevant Alternatives. A. R. Benson, R. Kumar, and A. Tomkins. WWW, 2016.
- Choosing to grow a graph: Modeling network formation as discrete choice. J. Overgoor, A. R. Benson, and J. Ugander. WWW, 2019.
Week 16
Last day of class (Tu 5/7): Project feedback sessionsWeek 17 [Final project report due Wed 5/15 4:30pm ET]
Coursework
Coursework will be managed through and assignments submitted on CMS.
The required coursework consists of three components:
- Homework (10% each)
There will be two homework where you implement numerical methods that we learned in class and use them to analyze datasets. - Reaction paper (20%)
The second component of the course is composing a (roughly) 5-page reaction paper that summarizes and critiques at least two closely related published papers relevant to the class. The motivation of this assignment is to get everyone thinking about research issues related to the class material and to stimulate ideas for the course project (described below). You can work individually or in groups of two or three. Your group members can be the same as your project partners, but they do not have to be. Additional details will be posted soon. - Course project (60%)
Most of your grade will be determined by a course project, where the topic is of your choosing but must be related to the class. You can work individually or in groups of two or three (the scope of the project should be scaled according to the group size). I will be looking for three key pieces in the course project: (i) some theoretical or mathematical discussion of a numerical method or algorithm (broadly construed); (ii) some software implementation of a method or algorithm; and (iii) some analysis of a real-world dataset. The project will be split into three components: a proposal, a progress report with feedback session attendance, and a final report, worth 1/4, 1/4, and 1/2 of the project grade, respectively. More details will be posted soon.