CS 6241: Numerics for Data Science
Low dimensional structure in function approximation
Introduction
Last time, we discussed how to seek low-dimensional structure in the input space for functions
Of course, it is also possible to combine the techniques to deal with high-dimensional input and output spaces, and this is frequently done in practice.
Interpolation and vector-valued functions
If
The Lagrange functions, we might recall, have the property that
The approximations
Independent of the exact space
Learning from snapshots
If we are going to build a good approximating subspace from evaluation of
Choose
points on some type of regular grid.Choosing
points at random according to some distribution (e.g. uniform over the domain ).Choose
points according to a low-discrepancy sequence.Choose
points according to an experimental design method (e.g. using a Latin hypercube).
Sampling on regular grids must be done with some care to avoid aliasing issues when
Frequently, when we look at a collection of snapshots, we will discover that they have some redundancy to them. In fact, we hope they do! The POD (or principle orthogonal directions) approach to reducing a snapshot collection basically involves an SVD:
Taking the SVD of a snapshot matrix can be seen as a discrete approximation of computing the eigendecomposition of
Empirical interpolation method (EIM)
For constructing a low-dimensional space from which to draw approximation, an alternative to principle orthogonal decomposition is the empirical interpolation method (EIM). The EIM algorithm is a greedy method that simultaneously constructs a set of interpolation points (the so-called “magic points”) and a basis for a subspace spanned by associated samples of the function.
For this case, consider
Those readers who are afficianados of numerical linear algebra may recognize another algorithm lurking in the shadows. To make this more explicit, note that the interpolant at step
If the sets
For reasons that I don’t entirely understand, the interpretation of the EIM basis construction as Gaussian elimination with complete pivoting (GCP) doesn’t seem to appear in the literature1, though a number of authors have come close. The closest I have found to making this explicit is the idea of a “continuous matrix” version of GECP was described in a 2014 SIREV paper by Townsend and Trefethen, who comment that “we are not aware of explicit previous discussions of LU factorization of a cmatrix.” A 2016 SISC paper by Drmač and Gugercin suggests the use of GECP in a similar context, but applied to the POD basis, and with an eye only to selecting interpolation points (rather than selecting a subspace basis); for this purpose, the pivoted QR approach that they analyze in detail (leading to the Q-DEIM method) seems preferable.
Extracting solutions
The POD method and the EIM method give us a subspace from which to draw approximations to
For the sake of concreteness, let’s again consider the problem of approximating the function
Least squares: We might choose
. Provided we have a basis for our space, this requires forming (with matrix-vector products), solving a least squares problem (in time), and then performing a matrix-vector product to reconstruct the solution (in time). If is dense, the dominant cost is the time to compute ; this may or may not be true if is sparse.Galerkin: We might choose a space
and enforce the condition . This is known as a Bubnov-Galerkin condition when ; otherwise, it is a Petrov-Galerkin condition. Unless has some special structure, the costs are similar to the least squares approach, with the greatest cost being the formation of (the “projected system matrix”), involving matrix-vector products and an cost for a matrix-matrix multiplication. Note that if is a linear function of (or a linear function of a few nonlinear functions of ), we might be able to do some pre-computation to get the online cost down to .Interpolation: We might choose
equations to enforce; the indices of these equations are our “interpolation points” in the discrete setting. In this case, we might be able to avoid the cost of computing , since we only need a few rows. Note that this is a special case of Galerkin where the test basis consists of columns of the identity associated with the indices for the enforced equations. The challenge for this problem is choosing a good set of interpolation points (in the EIM setting, these are the points selected during the subspace construction).
The error analysis of each of these methods follows the same general quasi-optimality recipe: we establish a bound on the minimum distance between
Addendum: Low-discrepancy sequences
Consider the problem of sampling on the unit hypercube, whether for quadrature or interpolation. One approach to distributing sample points is to take independent draws from the uniform distribution; however, random draws tend to include “clumps” (and leave some parts of the domain empty). What we would really like is a sampling scheme that ensures that points are reasonably spread out (like what happens with grids) and yet is “random-seeming” and not prone to aliasing issues (which is an advantage of random draws).
Discrepancy is a measure of the “clumpiness” of a distribution of sample points2. A number of low-discrepancy sequences are available; these include Halton sequences, Sobol sequences, and van der Corput sequences, for example. These sequences can be generated rapidly (they mostly involve simple recurrences), and there is good software available for them. Sampling via low-discrepance sequences (vs independent random draws) is the key distinction between quasi-Monte Carlo methods and Monte Carlo methods (and the reason that QMC tends to converge faster than standard MC).