CS 6241: Numerics for Data Science
Low dimensional structure in function approximation
Introduction
This week, we will consider the case of functions
Our main setting today will be scalar-valued functions (
Fortunately, many functions we encounter in practice have additional structure, such as high degrees of smoothness (at least in some directions, or away from some creases or singular points) or an effective low-dimensional dependence on the input, or a sparse sum of univariate (or low-dimensional) functions. Many adaptive algorithms for function approximation work because they are look for signs of these types of structure, and exploit it when it seems to be present. We focus today on a specific case where
Active subspaces
We consider the case where our high-dimensional function
Why should we ever expect to see this structure? We could say as a purely empirical matter that this type of low-dimensional subspace comes up often enough that we should check for it, and there is some value to this perspective. In other cases, scaling principles or physical invariants suggest possible ways to reduce the dimensionality (e.g. by non-dimensionalizing a physical problem before applying numerical methods). But in some of the situations where active subspaces are used, there is an a priori reason to expect that even though it may not be easy to tease out by analytic techniques like forming dimensionless groups, such a dimensionality reduction is possible.
For example, one place that active subspaces have been used is in uncertainty quantification involving coefficients appearing in partial differential equations. In these cases, the vector
To make this a little more concrete, consider the case of a function
Explicit construction
Suppose we believe that
When explicit gradients are not available, one can approximate them by computing local linear models (using finite differences or a local least squares fit).
Using active subspaces
When we have an active subspace, what can we do with it? Briefly, the answer might be “anything we would be happy to do with a low-dimensional function.” Specifically, if we believe
Computing a surrogate or response surface used to approximate the function at new points. These often involve kernel methods of the type we will discuss next week.
Do sample-efficient quadratures (e.g. expected values and variances)
Solve optimization problems, which are generally easier in lower-dimensional spaces.
The observent reader might notice that all three of these tasks (function approximation, quadrature, and optimization) are examples we gave earlier in the notes of the “curse of dimensionality,” so it is not surprising that dimension reduction on the input space would be useful in each case.
Implicit construction
So far, we have described how we might explicitly compute a low-dimensional subspace that describes variation in a function. But if we believe such a structure exists, we can also attempt to use it implicitly. This is the idea behind the REMBO (Random Embedding for Bayesian Optimization) and REGO (Random Embeddings for Global Optimization) algorithms. In the REGO algorithm, for example, one solves a sequence of problems of the form
An important point about this implicit approach is that the projected subproblems do not necessarily require gradient information, and can be solved (for
Beyond subspaces
In the past couple years, there have been a few papers that push beyond the idea of finding an active linear subspace to finding a nonlinear version (referred to by one group of authors2 as an active manifold, and another3 as a kernel-based active subspace). In both cases, instead of writing
One of the distinct advantages of the active subspace approach is that it’s easy to analytically write down the (approximate) level sets. This is significantly more complicated in the nonlinear variants of the problem.
Footnotes
As should happen if
and are discretizations of a differential operator and a relatively compact perturbation, for those with some functional analysis background↩︎Active Manifolds: a non-linear analogue to Active Subspaces, Bridges Gruber, Felder, Verma, Hoff, 2019↩︎
Kernel-based Active Subspaces with application to CFD parametric problems using Discontinuous Galerkin method, Romor, Tezzele, Lario, Rozza, 2020↩︎