Introduction
Our goal in the coming three weeks largely involves approximating functions by some “simple” family of functions (particularly polynomials and combinations of translates of a radial basis function). If we want to prove something, we will usually assume is compact and has some degree of smoothness or regularity.
Approximation of continuous functions plays a key role in most introductions to analysis, whether classical or numerical. On the classical side, we have the Weierstrass theorem, stating that every such function can be uniformly approximated by a polynomial; equivalently, where in this setting . Work by Jackson and Bernstein made Weierstrass’s results more precise, connecting the degree of smoothness to the polynomial degree required to achieve a given accuracy. These types of results are generally referred to as constructive function theory, and are well described in now-classic books by Akhieser and by Rivlin.
In numerical analysis, most students are first exposed to function approximation in discussions of polynomial interpolation: given function values at nodes , find a polynomial (i.e. a polynomial with degree at most ) such that for . The study of polynomial interpolation stands on two legs: on the computational side, one needs a representation of the interpolating polynomial that can be computed (and evaluated at new points) quickly and without numerical instability; and on the theoretical side, one needs an understanding of how well polynomial interpolation approximates a target function when has some known smoothness. The theory and computational practice come together in the design of adaptive approximation algorithms; the Chebfun system is a modern instance of tying these threads thoroughly together.
Alas, approximating univariate functions by polynomials is not enough for our purposes, and things immediately become more complicated when we move to higher-dimensional spaces. When we go beyond one space dimension, we can no longer always uniquely define an interpolant from an -dimensional linear space of functions with samples at data points; we need an additional condition in order to ensure the problem is well-posed (the famous Mairhuber-Curtis theorem). We can get around this issue by constraining the location of the sample points or by choosing a method that goes beyond taking approximants from a fixed linear space. At a computational level, high-dimensional approximation suffers from the “curse of dimensionality”: if all we have is a fixed amount of smoothness, the number of data points required to reach a target accuracy tends to grow exponentially in the dimension of the space. Therefore, we tend to seek methods that effectively lower the problem dimension, as we discuss in the coming week.
A concrete error bound
Some of this lecture will be rather abstract. In order to ground ourselves, let’s start with a very concrete problem. Consider points with associated function values . Assuming has bounded second derivatives, how can we bound where is the linear interpolant through the data points?
Even in this simple case, we need an extra hypothesis to make sense of the problem; the system of equations should not be singular. Partial Gaussian elimination gives that this is equivalent to the condition that be a basis for .
Assuming that this system is nonsingular, we can evaluate at a given target point by either solving the linear system for the coefficients and or solving the system and then evaluating . If is in the convex hull of the , then the weights are all non-negative.
This latter form is convenient for error analysis, together with the trick of writing each of the function evaluations in terms of a Taylor expansion about ; if we let , then where is some point on the line segment connecting and . Substituting in, we have and from the defining equations for , we have that so that we are left with Therefore, if uniformly, we have If is in the convex hull of the data locations, we have that the weights are all positive and sum to one, and the distances are bounded by the maximum edge length , so that where is the maximum edge length . In fact, we can tighten the constant somewhat, though not enormously.
There are a few things to take away from this concrete example (apart from the error bound, which is useful in its own right):
We need a hypothesis on the points at the outset to ensure that we can solve the interpolation system (this property of the points is sometimes called well-poisedness for interpolation).
There are multiple ways of formulating the interpolation problem (solve for the coefficients in an affine function, solve for weights associated with a given point).
The fact that the method exactly reconstructs linear functions played an important (if implicit) role in our error analysis.
Let’s now consider how we would tackle something like this in a more abstract setting.
Approximation from a fixed space
Consider a Banach space (a complete, normed vector space) containing a target function , and suppose we seek to approximate by some function where is a finite-dimensional subspace. We now have two distinct questions:
Does contain a good approximation to the target function?
If a good approximation exists, how can it be found?
There are several different ways that we might choose a good approximation, including minimizing a squared error (in the case that is a Hilbert space); including interpolating at a set of points; forcing certain linear functionals of the error to be zero (a Petrov-Galerkin approach); minimizing a (weighted) squared error at a larger set of sample points; or minimizing the maximum error over a large set of points (leading to a so-called Chebyshev optimization problem).
For the moment, we consider approximation schemes that are linear in and are exact on . This includes all of the schemes mentioned above except the last (Chebyshev optimization leads to a nonlinear scheme). In these methods, we approximate by where is a linear operator with range and . These two conditions imply that the approximation operator is a projection, i.e. . We also have an associated error projection . Now, note that for any , we have and therefore Hence, approximation via is quasi-optimal, yielding an approximation within a factor of of the best possible approximation within the space.
Making this a little more concrete, suppose we use the max norm on , i.e. Then and is the maximum over of the so-called Lebesgue function where are the so-called cardinal functions (or Lagrange functions) for which .
Kolmogorov -widths and convergence rates
The quasi-optimality framework described above only helps with half of our problem. We want an approximation within a constant factor of the best thing possible; we also want the best thing possible to be good! This is an instance of the consistency-stability paradigm common in numerical analysis: stability gives a small quasi-optimality constant, consistency means that the best possible approximation has a small error. Having briefly described the stability piece, let’s now talk about accuracy.
Let be a (usually compact) subset of . When has a norm (or quasi-norm) that measures smoothness, the set may be associated with a unit ball of functions up to a prescribed smoothness. In our concrete case described before, for instance, we might consider the set of all functions where the second derivative was pointwise bounded in norm by some constant . The Kolmogorov -width describes the worst-case approximation error for functions from under a best-case choice of -dimensional approximation spaces of functions; that is, where is the minimum error norm for approximating by elements of . In general, these -widths decay as for some associated with smoothness of the functions in .