Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

Proj 1: Approximation with RBFs

A classic function approximation scheme used for interpolating data at scattered points involves the use of a radial basis function (RBF), typically denoted by $\phi$. For a function $f : \mathbb{R}^d \rightarrow \mathbb{R}$, we approximate $f$ by

$$s(x) = \sum_{i=1}^n \phi(\|x-x_i\|_2) c_i$$

where the points $\{x_i\}_{i=1}^n$ are known as centers – we will assume for the purpose of this assignment that these are all distinct. We sometimes write this more concisely as

$$s(x) = \Phi_{xX} c$$

where $\Phi_{xX}$ is a row vector with entries $(\Phi_{xX})_j = \phi(\|x-x_i\|_2)$.

The coefficient vector $c$ may be chosen by interpolating at the centers; that is, we write $s(x_i) = f(x_i),$ or more compactly

$$\Phi_{XX} c = f_X$$

where $\Phi_{XX} \in \mathbb{R}^{n \times n}$ is the matrix with entries $(\Phi_{XX})_{ij} = \phi(\|x_i-x_j\|)$ and $f_X \in \mathbb{R}^n$ is the vector with entries $(f_X)_i = f(x_i)$. When $\phi$ is a positive definite RBF, the matrix $\Phi_{XX}$ is guaranteed to be positive definite.

There are many reasons to like RBF approximations. There is a great deal of theory associated with them, both from a classic approximation theory perspective and from a statistics perspective (where $\phi$ is associated with the covariance of a Gaussian process). But for this project, we also like RBF approximations because they naturally give rise to many different types of numerical linear algebra problems associated with solving linear systems! We will explore some of these in the current project.

md"""
# Proj 1: Approximation with RBFs

A classic function approximation scheme used for interpolating data at scattered points involves the use of a *radial basis function* (RBF), typically denoted by $\phi$. For a function $f : \mathbb{R}^d \rightarrow \mathbb{R}$, we approximate $f$ by

$$s(x) = \sum_{i=1}^n \phi(\|x-x_i\|_2) c_i$$

where the points $\{x_i\}_{i=1}^n$ are known as *centers* -- we will assume for the purpose of this assignment that these are all distinct. We sometimes write this more concisely as

$$s(x) = \Phi_{xX} c$$

where $\Phi_{xX}$ is a row vector with entries $(\Phi_{xX})_j = \phi(\|x-x_i\|_2)$.
1.4 ms

Logistics

You should complete tasks 1 and 2 by Friday, Feb 17; tasks 3-5 are due by Friday, Feb 24.

You are encouraged to work in pairs on this project. I particularly encourage you to try pair-thinking and pair-programming – you learn less if you just try to partition the problems! You should produce short report addressing the analysis tasks, and a few short codes that address the computational tasks. You may use any Julia functions you might want.

You are allowed (even encouraged!) to read outside resources that talk about these types of computations – including my own notes from my "Numerical Methods for Data Science" course. It is possible that you'll find references that tell you outright how to solve a subproblem; if so, feel free to take advantage of them with citation! You may well end up doing more work to find and understand the relevant resources than you would doing it yourself from scratch, but you will learn interesting things along the way.

Most of the code in this project will be short, but that does not make it easy. You should be able to convince both me and your partner that your code is right. A good way to do this is to test thoroughly. Check residuals, compare cheaper or more expensive ways of computing the same thing, and generally use the computer to make sure you don't commit silly errors in algebra or coding. You will also want to make sure that you satisfy the efficiency constraints stated in the tasks.

md"""
## Logistics

You should complete tasks 1 and 2 by Friday, Feb 17; tasks 3-5 are due by Friday, Feb 24.

You are encouraged to work in pairs on this project. I particularly encourage you to try pair-thinking and pair-programming -- you learn less if you just try to partition the problems! You should produce short report addressing the analysis tasks, and a few short codes that address the computational tasks. You may use any Julia functions you might want.

406 μs

Code setup

We will be using several built-in packages for this project. We'll use the LinearAlgebra and Plots packages in all most all of our computational homeworks and projects in this course, but here we also use the SparseArrays and SuiteSparse packages for dealing with sparse linear algebra. We also use the SpecialFunctions package and QuadGK for one of the tasks where we use kernel approximations to estimate integrals.

md"""
## Code setup

We will be using several built-in packages for this project. We'll use the `LinearAlgebra` and `Plots` packages in all most all of our computational homeworks and projects in this course, but here we also use the `SparseArrays` and `SuiteSparse` packages for dealing with sparse linear algebra. We also use the `SpecialFunctions` package and `QuadGK` for one of the tasks where we use kernel approximations to estimate integrals.
"""
196 μs
using LinearAlgebra
261 μs
using Plots
3.2 s
using SparseArrays
226 μs
using SuiteSparse
162 μs
using SpecialFunctions
169 μs
using QuadGK
3.7 ms

A bestiary of basis functions

We consider several possible radial basis functions, outlined below:

  • The squared exponential basis function (ϕ_se) is the default choice used in many machine learning applications. It is infinitely differentiable, and decays quickly away from the origin.

  • The inverse multiquadric function (ϕ_imq) is also infinitely differentiable, but decays much more slowly away from the origin.

  • The Matern family of functions is frequently used in geospatial statistics. These functions have a smoothness parameter; the Matérn 1/2 function (ϕ_mat12) is continuous but not differentiable, Matérn 3/2 (ϕ_mat32) is continuously differentiable but not twice differentiable, and Matérn 5/2 (ϕ_mat52) is twice continuously differentiable but not three times differentiable.

  • The Wendland family of functions is compactly supported: that means it is exactly zero far enough away from the origin. The definition of these functions depends on the dimension $d$ of the underlying space in which we are doing interpolation. Here we define the 2D version of this function that is twice differentiable (ϕ_w21).

md"""
### A bestiary of basis functions

We consider several possible radial basis functions, outlined below:

- The *squared exponential* basis function (`ϕ_se`) is the default choice used in many machine learning applications. It is infinitely differentiable, and decays quickly away from the origin.

- The *inverse multiquadric* function (`ϕ_imq`) is also infinitely differentiable, but decays much more slowly away from the origin.

- The *Matern* family of functions is frequently used in geospatial statistics. These functions have a smoothness parameter; the Matérn 1/2 function (`ϕ_mat12`) is continuous but not differentiable, Matérn 3/2 (`ϕ_mat32`) is continuously differentiable but not twice differentiable, and Matérn 5/2 (`ϕ_mat52`) is twice continuously differentiable but not three times differentiable.

1.6 ms
ϕ_w21 (generic function with 1 method)
begin
ϕ_se(r) = exp(-r^2)
ϕ_imq(r) = 1.0/sqrt(1.0 + r^2)
ϕ_mat12(r) = exp(-r)
ϕ_mat32(r) = (1.0+sqrt(3.0)*r)*exp(-sqrt(3.0)*r)
ϕ_mat52(r) = (1.0+sqrt(5.0)*r+5.0/3.0*r^2)*exp(-sqrt(5.0)*r)
ϕ_w21(r) = max(1.0-r, 0.0)^4*(4.0*r+1.0)
end
1.4 ms

Derivatives

For some of our computations later, it will also be useful to have the derivatives of the various radial basis functions we work with.

md"""
#### Derivatives

For some of our computations later, it will also be useful to have the derivatives of the various radial basis functions we work with.
"""
163 μs
dϕ_w21 (generic function with 1 method)
begin
dϕ_se(r) = -2.0*r*exp(-r^2)
dϕ_imq(r) = -r/(1.0 + r^2)^1.5
dϕ_mat12(r) = -exp(-r)
dϕ_mat32(r) = -3.0*r*exp(-sqrt(3.0)*r)
dϕ_mat52(r) = -5.0/3.0*r*(1 + sqrt(5.0)*r)*exp(-sqrt(5.0)*r)
dϕ_w21(r) = -20*r*max(1.0-r, 0.0)^3
end
1.3 ms

I am as prone to calculus errors and coding typos as anyone, so when I code derivatives, I also like to include a finite difference check; that is, I compare my supposedly-exact derivative comptutation to the approximation

$$f'(x) \approx \frac{f(x+h)-f(x-h)}{2h},$$

which has an absolute error bounded by $h^2 \max_{|z| \leq h} |f'''(z)|/6$ (which we usually write as $O(h^2)$.)

md"""
I am as prone to calculus errors and coding typos as anyone, so when I code derivatives, I also like to include a *finite difference check*; that is, I compare my supposedly-exact derivative comptutation to the approximation

$$f'(x) \approx \frac{f(x+h)-f(x-h)}{2h},$$

which has an absolute error bounded by $h^2 \max_{|z| \leq h} |f'''(z)|/6$ (which we usually write as $O(h^2)$.)
"""
232 μs
RBF typeRelerr
SE9.899196270977514e-9
IMQ1.4411370482301693e-8
Matérn 1/21.6664242225614643e-9
Matérn 3/24.1939174510743164e-8
Matérn 5/21.780860813958562e-8
Wendland 2,17.970209367545122e-8
let
function test_dϕ(tag, ϕ, )
r = 0.123
h = 1e-4
dϕ_fd = (ϕ(r+h)-ϕ(r-h))/2/h # Finite difference estimate
dϕ_an = (r) # Analytic computation
relerr = abs(dϕ_fd-dϕ_an)/abs(dϕ_an)
"| $tag | $relerr |\n"
end

Markdown.parse(
"""
| RBF type | Relerr |
|----------|--------|
""" *
test_dϕ("Matérn 1/2", ϕ_mat12, dϕ_mat12) *
test_dϕ("Matérn 3/2", ϕ_mat32, dϕ_mat32) *
test_dϕ("Matérn 5/2", ϕ_mat52, dϕ_mat52) *
test_dϕ("Wendland 2,1", ϕ_w21, dϕ_w21))
end
78.0 ms

There are a few other things worth noting about our test code, as we get used to Julia programming:

  • We put the entire tester inside a let block, which defines a local scope for variable names. The function test_dϕ that we defined here cannpt be seen outside this block.

  • Our tester test_d\phi takes functions as an input argument.

  • The last expression in the let block is the return value, which in this case is a Markdown cell. The Julia Markdown support includes support for rendering tables, which we have done here. We have also used the Markdown.parse function directly rather than using the md macro, making it a little easier to program the output in this case. Note that * on strings represents concatenation in Julia.

md"""
There are a few other things worth noting about our test code, as we get used to Julia programming:

- We put the entire tester inside a `let` block, which defines a local scope for variable names. The function `test_dϕ` that we defined here cannpt be seen outside this block.

- Our tester `test_d\phi` takes functions as an input argument.

- The last expression in the `let` block is the return value, which in this case is a Markdown cell. The Julia Markdown support includes support for rendering tables, which we have done here. We have also used the `Markdown.parse` function directly rather than using the `md` macro, making it a little easier to program the output in this case. Note that `*` on strings represents concatenation in Julia.
"""
309 μs

Length scales

We often define radial basis functions to have a length scale $l$, i.e.

$$\phi(r) = \phi_0(r/l).$$

We left out the length scale in the initial definitions of our RBFs, but we add it in now.

md"""
#### Length scales

We often define radial basis functions to have a *length scale* $l$, i.e.

$$\phi(r) = \phi_0(r/l).$$

We left out the length scale in the initial definitions of our RBFs, but we add it in now.
"""
191 μs
scale_rbf (generic function with 2 methods)
function scale_rbf(ϕ0, dϕ0, l=1.0)
ϕ(r) = ϕ0(r/l)
(r) = dϕ0(r/l)/l
end
1.3 ms

RBF matrices and vectors

To compute with our interpolating functions, we need to manipulate matrices $\Phi_{XY}$ with entries $\phi(\|x_i-y_j\|)$. We represent collections of points as matrices $X = \begin{bmatrix} x_1 & \ldots & x_n \end{bmatrix} \in \mathbb{R}^{d \times n}$. As is usual in Julia, we use an exclamation mark as part of the name for functions that mutate their inputs.

md"""
### RBF matrices and vectors

To compute with our interpolating functions, we need to manipulate matrices $\Phi_{XY}$ with entries $\phi(\|x_i-y_j\|)$. We represent collections of points as matrices $X = \begin{bmatrix} x_1 & \ldots & x_n \end{bmatrix} \in \mathbb{R}^{d \times n}$. As is usual in Julia, we use an exclamation mark as part of the name for functions that mutate their inputs.
"""
154 μs
form_ΦXY! (generic function with 1 method)
function form_ΦXY!(result, ϕ, X :: AbstractMatrix, Y :: AbstractMatrix)
for j = 1:size(Y)[2]
for i = 1:size(X)[2]
result[i,j] = ϕ(norm(X[:,i]-Y[:,j]))
end
end
end
870 μs
form_ΦXY (generic function with 1 method)

function form_ΦXY(ϕ, X :: AbstractMatrix, Y :: AbstractMatrix)
form_ΦXY!(zeros(size(X)[2], size(Y)[2]), ϕ, X, Y)
end
344 μs

We decorated the definitions in the previous block with AbstractVector and AbstractMatrix types in order to also have a specialized 1D version (where we use numbers to represent individual points, and vectors for collection of points).

md"""
We decorated the definitions in the previous block with `AbstractVector` and `AbstractMatrix` types in order to also have a specialized 1D version (where we use numbers to represent individual points, and vectors for collection of points).
"""
118 μs
form_ΦXY! (generic function with 2 methods)
function form_ΦXY!(result, ϕ, X :: AbstractVector, Y :: AbstractVector)
form_ΦXY!(result, reshape(X, 1, length(X)), reshape(Y, 1, length(Y)))
end
371 μs
form_ΦXY (generic function with 2 methods)
function form_ΦXY(ϕ, X :: AbstractVector, Y :: AbstractVector)
form_ΦXY(ϕ, reshape(X, 1, length(X)), reshape(Y, 1, length(Y)))
end
302 μs

RBF matrix factorization

We create an RBFCenters structure to keep track of the location of the centers in our approximation scheme along with the associated function of the RBF matrix. We are going to want to allow the interpolation of multiple functions for the same set of centers (for example), so we keep the representation of the centers and the factorization of $\Phi_{XX}$ separate from the coefficient vector for interpolating any specific function.

The implementation is slightly complicated by us typically keeping some extra storage around so that we can add new centers without reallocating storage.

md"""
### RBF matrix factorization

We create an `RBFCenters` structure to keep track of the location of the centers in our approximation scheme along with the associated function of the RBF matrix.
We are going to want to allow the interpolation of multiple functions for the same set of centers (for example), so we keep the representation of the centers and the factorization of $\Phi_{XX}$ separate from the coefficient vector for interpolating any specific function.

The implementation is slightly complicated by us typically keeping some extra storage around so that we can add new centers without reallocating storage.
"""
183 μs
add_centers_ref! (generic function with 1 method)
begin

# Structure representing a collection of RBF centers and an associated matrix rep
mutable struct RBFCenters
ϕ :: Function # RBF function
:: Function # RBF derivative function
Nmax :: Integer # Max points that we can accommodate without expansion
n :: Integer # Number of points we are currently maintaining
Xstore :: Matrix # Storage for location of centers
Rstore :: Matrix # Storage for upper triangular Cholesky factor
end

# Constructor for centers object
function RBFCenters(ϕ, , d :: Integer, Nmax=100)
Xstore = zeros(d, Nmax)
Rstore = zeros(Nmax, Nmax)
end

# Alternate constructor (takes in initial points)
function RBFCenters(ϕ, , X :: AbstractMatrix, Nmax=100)
d = size(X)[1]
n = size(X)[2]
Nmax = max(n, Nmax)
rbfc = RBFCenters(ϕ, , d, Nmax)
5.0 ms

Once we have a factorization of the matrix $\Phi_{XX}$ through $n$ centers (which takes $O(n^3)$ time to compute), we can use that factorization to solve $\Phi_{XX} c = f_X$ in $O(n^2)$ time. We can then evaluate $s(x) = \phi_{xX} c$ at a new point in $O(n)$ time. The solve and feval routines compute $c$ and evaluate $s(x)$, respectively. Note that the use of iterators means that we never actually form $\phi_{xX}$ in memory – we just compute with it element by element.

We sometimes also want to compute gradients $\nabla s(x)$, which we can do with the dfeval function. If the implementation of dfeval seems slightly cryptic, note that the chain rule gives us

$$\nabla_x \phi(\|x-x_j\|_2) = \phi'(\|x-x_j\|_2) \frac{x-x_j}{\|x-x_j\|}$$

and we have to do something special when $x = x_j$ (the answer there is zero when differentiation makes sense at all). Therefore, by linearity of differentiation,

$$\nabla_x s(x) = \sum_{j=1}^n \phi'(\|x-x_j\|_2) \frac{x-x_j}{\|x-x_j\|} c_j$$

md"""
Once we have a factorization of the matrix $\Phi_{XX}$ through $n$ centers (which takes $O(n^3)$ time to compute), we can use that factorization to solve $\Phi_{XX} c = f_X$ in $O(n^2)$ time. We can then evaluate $s(x) = \phi_{xX} c$ at a new point in $O(n)$ time. The `solve` and `feval` routines compute $c$ and evaluate $s(x)$, respectively. Note that the use of iterators means that we never actually form $\phi_{xX}$ in memory -- we just compute with it element by element.

We sometimes also want to compute gradients $\nabla s(x)$, which we can do with the `dfeval` function. If the implementation of `dfeval` seems slightly cryptic, note that the chain rule gives us

$$\nabla_x \phi(\|x-x_j\|_2) = \phi'(\|x-x_j\|_2) \frac{x-x_j}{\|x-x_j\|}$$

and we have to do something special when $x = x_j$ (the answer there is zero when differentiation makes sense at all). Therefore, by linearity of differentiation,

$$\nabla_x s(x) = \sum_{j=1}^n \phi'(\|x-x_j\|_2) \frac{x-x_j}{\|x-x_j\|} c_j$$
252 μs
dfeval (generic function with 1 method)
begin
# Compute a coefficient vector for a given rhs
solve(rbfc :: RBFCenters, fX :: AbstractVector) = rbfc.F \ fX
solve(rbfc :: RBFCenters, f :: Function) = rbfc.F \ [f(x) for x in rbfc.Xiter]

# Evaluate an interpolant at a given point
function feval(rbfc :: RBFCenters, c :: Vector, x)
ϕ = rbfc
sum( ϕ(norm(x-xj))*cj for (xj, cj) in zip(rbfc.Xiter, c) )
end

# Evaluate a gradient at a given point
function dfeval(rbfc :: RBFCenters, c :: Vector, x)
= rbfc.dϕ
∇ϕ(r) = if norm(r) == 0 0*r else (norm(r))*r/norm(r) end
sum( ∇ϕ(x-xj)*cj for (xj, cj) in zip(rbfc.Xiter, c) )
end
end
2.5 ms

Hello world!

Let's use our machinery now for a toy task of 1D interpolation of $\cos(x)$ on the interval from $[0, 2\pi]$. We will use an equi-spaced sampling mesh, with the squared exponential RBF.

md"""
### Hello world!

Let's use our machinery now for a toy task of 1D interpolation of $\cos(x)$ on the interval from $[0, 2\pi]$. We will use an equi-spaced sampling mesh, with the squared exponential RBF.
"""
141 μs
let

# Set up a sampling mesh
x = zeros(1, 8)
for j = 0:7
x[1,j+1] = 2*π*j/7
end

# Fit the cosine function
c = solve(rbfc, (x) -> cos(x[1]))

# Plot on a finer mesh
xx = range(0, 2*π, length=100)
sxx = [feval(rbfc, c, [x]) for x in xx]
plot(xx, cos.(xx), label="cos(x)")
plot!(xx, sxx, linestyle=:dash, label="s(x)")
scatter!(x, cos.(x), markersize=5, label=nothing)

end
1.0 s

Task 1: Fast mean temperatures

So far, we have discussed approximating one function at many points. Sometimes, we would like to quickly approximate something about many functions. For example, in 1D, suppose we have streaming temperature measurements $\theta_j(t)$ taken at $n$ fixed points $x_j \in [0, 1]$, from which we could estimate an instantaneous temperature field

$$\hat{\theta}(x, t) = \sum_{j=1}^n \phi(\|x-x_j\|) c_j$$

by the interpolation condition $\hat{\theta}(x_i, t) = \theta_i(t)$. Assuming we have a factorization for $\Phi_{XX}$, we can compute the estimated mean temperature

$$\tilde{\theta}(t) = \int_0^1 \hat{\theta}(x, t) \, dx = \sum_{j=1}^n \left( \int_0^1 \phi(\|x-x_j\|) \, dx \right) c_j$$

for any given $t$ by solving for the $c$ vector ($O(n^2)$ time) and taking the appropriate linear combination of integrals ($O(n)$ time). Here it is worth noting that

$$\int_0^1 \phi(\|x-x_j\|) \, dx = \int_0^{x_j} \phi(s) \, ds + \int_{0}^{1-x_j} \phi(s) \, ds$$

and for $\phi(r) = \exp(-r^2/l^2)$, we have

$$\int_0^x \phi(s) \, ds = \frac{l \sqrt{\pi}}{2} \operatorname{erf}(x/l),$$

where the error function $\operatorname{erf}(x)$ is implemented as erf(x) in the Julia SpecialFunction library. We implement this scheme in the following code, which runs in $O(n^3) + O(n^2 m)$ time with $n$ sensors and $m$ measurements.

md"""
## Task 1: Fast mean temperatures

So far, we have discussed approximating one function at many points. Sometimes, we would like to quickly approximate something about *many* functions. For example, in 1D, suppose we have streaming temperature measurements $\theta_j(t)$ taken at $n$ fixed points $x_j \in [0, 1]$, from which we could estimate an instantaneous temperature field

$$\hat{\theta}(x, t) = \sum_{j=1}^n \phi(\|x-x_j\|) c_j$$

by the interpolation condition $\hat{\theta}(x_i, t) = \theta_i(t)$. Assuming we have a factorization for $\Phi_{XX}$, we can compute the estimated mean temperature

$$\tilde{\theta}(t) = \int_0^1 \hat{\theta}(x, t) \, dx = \sum_{j=1}^n \left( \int_0^1 \phi(\|x-x_j\|) \, dx \right) c_j$$

for any given $t$ by solving for the $c$ vector ($O(n^2)$ time) and taking the appropriate linear combination of integrals ($O(n)$ time). Here it is worth noting that

$$\int_0^1 \phi(\|x-x_j\|) \, dx = \int_0^{x_j} \phi(s) \, ds + \int_{0}^{1-x_j} \phi(s) \, ds$$
354 μs
mean_temperatures (generic function with 1 method)
# Given a set of sensor locations x and a time series of temperature measurements
# Θ[i,j] = measurement of sensor i at time j, return the corresponding time series
# of mean temperature estimates.
#
function mean_temperatures(x, Θ)

# Set up the RBF and RBF matrix for interpolation
l = 1.0/length(x)
ϕ, = scale_rbf(ϕ_se, dϕ_se, l)
X = reshape(x, 1, prod(size(x)))
rbfc = RBFCenters(ϕ, , X)

# Computation for weights
wt(x) = l*sqrt(π)/2 * (erf(x/l) + erf((1.0-x)/l))
w = [wt(xi) for xi in x]

# Compute the mean temperature for each column
θmean = zeros(size(Θ)[2])
for j = 1:size(Θ)[2]
c = solve(rbfc, Θ[:,j])
θmean[j] = dot(w, c)
end

end
1.9 ms

An example run with about 2000 virtual sensors and 1000 samples takes a bit under 3 seconds to process this way on my laptop.

md"""
An example run with about 2000 virtual sensors and 1000 samples takes a bit under 3 seconds to process this way on my laptop.
"""
107 μs
let
nsensor = 2001
ntimes = 1000
x_sensor = zeros(1,nsensor)
x_sensor[:] = range(0, 1, length=nsensor)
Θ = [cos(x * 2π * j/ntimes) + 10.0*j/ntimes for x=x_sensor[:], j = 1:ntimes]

plot(θmean)
end
2.7 s

Rewrite this code to take $O(n^3) + O(mn)$ time by reassociating the expression $\tilde{\theta}(t) = w^T \Phi_{XX}^{-1} \theta_X(t)$. Compare the timing of the example given above to the timing of your routine; do the numbers seem reasonable? At the same time, do a comparison of the results to make sure that you do not get something different! You may want to use the testing and timing harness below.

md"""
Rewrite this code to take $O(n^3) + O(mn)$ time by reassociating the expression $\tilde{\theta}(t) = w^T \Phi_{XX}^{-1} \theta_X(t)$. Compare the timing of the example given above to the timing of your routine; do the numbers seem reasonable? At the same time, do a comparison of the results to make sure that you do not get something different! You may want to use the testing and timing harness below.
"""
125 μs
mean_temperatures2 (generic function with 1 method)
# Given a set of sensor locations x and a time series of temperature measurements
# Θ[i,j] = measurement of sensor i at time j, return the corresponding time series
# of mean temperature estimates.
#
function mean_temperatures2(x, Θ)

# Set up the RBF and RBF matrix for interpolation
l = 1.0/length(x)
ϕ, = scale_rbf(ϕ_se, dϕ_se, l)
X = reshape(x, 1, prod(size(x)))
rbfc = RBFCenters(ϕ, , X)

# TODO: Write your fast implementation here (replace the line below)
end
533 μs
test_sensor_demo (generic function with 1 method)
function test_sensor_demo(nsensor, ntimes)
x_sensor = zeros(1,nsensor)
x_sensor[:] = range(0, 1, length=nsensor)
Θ = [cos(x * 2π * j/ntimes) + 10.0*j/ntimes for x=x_sensor[:], j = 1:ntimes]

@time θmean1 = mean_temperatures(x_sensor, Θ)
@time θmean2 = mean_temperatures2(x_sensor, Θ)
norm(θmean1-θmean2, Inf)
end
2.3 ms
0.0
test_sensor_demo(2001, 1000)
❔
  2.668168 seconds (12.02 M allocations: 794.908 MiB, 2.46% gc time)
  3.304412 seconds (24.03 M allocations: 1.522 GiB, 9.26% gc time)
6.0 s

Task 2: Adding data points

The add_centers_ref! function adds centers to an existing RBFCenters object, then recomputes the factorization from scratch. If we add $k$ centers to an existing object with $n$ centers, this costs $O((n+k)^3)$. However, we can compute the same thing in $O(n^2 k + k^3)$ time by extending the existing Cholesky factorization using what we know about block factorization methods.

It may be helpful to use ldiv!(A, B), which overwrites the storage in B with A\B (where A can be a factorization object or a matrix with a simple solver, like an upper triangular or diagonal matrix).

md"""
## Task 2: Adding data points

The `add_centers_ref!` function adds centers to an existing `RBFCenters` object, then recomputes the factorization from scratch. If we add $k$ centers to an existing object with $n$ centers, this costs $O((n+k)^3)$. However, we can compute the same thing in $O(n^2 k + k^3)$ time by extending the existing Cholesky factorization using what we know about block factorization methods.

It may be helpful to use `ldiv!(A, B)`, which overwrites the storage in `B` with `A\B` (where `A` can be a factorization object or a matrix with a simple solver, like an upper triangular or diagonal matrix).
"""
205 μs
add_centers! (generic function with 1 method)
function add_centers!(rbfc :: RBFCenters, Xnew :: AbstractMatrix)
# TODO: Replace this with code that extends the factorization instead of updating.
# add_centers_ref!(rbfc, Xnew)
# We provide a partial solution below.


# Partial code for you to fill in:
# # Extend storage if needed
# n0 = rbfc.n
# n1 = n0 + size(Xnew)[2]
# grow!(rbfc, n1)
#
# # Copy in new points
# rbfc.Xstore[:,n0+1:n1] = Xnew
#
# # TODO: Extend factorization
# # End TODO
#
# # Update the size
# rbfc.n = n1
#
# # Return the factorization object
# rbfc.F
end
273 μs
test_add_centers (generic function with 1 method)
function test_add_centers(nsensor, nstart, nbatch)
l = 1.0/nsensor
ϕ, = scale_rbf(ϕ_se, dϕ_se, l)
x_sensor = zeros(1,nsensor)
x_sensor[:] = range(0, 1, length=nsensor)
t1 = @time begin
rbfc1 = RBFCenters(ϕ, , x_sensor[:,1:nstart])
end
end
t2 = @time begin
rbfc2 = RBFCenters(ϕ, , x_sensor[:,1:nstart])
end
end
relerr = norm(rbfc1.R-rbfc2.R)/norm(rbfc1.R)
md"Relative error: $relerr"
end
3.0 ms

Relative error: 0.0

test_add_centers(2001, 1500, 50)
❔
  3.960648 seconds (113.90 M allocations: 7.072 GiB, 12.60% gc time)
  4.148284 seconds (113.90 M allocations: 7.072 GiB, 16.45% gc time)
8.2 s

Task 3: Missing data

Now suppose that we are again dealing with streaming sensor data, but every so often there are entries missing. If the $k$th measurement is missing, the interpolation conditions for the remaining points can be written in terms of a smaller system of equations where we remove row and column $k$ from the original problem; or as

$$\begin{align*} f(x_i) &= \sum_{j=1}^n \phi(\|x_i-x_j\|) \hat{c}_j + r_i, & r_i = 0 \mbox{ for } i \neq k \\ \hat{c}_k &= 0. \end{align*}$$

Equivalently, we have

$$\begin{bmatrix} \Phi_{XX} & e_k \\ e_k^T & 0 \end{bmatrix} \begin{bmatrix} \hat{c} \\ r_k \end{bmatrix} = \begin{bmatrix} \tilde{f}_X \\ 0 \end{bmatrix}$$

where $\tilde{f}_X$ agrees with $f_X$ except in the $k$th element. If we set $(\tilde{f}_X)_k = 0$, then $-r_k$ is the value at $x_k$ of the interpolant through the remaining data – potentially a pretty good guess for the true value.

Using block elimination on the system above, complete the following routine to fill in a single missing value in an indicated location. Your code should take $O(n^2)$ time (it would take $O(n^3)$ to refactor from scratch).

md"""
## Task 3: Missing data

Now suppose that we are again dealing with streaming sensor data, but every so often there are entries missing. If the $k$th measurement is missing, the interpolation conditions for the remaining points can be written in terms of a smaller system of equations where we remove row and column $k$ from the original problem; *or* as

$$\begin{align*}
f(x_i) &= \sum_{j=1}^n \phi(\|x_i-x_j\|) \hat{c}_j + r_i, & r_i = 0 \mbox{ for } i \neq k \\
\hat{c}_k &= 0.
\end{align*}$$

Equivalently, we have

$$\begin{bmatrix}
\Phi_{XX} & e_k \\
e_k^T & 0
\end{bmatrix}
\begin{bmatrix} \hat{c} \\ r_k \end{bmatrix}
299 μs
fill_missing! (generic function with 1 method)
# Assuming that entry k of the measurement vector fX is missing,
# fill it with an approximate value from interpolating the remaining points.
#
function fill_missing!(rbfc :: RBFCenters, fX :: AbstractVector, k)
# TODO: Complete this routine
fX[k]
end
389 μs
  • fill_missing! vs reference rel diff: 1.0

  • Abs error in approximation: 0.22252093395631445

let

# Set up a sampling mesh
x = zeros(1, 15)
x[:] = range(0.0, 2π, 15)
fX = cos.(x[:])

# Fit the cosine function
c = solve(rbfc, fX)

# Try filling in a missing point 4
fXbad = copy(fX)
fXbad[4] = 0.0
fx4_interp1 = fill_missing!(rbfc, fXbad, 4)

# Compare to the obvious-but-expensive algorithm
mask = ones(Bool, 15)
mask[4] = false
rbfc2 = RBFCenters(ϕ_se, dϕ_se, x[:, mask])
c2 = solve(rbfc2, fX[mask])
fx4_interp2 = feval(rbfc2, c2, x[:,4])

relerr1 = abs(fXbad[4] - fx4_interp2)/abs(fx4_interp2)
err2 = abs(fx4_interp1 - cos(x[1,4]))

md"""
- `fill_missing!` vs reference rel diff: $relerr1
- Abs error in approximation: $err2
3.1 ms

Task 4: Sparse approximation

So far, we have used dense matrix representations for everything. As you have seen, this is often basically fine for up to a couple thousand data points, provided that we are careful to re-use factorizations and organize our computations for efficiency. When we have much more data, though, the $O(n^3)$ cost of an initial factorization gets to be excessive.

There are several ways to use data sparsity of the RBF matrix for faster (approximate) solves. For this project, though, we will focus on a simple one: using ordinary sparsity of $\Phi_{XX}$ for compactly supported kernels like the Wendland, or approximating that sparsity for rapidly-decaying kernels like the squared exponential.

We are going to solve 2D function approximation problems here. To do this, we will want a couple of additional routines that we will provide.

md"""
## Task 4: Sparse approximation

So far, we have used dense matrix representations for everything. As you have seen, this is often basically fine for up to a couple thousand data points, provided that we are careful to re-use factorizations and organize our computations for efficiency. When we have much more data, though, the $O(n^3)$ cost of an initial factorization gets to be excessive.

There are several ways to use *data sparsity* of the RBF matrix for faster (approximate) solves. For this project, though, we will focus on a simple one: using ordinary sparsity of $\Phi_{XX}$ for compactly supported kernels like the Wendland, or approximating that sparsity for rapidly-decaying kernels like the squared exponential.

We are going to solve 2D function approximation problems here. To do this, we will want a couple of additional routines that we will provide.
"""
240 μs

Code preliminaries

md"""
### Code preliminaries
"""
105 μs

Sampling the space

When we want to sample something in more than one spatial dimension and aren't just using a regular mesh, it is tempting to choose random samples. But taking independent uniform draws is not an especially effective way of covering a space – random numbers tend to clump up. For this reason, low discrepancy sequences are often a better basis for sampling than (pseudo)random draws. There are many such generators; we use a relatively simple one based on an additive recurrence with a multiplier based on the "generalized golden ratio".

md"""
#### Sampling the space

When we want to sample something in more than one spatial dimension and aren't just using a regular mesh, it is tempting to choose random samples. But taking independent uniform draws is not an especially effective way of covering a space – random numbers tend to clump up. For this reason, [low discrepancy sequences](https://en.wikipedia.org/wiki/Low-discrepancy_sequence) are often a better basis for sampling than (pseudo)random draws. There are many such generators; we use a relatively simple one based on an additive recurrence with a multiplier based on the ["generalized golden ratio"](http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/).
"""
220 μs
kronecker_quasirand (generic function with 2 methods)
function kronecker_quasirand(d, N, start=0)
# Compute the recommended constants ("generalized golden ratio")
ϕ = 1.0+1.0/d
for k = 1:10
= ϕ^(d+1)-ϕ-1
dgϕ= (d+1)*ϕ^d-1
ϕ -= /dgϕ
end
αs = [mod(1.0/ϕ^j, 1.0) for j=1:d]
# Compute the quasi-random sequence
Z = zeros(d, N)
for j = 1:N
for i=1:d
Z[i,j] = mod(0.5 + (start+j)*αs[i], 1.0)
end
end
Z
end
2.2 ms

Finding neighbors

If we work with a radial basis function $\phi$ that is zero past some distance $r_{\mathrm{cutoff}}$, then $(\Phi_{XX})_{ij} \neq 0$ only if $\|x_i-x_j\| \leq r_{\mathrm{cutoff}}$. We can, of course, find all such pairs in $O(n^2)$ time by checking all possible pairs. But things can go faster with a simple data structure. In 2D, think of putting every point $x_i$ into a square "bucket" with side length $r_{\mathrm{cutoff}}$. Then all points within $r_{\mathrm{cutoff}}$ of $x_i$ must be in one of nine buckets: the one containing $x_i$, or any of its neighbors. If we give the buckets identifiers and sort them in "column major" order (and sort their contents accordingly), this can significantly reduce the number of potential neighbors to any point that we might have to check.

md"""
#### Finding neighbors

If we work with a radial basis function $\phi$ that is zero past some distance $r_{\mathrm{cutoff}}$, then $(\Phi_{XX})_{ij} \neq 0$ only if $\|x_i-x_j\| \leq r_{\mathrm{cutoff}}$. We can, of course, find all such pairs in $O(n^2)$ time by checking all possible pairs. But things can go faster with a simple data structure. In 2D, think of putting every point $x_i$ into a square "bucket" with side length $r_{\mathrm{cutoff}}$. Then all points within $r_{\mathrm{cutoff}}$ of $x_i$ must be in one of nine buckets: the one containing $x_i$, or any of its neighbors. If we give the buckets identifiers and sort them in "column major" order (and sort their contents accordingly), this can significantly reduce the number of potential neighbors to any point that we might have to check.
"""
188 μs
find_neighbors2d (generic function with 1 method)
# Given a 2-by-n matrix xy of coordinates and a cutoff threshold,
# quickly find all pairs of points within the cutoff from each other.
# Returns
# - xy_new -- a reordered version of the input matrix in "bucket order"
# - p -- the reordering to bucket order (xy_new = xy[:,p])
# - neighbors(q) -> Iq, rq -- a function that takes a query point q and finds
# indices Iq of points within the cutoff of q and rq of the distances from q
# to those points. The index is with respect to the bucket order.
# - I, J, r -- three parallel arrays such that I[k], J[k], r[k] represents the
# indices of two points within the cutoff of each other and the distance
# between them.
#
function find_neighbors2d(xy, rcutoff)

# Logic for computing integer bucket indices
n = size(xy)[2]
xmin = minimum(xy[1,:])
xmax = maximum(xy[1,:])
ymin = minimum(xy[2,:])
ymax = maximum(xy[2,:])
idx1d(x) = floor(Int, x/rcutoff) + 1
nx = idx1d(xmax-xmin)
idx(x,y) = idx1d(x-xmin) + (idx1d(y-ymin)-1)*nx

# Assign points to buckets, sort by bucket index
5.7 ms

Sparse RBF centers

Using the data structure in find_neighbors2d, we can create a 2D sparse version of the RBFCenters data structure. We will not provide functionality to add new centers after initialization, so we will not bother with making it mutable.

md"""
#### Sparse RBF centers

Using the data structure in `find_neighbors2d`, we can create a 2D sparse version of the `RBFCenters` data structure. We will not provide functionality to add new centers after initialization, so we will not bother with making it mutable.
"""
142 μs
RBFCenters2Ds
begin

# Structure representing a collection of RBF centers and an associated matrix rep
struct RBFCenters2Ds
ϕ :: Function # RBF function
:: Function # RBF derivative function
X :: Matrix # Coordinates
p :: Vector # Permutation from original to new index
ΦXX # Sparse RBF matrix
F # Sparse Cholesky factorization of ΦXX
neighbors :: Function # Function to query neighbors
end

function RBFCenters2Ds(ϕ, , X, rcutoff)
Xnew, p, neighbors, I, J, r = find_neighbors2d(X, rcutoff)
ΦXX = sparse(I, J, ϕ.(r))
F = cholesky(ΦXX)
end

end
2.2 ms

As before, we want the ability to solve for a right hand side, evaluate a function, and evaluate a gradient.

md"""
As before, we want the ability to solve for a right hand side, evaluate a function, and evaluate a gradient.
"""
105 μs
dfeval (generic function with 2 methods)
begin
# Compute a coefficient vector for a given rhs
solve(rbfc :: RBFCenters2Ds, fX :: AbstractVector) = rbfc.F \ fX
solve(rbfc :: RBFCenters2Ds, f :: Function) = rbfc.F \ [f(x) for x in eachcol(rbfc.X)]

# Evaluate an interpolant at a given point
function feval(rbfc :: RBFCenters2Ds, c :: Vector, x)
ϕ = rbfc
sum( ϕ(rj)*c[j] for (j, rj) in zip(rbfc.neighbors(x)...) )
end

# Evaluate a gradient at a given point
function dfeval(rbfc :: RBFCenters2Ds, c :: Vector, x)
= rbfc.dϕ
∇ϕ(r) = if norm(r) == 0 0*r else (norm(r))*r/norm(r) end
sum( ∇ϕ(x-rbfc.X[j])*c[j] for j in rbfc.neighbors(x)[1] )
end
end
2.6 ms

Hello world

As before, we do a "hello world" test to illustrate the behavior of the solver. In this case, we approximate a simple function from 10K sample points in 2D.

md"""
#### Hello world

As before, we do a "hello world" test to illustrate the behavior of the solver. In this case, we approximate a simple function from 10K sample points in 2D.
"""
132 μs
test_hello_world_2d (generic function with 2 methods)
function test_hello_world_2d(npoints=10000)
l = 5.0/sqrt(npoints)
ϕ, = scale_rbf(ϕ_w21, dϕ_w21, l)
ftest(x) = cos(x[1])*exp(x[2])

@time rbfc = RBFCenters2Ds(ϕ, , xy, l)
@time c = solve(rbfc, ftest)
@time ftest([0.5; 0.5]), feval(rbfc, c, [0.5; 0.5])
end
9.3 ms
❔
  0.510156 seconds (5.48 M allocations: 430.675 MiB, 11.45% gc time, 34.19% compilation time)
  0.150889 seconds (278.17 k allocations: 14.567 MiB, 54.32% gc time, 98.02% compilation time)
  0.073137 seconds (421.08 k allocations: 22.505 MiB, 99.48% compilation time)
749 ms

The questions

For the case of the Wendland kernel, the cutoff is exact, and $\Phi_{XX}$ is truly sparse. However, while the squared exponential kernel is never exactly zero, it does get very small. Therefore, we might try to do the same thing here.

md"""
### The questions

For the case of the Wendland kernel, the cutoff is exact, and $\Phi_{XX}$ is truly sparse. However, while the squared exponential kernel is never exactly zero, it does get very small. Therefore, we might try to do the same thing here.
"""
138 μs
hello_world_2d_se (generic function with 3 methods)
function hello_world_2d_se(l=0.02, σ=4.0)
l = 0.02 # Length scale for kernel
σ = 4.0 # Number of length scales to extend out
ϕ, = scale_rbf(ϕ_se, dϕ_se, l)
xy = kronecker_quasirand(2, 10000)
ftest(x) = cos(x[1])*exp(x[2])

@time rbfc = RBFCenters2Ds(ϕ, , xy, σ*l)
@time fX = [ftest(x) for x in eachcol(rbfc.X)]
@time c = solve(rbfc, fX)

end
5.2 ms
let
rbfc, c, fX, ftest = hello_world_2d_se(0.02, 4.0)
smid = ftest([0.5; 0.5])
fmid = feval(rbfc, c, [0.5; 0.5])
end
❔
  0.757898 seconds (12.91 M allocations: 976.164 MiB, 10.34% gc time)
  0.017487 seconds (70.52 k allocations: 3.536 MiB, 98.15% compilation time)
  0.004284 seconds (19 allocations: 320.453 KiB)
785 ms

Even a little playing around with this code should illustrate that it's delicate. Notice that we have changed the length scale to be even shorter. For example, try running with $l = 0.05$; what happens?

md"""
Even a little playing around with this code should illustrate that it's delicate.
Notice that we have changed the length scale to be even shorter. For example, try running with $l = 0.05$; what happens?
"""
116 μs

Answer: TODO

md"""
*Answer*: TODO
"""
232 μs

The code doesn't crash with $l = 0.02$, and our sanity check computation on a simple test function looks pretty good. But we would like to better understand the error in this approximation. Let $\hat{s}(x)$ be the sparsified approximator that we have computed above, i.e.

$$\hat{s}(x) = \sum_{\|x-x_i\| \leq r_{\mathrm{cutoff}}} \phi(\|x-x_i\|) c_{0,i}$$

Let us start by showing that

$$|\hat{s}(x)-s(x)| \leq \|c_0\|_1 \exp(-\sigma^2) + \|c-c_0\|_1.$$

Hint/sketch: Write $\hat{\Phi}_{xX}$ as the vector of evaluations with cutoff, so that $\hat{s}(x) = \hat{\Phi}_{xX} c_0$ and $s(x) = \Phi_{xX} c$. Add and subtract $\Phi_{xX} c_0$; apply the triangle inequality; use the fact that $|u \cdot v| \leq \|u\|_1 \|v\|_\infty$ for any $u$ and $v$; and derive simple bounds on $\|\Phi_{xX}\|_\infty$ and $\|\hat{\Phi}_{xX}-\Phi_{xX}\|_\infty$.

md"""
The code doesn't crash with $l = 0.02$, and our sanity check computation on a simple test function looks pretty good. But we would like to better understand the error in this approximation. Let $\hat{s}(x)$ be the sparsified approximator that we have computed above, i.e.

$$\hat{s}(x) = \sum_{\|x-x_i\| \leq r_{\mathrm{cutoff}}} \phi(\|x-x_i\|) c_{0,i}$$

Let us start by showing that

$$|\hat{s}(x)-s(x)| \leq \|c_0\|_1 \exp(-\sigma^2) + \|c-c_0\|_1.$$

*Hint/sketch*: Write $\hat{\Phi}_{xX}$ as the vector of evaluations with cutoff, so that $\hat{s}(x) = \hat{\Phi}_{xX} c_0$ and $s(x) = \Phi_{xX} c$. Add and subtract $\Phi_{xX} c_0$; apply the triangle inequality; use the fact that $|u \cdot v| \leq \|u\|_1 \|v\|_\infty$ for any $u$ and $v$; and derive simple bounds on $\|\Phi_{xX}\|_\infty$ and $\|\hat{\Phi}_{xX}-\Phi_{xX}\|_\infty$.
"""
270 μs

Answer: TODO

md"""
*Answer*: TODO
"""
280 μs

The bound is pessimistic, but it gives a sense of what can go wrong: if there is a large error in our coefficients, or if the coefficient norm times the magnitude of the neglected RBF evaluations is big, then we may have large differences between $s$ and $\hat{s}$.

The error in the coefficient vector $\|c-c_0\|_1$ with our initial parameters can be quite large, enough to make our error bounds terrible (even if the error might not be as bad). But even if the factorization of $\hat{\Phi}_{XX}$ is not good enough to get a very accurate $c$ alone, it is good enough to make progress. We therefore try an iterative refinement loop, combining our approximate solver based on $\hat{\Phi}_XX$ and a matrix vector product with a (still sparse) $\tilde{\Phi}_XX$ that provides a much more approximation to $\Phi_{xX}$. Your goal: carry out the iterative refinement and give evidence of its convergence.

md"""
The bound is pessimistic, but it gives a sense of what can go wrong: if there is a large error in our coefficients, or if the coefficient norm times the magnitude of the neglected RBF evaluations is big, then we may have large differences between $s$ and $\hat{s}$.

The error in the coefficient vector $\|c-c_0\|_1$ with our initial parameters can be quite large, enough to make our error bounds terrible (even if the error might not be as bad). But even if the factorization of $\hat{\Phi}_{XX}$ is not good enough to get a very accurate $c$ alone, it is good enough to make progress. We therefore try an *iterative refinement loop*, combining our approximate solver based on $\hat{\Phi}_XX$ and a matrix vector product with a (still sparse) $\tilde{\Phi}_XX$ that provides a much more approximation to $\Phi_{xX}$. Your goal: carry out the iterative refinement and give evidence of its convergence.
"""
215 μs
hello_world_2d_se_itref (generic function with 4 methods)
function hello_world_2d_se_itref(l=0.02, σ=4.0, σ1=6.0)

# Set up problem for which we will do a factorization
rbfc, c0, fX, ftest = hello_world_2d_se(l, σ)

# Form matvec for reference ΦXX (with larger cutoff σ1)
Xnew, p, neighbors, I, J, r = find_neighbors2d(rbfc.X, σ1*l)
ΦXX = sparse(I, J, rbfc.(r))
function mulΦXX(y, result)
result[p] = ΦXX*y[p]
end
mulΦXX(y) = mulΦXX(y, zeros(size(ΦXX)[1]))

c = c0
# TODO: Iterative refinement loop (take maybe 10-12 steps)

c
end
2.6 ms

We can take a rough guess at the magnitude of the error in $c$ by the magnitude of the corrections taken during the iterative refinement loop. As for the first term in our error bound, we can compute that easily in Julia.

md"""
We can take a rough guess at the magnitude of the error in $c$ by the magnitude of the corrections taken during the iterative refinement loop. As for the first term in our error bound, we can compute that easily in Julia.
"""
114 μs
hello_world2d_iterf_err_term (generic function with 4 methods)
function hello_world2d_iterf_err_term(l=0.02, σ=4.0, σ1=6.0)
norm(hello_world_2d_se_itref(l, σ, σ1), 1)*exp(-σ^2)
end
656 μs
0.0022668523126674854
❔
  0.719999 seconds (12.91 M allocations: 1003.542 MiB, 8.45% gc time)
  0.000074 seconds (4 allocations: 78.234 KiB)
  0.004387 seconds (19 allocations: 320.453 KiB)
1.8 s
2.7975179979973083e-7
❔
  0.754758 seconds (12.91 M allocations: 1003.542 MiB, 9.67% gc time)
  0.000078 seconds (4 allocations: 78.234 KiB)
  0.005419 seconds (19 allocations: 320.453 KiB)
2.1 s

Task 5: Fun with Fekete

The Fekete points for RBF interpolation on some domain $\Omega$ are the points that maximize $\det \Phi_{XX}$ – though it will be convenient for us to instead consider $\log \det \Phi_{XX}$. In order to maximize such a quantity, we would like to be able to compute derivatives; that is, we seek to compute the matrix of $G$ with entries

$$G_{ij} = \frac{\partial (\log \det \Phi_{XX})}{\partial x_{ij}}$$

where $x_{ij}$ represents component $i$ of the $j$th center. Some matrix calculus that we will not go into now lets us write this as

$$G_{ij} = \operatorname{tr}\left( \Phi_{XX}^{-1} \, \frac{\partial \Phi_{XX}}{\partial x_{ij}} \right),$$

and we can write the second matrix as

$$\frac{\partial \Phi_{XX}}{\partial x_{ij}} = e_j (v^{ij})^T + v^{ij} e_j^T \mbox{ where } v^{ij}_k = \begin{cases} \phi'(\|x_j-x_k\|) \frac{x_{ij}-x_{ik}}{\|x_j-x_k\|}, & k \neq j \\ 0, & k = j. \end{cases}$$

A standard property of traces (the cyclic property) tells us that in general for any matrix $A$ and vectors $u$ and $w$ such that the dimensions make sense,

$$\operatorname{tr}(Auw^T) = w^T A u$$

Therefore, using the $v$ vectors introduced above, we can write

$$G_{ij} = 2 e_j^T \Phi_{XX}^{-1} v^{ij} = 2 (v^{ij})^T \Phi_{XX}^{-1} e_j$$

For this problem, you should complete the following routines to compute the log determinant of $\Phi_{XX}$ and the derivative matrix $G$ in the indicated time bounds. A tester is provided below.

md"""
## Task 5: Fun with Fekete

The *Fekete points* for RBF interpolation on some domain $\Omega$ are the points that maximize $\det \Phi_{XX}$ -- though it will be convenient for us to instead consider $\log \det \Phi_{XX}$. In order to maximize such a quantity, we would like to be able to compute derivatives; that is, we seek to compute the matrix of $G$ with entries

$$G_{ij} = \frac{\partial (\log \det \Phi_{XX})}{\partial x_{ij}}$$

where $x_{ij}$ represents component $i$ of the $j$th center. Some matrix calculus that we will not go into now lets us write this as

$$G_{ij} = \operatorname{tr}\left( \Phi_{XX}^{-1} \, \frac{\partial \Phi_{XX}}{\partial x_{ij}} \right),$$

and we can write the second matrix as

$$\frac{\partial \Phi_{XX}}{\partial x_{ij}} =
e_j (v^{ij})^T + v^{ij} e_j^T \mbox{ where }
v^{ij}_k =
\begin{cases}
379 μs
logdet_ΦXX (generic function with 1 method)
# TODO: Complete this function (should be O(n) time)
function logdet_ΦXX(rbfc :: RBFCenters)
1.0
end
270 μs
dlogdet_ΦXX (generic function with 1 method)
# TODO: Complete this function (should be O(n^3 + n^2 d) time)
function dlogdet_ΦXX(rbfc :: RBFCenters)
G = zeros(rbfc.d, rbfc.n)
G
end
311 μs
test_logdet (generic function with 1 method)
# Test the analytical formula for a directional derivative versus finite differences
function test_logdet()

X = rand(2,10)
δX = rand(2,10)
h = 1e-5

# Compute using the analytical formula
δld_an = dot(G, δX)

# Compute using finite differences
δld_fd = (logdet_p-logdet_m)/2/h

end
843 μs
NaN
94.3 μs