HW 3 for CS 4220
You may (and probably should) talk about problems with the each other, with the TAs, and with me, providing attribution for any good ideas you might get. Your final write-up should be your own.
1. Function fitting
Let $x_1, \ldots, x_m$ be sample points in $[-1, 1]$ and $y_1, \ldots, y_m$ are associated real-valued responses. Consider the least squares problem over $p \in \mathcal{P}_d$
$$\mbox{minimize } \sum_{i=1}^m (p(x_i)-y_i)^2$$
where $d+1 \geq m$. We consider solving this problem with two different bases for $\mathcal{P}_d$, writing
$$p(x) = \sum_{j=0}^d b_j x^j$$
or
$$p(x) = \sum_{j=0}^d c_j T_j(x)$$
where the functions $T_j(x)$ are the Chebyshev polynomials. This gives rise to least squares problems of minimizing $\|Ab-y\|^2$ or $\|Bc-y\|^2$, where we can compute via the following functions:
form_ls_power (generic function with 1 method)
form_ls_cheb (generic function with 1 method)
It is also useful to be able to evaluate both polynomials at a point or points. With careful use of broadcasting operations, the same Julia code can cover both cases.
eval_poly_power (generic function with 1 method)
eval_poly_cheb (generic function with 1 method)
As an example, consider the case where $y_i = 1.0 + \eta_i$, where $\eta_i$ are independent noise terms with mean zero and standard deviation of $\sigma$ (which we will set to $10^{-4}$). Ideally, we should have $p$ evaluate to the same thing with both fitting processes (they represent the same polynomial, up to roundoff effects), and that thing should be close to 1.
Note that if $p(x) = \sum_{j=0}^d w_j c$, $w_j = T_j(x)$, then for our example we expect $p(x)-1 = w^T B^\dagger \eta$ to be a normal random variable with mean zero and standard deviation $\|w^T B^\dagger\| \sigma$. We can write this in Julia as w' / B
.
-5.74567e-5
0.000547991
-5.74567e-5
0.000547991
0.0
5.10703e-15
9.60781e-5
0.00178927
Questions
Change $m = 6, d = 4$ to $m = 60, d = 40$. What do you observe? Try re-running a few times to vary the random parameters.
What are the condition numbers of $A$ and $B$ for the case of $m = 60$ and $d = 40$? You may use the Julia
cond
function for computing this. How does this explain what you see about the differences between using the two bases?In our code, we used
F.R
rather thanF
in the variance computation. Explain why this ends up being equivalent, at least in terms of the values.
2. Normalized Legendre
The normalized Legendre polynomials $q_0, q_1, \ldots$ are a family of orthonormal polynomials of increasing degree $q_d$ has degree $d$) with respect to the $\mathcal{L}^2$ inner product on $[-1, 1]$, i.e.
$$\int_{-1}^1 q_i(x) q_j(x) \, dx = \delta_{ij} = \begin{cases} 1, & i = j \\ 0, & \mbox{ otherwise.} \end{cases}$$
If the matrix $M \in \mathbb{R}^{d+1}$ has entries
$$M_{ij} = \int_{-1}^1 x^i x^j \, dx = \begin{cases} 2/(i+j+1), & i+j \mbox{ even} \\ 0, & i+j \mbox{ odd} \end{cases}$$
then the Cholesky factorization $R^T R = M$ gives the coefficients for $q_0, \ldots, q_d$ in the power basis: if $U = R^{-1}$, then
$$q_j(x) = \sum_{i=0}^j x^i u_{ij}.$$
Explain why this construction works – that is, why is it that these polynomials are orthonormal in $\mathcal{L}^2$ on $[-1, 1]$
Hint: This is easiest if we write $Q(x) = P(x) R^{-1}$, where
$$\begin{align*} Q(x) &= \begin{bmatrix} q_0(x) & \ldots & q_d(x) \end{bmatrix} \\ P(x) &= \begin{bmatrix} x^0 & \ldots & x^d \end{bmatrix} \end{align*}$$
3. Chopping Cholesky
Suppose $A$ is a positive definite matrix with Cholesky factorization $A = R^T R$. Then the matrix $A_{2:n,2:n}$ in which we remove the first row and column can be written as $A_{2:n,2:n} = \tilde{U}^T \tilde{U}$ where $\tilde{U} = R_{:,2:n}$ is the $n$-by-$(n-1)$ matrix consisting of columns $2$ through $n$ of $R$. The nonzero structure of $\tilde{U}$ is
$$\begin{bmatrix} \times & \times & \times & \ldots & \times \\ \times & \times & \times & \ldots & \times \\ 0 & \times & \times & \ldots & \times \\ 0 & 0 & \times & \ldots & \times \\ 0 & 0 & 0 & \ddots & \vdots \\ 0 & 0 & 0 & \ldots & \times \end{bmatrix}$$
If we apply an appropriate Householder reflector to the first two rows of $\tilde{U}$
$$\begin{bmatrix} * & * & * & \ldots & * \\ 0 & * & * & \ldots & * \\ 0 & \times & \times & \ldots & \times \\ 0 & 0 & \times & \ldots & \times \\ 0 & 0 & 0 & \ddots & \vdots \\ 0 & 0 & 0 & \ldots & \times \end{bmatrix}$$
Continuing in this fashion, we can apply a total of $n$ Householder reflectors – each affecting two rows, to reduce $\tilde{U}$ to upper triangular form $U$, i.e. we (implicitly) compute the QR factorization $\tilde{U} = QU$ in $O(n^2)$ time.
Complete the following implementation of this method below.
house (generic function with 1 method)
0.750853
-6.93889e-18
-5.55112e-17
-5.55112e-17
-5.55112e-17
chop_chol (generic function with 1 method)
2.3254e-16
0.0