Notes for 2023-02-22
Gram-Schmidt
The Gram-Schmidt procedure is usually the first method people learn to convert some existing basis (columns of $A$) into an orthonormal basis (columns of $Q$). For each column of $A$, the procedure subtracts off any components in the direction of the previous columns, and then scales the remainder to be unit length. In Julia, Gram-Schmidt looks something like this:
orth_cgs0 (generic function with 1 method)
test_orth_cgs0 (generic function with 1 method)
4.23047e-16
2.06746e-16
Where does $R$ appear in this algorithm? As with the lower triangle in the LU factorization, the $R$ factor in QR appears implicitly in the orthogonalization and normalization steps.
orth_cgs (generic function with 1 method)
test_orth_cgs (generic function with 1 method)
9.21993e-16
0.0
2.87117e-17
Testing with random matrices is a particularly gentle way to test our code, and doesn't reveal a lurking numerical instability in the CGS algorithm. The problem is that if $A$ is even somewhat ill conditioned (i.e. the basis vectors in $A$ are nearly co-linear) then the $Q$ computed by the classical Gram-Schmidt algorithm may stop being nearly orthogonal, even if the relationship $A = QR$ is well maintained.
test_orth_cgs_poor (generic function with 1 method)
5.56811e-5
0.0
0.0
This is somewhat ameliorated by the modified Gram-Schmidt algorithm, which computes the multipliers in $R$ in a slightly different way: subtracting off the projections of the residual onto each previously computed column of $k$ immediately rather than computing all the projections from the original vector. This is a little less prone to cancellation. We can see that numerical orthogonality with MGS is much better than with CGS, though there is still some loss.
orth_mgs (generic function with 1 method)
test_orth_mgs_poor (generic function with 1 method)
1.25638e-7
0.0
6.82742e-17
Householder transformations
A Householder transformation is a simple orthogonal transformation associated with reflection across a plane with some unit normal $v$:
$$H = I-2vv^T$$
We typically choose the transformation so that we can map a target vector (say $x$) to something parallel to the first coordinate axis (so $\pm \|x\| e_1$). In some cases, it is convenient to use a non-unit normal $w$ and introduce an associated scaling constant:
$$H = I - \tau ww^T$$
This lets us normalize $w$ in other ways (e.g. by putting a 1 in the first component).
householder (generic function with 1 method)
test_householder (generic function with 1 method)
5.41358e-16
2.56777e-16
Now we can think about QR in the same way we think about LU. The only difference is that we apply a sequence of Householder transformations (rather than Gauss transformations) to introduce the subdiagonal zeros. As with the LU factorization, we can also reuse the storage of A by recognizing that the number of nontrivial parameters in the vector $w$ at each step is the same as the number of zeros produced by that transformation. This gives us the following.
hqr! (generic function with 1 method)
If we need $Q$ or $Q^T$ explicitly, we can always form it from the compressed representation. We can also multiply by $Q$ and $Q^T$ implicitly.
applyQ! (generic function with 1 method)
applyQT! (generic function with 1 method)
applyQ (generic function with 1 method)
applyQT (generic function with 1 method)
To compute a dense representation of $Q$, we can always apply $Q$ to the columns of the identity.
formQ (generic function with 1 method)
This gives us all the ingredients to form a dense QR decomposition of $A$.
hqr (generic function with 1 method)
test_hqr (generic function with 1 method)
1.01711e-15
2.59415e-16
However, we don't need the dense $Q$ and $R$ factors to solve a least squares problem; we just need the ability to multiply by $Q^T$ and solve with $R$.
hqr_solve_ls (generic function with 1 method)
test_hqr_solve_ls (generic function with 1 method)
1.4063212636835668e-15