Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

Notes for 2023-02-22

Gram-Schmidt

The Gram-Schmidt procedure is usually the first method people learn to convert some existing basis (columns of $A$) into an orthonormal basis (columns of $Q$). For each column of $A$, the procedure subtracts off any components in the direction of the previous columns, and then scales the remainder to be unit length. In Julia, Gram-Schmidt looks something like this:

md"""
# Notes for 2023-02-22

## Gram-Schmidt

The *Gram-Schmidt* procedure is usually the first method people
learn to convert some existing basis (columns of $A$) into an
orthonormal basis (columns of $Q$). For each column of $A$, the procedure
subtracts off any components in the direction of the previous columns,
and then scales the remainder to be unit length. In Julia, Gram-Schmidt looks
something like this:
"""
14.2 ms
using LinearAlgebra
240 μs
orth_cgs0 (generic function with 1 method)
function orth_cgs0(A)
m,n = size(A)
Q = zeros(m,n)
for j = 1:n
v = A[:,j] # Take the jth original basis vector
v = v-Q[:,1:j-1]*(Q[:,1:j-1]'*v) # Orthogonalize vs q_1, ... q_j-1
v = v/norm(v) # Normalize what remains
Q[:,j] = v # Add result to Q basis
end
Q
end
898 μs
test_orth_cgs0 (generic function with 1 method)
function test_orth_cgs0()
A = rand(10,5)
Q = orth_cgs0(A)

# Check orthonormality of Q columns, and that Q gives the right projector
norm(Q'*Q-I), norm(A-Q*(Q'*A))/norm(A)
end
437 μs
217 μs

Where does $R$ appear in this algorithm? As with the lower triangle in the LU factorization, the $R$ factor in QR appears implicitly in the orthogonalization and normalization steps.

md"""
Where does $R$ appear in this algorithm? As with the lower triangle in the LU factorization, the $R$ factor in QR appears implicitly in the orthogonalization and normalization steps.
"""
122 μs
orth_cgs (generic function with 1 method)
function orth_cgs(A)
m,n = size(A)
Q = zeros(m,n)
R = zeros(n,n)
for j = 1:n
v = A[:,j] # Take the jth original basis vector
R[1:j-1,j] = Q[:,1:j-1]'*v # Project onto q_1, ..., q_j-1
v = v-Q[:,1:j-1]*R[1:j-1,j] # Orthogonalize vs q_1, ... q_j-1
R[j,j] = norm(v) # Compute normalization constant
v = v/R[j,j] # Normalize what remains
Q[:,j] = v # Add result to Q basis
end
Q, R
end
1.4 ms
test_orth_cgs (generic function with 1 method)
function test_orth_cgs()
A = rand(10,5)
Q, R = orth_cgs(A)

# Check orthonormality of Q columns,
# upper triangularity of R,
# and that A = QR
norm(Q'*Q-I), norm(tril(R,-1)), norm(A-Q*R)/norm(A)
end
569 μs
42.1 μs

Testing with random matrices is a particularly gentle way to test our code, and doesn't reveal a lurking numerical instability in the CGS algorithm. The problem is that if $A$ is even somewhat ill conditioned (i.e. the basis vectors in $A$ are nearly co-linear) then the $Q$ computed by the classical Gram-Schmidt algorithm may stop being nearly orthogonal, even if the relationship $A = QR$ is well maintained.

md"""
Testing with random matrices is a particularly gentle way to test our code, and doesn't reveal a lurking numerical instability in the CGS algorithm. The problem is that if $A$ is even somewhat ill conditioned (i.e. the basis vectors in $A$ are nearly co-linear) then the $Q$ computed by the classical Gram-Schmidt algorithm may stop being nearly orthogonal, even if the relationship $A = QR$ is well maintained.
"""
147 μs
test_orth_cgs_poor (generic function with 1 method)
function test_orth_cgs_poor()
A = rand(10,5)*Diagonal([1, 1e-2, 1e-4, 1e-6, 1e-8])*rand(5,5)
Q, R = orth_cgs(A)

# Check orthonormality of Q columns,
# upper triangularity of R,
# and that A = QR
norm(Q'*Q-I), norm(tril(R,-1)), norm(A-Q*R)/norm(A)
end
648 μs
5.5 ms

This is somewhat ameliorated by the modified Gram-Schmidt algorithm, which computes the multipliers in $R$ in a slightly different way: subtracting off the projections of the residual onto each previously computed column of $k$ immediately rather than computing all the projections from the original vector. This is a little less prone to cancellation. We can see that numerical orthogonality with MGS is much better than with CGS, though there is still some loss.

md"""
This is somewhat ameliorated by the *modified* Gram-Schmidt algorithm, which computes the multipliers in $R$ in a slightly different way: subtracting off the projections of the residual onto each previously computed column of $k$ immediately rather than computing all the projections from the original vector. This is a little less prone to cancellation. We can see that numerical orthogonality with MGS is much better than with CGS, though there is still some loss.
"""
206 μs
orth_mgs (generic function with 1 method)
function orth_mgs(A)
m,n = size(A)
Q = zeros(m,n)
R = zeros(n,n)
for j = 1:n
v = A[:,j] # Take the jth original basis vector
for k = 1:j-1
R[k,j] = Q[:,k]'*v # Project what remains onto q_k
v -= Q[:,k]*R[k,j] # Subtract off the q_k component
end
R[j,j] = norm(v) # Compute normalization constant
v = v/R[j,j] # Normalize what remains
Q[:,j] = v # Add result to Q basis
end
Q, R
end
1.3 ms
test_orth_mgs_poor (generic function with 1 method)
function test_orth_mgs_poor()
A = rand(10,5)*Diagonal([1, 1e-2, 1e-4, 1e-6, 1e-8])*rand(5,5)
Q, R = orth_mgs(A)

# Check orthonormality of Q columns,
# upper triangularity of R,
# and that A = QR
norm(Q'*Q-I), norm(tril(R,-1)), norm(A-Q*R)/norm(A)
end
633 μs
72.5 μs

Householder transformations

A Householder transformation is a simple orthogonal transformation associated with reflection across a plane with some unit normal $v$:

$$H = I-2vv^T$$

We typically choose the transformation so that we can map a target vector (say $x$) to something parallel to the first coordinate axis (so $\pm \|x\| e_1$). In some cases, it is convenient to use a non-unit normal $w$ and introduce an associated scaling constant:

$$H = I - \tau ww^T$$

This lets us normalize $w$ in other ways (e.g. by putting a 1 in the first component).

md"""
## Householder transformations

A *Householder transformation* is a simple orthogonal transformation associated with reflection across a plane with some unit normal $v$:

$$H = I-2vv^T$$

We typically choose the transformation so that we can map a target vector (say $x$) to something parallel to the first coordinate axis (so $\pm \|x\| e_1$). In some cases, it is convenient to use a *non-unit* normal $w$ and introduce an associated scaling constant:

$$H = I - \tau ww^T$$

This lets us normalize $w$ in other ways (e.g. by putting a 1 in the first component).
"""
329 μs
householder (generic function with 1 method)
function householder(x)
normx = norm(x)
s = -sign(x[1])
u1 = x[1]-s*normx
w = x/u1
w[1] = 1.0
τ = -s*u1/normx
τ, w
end
539 μs
test_householder (generic function with 1 method)
function test_householder()
x = rand(10)
τ, w = householder(x)

# Check orthogonality and the mapping of x
norm((I-τ*w*w')^2-I), norm((x-τ*w*(w'*x))[2:end])/norm(x)
end
697 μs
50.8 μs

Now we can think about QR in the same way we think about LU. The only difference is that we apply a sequence of Householder transformations (rather than Gauss transformations) to introduce the subdiagonal zeros. As with the LU factorization, we can also reuse the storage of A by recognizing that the number of nontrivial parameters in the vector $w$ at each step is the same as the number of zeros produced by that transformation. This gives us the following.

md"""
Now we can think about QR in the same way we think about LU. The only difference is that we apply a sequence of Householder transformations (rather than Gauss transformations) to introduce the subdiagonal zeros. As with the LU factorization, we can also reuse the storage of A by recognizing that the number of nontrivial parameters in the vector $w$ at each step is the same as the number of zeros produced by that transformation. This gives us the following.
"""
158 μs
hqr! (generic function with 1 method)
function hqr!(A)
m,n = size(A)
τ = zeros(n)

for j = 1:n

# Find H = I-τ*w*w' to zero out A[j+1:end,j]
normx = norm(A[j:end,j])
s = -sign(A[j,j])
u1 = A[j,j] - s*normx
w = A[j:end,j]/u1
w[1] = 1.0
A[j+1:end,j] = w[2:end] # Save trailing part of w
A[j,j] = s*normx # Diagonal element of R
τ[j] = -s*u1/normx # Save scaling factor

# Update trailing submatrix by multipling by H
A[j:end,j+1:end] -= τ[j]*w*(w'*A[j:end,j+1:end])

end

A, τ
end
1.5 ms

If we need $Q$ or $Q^T$ explicitly, we can always form it from the compressed representation. We can also multiply by $Q$ and $Q^T$ implicitly.

md"""
If we need $Q$ or $Q^T$ explicitly, we can always form it from the compressed representation. We can also multiply by $Q$ and $Q^T$ implicitly.
"""
151 μs
applyQ! (generic function with 1 method)
function applyQ!(QR, τ, X)
m, n = size(QR)
for j = n:-1:1
w = [1.0; QR[j+1:end,j]]
X[j:end,:] -= τ[j]*w*(w'*X[j:end,:])
end
X
end
988 μs
applyQT! (generic function with 1 method)
function applyQT!(QR, τ, X)
m, n = size(QR)
for j = 1:n
w = [1.0; QR[j+1:end,j]]
X[j:end,:] -= τ[j]*w*(w'*X[j:end,:])
end
X
end
951 μs
applyQ (generic function with 1 method)
applyQ(QR, τ, X) = applyQ!(QR, τ, copy(X))
292 μs
applyQT (generic function with 1 method)
applyQT(QR, τ, X) = applyQ(QR, τ, copy(X))
278 μs

To compute a dense representation of $Q$, we can always apply $Q$ to the columns of the identity.

md"""
To compute a dense representation of $Q$, we can always apply $Q$ to the columns of the identity.
"""
125 μs
formQ (generic function with 1 method)
formQ(QR, τ) = applyQ!(QR, τ, Matrix(1.0I, size(QR)[1], size(QR)[1]))
325 μs

This gives us all the ingredients to form a dense QR decomposition of $A$.

md"""
This gives us all the ingredients to form a dense QR decomposition of $A$.
"""
121 μs
hqr (generic function with 1 method)
function hqr(A)
QR, τ = hqr!(copy(A))
formQ(QR, τ), triu(QR)
end
443 μs
test_hqr (generic function with 1 method)
function test_hqr()
A = rand(10,5)
Q, R = hqr(A)
norm(Q'*Q-I), norm(A-Q*R)/norm(A)
end
545 μs
99.3 μs

However, we don't need the dense $Q$ and $R$ factors to solve a least squares problem; we just need the ability to multiply by $Q^T$ and solve with $R$.

md"""
However, we don't need the dense $Q$ and $R$ factors to solve a least squares problem; we just need the ability to multiply by $Q^T$ and solve with $R$.
"""
163 μs
hqr_solve_ls (generic function with 1 method)
function hqr_solve_ls(QR, τ, b)
m, n = size(QR)
UpperTriangular(QR[1:n,1:n])\(applyQT!(QR, τ, copy(b))[1:n])
end
578 μs
test_hqr_solve_ls (generic function with 1 method)
function test_hqr_solve_ls()
A = rand(10,5)
b = rand(10)

QR, τ = hqr!(copy(A))

xref = A\b
norm(x-xref)/norm(xref)
end
573 μs
1.4063212636835668e-15
178 μs