Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

Notebook for 2023-02-24

The main topic of this lecture is regularization for ill-posed systems. We will illustrate this with an exactly singular problem.

md"""
# Notebook for 2023-02-24

The main topic of this lecture is regularization for ill-posed systems. We will illustrate this with an *exactly* singular problem.
"""
8.7 ms
using LinearAlgebra
259 μs
begin

# Exact and noisy factors
A = randn(100,2) * randn(2,4)
E1 = 1e-12 * rand(10,4)
E2 = 1e-8 * randn(90,4)
E = [E1; E2]
= A+E
Â1 = [1:10,:]

# Reference coefficients
c = [1.0; 2.0; 3.0; 4.0]

# Exact and noisy rhs
b = A*c
η = 1e-3 * randn(100)
= b + η
b̂1 = [1:10]

end
206 μs

If we are given the noisy matrix $\hat{A}$ and the noisy right hand side $\hat{b}$, we will basically always get something terrible, because we are fitting the noise. In particular, notice the giant coefficients if we fit with just the first part of the data

md"""
If we are given the noisy matrix $\hat{A}$ and the noisy right hand side $\hat{b}$, we will basically always get something terrible, because we are fitting the noise.
In particular, notice the giant coefficients if we fit with just the first part of the data
"""
170 μs
ĉ
= Â1\b̂1
121 μs

The noise vector $\eta$ in this case isn't precisely the optimal residual, but it is close.

md"""
The noise vector $\eta$ in this case isn't precisely the optimal residual, but it is close.
"""
127 μs
0.9999988224856102
norm(-*(\b))/norm(η)
54.0 μs

How does our least squares fit with partial data compare to the noise vector? Not well!

md"""
How does our least squares fit with partial data compare to the noise vector? Not well!
"""
119 μs
6432.9183846541455
norm(-*)/norm(η)
22.9 μs

This is not great, though still better than our worst-case error bounds.

md"""
This is not great, though still better than our worst-case error bounds.
"""
118 μs
4.264388527007974e8
begin
U1, σ1s, V1 = svd()
1.0 + norm(A)/σ1s[end]
end
115 ms

Let's now consider the regularization approaches from class.

md"""
Let's now consider the regularization approaches from class.
"""
108 μs

Pivoted QR

The pivoted QR factorization $\hat{A}_1 \Pi = QR$ tends to pull "important factors" to the front. In this case, we can see from the structure of $R$ that columns 1 and 4 are nearly within the span of columns 3 and 2 (to within a residual of around $10^{-12}$). Therefore, we consider fitting a model that depends just on columns 3 and 2.

md"""
## Pivoted QR

The pivoted QR factorization $\hat{A}_1 \Pi = QR$ tends to pull "important factors" to the front. In this case, we can see from the structure of $R$ that columns 1 and 4 are nearly within the span of columns 3 and 2 (to within a residual of around $10^{-12}$). Therefore, we consider fitting a model that depends just on columns 3 and 2.
"""
179 μs
F
QRPivoted{Float64, Matrix{Float64}, Vector{Float64}, Vector{Int64}}
Q factor:
10×10 LinearAlgebra.QRPackedQ{Float64, Matrix{Float64}, Vector{Float64}}:
 -0.51492     0.0548636  -0.673411   0.298567   …   0.16142    -0.254046   -0.0211878
  0.385646   -0.104791   -0.04248    0.131296       0.444605   -0.309804    0.470137
 -0.0780167  -0.200436   -0.256745  -0.111578      -0.427811   -0.0467769  -0.361725
  0.327539    0.185649   -0.363489  -0.184843       0.0957728  -0.453571   -0.121894
  0.434378   -0.499924   -0.264279   0.498432      -0.241753    0.237641   -0.0137181
  0.357605    0.184237   -0.198794  -0.497039   …  -0.149097   -0.0568628  -0.174298
  0.273396   -0.169482   -0.176968  -0.095118      -0.0614069  -0.0209033  -0.0236755
  0.242407    0.486848   -0.145901   0.271544       0.38444     0.494224   -0.416338
 -0.146351   -0.37615    -0.329089  -0.518193       0.373859    0.500713    0.186095
  0.035711    0.469326   -0.276413   0.0145769     -0.460505    0.276502    0.627473
R factor:
4×4 Matrix{Float64}:
 5.68181  -0.505086  -3.76776      -0.860696
 0.0       5.50948   -1.76684      -1.83836
 0.0       0.0       -2.99715e-12  -2.08307e-12
 0.0       0.0        0.0          -7.55794e-13
permutation:
4-element Vector{Int64}:
 4
 2
 3
 1
F = qr(Â1, ColumnNorm())
40.0 μs
1.9700593960409308e-15
norm(Â1[:,F.p]-F.Q*F.R) # Sanity check that this does what I said it would!
56.1 μs
begin
c_pivqr = zeros(4)
c_pivqr[F.p[1:2]] = F.R[1:2,1:2]\(F.Q'*b̂1)[1:2]
end
71.4 μs

This model gets pretty close to the best we could expect!

md"""
This model gets pretty close to the best we could expect!
"""
110 μs
1.1430754588035392
norm(-*c_pivqr)/norm(η)
21.7 μs

Tikhonov regularization

Recall from class that the Tikhonov approach can be encoded as an ordinary least squares problem. For example, to minimize $\|\hat{A}_1 c - \hat{b}_1\|^2 + 10^{-12} \|c\|^2$ with respect to $c$, we can write

md"""
## Tikhonov regularization

Recall from class that the Tikhonov approach can be encoded as an ordinary least squares problem. For example, to minimize $\|\hat{A}_1 c - \hat{b}_1\|^2 + 10^{-12} \|c\|^2$ with respect to $c$, we can write
"""
167 μs
c_tik1
c_tik1 = [Â1; 1e-6*I]\[b̂1; zeros(4)]
43.5 ms

Indeed, this works pretty well, though you should wonder "where did $10^{-12}$ come from?"

md"""
Indeed, this works pretty well, though you should wonder "where did $10^{-12}$ come from?"
"""
122 μs
1.14307466211484
norm(-*c_tik1)/norm(η)
18.8 μs

The Tikhonov regularized problem can also be expressed via the SVD, where instead of using the inverse singular values $\sigma_i^{-1}$ we use the regularized version $\sigma_i/(\sigma_i^2+\lambda^2)$.

md"""
The Tikhonov regularized problem can also be expressed via the SVD, where instead of using the inverse singular values $\sigma_i^{-1}$ we use the regularized version $\sigma_i/(\sigma_i^2+\lambda^2)$.
"""
129 μs
fit_tikhonov_svd (generic function with 1 method)
function fit_tikhonov_svd(A, b, λ)
U, σ, V = svd(A)
σ̂inv = σ./(σ.^2 .+ λ^2)
V*(σ̂inv.*(U'*b))
end
712 μs
fit_tikhonov_svd(Â1, b̂1, 1e-6)
41.3 μs

The SVD is more expensive to compute with than a QR factorization, and is not "sparsity friendly" in the same way that QR is. But an advantage of using the SVD is that we can quickly recompute solutions associated with different levels of regularization.

md"""
The SVD is more expensive to compute with than a QR factorization, and is not "sparsity friendly" in the same way that QR is. But an advantage of using the SVD is that we can quickly recompute solutions associated with different levels of regularization.
"""
120 μs

Truncated SVD

The truncated SVD approach involves using the first $k$ singular values and vectors to compute an approximate solution to the least squares problem (vs using all of them).

md"""
## Truncated SVD

The truncated SVD approach involves using the first $k$ singular values and vectors to compute an approximate solution to the least squares problem (vs using all of them).
"""
148 μs
fit_truncated_svd (generic function with 1 method)
function fit_truncated_svd(A, b, k)
U, σ, V = svd(A)
σ̂inv = 1.0./σ
σ̂inv[k+1:end] .= 0.0
V*(σ̂inv.*(U'*b))
end
726 μs
c_tsvd
c_tsvd = fit_truncated_svd(Â1, b̂1, 2)
39.9 μs

Again, this approach works pretty well for this problem.

md"""
Again, this approach works pretty well for this problem.
"""
107 μs
1.1430746648361945
norm(-*c_tsvd)/norm(η)
20.3 μs