Notebook for 2023-02-24

The main topic of this lecture is regularization for ill-posed systems. We will illustrate this with an exactly singular problem.

md"""

# Notebook for 2023-02-24

The main topic of this lecture is regularization for ill-posed systems. We will illustrate this with an *exactly* singular problem.

"""

8.7 ms

using LinearAlgebra

259 μs

Float64

-4.70659

3.27782

-1.52446

3.84962

2.20809

4.13047

1.95386

4.20485

-2.85816

2.16261

begin

# Exact and noisy factors

A = randn(100,2) * randn(2,4)

E1 = 1e-12 * rand(10,4)

E2 = 1e-8 * randn(90,4)

E = [E1; E2]

Â = A+E

Â1 = Â[1:10,:]

# Reference coefficients

c = [1.0; 2.0; 3.0; 4.0]

# Exact and noisy rhs

b = A*c

η = 1e-3 * randn(100)

b̂ = b + η

b̂1 = b̂[1:10]

end

206 μs

If we are given the noisy matrix $\hat{A}$ and the noisy right hand side $\hat{b}$, we will basically always get something terrible, because we are fitting the noise. In particular, notice the giant coefficients if we fit with just the first part of the data

md"""

If we are given the noisy matrix $\hat{A}$ and the noisy right hand side $\hat{b}$, we will basically always get something terrible, because we are fitting the noise.

In particular, notice the giant coefficients if we fit with just the first part of the data

"""

170 μs

Float64

2.68489e8

-1.01877e8

-5.97039e8

-3.64297e8

ĉ = Â1\b̂1

121 μs

The noise vector $\eta$ in this case isn't precisely the optimal residual, but it is close.

md"""

The noise vector $\eta$ in this case isn't precisely the optimal residual, but it is close.

"""

127 μs

0.9999988224856102

norm(b̂-Â*(Â\b))/norm(η)

54.0 μs

How does our least squares fit with partial data compare to the noise vector? Not well!

md"""

How does our least squares fit with partial data compare to the noise vector? Not well!

"""

119 μs

6432.9183846541455

norm(b̂-Â*ĉ)/norm(η)

22.9 μs

This is not great, though still better than our worst-case error bounds.

md"""

This is not great, though still better than our worst-case error bounds.

"""

118 μs

4.264388527007974e8

begin

U1, σ1s, V1 = svd(Â)

1.0 + norm(A)/σ1s[end]

end

115 ms

Let's now consider the regularization approaches from class.

md"""

Let's now consider the regularization approaches from class.

"""

108 μs

Pivoted QR

The pivoted QR factorization $\hat{A}_1 \Pi = QR$ tends to pull "important factors" to the front. In this case, we can see from the structure of $R$ that columns 1 and 4 are nearly within the span of columns 3 and 2 (to within a residual of around $10^{-12}$). Therefore, we consider fitting a model that depends just on columns 3 and 2.

md"""

## Pivoted QR

"""

179 μs

QRPivoted{Float64, Matrix{Float64}, Vector{Float64}, Vector{Int64}}
Q factor:
10×10 LinearAlgebra.QRPackedQ{Float64, Matrix{Float64}, Vector{Float64}}:
 -0.51492     0.0548636  -0.673411   0.298567   …   0.16142    -0.254046   -0.0211878
  0.385646   -0.104791   -0.04248    0.131296       0.444605   -0.309804    0.470137
 -0.0780167  -0.200436   -0.256745  -0.111578      -0.427811   -0.0467769  -0.361725
  0.327539    0.185649   -0.363489  -0.184843       0.0957728  -0.453571   -0.121894
  0.434378   -0.499924   -0.264279   0.498432      -0.241753    0.237641   -0.0137181
  0.357605    0.184237   -0.198794  -0.497039   …  -0.149097   -0.0568628  -0.174298
  0.273396   -0.169482   -0.176968  -0.095118      -0.0614069  -0.0209033  -0.0236755
  0.242407    0.486848   -0.145901   0.271544       0.38444     0.494224   -0.416338
 -0.146351   -0.37615    -0.329089  -0.518193       0.373859    0.500713    0.186095
  0.035711    0.469326   -0.276413   0.0145769     -0.460505    0.276502    0.627473
R factor:
4×4 Matrix{Float64}:
 5.68181  -0.505086  -3.76776      -0.860696
 0.0       5.50948   -1.76684      -1.83836
 0.0       0.0       -2.99715e-12  -2.08307e-12
 0.0       0.0        0.0          -7.55794e-13
permutation:
4-element Vector{Int64}:
 4
 2
 3
 1

F = qr(Â1, ColumnNorm())

40.0 μs

1.9700593960409308e-15

norm(Â1[:,F.p]-F.Q*F.R) # Sanity check that this does what I said it would!

56.1 μs

Float64

0.0

0.704546

0.0

1.7439

begin

c_pivqr = zeros(4)

c_pivqr[F.p[1:2]] = F.R[1:2,1:2]\(F.Q'*b̂1)[1:2]

c_pivqr

end

71.4 μs

This model gets pretty close to the best we could expect!

md"""

This model gets pretty close to the best we could expect!

"""

110 μs

1.1430754588035392

norm(b̂-Â*c_pivqr)/norm(η)

21.7 μs

Tikhonov regularization

Recall from class that the Tikhonov approach can be encoded as an ordinary least squares problem. For example, to minimize $\|\hat{A}_1 c - \hat{b}_1\|^2 + 10^{-12} \|c\|^2$ with respect to $c$, we can write

md"""

## Tikhonov regularization

"""

167 μs

c_tik1

Float64

-0.308522

0.324941

-0.862699

1.09134

c_tik1 = [Â1; 1e-6*I]\[b̂1; zeros(4)]

43.5 ms

Indeed, this works pretty well, though you should wonder "where did $10^{-12}$ come from?"

md"""

Indeed, this works pretty well, though you should wonder "where did $10^{-12}$ come from?"

"""

122 μs

1.14307466211484

norm(b̂-Â*c_tik1)/norm(η)

18.8 μs

The Tikhonov regularized problem can also be expressed via the SVD, where instead of using the inverse singular values $\sigma_i^{-1}$ we use the regularized version $\sigma_i/(\sigma_i^2+\lambda^2)$.

md"""

"""

129 μs

fit_tikhonov_svd (generic function with 1 method)

function fit_tikhonov_svd(A, b, λ)

U, σ, V = svd(A)

σ̂inv = σ./(σ.^2 .+ λ^2)

V*(σ̂inv.*(U'*b))

end

712 μs

Float64

-0.308522

0.324941

-0.862699

1.09134

fit_tikhonov_svd(Â1, b̂1, 1e-6)

41.3 μs

The SVD is more expensive to compute with than a QR factorization, and is not "sparsity friendly" in the same way that QR is. But an advantage of using the SVD is that we can quickly recompute solutions associated with different levels of regularization.

md"""

"""

120 μs

Truncated SVD

The truncated SVD approach involves using the first $k$ singular values and vectors to compute an approximate solution to the least squares problem (vs using all of them).

md"""

## Truncated SVD

The truncated SVD approach involves using the first $k$ singular values and vectors to compute an approximate solution to the least squares problem (vs using all of them).

"""

148 μs

fit_truncated_svd (generic function with 1 method)

function fit_truncated_svd(A, b, k)

U, σ, V = svd(A)

σ̂inv = 1.0./σ

σ̂inv[k+1:end] .= 0.0

V*(σ̂inv.*(U'*b))

end

726 μs

c_tsvd

Float64

-0.30684

0.326171

-0.860615

1.09308

c_tsvd = fit_truncated_svd(Â1, b̂1, 2)

39.9 μs

Again, this approach works pretty well for this problem.

md"""

Again, this approach works pretty well for this problem.

"""

107 μs

1.1430746648361945

norm(b̂-Â*c_tsvd)/norm(η)

20.3 μs