Notebook for 2023-02-24
The main topic of this lecture is regularization for ill-posed systems. We will illustrate this with an exactly singular problem.
-4.70659
3.27782
-1.52446
3.84962
2.20809
4.13047
1.95386
4.20485
-2.85816
2.16261
If we are given the noisy matrix $\hat{A}$ and the noisy right hand side $\hat{b}$, we will basically always get something terrible, because we are fitting the noise. In particular, notice the giant coefficients if we fit with just the first part of the data
2.68489e8
-1.01877e8
-5.97039e8
-3.64297e8
The noise vector $\eta$ in this case isn't precisely the optimal residual, but it is close.
0.9999988224856102
How does our least squares fit with partial data compare to the noise vector? Not well!
6432.9183846541455
This is not great, though still better than our worst-case error bounds.
4.264388527007974e8
Let's now consider the regularization approaches from class.
Pivoted QR
The pivoted QR factorization $\hat{A}_1 \Pi = QR$ tends to pull "important factors" to the front. In this case, we can see from the structure of $R$ that columns 1 and 4 are nearly within the span of columns 3 and 2 (to within a residual of around $10^{-12}$). Therefore, we consider fitting a model that depends just on columns 3 and 2.
QRPivoted{Float64, Matrix{Float64}, Vector{Float64}, Vector{Int64}}
Q factor:
10×10 LinearAlgebra.QRPackedQ{Float64, Matrix{Float64}, Vector{Float64}}:
-0.51492 0.0548636 -0.673411 0.298567 … 0.16142 -0.254046 -0.0211878
0.385646 -0.104791 -0.04248 0.131296 0.444605 -0.309804 0.470137
-0.0780167 -0.200436 -0.256745 -0.111578 -0.427811 -0.0467769 -0.361725
0.327539 0.185649 -0.363489 -0.184843 0.0957728 -0.453571 -0.121894
0.434378 -0.499924 -0.264279 0.498432 -0.241753 0.237641 -0.0137181
0.357605 0.184237 -0.198794 -0.497039 … -0.149097 -0.0568628 -0.174298
0.273396 -0.169482 -0.176968 -0.095118 -0.0614069 -0.0209033 -0.0236755
0.242407 0.486848 -0.145901 0.271544 0.38444 0.494224 -0.416338
-0.146351 -0.37615 -0.329089 -0.518193 0.373859 0.500713 0.186095
0.035711 0.469326 -0.276413 0.0145769 -0.460505 0.276502 0.627473
R factor:
4×4 Matrix{Float64}:
5.68181 -0.505086 -3.76776 -0.860696
0.0 5.50948 -1.76684 -1.83836
0.0 0.0 -2.99715e-12 -2.08307e-12
0.0 0.0 0.0 -7.55794e-13
permutation:
4-element Vector{Int64}:
4
2
3
1
1.9700593960409308e-15
0.0
0.704546
0.0
1.7439
This model gets pretty close to the best we could expect!
1.1430754588035392
Tikhonov regularization
Recall from class that the Tikhonov approach can be encoded as an ordinary least squares problem. For example, to minimize $\|\hat{A}_1 c - \hat{b}_1\|^2 + 10^{-12} \|c\|^2$ with respect to $c$, we can write
-0.308522
0.324941
-0.862699
1.09134
Indeed, this works pretty well, though you should wonder "where did $10^{-12}$ come from?"
1.14307466211484
The Tikhonov regularized problem can also be expressed via the SVD, where instead of using the inverse singular values $\sigma_i^{-1}$ we use the regularized version $\sigma_i/(\sigma_i^2+\lambda^2)$.
fit_tikhonov_svd (generic function with 1 method)
-0.308522
0.324941
-0.862699
1.09134
The SVD is more expensive to compute with than a QR factorization, and is not "sparsity friendly" in the same way that QR is. But an advantage of using the SVD is that we can quickly recompute solutions associated with different levels of regularization.
Truncated SVD
The truncated SVD approach involves using the first $k$ singular values and vectors to compute an approximate solution to the least squares problem (vs using all of them).
fit_truncated_svd (generic function with 1 method)
-0.30684
0.326171
-0.860615
1.09308
Again, this approach works pretty well for this problem.
1.1430746648361945