Notes for 2023-02-20
Crickets data
As we discuss in the notes, we consider least squares regression to fit the model
$$\alpha C + \beta \approx T$$
where $C$ and $T$ are cricket chirps per minute and temperatures (in degrees Farenheit), respectively. In matrix form, we want
$$\begin{bmatrix} C_1 & 1 \\ C_2 & 1 \\ \vdots & \vdots \\ C_n & 1 \end{bmatrix} \begin{bmatrix} \alpha \\ \beta \end{bmatrix} \approx \begin{bmatrix} T_1 \\ T_2 \\ \vdots \\ T_n \end{bmatrix}$$
where $\approx$ in this setting means "minimizing error in a least squares sense." We can compute this using the backslash operation in Julia. Through the rest of this notebook, we'll refer to this system as $Ac \approx b$.
3.21635
26.742
It is standard in situations like this to plot the model fit (the red line) on top of a scatter plot of the raw data (the blue dots).
Solving via factorizations
Just as we can solve linear systems via Gaussian elimination, there are a variety of ways that we can solve the least squares problem using factorizations. If we didn't know that backslash with a rectangular matrix meant "solve the least squares" problem, we might be tempted to explicitly form and solve the normal equations:
$$c = (A^T A)^{-1} A^T b$$
Note that we do not explicitly form $(A^T A)^{-1}$! Even when I'm being naive, I won't be that naive. We'll solve with a backslash instead.
3.21635
26.742
Forming and solving the normal equations isn't actually a completely insane way to solve the least squares problem. In many cases, it's the fastest thing we can do, though not by much. The backslash algorithm in the scenario above will do some type of factorization in Julia, but if we might want to solve the normal equations more than once, it is useful to do the factorization directly. Here we compute the Cholesky factorization of the Gram matrix ($A^T A$).
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
64.6142 3.85364
⋅ 0.386602
Julia lets us use backslash with a factorization to solve the appropriate linear system or least squares problem involving that the factorization was set up for.
3.21635
26.742
Alternately, we could unpack the factorizations to get directly at the upper triangular factor, often called $R$ but named U
by the Julia routine ($A^T A = R^T R$).
3.21635
26.742
Another approach is to compute the factorization $A = QR$. When we write the QR factorization, we sometimes distinguish between the "full" factorization (where $Q$ is square) and the "economy" factorization (where $R$ is square). But on the computer, we don't necessarily need to explicitly keep track of the $Q$ factor, and indeed Julia does not. So behind the scenes, the Julia QR routine is sort of a hybrid: we keep a data structure that can be used to reconstruct the full $Q$ factor or a part of it; and we keep the top part of the $R$ factor as in an economy factorization (and don't bother storing the zero block)
LinearAlgebra.QRCompactWY{Float64, Matrix{Float64}, Matrix{Float64}}
Q factor:
15×15 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}:
-0.309529 0.498741 -0.54394 … -0.116744 -0.223543 0.0968537
-0.247623 -0.118335 0.278 -0.234719 -0.106539 -0.491079
-0.309529 0.498741 0.751815 0.0391906 -0.0326533 0.182878
-0.278576 0.190203 -0.142128 -0.0100866 -0.0430969 0.0559339
-0.2631 0.0359342 -0.0890991 -0.0347252 -0.0483187 -0.00753827
-0.247623 -0.118335 -0.0360705 … -0.0593638 -0.0535405 -0.0710104
-0.232147 -0.272604 0.0169581 -0.0840024 -0.0587623 -0.134483
⋮ ⋱
-0.247623 -0.118335 -0.0360705 -0.0593638 -0.0535405 -0.0710104
-0.232147 -0.272604 0.0169581 … -0.0840024 -0.0587623 -0.134483
-0.2631 0.0359342 -0.0890991 -0.0347252 -0.0483187 -0.00753827
-0.247623 -0.118335 -0.0360705 0.940636 -0.0535405 -0.0710104
-0.2631 0.0359342 -0.0890991 -0.0347252 0.951681 -0.00753827
-0.216671 -0.426873 0.0699867 -0.108641 -0.0639841 0.802045
R factor:
2×2 Matrix{Float64}:
-64.6142 -3.85364
0.0 -0.386602
As before, we can use backslash with the factorization object to solve the system all in one go.
3.21635
26.742
Or we can write (in economy QR lingo) that $A = QR$ means $c = R^{-1} Q^T b$.
3.21635
26.742
Our final factorization for solving least squares problems is the SVD. In this case, we don't store a compressed representation: instead, we store the factors $U$ and $V^T$ in explicit form (with $U$ rectangular), and the singular values are stored in a vector.
SVD{Float64, Float64, Matrix{Float64}, Vector{Float64}}
U factor:
15×2 Matrix{Float64}:
-0.309352 0.498851
-0.247665 -0.118247
-0.309352 0.498851
-0.278509 0.190302
-0.263087 0.0360277
-0.247665 -0.118247
-0.232244 -0.272521
⋮
-0.247665 -0.118247
-0.232244 -0.272521
-0.263087 0.0360277
-0.247665 -0.118247
-0.263087 0.0360277
-0.216822 -0.426796
singular values:
2-element Vector{Float64}:
64.72905892017894
0.38591619297907387
Vt factor:
2×2 Matrix{Float64}:
-0.998226 -0.059537
0.059537 -0.998226
As before, we can just use backslash to solve the least squares problem directly via the factorization.
3.21635
26.742
But what is going on behind the scenes with that backslash is that we are computing $c = V (\Sigma^{-1} (U^T b))$. It looks a little different in the Julia code only because the factorization object stores $V^T$ rather than $V$, and the vector of singular values rather than the diagonal matrix $\Sigma$.
3.21635
26.742
Finally, it's worth mentioning that we can in principle construct the Moore-Penrose pseudoinverse directly. This is the matrix
$$A^\dagger = (A^T A)^{-1} A^T = R^{-1} Q^T = V \Sigma^{-1} U^T$$
However, it is worth emphasizing that you should never write code like this. Just as we almost never solve linear systems by forming explicit inverses, we also almost never solve least squares problems by forming explicit pseudoinverses.
3.21635
26.742
And really finally, what is the condition number of this problem? We can compute this via the Julia cond
routine.
167.72827908698008