Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

Notes for 2023-02-20

md"""
# Notes for 2023-02-20
"""
19.0 ms
using LinearAlgebra
241 μs
using Plots
8.6 s

Crickets data

As we discuss in the notes, we consider least squares regression to fit the model

$$\alpha C + \beta \approx T$$

where $C$ and $T$ are cricket chirps per minute and temperatures (in degrees Farenheit), respectively. In matrix form, we want

$$\begin{bmatrix} C_1 & 1 \\ C_2 & 1 \\ \vdots & \vdots \\ C_n & 1 \end{bmatrix} \begin{bmatrix} \alpha \\ \beta \end{bmatrix} \approx \begin{bmatrix} T_1 \\ T_2 \\ \vdots \\ T_n \end{bmatrix}$$

where $\approx$ in this setting means "minimizing error in a least squares sense." We can compute this using the backslash operation in Julia. Through the rest of this notebook, we'll refer to this system as $Ac \approx b$.

md"""
## Crickets data

As we discuss in the notes, we consider least squares regression to fit the model

$$\alpha C + \beta \approx T$$

where $C$ and $T$ are cricket chirps per minute and temperatures (in degrees Farenheit), respectively. In matrix form, we want

$$\begin{bmatrix}
C_1 & 1 \\
C_2 & 1 \\
\vdots & \vdots \\
C_n & 1
\end{bmatrix} \begin{bmatrix} \alpha \\ \beta \end{bmatrix} \approx
\begin{bmatrix} T_1 \\ T_2 \\ \vdots \\ T_n \end{bmatrix}$$

where $\approx$ in this setting means "minimizing error in a least squares sense."
We can compute this using the backslash operation in Julia. Through the rest of this notebook, we'll refer to this system as $Ac \approx b$.
"""
254 μs
begin
chirps_vs_temperature =
[ 20.000 89.000 ;
16.000 72.000 ;
20.000 93.000 ;
18.000 84.000 ;
17.000 81.000 ;
16.000 75.000 ;
15.000 70.000 ;
17.000 82.000 ;
15.000 69.000 ;
16.000 83.000 ;
15.000 80.000 ;
17.000 83.000 ;
16.000 81.000 ;
17.000 84.000 ;
14.000 76.000 ]

chirps = chirps_vs_temperature[:,1]
temperatures = chirps_vs_temperature[:,2]

Acrickets = [chirps ones(length(chirps))]
chirp_coeffs = Acrickets \ temperatures

end
1.8 ms

It is standard in situations like this to plot the model fit (the red line) on top of a scatter plot of the raw data (the blue dots).

md"""
It is standard in situations like this to plot the model fit (the red line) on top of a scatter plot of the raw data (the blue dots).
"""
118 μs
begin
chirp_range = range(14.0, 20.0, length=100)
plot(chirps, temperatures, seriestype = :scatter,
title="Chirps vs temperature",
xlabel="Chirps (/min)", ylabel="Temperature (F)",
legend=false)
end
2.6 s

Solving via factorizations

Just as we can solve linear systems via Gaussian elimination, there are a variety of ways that we can solve the least squares problem using factorizations. If we didn't know that backslash with a rectangular matrix meant "solve the least squares" problem, we might be tempted to explicitly form and solve the normal equations:

$$c = (A^T A)^{-1} A^T b$$

Note that we do not explicitly form $(A^T A)^{-1}$! Even when I'm being naive, I won't be that naive. We'll solve with a backslash instead.

md"""
## Solving via factorizations

Just as we can solve linear systems via Gaussian elimination, there are a variety of ways that we can solve the least squares problem using factorizations. If we didn't know that backslash with a rectangular matrix meant "solve the least squares" problem, we might be tempted to explicitly form and solve the normal equations:

$$c = (A^T A)^{-1} A^T b$$

Note that we do *not* explicitly form $(A^T A)^{-1}$! Even when I'm being naive, I won't be that naive. We'll solve with a backslash instead.
"""
260 μs
99.3 μs

Forming and solving the normal equations isn't actually a completely insane way to solve the least squares problem. In many cases, it's the fastest thing we can do, though not by much. The backslash algorithm in the scenario above will do some type of factorization in Julia, but if we might want to solve the normal equations more than once, it is useful to do the factorization directly. Here we compute the Cholesky factorization of the Gram matrix ($A^T A$).

md"""
Forming and solving the normal equations isn't actually a completely insane way to solve the least squares problem. In many cases, it's the fastest thing we can do, though not by much. The backslash algorithm in the scenario above will do some type of factorization in Julia, but if we might want to solve the normal equations more than once, it is useful to do the factorization directly. Here we compute the Cholesky factorization of the Gram matrix ($A^T A$).
"""
140 μs
FC
Cholesky{Float64, Matrix{Float64}}
U factor:
2×2 UpperTriangular{Float64, Matrix{Float64}}:
 64.6142  3.85364
   ⋅      0.386602
FC = cholesky(Acrickets'*Acrickets)
952 μs

Julia lets us use backslash with a factorization to solve the appropriate linear system or least squares problem involving that the factorization was set up for.

md"""
Julia lets us use backslash with a factorization to solve the appropriate linear system or least squares problem involving that the factorization was set up for.
"""
114 μs
947 μs

Alternately, we could unpack the factorizations to get directly at the upper triangular factor, often called $R$ but named U by the Julia routine ($A^T A = R^T R$).

md"""
Alternately, we could unpack the factorizations to get directly at the upper triangular factor, often called $R$ but named `U` by the Julia routine ($A^T A = R^T R$).
"""
128 μs
27.7 μs

Another approach is to compute the factorization $A = QR$. When we write the QR factorization, we sometimes distinguish between the "full" factorization (where $Q$ is square) and the "economy" factorization (where $R$ is square). But on the computer, we don't necessarily need to explicitly keep track of the $Q$ factor, and indeed Julia does not. So behind the scenes, the Julia QR routine is sort of a hybrid: we keep a data structure that can be used to reconstruct the full $Q$ factor or a part of it; and we keep the top part of the $R$ factor as in an economy factorization (and don't bother storing the zero block)

md"""
Another approach is to compute the factorization $A = QR$. When we write the QR factorization, we sometimes distinguish between the "full" factorization (where $Q$ is square) and the "economy" factorization (where $R$ is square). But on the computer, we don't necessarily need to explicitly keep track of the $Q$ factor, and indeed Julia does not. So behind the scenes, the Julia QR routine is sort of a hybrid: we keep a data structure that can be used to reconstruct the full $Q$ factor or a part of it; and we keep the top part of the $R$ factor as in an economy factorization (and don't bother storing the zero block)
"""
171 μs
FQR
LinearAlgebra.QRCompactWY{Float64, Matrix{Float64}, Matrix{Float64}}
Q factor:
15×15 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}:
 -0.309529   0.498741   -0.54394    …  -0.116744   -0.223543    0.0968537
 -0.247623  -0.118335    0.278         -0.234719   -0.106539   -0.491079
 -0.309529   0.498741    0.751815       0.0391906  -0.0326533   0.182878
 -0.278576   0.190203   -0.142128      -0.0100866  -0.0430969   0.0559339
 -0.2631     0.0359342  -0.0890991     -0.0347252  -0.0483187  -0.00753827
 -0.247623  -0.118335   -0.0360705  …  -0.0593638  -0.0535405  -0.0710104
 -0.232147  -0.272604    0.0169581     -0.0840024  -0.0587623  -0.134483
  ⋮                                 ⋱                          
 -0.247623  -0.118335   -0.0360705     -0.0593638  -0.0535405  -0.0710104
 -0.232147  -0.272604    0.0169581  …  -0.0840024  -0.0587623  -0.134483
 -0.2631     0.0359342  -0.0890991     -0.0347252  -0.0483187  -0.00753827
 -0.247623  -0.118335   -0.0360705      0.940636   -0.0535405  -0.0710104
 -0.2631     0.0359342  -0.0890991     -0.0347252   0.951681   -0.00753827
 -0.216671  -0.426873    0.0699867     -0.108641   -0.0639841   0.802045
R factor:
2×2 Matrix{Float64}:
 -64.6142  -3.85364
   0.0     -0.386602
FQR = qr(Acrickets)
928 μs

As before, we can use backslash with the factorization object to solve the system all in one go.

md"""
As before, we can use backslash with the factorization object to solve the system all in one go.
"""
123 μs
33.2 μs

Or we can write (in economy QR lingo) that $A = QR$ means $c = R^{-1} Q^T b$.

md"""
Or we can write (in economy QR lingo) that $A = QR$ means $c = R^{-1} Q^T b$.
"""
123 μs
FQR.R\((FQR.Q'*temperatures)[1:2])
34.5 μs

Our final factorization for solving least squares problems is the SVD. In this case, we don't store a compressed representation: instead, we store the factors $U$ and $V^T$ in explicit form (with $U$ rectangular), and the singular values are stored in a vector.

md"""
Our final factorization for solving least squares problems is the SVD. In this case, we don't store a compressed representation: instead, we store the factors $U$ and $V^T$ in explicit form (with $U$ rectangular), and the singular values are stored in a vector.
"""
135 μs
FSVD
SVD{Float64, Float64, Matrix{Float64}, Vector{Float64}}
U factor:
15×2 Matrix{Float64}:
 -0.309352   0.498851
 -0.247665  -0.118247
 -0.309352   0.498851
 -0.278509   0.190302
 -0.263087   0.0360277
 -0.247665  -0.118247
 -0.232244  -0.272521
  ⋮         
 -0.247665  -0.118247
 -0.232244  -0.272521
 -0.263087   0.0360277
 -0.247665  -0.118247
 -0.263087   0.0360277
 -0.216822  -0.426796
singular values:
2-element Vector{Float64}:
 64.72905892017894
  0.38591619297907387
Vt factor:
2×2 Matrix{Float64}:
 -0.998226  -0.059537
  0.059537  -0.998226
FSVD = svd(Acrickets)
45.1 μs

As before, we can just use backslash to solve the least squares problem directly via the factorization.

md"""
As before, we can just use backslash to solve the least squares problem directly via the factorization.
"""
112 μs
2.7 ms

But what is going on behind the scenes with that backslash is that we are computing $c = V (\Sigma^{-1} (U^T b))$. It looks a little different in the Julia code only because the factorization object stores $V^T$ rather than $V$, and the vector of singular values rather than the diagonal matrix $\Sigma$.

md"""
But what is going on behind the scenes with that backslash is that we are computing $c = V (\Sigma^{-1} (U^T b))$. It looks a little different in the Julia code only because the factorization object stores $V^T$ rather than $V$, and the vector of singular values rather than the diagonal matrix $\Sigma$.
"""
152 μs
FSVD.Vt'*( (FSVD.U'*temperatures) ./ FSVD.S )
26.8 ms

Finally, it's worth mentioning that we can in principle construct the Moore-Penrose pseudoinverse directly. This is the matrix

$$A^\dagger = (A^T A)^{-1} A^T = R^{-1} Q^T = V \Sigma^{-1} U^T$$

However, it is worth emphasizing that you should never write code like this. Just as we almost never solve linear systems by forming explicit inverses, we also almost never solve least squares problems by forming explicit pseudoinverses.

md"""
Finally, it's worth mentioning that we can *in principle* construct the Moore-Penrose pseudoinverse directly. This is the matrix

$$A^\dagger = (A^T A)^{-1} A^T = R^{-1} Q^T = V \Sigma^{-1} U^T$$

However, it is worth emphasizing that you should *never write code like this*. Just as we almost never solve linear systems by forming explicit inverses, we also almost never solve least squares problems by forming explicit pseudoinverses.
"""
216 μs
110 ms

And really finally, what is the condition number of this problem? We can compute this via the Julia cond routine.

md"""
And really finally, what is the condition number of this problem? We can compute this via the Julia `cond` routine.
"""
127 μs
167.72827908698008
cond(Acrickets)
41.2 μs