Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

Notebook for 2023-03-06

Power method

The basic power iteration is $x_{k+1} \propto A x_k$, and it converges if there is a unique eigenvalue with largest magnitude (so we get into trouble if a complex conjugate pair of eigenvalues have the largest magnitude, for example). Fortunately, there is a theorem for non-negative matrices (the Perron-Frobenius theorem) that guarantee that if $A$ is most matrices with non-negative entries (and any matrix with all positive entries), then there is a unique real eigenvalue with largest magnitude (and the associated eigenvector can be normalized to be non-negative elementwise). So we will use a random non-negative matrix as a test case.

md"""
# Notebook for 2023-03-06

## Power method

The basic power iteration is $x_{k+1} \propto A x_k$, and it converges if there is a unique eigenvalue with largest magnitude (so we get into trouble if a complex conjugate pair of eigenvalues have the largest magnitude, for example). Fortunately, there is a theorem for non-negative matrices (the Perron-Frobenius theorem) that guarantee that if $A$ is most matrices with non-negative entries (and any matrix with all positive entries), then there is a unique real eigenvalue with largest magnitude (and the associated eigenvector can be normalized to be non-negative elementwise). So we will use a random non-negative matrix as a test case.
"""
126 ms
using LinearAlgebra
293 μs
using Plots
8.8 s
A
10×10 Matrix{Float64}:
 0.880662   0.0514075  0.578123  …  0.590584  0.325349  0.0715021  0.684284
 0.31962    0.842413   0.779432     0.465667  0.156031  0.861537   0.358622
 0.52215    0.989494   0.32323      0.490179  0.589409  0.0363756  0.956799
 0.243136   0.923794   0.686959     0.174564  0.458208  0.764887   0.755403
 0.207609   0.041166   0.355719     0.206162  0.848036  0.581509   0.587425
 0.744564   0.367615   0.995773  …  0.63304   0.660803  0.683073   0.367725
 0.458543   0.742404   0.401307     0.634839  0.526996  0.262948   0.987942
 0.510679   0.344028   0.42933      0.486164  0.951994  0.360656   0.599117
 0.401566   0.679092   0.854722     0.234696  0.416433  0.234901   0.814993
 0.0939432  0.604541   0.925085     0.387471  0.48294   0.786034   0.588253
A = rand(10,10)
46.0 μs

When we write iterative methods, it is frequently useful to add some type of monitoring functionality that allows us to keep track of the convergence of the iteration from outside. We do that here, too, by adding a keyword argument called monitor that takes the current iterate as an argument.

md"""
When we write iterative methods, it is frequently useful to add some type of monitoring functionality that allows us to keep track of the convergence of the iteration from outside. We do that here, too, by adding a keyword argument called `monitor` that takes the current iterate as an argument.
"""
123 μs
basic_power_method (generic function with 2 methods)
function basic_power_method(A, k, x=nothing; monitor=(x)->nothing)
n = size(A)[1]

# Start with a random vector
if x == nothing
x = randn(n)
end
x /= norm(x)

for j=1:k
x = A*x
x /= norm(x)
end
x
end
2.0 ms

There are a few ways of measuring the residual for an approximate eigenvector.

One common approach is to measure the norm of $(I-xx^T) Ax$, where $x$ is assumed to be normalized to unit length. This is equivalent to looking at $Ax - \rho_A(x) x$ where $\rho_A(x) = x^T A x / x^T x$ is the Rayleigh quotient (frequently used as an eigenvalue estimate). If we want a relative error measure, it makes sense to look at $\|(I-xx^T)Ax\|/\|Ax\|$.

The test_power_method function uses the monitoring functionality of basic_power_method to compute residuals at each step.

md"""
There are a few ways of measuring the residual for an approximate eigenvector.

One common approach is to measure the norm of $(I-xx^T) Ax$, where $x$ is assumed to be normalized to unit length. This is equivalent to looking at $Ax - \rho_A(x) x$ where $\rho_A(x) = x^T A x / x^T x$ is the Rayleigh quotient (frequently used as an eigenvalue estimate). If we want a relative error measure, it makes sense to look at $\|(I-xx^T)Ax\|/\|Ax\|$.

The `test_power_method` function uses the monitoring functionality of `basic_power_method` to compute residuals at each step.
"""
201 μs
eigenvec_resid (generic function with 1 method)
function eigenvec_resid(A, x)
u = x/norm(x)
y = A*u
ρ = y'*u
norm(y-ρ*x)/norm(y)
end
473 μs
test_power_method (generic function with 1 method)
function test_power_method(A, k)
rs = []
basic_power_method(A, k, monitor=(x)->push!(rs, eigenvec_resid(A, x)))
end
691 μs

The expected rate of convergence is given by $|\lambda_1|/|\lambda_2|$, where $|\lambda_1|$ is the largest eigenvalue magnitude and $|\lambda_2|$ is the second-largest eigenvalue magnitude. If we plot the residual and powers of $|\lambda_1|/|\lambda_2|$ together on a semi-logarithmic scale, we see that they are essentially parallel – which tells us that our theory is getting the rate right.

md"""
The expected rate of convergence is given by $|\lambda_1|/|\lambda_2|$, where $|\lambda_1|$ is the largest eigenvalue magnitude and $|\lambda_2|$ is the second-largest eigenvalue magnitude. If we plot the residual and powers of $|\lambda_1|/|\lambda_2|$ together on a semi-logarithmic scale, we see that they are essentially parallel -- which tells us that our theory is getting the rate right.
"""
372 μs
begin
λs = eigvals(A)
abs_λs = sort(abs.(λs), rev=true)
plot(test_power_method(A, 20), yscale=:log10, label="Residual")
plot!((abs_λs[2]/abs_λs[1]).^(1:20), yscale=:log10, label="Expected rate")
end
2.8 s

Shift-invert

For this family of random matrices with uniform entries, we expect the first eigenvalue to be roughly $n/2$. Let's use this as a shift for the shift-invert iteration. Note that we factor the matrix once and keep re-using the factorization! As before, we use the monitoring functionality to keep track of things.

md"""
## Shift-invert

For this family of random matrices with uniform entries, we expect the first eigenvalue to be roughly $n/2$. Let's use this as a shift for the shift-invert iteration. Note that we factor the matrix once and keep re-using the factorization!
As before, we use the monitoring functionality to keep track of things.
"""
150 μs
shiftinv_power_method (generic function with 3 methods)
function shiftinv_power_method(A, k, σ=0.0, x=nothing; monitor=(x)->nothing)
n = size(A)[1]

# Factor the shifted matrix
F = lu(A-σ*I)

# Start with a random vector
if x == nothing
x = randn(n)
end
x /= norm(x)

for j=1:k
x = F\x
x /= norm(x)
end
x
end
2.8 ms
test_shiftinv_power_method (generic function with 1 method)
function test_shiftinv_power_method(A, k, σ)
rs = []
shiftinv_power_method(A, k, σ, monitor=(x)->push!(rs, eigenvec_resid(A, x)))
end
753 μs

The shift-invert iteration is just an iteration with a new matrix $(A-\sigma I)^{-1}$, so the rate of convergence uses the same technique – but now we are looking at the eigenvalues of $(A-\sigma I)^{-1}$. Note that if $\lambda$ is an eigenvalue of $A$, then the spectral mapping theorem tells us that $(\lambda-\sigma)^{-1}$ is an eigenvalue of $(A-\sigma I)^{-1}$.

md"""
The shift-invert iteration is just an iteration with a new matrix $(A-\sigma I)^{-1}$, so the rate of convergence uses the same technique -- but now we are looking at the eigenvalues of $(A-\sigma I)^{-1}$. Note that if $\lambda$ is an eigenvalue of $A$, then the spectral mapping theorem tells us that $(\lambda-\sigma)^{-1}$ is an eigenvalue of $(A-\sigma I)^{-1}$.
"""
147 μs
begin
abs_si_λs = sort(abs.(1.0./(λs.-5.0)), rev=true)
plot(test_shiftinv_power_method(A, 20, 5.0), yscale=:log10, label="Residual")
plot!((abs_si_λs[2]/abs_si_λs[1]).^(1:20), yscale=:log10, label="Expected rate")
end
874 μs

Rayleigh quotient iteration

The Rayleigh quotient iteration is a variant of the shift-invert transformed power iteration in which we use the Rayleigh quotient as our shift at each step. Because the shifts change, we have to re-do the factorization every time, which seems expensive! We also now have less control over which eigenvalue we converge to, though we can try to guide the iteration by starting with a step of shift-invert with a fixed shift. But once it starts converging, the iteration converges much faster.

md"""
## Rayleigh quotient iteration

The Rayleigh quotient iteration is a variant of the shift-invert transformed power iteration in which we use the Rayleigh quotient as our shift at each step. Because the shifts change, we have to re-do the factorization every time, which seems expensive! We also now have less control over which eigenvalue we converge to, though we can try to guide the iteration by starting with a step of shift-invert with a fixed shift. But once it starts converging, the iteration converges much faster.
"""
159 μs
rq_method (generic function with 3 methods)
function rq_method(A, k, σ0=0.0, x=nothing; monitor=(x, ρ)->nothing)
n = size(A)[1]

# Start with one step of shift-inverse from a random vector
if x == nothing
x = (A-σ0*I)\randn(n)
end
x /= norm(x)

for j=1:k
ρ = x'*A*x
x = (A-ρ*I)\x
x /= norm(x)
end
x
end
2.4 ms
test_rq_method (generic function with 1 method)
function test_rq_method(A, k, σ)
rs = []
rq_method(A, k, σ, monitor=(x, ρ)->push!(rs, eigenvec_resid(A, x)))
end
741 μs

When we plot the Rayleigh quotient iteration convergence on a semi-log scale, we see the shape of a downward-facing parabola. This indicates that the residual at step $k+1$ is not proportional to the residual at step $k$, but to the square of the residual at step $k$. This is known as quadratic convergence, and we will see it frequently when we start talking about Newton's method later in the course.

md"""
When we plot the Rayleigh quotient iteration convergence on a semi-log scale, we see the shape of a downward-facing parabola. This indicates that the residual at step $k+1$ is not proportional to the residual at step $k$, but to the *square* of the residual at step $k$. This is known as *quadratic convergence*, and we will see it frequently when we start talking about Newton's method later in the course.
"""
189 μs
plot(test_rq_method(A, 20, 5.0), yscale=:log10, label="Residual")
597 μs

Subspace iteration

So far, we have focused on only one vector at a time. What if we want more than one vector? We can do something very much like power iteration, but with more than one vector:

$$A Q_k = Q_{k+1} R_{k+1}$$

where $Q_k \in \mathbb{R}^{n \times s}$ and $R_k \in \mathbb{R}^{s \times s}$ for iterating toward an $s$-dimensional invariant subspace.

Because we want to ensure that we are not running into an invariant subspace associated with a complex conjugate pair, we will use a fixed matrix for this part.

md"""
## Subspace iteration

So far, we have focused on only one vector at a time. What if we want more than one vector? We can do something very much like power iteration, but with more than one vector:

$$A Q_k = Q_{k+1} R_{k+1}$$

where $Q_k \in \mathbb{R}^{n \times s}$ and $R_k \in \mathbb{R}^{s \times s}$ for iterating toward an $s$-dimensional invariant subspace.

Because we want to ensure that we are not running into an invariant subspace associated with a complex conjugate pair, we will use a fixed matrix for this part.
"""
210 μs
sympart (generic function with 1 method)
sympart(A) = (A+A')/2
254 μs
AA
10×10 Matrix{Float64}:
 0.0218826  0.52758   0.633552  0.418954  0.478488   …  0.943454  0.630096   0.474326
 0.52758    0.637445  0.758358  0.746051  0.384418      0.374411  0.72308    0.536691
 0.633552   0.758358  0.996863  0.526729  0.87995       0.122651  0.647648   0.406375
 0.418954   0.746051  0.526729  0.883328  0.828262      0.600134  0.378186   0.49313
 0.478488   0.384418  0.87995   0.828262  0.0535102     0.753344  0.0696672  0.586647
 0.410837   0.543343  0.841718  0.188766  0.333772   …  0.536866  0.730405   0.59526
 0.0374956  0.702629  0.542772  0.732502  0.781873      0.460396  0.485395   0.38182
 0.943454   0.374411  0.122651  0.600134  0.753344      0.784184  0.452729   0.603211
 0.630096   0.72308   0.647648  0.378186  0.0696672     0.452729  0.859303   0.509452
 0.474326   0.536691  0.406375  0.49313   0.586647      0.603211  0.509452   0.235686
AA = sympart(rand(10,10))
41.1 μs
orth_iteration (generic function with 2 methods)
function orth_iteration(A, s, k, Q0=nothing; monitor=(Q)->nothing)
n = size(A)[1]

# Initial guess
if Q0 == nothing
Q, R = qr(randn(n,s))
Q = Q[:,1:s]
else
Q, R = qr(Q0)
Q = Q[:,1:s]
end

for j = 1:k
Q, R = qr(A*Q)
Q = Q[:,1:s]
end
Q
end
2.5 ms

As before, we can monitor convergence via the relative residual norm

$$\|(I-Q_k Q_k^T) A Q_k\|/\|AQ_k\|$$

md"""
As before, we can monitor convergence via the relative residual norm

$$\|(I-Q_k Q_k^T) A Q_k\|/\|AQ_k\|$$
"""
109 μs
subspace_resid (generic function with 1 method)
function subspace_resid(A, Q)
Y = A*Q
norm(Y-Q*(Q'*Y))/norm(Y)
end
410 μs
test_orth_iteration (generic function with 1 method)
function test_orth_iteration(A, s, k)
rs = []
orth_iteration(A, s, k, monitor=(Q)->push!(rs, subspace_resid(A, Q)))
end
814 μs

And the rate of the convergence is predicted now by $|\lambda_{s+1}|/|\lambda_s|$ where $s$ is the dimension of the subspace.

md"""
And the rate of the convergence is predicted now by $|\lambda_{s+1}|/|\lambda_s|$ where $s$ is the dimension of the subspace.
"""
126 μs
begin
abs_λs_AA = sort(abs.(eigvals(AA)), rev=true)
plot(test_orth_iteration(AA, 2, 20), yscale=:log10, label="Residual")
plot!((abs_λs_AA[3]/abs_λs_AA[2]).^(1:20), yscale=:log10, label="Predicted rate")
end
1.4 ms