Notebook for 2023-03-06

Power method

The basic power iteration is $x_{k+1} \propto A x_k$, and it converges if there is a unique eigenvalue with largest magnitude (so we get into trouble if a complex conjugate pair of eigenvalues have the largest magnitude, for example). Fortunately, there is a theorem for non-negative matrices (the Perron-Frobenius theorem) that guarantee that if $A$ is most matrices with non-negative entries (and any matrix with all positive entries), then there is a unique real eigenvalue with largest magnitude (and the associated eigenvector can be normalized to be non-negative elementwise). So we will use a random non-negative matrix as a test case.

md"""

# Notebook for 2023-03-06

## Power method

"""

126 ms

using LinearAlgebra

293 μs

using Plots

8.8 s

10×10 Matrix{Float64}:
 0.880662   0.0514075  0.578123  …  0.590584  0.325349  0.0715021  0.684284
 0.31962    0.842413   0.779432     0.465667  0.156031  0.861537   0.358622
 0.52215    0.989494   0.32323      0.490179  0.589409  0.0363756  0.956799
 0.243136   0.923794   0.686959     0.174564  0.458208  0.764887   0.755403
 0.207609   0.041166   0.355719     0.206162  0.848036  0.581509   0.587425
 0.744564   0.367615   0.995773  …  0.63304   0.660803  0.683073   0.367725
 0.458543   0.742404   0.401307     0.634839  0.526996  0.262948   0.987942
 0.510679   0.344028   0.42933      0.486164  0.951994  0.360656   0.599117
 0.401566   0.679092   0.854722     0.234696  0.416433  0.234901   0.814993
 0.0939432  0.604541   0.925085     0.387471  0.48294   0.786034   0.588253

A = rand(10,10)

46.0 μs

When we write iterative methods, it is frequently useful to add some type of monitoring functionality that allows us to keep track of the convergence of the iteration from outside. We do that here, too, by adding a keyword argument called monitor that takes the current iterate as an argument.

md"""

When we write iterative methods, it is frequently useful to add some type of monitoring functionality that allows us to keep track of the convergence of the iteration from outside. We do that here, too, by adding a keyword argument called `monitor` that takes the current iterate as an argument.

"""

123 μs

basic_power_method (generic function with 2 methods)

function basic_power_method(A, k, x=nothing; monitor=(x)->nothing)

n = size(A)[1]

# Start with a random vector

if x == nothing

x = randn(n)

end

x /= norm(x)

for j=1:k

monitor(x)

x = A*x

x /= norm(x)

end

2.0 ms

There are a few ways of measuring the residual for an approximate eigenvector.

One common approach is to measure the norm of $(I-xx^T) Ax$, where $x$ is assumed to be normalized to unit length. This is equivalent to looking at $Ax - \rho_A(x) x$ where $\rho_A(x) = x^T A x / x^T x$ is the Rayleigh quotient (frequently used as an eigenvalue estimate). If we want a relative error measure, it makes sense to look at $\|(I-xx^T)Ax\|/\|Ax\|$.

The test_power_method function uses the monitoring functionality of basic_power_method to compute residuals at each step.

md"""

There are a few ways of measuring the residual for an approximate eigenvector.

The `test_power_method` function uses the monitoring functionality of `basic_power_method` to compute residuals at each step.

"""

201 μs

eigenvec_resid (generic function with 1 method)

function eigenvec_resid(A, x)

u = x/norm(x)

y = A*u

ρ = y'*u

norm(y-ρ*x)/norm(y)

end

473 μs

test_power_method (generic function with 1 method)

function test_power_method(A, k)

rs = []

basic_power_method(A, k, monitor=(x)->push!(rs, eigenvec_resid(A, x)))

end

691 μs

md"""

"""

372 μs

begin

λs = eigvals(A)

abs_λs = sort(abs.(λs), rev=true)

plot(test_power_method(A, 20), yscale=:log10, label="Residual")

plot!((abs_λs[2]/abs_λs[1]).^(1:20), yscale=:log10, label="Expected rate")

end

2.8 s

Shift-invert

For this family of random matrices with uniform entries, we expect the first eigenvalue to be roughly $n/2$. Let's use this as a shift for the shift-invert iteration. Note that we factor the matrix once and keep re-using the factorization! As before, we use the monitoring functionality to keep track of things.

md"""

## Shift-invert

As before, we use the monitoring functionality to keep track of things.

"""

150 μs

shiftinv_power_method (generic function with 3 methods)

function shiftinv_power_method(A, k, σ=0.0, x=nothing; monitor=(x)->nothing)

n = size(A)[1]

# Factor the shifted matrix

F = lu(A-σ*I)

# Start with a random vector

if x == nothing

x = randn(n)

end

x /= norm(x)

for j=1:k

monitor(x)

x = F\x

x /= norm(x)

end

2.8 ms

test_shiftinv_power_method (generic function with 1 method)

function test_shiftinv_power_method(A, k, σ)

rs = []

shiftinv_power_method(A, k, σ, monitor=(x)->push!(rs, eigenvec_resid(A, x)))

end

753 μs

The shift-invert iteration is just an iteration with a new matrix $(A-\sigma I)^{-1}$, so the rate of convergence uses the same technique – but now we are looking at the eigenvalues of $(A-\sigma I)^{-1}$. Note that if $\lambda$ is an eigenvalue of $A$, then the spectral mapping theorem tells us that $(\lambda-\sigma)^{-1}$ is an eigenvalue of $(A-\sigma I)^{-1}$.

md"""

The shift-invert iteration is just an iteration with a new matrix $(A-\sigma I)^{-1}$, so the rate of convergence uses the same technique -- but now we are looking at the eigenvalues of $(A-\sigma I)^{-1}$. Note that if $\lambda$ is an eigenvalue of $A$, then the spectral mapping theorem tells us that $(\lambda-\sigma)^{-1}$ is an eigenvalue of $(A-\sigma I)^{-1}$.

"""

147 μs

begin

abs_si_λs = sort(abs.(1.0./(λs.-5.0)), rev=true)

plot(test_shiftinv_power_method(A, 20, 5.0), yscale=:log10, label="Residual")

plot!((abs_si_λs[2]/abs_si_λs[1]).^(1:20), yscale=:log10, label="Expected rate")

end

874 μs

Rayleigh quotient iteration

The Rayleigh quotient iteration is a variant of the shift-invert transformed power iteration in which we use the Rayleigh quotient as our shift at each step. Because the shifts change, we have to re-do the factorization every time, which seems expensive! We also now have less control over which eigenvalue we converge to, though we can try to guide the iteration by starting with a step of shift-invert with a fixed shift. But once it starts converging, the iteration converges much faster.

md"""

## Rayleigh quotient iteration

"""

159 μs

rq_method (generic function with 3 methods)

function rq_method(A, k, σ0=0.0, x=nothing; monitor=(x, ρ)->nothing)

n = size(A)[1]

# Start with one step of shift-inverse from a random vector

if x == nothing

x = (A-σ0*I)\randn(n)

end

x /= norm(x)

for j=1:k

ρ = x'*A*x

monitor(x, ρ)

x = (A-ρ*I)\x

x /= norm(x)

end

2.4 ms

test_rq_method (generic function with 1 method)

function test_rq_method(A, k, σ)

rs = []

rq_method(A, k, σ, monitor=(x, ρ)->push!(rs, eigenvec_resid(A, x)))

end

741 μs

When we plot the Rayleigh quotient iteration convergence on a semi-log scale, we see the shape of a downward-facing parabola. This indicates that the residual at step $k+1$ is not proportional to the residual at step $k$, but to the square of the residual at step $k$. This is known as quadratic convergence, and we will see it frequently when we start talking about Newton's method later in the course.

md"""

When we plot the Rayleigh quotient iteration convergence on a semi-log scale, we see the shape of a downward-facing parabola. This indicates that the residual at step $k+1$ is not proportional to the residual at step $k$, but to the *square* of the residual at step $k$. This is known as *quadratic convergence*, and we will see it frequently when we start talking about Newton's method later in the course.

"""

189 μs

plot(test_rq_method(A, 20, 5.0), yscale=:log10, label="Residual")

597 μs

Subspace iteration

So far, we have focused on only one vector at a time. What if we want more than one vector? We can do something very much like power iteration, but with more than one vector:

$$A Q_k = Q_{k+1} R_{k+1}$$

where $Q_k \in \mathbb{R}^{n \times s}$ and $R_k \in \mathbb{R}^{s \times s}$ for iterating toward an $s$-dimensional invariant subspace.

Because we want to ensure that we are not running into an invariant subspace associated with a complex conjugate pair, we will use a fixed matrix for this part.

md"""

## Subspace iteration

So far, we have focused on only one vector at a time. What if we want more than one vector? We can do something very much like power iteration, but with more than one vector:

$$A Q_k = Q_{k+1} R_{k+1}$$

where $Q_k \in \mathbb{R}^{n \times s}$ and $R_k \in \mathbb{R}^{s \times s}$ for iterating toward an $s$-dimensional invariant subspace.

Because we want to ensure that we are not running into an invariant subspace associated with a complex conjugate pair, we will use a fixed matrix for this part.

"""

210 μs

sympart (generic function with 1 method)

sympart(A) = (A+A')/2

254 μs

10×10 Matrix{Float64}:
 0.0218826  0.52758   0.633552  0.418954  0.478488   …  0.943454  0.630096   0.474326
 0.52758    0.637445  0.758358  0.746051  0.384418      0.374411  0.72308    0.536691
 0.633552   0.758358  0.996863  0.526729  0.87995       0.122651  0.647648   0.406375
 0.418954   0.746051  0.526729  0.883328  0.828262      0.600134  0.378186   0.49313
 0.478488   0.384418  0.87995   0.828262  0.0535102     0.753344  0.0696672  0.586647
 0.410837   0.543343  0.841718  0.188766  0.333772   …  0.536866  0.730405   0.59526
 0.0374956  0.702629  0.542772  0.732502  0.781873      0.460396  0.485395   0.38182
 0.943454   0.374411  0.122651  0.600134  0.753344      0.784184  0.452729   0.603211
 0.630096   0.72308   0.647648  0.378186  0.0696672     0.452729  0.859303   0.509452
 0.474326   0.536691  0.406375  0.49313   0.586647      0.603211  0.509452   0.235686

AA = sympart(rand(10,10))

41.1 μs

orth_iteration (generic function with 2 methods)

function orth_iteration(A, s, k, Q0=nothing; monitor=(Q)->nothing)

n = size(A)[1]

# Initial guess

if Q0 == nothing

Q, R = qr(randn(n,s))

Q = Q[:,1:s]

else

Q, R = qr(Q0)

Q = Q[:,1:s]

end

for j = 1:k

monitor(Q)

Q, R = qr(A*Q)

Q = Q[:,1:s]

end

2.5 ms

As before, we can monitor convergence via the relative residual norm

$$\|(I-Q_k Q_k^T) A Q_k\|/\|AQ_k\|$$

md"""

As before, we can monitor convergence via the relative residual norm

$$\|(I-Q_k Q_k^T) A Q_k\|/\|AQ_k\|$$

"""

109 μs

subspace_resid (generic function with 1 method)

function subspace_resid(A, Q)

Y = A*Q

norm(Y-Q*(Q'*Y))/norm(Y)

end

410 μs

test_orth_iteration (generic function with 1 method)

function test_orth_iteration(A, s, k)

rs = []

orth_iteration(A, s, k, monitor=(Q)->push!(rs, subspace_resid(A, Q)))

end

814 μs

And the rate of the convergence is predicted now by $|\lambda_{s+1}|/|\lambda_s|$ where $s$ is the dimension of the subspace.

md"""

And the rate of the convergence is predicted now by $|\lambda_{s+1}|/|\lambda_s|$ where $s$ is the dimension of the subspace.

"""

126 μs

begin

abs_λs_AA = sort(abs.(eigvals(AA)), rev=true)

plot(test_orth_iteration(AA, 2, 20), yscale=:log10, label="Residual")

plot!((abs_λs_AA[3]/abs_λs_AA[2]).^(1:20), yscale=:log10, label="Predicted rate")

end

1.4 ms