Notebook for 2023-03-13
The main topic of this lecture is basic stationary iterations for solving linear systems. As discussed in the notes, these iterations are typically associated with a splitting $A = M-N$ where $M$ is a matrix that is easy to solve with.
Splittings and convergence
A stationary iteration for $Ax = b$ is associated with a splitting $A = M-N$ where $M$ is a matrix that is easy to solve with. In terms of the splitting, we have
$$Mx = Nx + b$$
which is the fixed point equation for the iteration
$$Mx^{(k+1)} = Nx^{(k)} + b$$
We can also rewrite this in update form as
$$x^{(k+1)} = x^{(k)} + M^{-1}\left( b-Ax^{(k)} \right)$$
The error iteration is $e^{(k+1)} = R e^{(k)}$ where $R = M^{-1} N$. We typically analyze $R$ in terms of a bound on an operator norm (computed via opnorm
in Julia). But tight bounds on the rate of convergence of the iteration are given by the spectral radius $\rho(R)$, i.e. the largest magnitude of any eigenvalue of $R$.
If $v$ is an eigenvector associated with the largest magnitude eigenvalue, then $\|Rv\| = |\rho(R)| \|v\|$. So $\rho(R)$ is a lower bound on any operator norm of $R$, as $\rho(R) = \|Rv\|/\|v\| \leq \|R\|$.
Going the other way, if $R$ is diagonalizable, then there exists a vector norm such that $\rho(R)$ is equal to the associated operator norm. If $R$ is not diagonalizable, then for any $\epsilon > 0$ there is a vector norm such that the induced operator norm satisfies $\rho(R) \leq \|R\| \leq \rho(R) + \epsilon$.
spectral_radius (generic function with 1 method)
10×10 Matrix{Float64}:
0.57905 0.3999 0.950655 0.993927 … 0.655512 0.626451 0.592262
0.896937 0.638905 0.0958122 0.39459 0.424927 0.80666 0.496504
0.208288 0.696011 0.614289 0.265439 0.437093 0.319736 0.364303
0.11411 0.565658 0.601235 0.996179 0.730612 0.330163 0.440749
0.481979 0.805915 0.788288 0.773162 0.846622 0.192573 0.814053
0.438884 0.646805 0.319852 0.497737 … 0.966071 0.55545 0.360647
0.576505 0.788908 0.907664 0.284408 0.443088 0.0834277 0.493762
0.895204 0.178295 0.152112 0.984811 0.147581 0.0270541 0.80562
0.769117 0.757536 0.359267 0.716494 0.611082 0.559553 0.0101991
0.823437 0.313998 0.892947 0.970725 0.960149 0.683754 0.0212497
5.544225356637162
5.620282147857701
6.877471368765031
6.405730168285172
As a side note: if opnorm
computes the operator norm, what does norm
do when the argument is a matrix? The answer is that the matrix is treated as a vector, and the ordinary vector norm is computed. So norm(A)
in Julia computes the ordinary Euclidean norm of a vector composed of all entries of $A$ – that is, it computes the Frobenius norm of the matrix.
6.204586897865003
6.204586897865003
Richardson's iteration
We start with an iteration that is not actually in the lecture notes: Richardson's iteration. For $A = I-N$, Richardson's iteration for $Ax = b$ is
$$x^{(k+1)} = N x^{(k)} + b$$
This corresponds to the splitting $A = M-N$ where $A = I$.
If we start with $x^{(0)} = 0$, we have actually seen this iteration before in a different guise:
$$x^{(k)} = \sum_{j=0}^{k-1} N^j b = (I-N)^{-1} (I-N^k) b$$
This is precisely the $k$th partial sum of a Neumann series for $(I-N)^{-1}$ (multiplied by $b$)! In this case, the error at step $k$ is
$$e^{(k)} = (I-N)^{-1} b - x^{(k)} = (I-N)^{-1} N^k b$$
and we therefore have the relative error condition
$$\frac{\|e^{(k)}\|}{\|(I-N)^{-1} b\|} \leq \|N^k\| \leq \|N\|^k.$$
where the last inequality assumes a consistent norm. Under this assumption, convergence is guaranteed if $\|N\| < 1$. However, this is sufficient, but the necessary condition (for almost all $b$) is that $\rho(N)$ should be less than one, and $\rho(N)$ is the thing that tightly controls the rate of convergence.
richardson (generic function with 2 methods)
test_richardson (generic function with 1 method)
PageRank
Suppose $A$ is the (weighted) adjacency matrix for some graph. PageRank is a way of ranking the importance of different nodes in a graph according to a "random surfer" model. A surfer at node $i$ at time $t$ chooses where to move at time $t+1$ in one of two ways:
With probability $\alpha \in [0,1)$, move from $i$ to neighbor $j$. The "dampling probability" $\alpha$ is frequently chosen to be about $0.85$. Conditioned on taking a neighbor step, the neighbor node $j$ is chosen with probability $a_{ji}/d_i$.
Otherwise, with probability $1-\alpha$, "teleport" to some node $j$ in the graph with probability $\pi^{\mathrm{ref}}_j$. The reference probability is often chosen to be uniform ($\pi^{\mathrm{ref}}_j = 1/n$), but choosing a non-uniform reference is useful when defining "personalized" PageRank.
Let $AD^{-1}$ be the column-normalized transition matrix. Then we can write this process as
$$\pi^{(t+1)} = \alpha AD^{-1} \pi^{(t)} + (1-\alpha) \pi^{\mathrm{ref}}$$
With the setup from a moment ago, we can see that this looks like Richardson's iteration for the equation
$$(I-\alpha AD^{-1}) \pi = (1-\alpha) \pi^{\mathrm{ref}}$$
Because $AD^{-1}$ has columns that sum to 1, we know that $\|AD^{-1}\|_1 = 1$. We also know that $\rho(AD^{-1}) = 1$ in this case; this is true whenever we have a column stochastic matrix like $AD^{-1}$. And yet, the PageRank iteration decreases the error by more than a factor of $\alpha$ at each step. Why? Because we start off with the correct sum of the entries, which means $V^{-1} \pi^{(0)}$ has first entry zero – there is no error in the direction of the first eigenvector when we start with $\pi^{(0)} = \pi^{\mathrm{ref}}$. In this case, convergence is associated with the second largest magnitude eigenvalue.
I encourage you to check this for yourself. What happens to the plot below if you initialize the iteration with $\pi^{(0)} = 0$ rather than with $\pi^{(0)} = \pi^{\mathrm{ref}}$?
pagerank (generic function with 2 methods)
test_pagerank (generic function with 1 method)
Jacobi iteration
The Jacobi iteration takes $M$ to be the diagonal part of $A$. We say $A$ is *strictly (row) diagonally dominant$ if for every row $i$,
$$|a_{ii}| > \sum_{j\neq i} |a_{ij}|$$
For Jacobi iteration, the error iteration matrix $R = M^{-1} N$ has entries $a_{ij}/a_{ii}$ (with zeros on the diagonal), and so strict row diagonal dominance means that $\|R\|_\infty < 1$. Therefore, Jacobi iteration is guaranteed to converge for strictly row diagonally dominant matrices, with error strictly decreasing at each step in the $\infty$-norm.
It is convenient not to form $N = M-A$ explicitly. Instead, we work with the update form
$$x^{(k+1)} = x^{(k)} + M^{-1} (b-Ax^{(k)})$$
jacobi (generic function with 2 methods)
test_jacobi (generic function with 1 method)
Gauss-Seidel
The Gauss-Seidel iteration sets $M$ to be the lower triangle of $A$ (assuming the usual ordering of equations and unknowns). In update form, we have
$$x^{(k+1)} = x^{(k)} + L^{-1} (b-Ax^{(k)}).$$
If $A$ is symmetric and positive definite, then we showed in the notes that one Gauss-Seidel step can be seen as coordinate descent on
$$\phi(x) = \frac{1}{2} x^T A x - x^T b$$
where we minimize with respect to each $x_i$ in turn. This guarantees that $\phi(x^{(k)})$ is decreasing with $k$, and a little extra argument gives that Gauss-Seidel always converges for positive definite systems (though it is not guaranteed to converge very fast).
gauss_seidel (generic function with 2 methods)
test_gs (generic function with 1 method)
Tridiagonal test problem
Consider a two-point ODE boundary value problem
$$-\frac{d^2 u}{dx^2}(x) = f(x), \quad u(0) = u(1) = 0.$$
We discretize the problem by meshing $[0,1]$ with $N+1$ evenly spaced points $x_j = jh$ for $j = 0$ to $N+1$ where $h=1/(N+1)$ and applying the second-order finite difference approximation
$$\frac{d^2 u}{dx^2} = \frac{u(x-h)-2u(x)+u(x+h)}{h^2} + O(h^2)$$
Let $u_j$ be our approximation for $u(x_j)$; then we have
$$-u_{j-1}+2u_j-u_{j+1} = h^2 f_j, \quad j = 1, \ldots, N$$
or, in matrix notation
$$Tu = h^2 f$$
where $T$ is the tridiagonal matrix with $2$ on the diagonal and $-1$ on the first super- and subdiagonals. The vector $u$ in this case corresponds to the unknowns $u_1, \ldots, u_N$; the values $u_0 = u_N = 0.0$ are given by the boundary conditions.
Tpoisson (generic function with 1 method)
10×10 SymTridiagonal{Float64, Vector{Float64}}:
2.0 -1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
-1.0 2.0 -1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ -1.0 2.0 -1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ -1.0 2.0 -1.0 ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ -1.0 2.0 -1.0 ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ -1.0 2.0 -1.0 ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ -1.0 2.0 -1.0 ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -1.0 2.0 -1.0 ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -1.0 2.0 -1.0
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -1.0 2.0
The case $f = 1$ is simple enough for us to solve by hand; the solution in this case is $u(x) = \frac{1}{2} x(1-x)$. Let's check that this discrete approximation gives us something reasonable.
test_poisson_solve1 (generic function with 1 method)
Jacobi and Gauss-Seidel sweeps
We have seen the splitting perspective on Jacobi and Gauss-Seidel. Another perspective on the same iterations is that we are sweeping over the equations and unknowns, using equation $j$ to find an updated value for $u_j$ assuming all the other entries are correct. In the Jacobi case, we compute the updated values using the old values at each step, i.e.
$$-u_{j-1}^{(k)} + 2u_h^{(k+1)} - u_{j+1}^{(k)} = h^s f_j.$$
In the Gauss-Seidel case, we compute the updated values using the most recent versions of each variable, i.e.
$$-u_{j-1}^{(k+1)} + 2u_j^{(k+1)} - u_{j+1}^{(k)} = h^2 f_j.$$
Tjacobi (generic function with 2 methods)
Tgs (generic function with 2 methods)
test_poisson_solve2 (generic function with 1 method)
For this model problem, Gauss-Seidel converges about twice as fast as Jacobi. But neither is all that fast as the number of points gets big.