Notebook for 2023-02-10

Block Gaussian elimination

The main topic for this lecture was thinking about Gaussian elimination in a blockwise fashion. In particular, we considered solving block 2-by-2 linear systems with matrices of the form

$$A = \begin{bmatrix} B & V \\ W^T & C \end{bmatrix}$$

A particular instance of this is solving the linear system

$$\begin{bmatrix} B & V \\ W^T & -I \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} d \\ 0 \end{bmatrix}$$

In terms of two block equations, this is

$$\begin{aligned} Bx + Vy &= d \\ W^T x - y &= 0 \end{aligned}$$

and substituting $y = W^T x$ (from the latter equation) into the former equation gives

$$(B + VW^T)x = d$$

Going the other way, we can eliminate the $x$ variables to get an equation just in $y$:

$$(-I-W^T B^{-1} V) y = -W^T B^{-1} d$$

and then substitute into $Bx = d-Vy$ to solve the system.

Let's go ahead and illustrate how to use this to solve $(B + VW^T) x = d$ given a solver for systems with $B$.

md"""

# Notebook for 2023-02-10

## Block Gaussian elimination

The main topic for this lecture was thinking about Gaussian elimination in a blockwise fashion. In particular, we considered solving block 2-by-2 linear systems with matrices of the form

$$A = \begin{bmatrix} B & V \\ W^T & C \end{bmatrix}$$

A particular instance of this is solving the linear system

$$\begin{bmatrix} B & V \\ W^T & -I \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} d \\ 0 \end{bmatrix}$$

In terms of two block equations, this is

$$\begin{aligned}

Bx + Vy &= d \\

W^T x - y &= 0

\end{aligned}$$

and substituting $y = W^T x$ (from the latter equation) into the former equation gives

$$(B + VW^T)x = d$$

9.6 ms

using LinearAlgebra

272 μs

solve_bordered (generic function with 1 method)

# Solves the system (B+VW') x = d given a solver solveB(g) = B\g

function solve_bordered(solveB, V, W, d)

invBV = solveB(V)

invBd = solveB(d)

S = I + W'*invBV

y = S\(W'*invBd)

invBd - invBV*y

end

462 μs

test_solve_bordered (generic function with 1 method)

function test_solve_bordered()

B = rand(10,10)

V = rand(10,2)

W = rand(10,2)

x = rand(10)

d = B*x+V*(W'*x)

solveB(y) = B\y

err = x - solve_bordered(solveB, V, W, d)

norm(err)/norm(x)

end

835 μs

2.802295651434949e-15

test_solve_bordered()

987 μs

Iterative refinement

The trick of forming a bordered system is interesting in part because it allows us to take advantage of a fast solver for one system (from an existing factorization, for example) in order to solve a slightly modified system. But there's a cost! Even if the modified system is well-conditioned, poor conditioning in the reference problem (the $B$ matrix) can lead to quite bad results.

md"""

## Iterative refinement

"""

171 μs

test_sensitive_bordered (generic function with 1 method)

function test_sensitive_bordered()

s = ones(10)

s[10] = 1e-12

B = rand(10,10)*Diagonal(s)*rand(10,10)

V = rand(10,2)

W = rand(10,2)

x = rand(10)

d = B*x+V*(W'*x)

solveB(y) = B\y

err = x - solve_bordered(solveB, V, W, d)

norm(err)/norm(x)

end

913 μs

0.15406594322491596

test_sensitive_bordered()

67.0 μs

A technique in the notes (but not discussed in lecture) can be used to partly fix up this bad behavior. When we have a somewhat sloppy solver like the one here, but we're able to compute a good residual (and the problem we're solving is not terribly ill conditioned) sometimes a few steps of iterative refinement can make a big difference.

md"""

"""

169 μs

test_itref (generic function with 1 method)

function test_itref()

s = ones(10)

s[10] = 1e-12

B = rand(10,10)*Diagonal(s)*rand(10,10)

V = rand(10,2)

W = rand(10,2)

x = rand(10)

d = B*x+V*(W'*x)

solveB(y) = B\y

relerrs = zeros(5)

xx = solve_bordered(solveB, V, W, d)

relerrs[1] = norm(x-xx)/norm(x)

for j = 2:5

xx += solve_bordered(solveB, V, W, d-B*xx-V*(W'*xx))

relerrs[j] = norm(x-xx)/norm(x)

end

relerrs

end

1.5 ms

Float64

0.0270347

1.23315e-5

1.84368e-8

4.94402e-11

1.20103e-13

test_itref()

54.0 μs

Sensitivity

We also spent a little time in this lecture talking about the sensitivity of linear systems to perturbations. In particular, we differentiated the equation

$$Ax = b$$

which, using variational notation, gave us

$$(\delta A) x + A (\delta x) = \delta b$$

In addition to being useful for error analysis, this formula is actually something you can use computationally, as we illustrate below.

md"""

## Sensitivity

We also spent a little time in this lecture talking about the sensitivity of linear systems to perturbations. In particular, we differentiated the equation

$$Ax = b$$

which, using variational notation, gave us

$$(\delta A) x + A (\delta x) = \delta b$$

In addition to being useful for error analysis, this formula is actually something you can use computationally, as we illustrate below.

"""

191 μs

test_perturbed (generic function with 1 method)

function test_perturbed()

# Set up a test problem and directions to move A and b

A = rand(10,10)

δA = rand(10,10)

b = rand(10)

δb = rand(10)

# Compute solutions to a reference and perturbed system

h = 1e-8

x = A\b

x̂ = (A+h*δA)\(b+h*δb)

# Compute the directional deriv δx and a finite difference estimate

δx_est = (x̂-x)/h

δx = A\(δb-δA*x)

# Compare the analytic derivative to the finite difference estimate

norm(δx-δx_est)/norm(δx)

end

742 μs

5.359429109831612e-8

test_perturbed()

72.2 μs