Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

Manipulating matrices

In this notebook, we play with some of the ideas introduced in the Jan 26 lecture notes, expressed concretely in Julia. In addition to using the Plots package, we will want to use the LinearAlgebra package in this notebook (and in most of the rest of the class!).

md"""
# Manipulating matrices

In this notebook, we play with some of the ideas introduced in the Jan 26 lecture notes, expressed concretely in Julia. In addition to using the `Plots` package, we will want to use the `LinearAlgebra` package in this notebook (and in most of the rest of the class!).
"""
160 μs
using Plots
2.7 s
using LinearAlgebra
231 μs

Twelve commandments

When Charlie Van Loan teaches matrix computations, he talks about twelve commandments of matrix manipulation. These are listed in the lecture notes; here, I'd like to illustrate them in code. Throughout, we'll use some small concrete matrices and vectors to illustrate the point.

md"""
## Twelve commandments

When Charlie Van Loan teaches matrix computations, he talks about twelve commandments of matrix manipulation. These are listed in the lecture notes; here, I'd like to illustrate them in code. Throughout, we'll use some small concrete matrices and vectors to illustrate the point.
"""
149 μs
A
2×2 Matrix{Float64}:
 1.0  2.0
 3.0  5.0
A = [1.0 2.0; 3.0 5.0]
21.0 μs
B
2×2 Matrix{Float64}:
 7.0   8.0
 9.0  12.0
B = [7.0 8.0; 9.0 12.0]
12.1 μs
x
x = [1.0; 2.0]
7.5 μs

Matrix × vector = linear combination of matrix columns.

We discussed this some in Monday's notebook. There are a couple of ways of writing a matrix-vector product, but the column oriented approach is equivalent to taking a weighted linear combination of the columns of the matrix. So if we write a matrix as a collection of columns

$$A = \begin{bmatrix} a_1 & a_2 & \ldots & a_n \end{bmatrix}$$

then

$$Ax = \sum_{j=1}^n a_j x_j$$

md"""
#### Matrix × vector = linear combination of matrix columns.

We discussed this some in Monday's notebook. There are a couple of ways of writing a matrix-vector product, but the column oriented approach is equivalent to taking a weighted linear combination of the columns of the matrix. So if we write a matrix as a collection of columns

$$A = \begin{bmatrix} a_1 & a_2 & \ldots & a_n \end{bmatrix}$$

then

$$Ax = \sum_{j=1}^n a_j x_j$$
"""
182 μs
matvec_col (generic function with 1 method)
function matvec_col(A, x)
m, n = size(A)
y = zeros(m)
for j = 1:n
y += A[:,j]*x[j]
end
y
end
796 μs
21.2 μs
41.0 μs

Inner product = sum of products of corresponding elements

We have to be a little careful about this – this is the standard inner product for a real vector space. We will talk about more general inner products later in the class. But for the real standard inner product, we do have

$$x \cdot y = \sum_{j=1}^n x_j y_j$$

md"""
#### Inner product = sum of products of corresponding elements

We have to be a little careful about this -- this is the *standard* inner product for a *real* vector space. We will talk about more general inner products later in the class. But for the real standard inner product, we do have

$$x \cdot y = \sum_{j=1}^n x_j y_j$$
"""
240 μs
my_dot (generic function with 1 method)
function my_dot(x, y)
result = 0.0
for j = 1:length(x)
result += x[j]*y[j]
end
end
581 μs
5.0
7.8 μs
5.0
x'*x
27.8 μs

Order of operations matters to performance

Suppose we have three long vectors $u, v, w \in \mathbb{R}^n$ and we want to compute

$$y = uv^T w$$

The first way to do it, which is what we will get with the Julia code u*v'*w effectively computes

$$y = (uv^T) w$$

That is, we first compute the outer product $uv^T$ (in $O(n^2)$ time), then do a matrix-vector product (in $O(n^2)$ time.) The total time scales like $n^2$. Alternately, we can use associativity to write

$$y = u (v^T w)$$

In this version, we first compute the dot product $v^T w$ (in $O(n)$ time), then scale $u$ by that value (in $O(n)$ time).

This may seem like a contrived example, but a lot of slow matrix code is slow due to exactly this sort of goof.

md"""
#### Order of operations matters to performance

Suppose we have three long vectors $u, v, w \in \mathbb{R}^n$ and we want to compute

$$y = uv^T w$$

The first way to do it, which is what we will get with the Julia code `u*v'*w` effectively computes

$$y = (uv^T) w$$

That is, we first compute the outer product $uv^T$ (in $O(n^2)$ time), then do a matrix-vector product (in $O(n^2)$ time.) The total time scales like $n^2$. Alternately, we can use associativity to write

$$y = u (v^T w)$$

In this version, we first compute the dot product $v^T w$ (in $O(n)$ time), then scale $u$ by that value (in $O(n)$ time).

This may seem like a contrived example, but a lot of slow matrix code is slow due to exactly this sort of goof.
"""
304 μs
ubig
ubig = rand(5000)
20.5 μs
vbig
vbig = rand(5000)
13.7 μs
wbig
wbig = rand(5000)
12.8 μs
0.069244292
@elapsed ubig*vbig'*wbig
71.0 ms
8.25e-6
@elapsed ubig*(vbig'*wbig)
117 μs

Matrix × diagonal = scaling of the matrix columns

If $D$ is a diagonal matrix with diagonal entries $d_1, \ldots, d_n$ and $A$ is written column-by-column as $A = \begin{bmatrix} a_1 & a_2 & \ldots & a_n \end{bmatrix}$, then $AD = \begin{bmatrix} a_1 d_1 & a_2 d_2 & \ldots & a_n d_n \end{bmatrix}$. We certainly do not need to explicitly form and multiply by a dense representation of the matrix $D$ to compute $AD$; we're better off just scaling the columns directly! Alternately, we can use the Diagonal type in Julia, which constructs and works with an abstract diagonal matrix stored as just the diagonal entries.

md"""
#### Matrix × diagonal = scaling of the matrix columns

If $D$ is a diagonal matrix with diagonal entries $d_1, \ldots, d_n$ and $A$ is written column-by-column as $A = \begin{bmatrix} a_1 & a_2 & \ldots & a_n \end{bmatrix}$, then $AD = \begin{bmatrix} a_1 d_1 & a_2 d_2 & \ldots & a_n d_n \end{bmatrix}$. We certainly do not need to explicitly form and multiply by a dense representation of the matrix $D$ to compute $AD$; we're better off just scaling the columns directly! Alternately, we can use the `Diagonal` type in Julia, which constructs and works with an abstract diagonal matrix stored as just the diagonal entries.
"""
199 μs
2×2 Matrix{Float64}:
 1.0   4.0
 3.0  10.0
A * diagm(x) # BAD: Explicitly forms the diagonal, O(n^3) cost
25.6 μs
2×2 Matrix{Float64}:
 1.0   4.0
 3.0  10.0
A .* x' # Better: Apply scaling using Julia broadcast ops
31.9 ms
2×2 Matrix{Float64}:
 1.0   4.0
 3.0  10.0
A * Diagonal(x) # Also good: Use a special representation for diagonal matrices
30.9 μs

To see the difference between diagm and Diagonal, it is useful to look directly at the matrix representations. Notice that they don't look the same! The result of diagm keeps a bunch of explicit zeros; the Diagonal representation doesn't bother to store the off-diagonal zeros.

md"""
To see the difference between `diagm` and `Diagonal`, it is useful to look directly at the matrix representations. Notice that they don't look the same! The result of `diagm` keeps a bunch of explicit zeros; the `Diagonal` representation doesn't bother to store the off-diagonal zeros.
"""
131 μs
2×2 Matrix{Float64}:
 1.0  0.0
 0.0  2.0
diagm(x)
8.0 μs
2×2 Diagonal{Float64, Vector{Float64}}:
 1.0   ⋅ 
  ⋅   2.0
Diagonal(x)
6.6 μs

Diagonal × matrix = scaling of the matrix rows

The product $DA$ scales the rows of $A$ just as $AD$ scales the columns. As with the column scaling, we shouldn't ever form a dense diagonal matrix; we can be much more efficient in storage and time by using a Diagonal representation or by using Julia's broadcast operations or something similar.

md"""
#### Diagonal × matrix = scaling of the matrix rows

The product $DA$ scales the rows of $A$ just as $AD$ scales the columns. As with the column scaling, we shouldn't ever form a dense diagonal matrix; we can be much more efficient in storage and time by using a `Diagonal` representation or by using Julia's broadcast operations or something similar.
"""
156 μs
2×2 Matrix{Float64}:
 1.0   2.0
 6.0  10.0
A .* x # Computes row scalings via Julia broadcast ops (could also do x .* A)
33.6 ms
2×2 Matrix{Float64}:
 1.0   2.0
 6.0  10.0
x .* A # I said we could do x .* A -- may as well show it's true!
33.3 ms
2×2 Matrix{Float64}:
 1.0   2.0
 6.0  10.0
Diagonal(x) * A
16.0 μs

Never form an explicit diagonal matrix

This should perhaps be rephrased as "never form an explicit dense representation of a diagonal matrix." Working with something like Diagonal is fine.

md"""
#### Never form an explicit diagonal matrix

This should perhaps be rephrased as "never form an explicit *dense* representation of a diagonal matrix." Working with something like `Diagonal` is fine.
"""
168 μs

Never form an explicit rank one matrix

A rank one matrix can always be written $uv^T$ (an outer product) for some vectors $u$ and $v$. Such matrices arise regularly in matrix computations, and it is almost never worth writing down a dense representation. Rather than computing and storing every pairwise product of the entries of $u$ and $v$, we're much better off trying to just store $u$ and $v$, and manipulate any computations on the outer product to just involve these two vectors. We saw an example above when we were talking about multipling a rank one matrix by a vector – this is $O(n^2)$ when done naively by explicitly forming the rank one matrix, but is only $O(n)$ if done a bit more cleverly.

md"""
#### Never form an explicit rank one matrix

A rank one matrix can always be written $uv^T$ (an *outer product*) for some vectors $u$ and $v$. Such matrices arise regularly in matrix computations, and it is almost never worth writing down a dense representation. Rather than computing and storing every pairwise product of the entries of $u$ and $v$, we're much better off trying to just store $u$ and $v$, and manipulate any computations on the outer product to just involve these two vectors. We saw an example above when we were talking about multipling a rank one matrix by a vector -- this is $O(n^2)$ when done naively by explicitly forming the rank one matrix, but is only $O(n)$ if done a bit more cleverly.
"""
227 μs

Matrix × matrix = collection of matrix-vector products

There are several ways to think of taking a product of two matrices. One natural one involves thinking of the second matrix column-by-column: if $B = \begin{bmatrix} b_1 & \ldots & b_n\end{bmatrix}$, then $AB = \begin{bmatrix} Ab_1 & \ldots & Ab_n \end{bmatrix}$.

md"""
#### Matrix × matrix = collection of matrix-vector products

There are several ways to think of taking a product of two matrices. One natural one involves thinking of the second matrix column-by-column: if $B = \begin{bmatrix} b_1 & \ldots & b_n\end{bmatrix}$, then $AB = \begin{bmatrix} Ab_1 & \ldots & Ab_n \end{bmatrix}$.
"""
147 μs
matmul_col (generic function with 1 method)
function matmul_col(A, B)
m, p = size(A)
p, n = size(B)
C = zeros(m, n)
for j = 1:n
C[:,j] = A*B[:,j]
end
C
end
847 μs
2×2 Matrix{Float64}:
 25.0  32.0
 66.0  84.0
12.3 μs
2×2 Matrix{Float64}:
 25.0  32.0
 66.0  84.0
20.4 μs

Matrix × matrix = collection of dot products

This way of thinking about matrix-matrix products is the one that many learn the first time that they see matrix multiplication.

The implementation illustrates something that may not be immediately obvious in Julia: because one-dimensional arrays are interpreted as column vectors and a slicing operation like A[i,:] gives a one-dimensional array (assuming i is a scalar index), we need to force Julia to recognize that this is a row vector (adjoint vector) when we implement the dot product.

md"""
#### Matrix × matrix = collection of dot products

This way of thinking about matrix-matrix products is the one that many learn the first time that they see matrix multiplication.

The implementation illustrates something that may not be immediately obvious in Julia: because one-dimensional arrays are interpreted as column vectors and a slicing operation like `A[i,:]` gives a one-dimensional array (assuming `i` is a scalar index), we need to force Julia to recognize that this is a row vector (adjoint vector) when we implement the dot product.
"""
182 μs
matmul_dots (generic function with 1 method)
function matmul_dots(A, B)
m, p = size(A)
p, n = size(B)
C = zeros(m, n)
for i = 1:m
for j = 1:n
C[i,j] = A[i,:]'*B[:,j]
end
end
C
end
1.1 ms
2×2 Matrix{Float64}:
 25.0  32.0
 66.0  84.0
14.5 μs

Matrix × matrix = sum of rank one matrices

We saw this approach in the last notebook as well: matrix-matrix products can be seen as a sum of outer products between columns of the first matrix and rows of the second.

md"""
#### Matrix × matrix = sum of rank one matrices

We saw this approach in the last notebook as well: matrix-matrix products can be seen as a sum of *outer products* between columns of the first matrix and rows of the second.
"""
160 μs
matmul_outer (generic function with 1 method)
function matmul_outer(A, B)
m, p = size(A)
p, n = size(B)
C = zeros(m, n)
for k = 1:p
C += A[:,k]*B[k,:]'
end
C
end
885 μs
2×2 Matrix{Float64}:
 25.0  32.0
 66.0  84.0
26.0 μs

$AB$ involves linear combinations of rows from $B$

md"""
#### $AB$ involves linear combinations of rows from $B$
"""
107 μs
matmul_rows (generic function with 1 method)
function matmul_rows(A, B)
m, p = size(A)
p, n = size(B)
C = zeros(m, n)
for i = 1:m
C[i,:] = A[i,:]'*B
end
C
end
898 μs
2×2 Matrix{Float64}:
 25.0  32.0
 66.0  84.0
20.5 μs

$AB$ involves linear combinations of columns from $A$

We really addressed this above when we said that a matrix-matrix product is a collection of matrix-vector products, since a matrix-vector product involves a linear combination of the columns of the matrix.

md"""
#### $AB$ involves linear combinations of columns from $A$

We really addressed this above when we said that a matrix-matrix product is a collection of matrix-vector products, since a matrix-vector product involves a linear combination of the columns of the matrix.
"""
142 μs

Block matrices

We often like to think of matrices not as arrays of numbers, but as arrays of submatrices. For example, if $A \in \mathbb{R}^{n \times n}$, $b, c \in \mathbb{R}^n$ and $d \in \mathbb{R}$, then I might write a block 2-by-2 matrix

$$M = \begin{bmatrix} A & b \\ c^T & d \end{bmatrix} \in \mathbb{R}^{(n+1) \times (n+1)}$$

where the elements (in the case $n = 2$) look like

$$M = \begin{bmatrix} a_{11} & a_{12} & b_1 \\ a_{21} & a_{22} & b_2 \\ c_1 & c_2 & d \end{bmatrix}$$

Julia's horizontal and vertical concatenation operations to the "right" thing with block matrices, allowing us to easily form bigger matrices.

md"""
## Block matrices

We often like to think of matrices not as arrays of numbers, but as arrays of submatrices. For example, if $A \in \mathbb{R}^{n \times n}$, $b, c \in \mathbb{R}^n$ and $d \in \mathbb{R}$, then I might write a block 2-by-2 matrix

$$M = \begin{bmatrix} A & b \\ c^T & d \end{bmatrix} \in \mathbb{R}^{(n+1) \times (n+1)}$$

where the elements (in the case $n = 2$) look like

$$M = \begin{bmatrix}
a_{11} & a_{12} & b_1 \\
a_{21} & a_{22} & b_2 \\
c_1 & c_2 & d
\end{bmatrix}$$

Julia's horizontal and vertical concatenation operations to the "right" thing with block matrices, allowing us to easily form bigger matrices.
"""
220 μs
b
b = [7; 11]
7.5 μs
c
c = [13; 14]
9.2 μs
d
42
d = 42
6.7 μs
M
3×3 Matrix{Float64}:
  1.0   2.0   7.0
  3.0   5.0  11.0
 13.0  14.0  42.0
M = [A b; c' d]
154 ms

The "commandments" described above apply to block matrices as well as they do to ordinary matrices. For example, we can write matrix-vector products with $M$ in terms of the block columns times subvectors.

md"""
The "commandments" described above apply to block matrices as well as they do to ordinary matrices. For example, we can write matrix-vector products with $M$ in terms of the block columns times subvectors.
"""
114 μs
z
z = rand(3)
13.7 μs
21.1 μs
[A; c']*z[1:2] + [b; d]*z[3]
55.7 μs

Alternately, we can write a matrix-vector product in terms of rows.

md"""
Alternately, we can write a matrix-vector product in terms of rows.
"""
104 μs
[ [A b]*z ; [c' d]*z ]
31.3 μs

Julia also frequently does something reasonable if you try to build block matrices involving diagonal matrices (for example):

md"""
Julia also frequently does something reasonable if you try to build block matrices involving diagonal matrices (for example):
"""
106 μs
M2
3×3 Matrix{Float64}:
  1.0   0.0   7.0
  0.0   2.0  11.0
 13.0  14.0  42.0
M2 = [Diagonal(x) b; c' d]
105 ms

Notice that the type of the result in this case is a sparse matrix. This uses a different internal data structure than the dense matrices we've seen so far. A dense matrix is a regular array of all the entries, laid out column after column (in languages like Julia). In some matrices, that would include a lot of zeros. But in a sparse matrix data structure, we store only the nonzero values (but also have to keep an indexing structure to remember where they are).

We will frequently see cases where a block matrix includes some blocks with special structure (and other blocks that are ordinary dense matrices). In this case, we may want to work with the pieces of the block matrix separately, rather than packing them into a single standard data structure.

md"""
Notice that the type of the result in this case is a *sparse* matrix. This uses a different internal data structure than the dense matrices we've seen so far. A dense matrix is a regular array of all the entries, laid out column after column (in languages like Julia). In some matrices, that would include a lot of zeros. But in a sparse matrix data structure, we store only the nonzero values (but also have to keep an indexing structure to remember where they are).

We will frequently see cases where a block matrix includes some blocks with special structure (and other blocks that are ordinary dense matrices). In this case, we may want to work with the pieces of the block matrix separately, rather than packing them into a single standard data structure.
"""
189 μs
M2*z # OK, but involves explicitly forming M2
11.7 μs
[x .* z[1:2] + b*z[3]; c'*z[1:2] + d*z[3]] # Same computation, does something special with the diagonal block.
25.8 μs

Matrix shapes

We have already talked about one type of matrix that is mostly zeros, but we will see a lot of other such matrices as well. We sometimes talk about the nonzero pattern (or sparsity pattern) in terms of the "shape" of a matrix. Two types of matrix shapes that we'll see a lot in the first part of the class are lower and upper triangular matrices. In fact, Julia has explicit functions for getting the lower and upper triangular parts out of a standard dense matrix.

md"""
## Matrix shapes

We have already talked about one type of matrix that is mostly zeros, but we will see a lot of other such matrices as well. We sometimes talk about the nonzero pattern (or sparsity pattern) in terms of the "shape" of a matrix. Two types of matrix shapes that we'll see a lot in the first part of the class are lower and upper triangular matrices. In fact, Julia has explicit functions for getting the lower and upper triangular parts out of a standard dense matrix.
"""
147 μs
Arand
5×5 Matrix{Float64}:
 0.402757  0.889451  0.659938  0.706049   0.782657
 0.712714  0.912812  0.736406  0.0623196  0.196461
 0.639698  0.754787  0.078397  0.936217   0.617625
 0.206513  0.705258  0.85326   0.313553   0.43446
 0.655685  0.804931  0.739925  0.288234   0.575178
Arand = rand(5,5)
12.5 μs
5×5 Matrix{Float64}:
 0.402757  0.889451  0.659938  0.706049   0.782657
 0.0       0.912812  0.736406  0.0623196  0.196461
 0.0       0.0       0.078397  0.936217   0.617625
 0.0       0.0       0.0       0.313553   0.43446
 0.0       0.0       0.0       0.0        0.575178
triu(Arand)
10.5 μs
5×5 Matrix{Float64}:
 0.402757  0.0       0.0       0.0       0.0
 0.712714  0.912812  0.0       0.0       0.0
 0.639698  0.754787  0.078397  0.0       0.0
 0.206513  0.705258  0.85326   0.313553  0.0
 0.655685  0.804931  0.739925  0.288234  0.575178
tril(Arand)
12.8 μs

Alternately, we can construct upper and lower triangular views on the matrix Arand. These can be used in expressions in the same way that triu(Arand) or tril(Arand) could, but they don't involve any new storage; it's just re-interpreting the Arand storage by ignoring part of the matrix as all zeros.

md"""
Alternately, we can construct upper and lower triangular *views* on the matrix `Arand`. These can be used in expressions in the same way that `triu(Arand)` or `tril(Arand)` could, but they don't involve any new storage; it's just re-interpreting the `Arand` storage by ignoring part of the matrix as all zeros.
"""
156 μs
5×5 UpperTriangular{Float64, Matrix{Float64}}:
 0.402757  0.889451  0.659938  0.706049   0.782657
  ⋅        0.912812  0.736406  0.0623196  0.196461
  ⋅         ⋅        0.078397  0.936217   0.617625
  ⋅         ⋅         ⋅        0.313553   0.43446
  ⋅         ⋅         ⋅         ⋅         0.575178
UpperTriangular(Arand)
7.7 μs
5×5 LowerTriangular{Float64, Matrix{Float64}}:
 0.402757   ⋅         ⋅         ⋅         ⋅ 
 0.712714  0.912812   ⋅         ⋅         ⋅ 
 0.639698  0.754787  0.078397   ⋅         ⋅ 
 0.206513  0.705258  0.85326   0.313553   ⋅ 
 0.655685  0.804931  0.739925  0.288234  0.575178
LowerTriangular(Arand)
7.0 μs

For these matrices, we can just look at all the numbers. For bigger matrices, it is sometimes useful to have a visualization that shows us the nonzero structure. A spy plot is such a visualization: we put a dot wherever there is a nonzero (colored by value, in this case) and leave blank space for zero entries.

md"""
For these matrices, we can just look at all the numbers. For bigger matrices, it is sometimes useful to have a visualization that shows us the nonzero structure. A *spy plot* is such a visualization: we put a dot wherever there is a nonzero (colored by value, in this case) and leave blank space for zero entries.
"""
142 μs
spy(triu(rand(100,100)))
3.1 s

When we get to eigenvalue problems, we'll often see shapes that look like "triangular plus an extra diagonal." These are known as upper or lower Hessenberg matrices (depending whether the triangular part is upper or lower). Julia supports routines to pull out the upper/lower Hessenberg part of a matrix, too.

md"""
When we get to eigenvalue problems, we'll often see shapes that look like "triangular plus an extra diagonal." These are known as upper or lower *Hessenberg* matrices (depending whether the triangular part is upper or lower). Julia supports routines to pull out the upper/lower Hessenberg part of a matrix, too.
"""
142 μs
5×5 Matrix{Float64}:
 0.402757  0.889451  0.659938  0.706049   0.782657
 0.712714  0.912812  0.736406  0.0623196  0.196461
 0.0       0.754787  0.078397  0.936217   0.617625
 0.0       0.0       0.85326   0.313553   0.43446
 0.0       0.0       0.0       0.288234   0.575178
triu(Arand,-1)
30.0 μs
5×5 Matrix{Float64}:
 0.402757  0.889451  0.0       0.0       0.0
 0.712714  0.912812  0.736406  0.0       0.0
 0.639698  0.754787  0.078397  0.936217  0.0
 0.206513  0.705258  0.85326   0.313553  0.43446
 0.655685  0.804931  0.739925  0.288234  0.575178
tril(Arand, 1)
11.3 μs

The second argument to each of these calls is the number of a diagonal: diagonal zero is the main diagonal, +1 is the first superdiagonal (first diagonal above the main diagonal), -1 is the first subdiagonal (first diagonal below the main diagonal), and so on.

md"""
The second argument to each of these calls is the number of a diagonal: diagonal zero is the main diagonal, +1 is the first superdiagonal (first diagonal above the main diagonal), -1 is the first subdiagonal (first diagonal below the main diagonal), and so on.
"""
114 μs