Manipulating matrices
In this notebook, we play with some of the ideas introduced in the Jan 26 lecture notes, expressed concretely in Julia. In addition to using the Plots
package, we will want to use the LinearAlgebra
package in this notebook (and in most of the rest of the class!).
Twelve commandments
When Charlie Van Loan teaches matrix computations, he talks about twelve commandments of matrix manipulation. These are listed in the lecture notes; here, I'd like to illustrate them in code. Throughout, we'll use some small concrete matrices and vectors to illustrate the point.
2×2 Matrix{Float64}:
1.0 2.0
3.0 5.0
2×2 Matrix{Float64}:
7.0 8.0
9.0 12.0
1.0
2.0
Matrix × vector = linear combination of matrix columns.
We discussed this some in Monday's notebook. There are a couple of ways of writing a matrix-vector product, but the column oriented approach is equivalent to taking a weighted linear combination of the columns of the matrix. So if we write a matrix as a collection of columns
$$A = \begin{bmatrix} a_1 & a_2 & \ldots & a_n \end{bmatrix}$$
then
$$Ax = \sum_{j=1}^n a_j x_j$$
matvec_col (generic function with 1 method)
5.0
13.0
5.0
13.0
Inner product = sum of products of corresponding elements
We have to be a little careful about this – this is the standard inner product for a real vector space. We will talk about more general inner products later in the class. But for the real standard inner product, we do have
$$x \cdot y = \sum_{j=1}^n x_j y_j$$
my_dot (generic function with 1 method)
5.0
5.0
Order of operations matters to performance
Suppose we have three long vectors $u, v, w \in \mathbb{R}^n$ and we want to compute
$$y = uv^T w$$
The first way to do it, which is what we will get with the Julia code u*v'*w
effectively computes
$$y = (uv^T) w$$
That is, we first compute the outer product $uv^T$ (in $O(n^2)$ time), then do a matrix-vector product (in $O(n^2)$ time.) The total time scales like $n^2$. Alternately, we can use associativity to write
$$y = u (v^T w)$$
In this version, we first compute the dot product $v^T w$ (in $O(n)$ time), then scale $u$ by that value (in $O(n)$ time).
This may seem like a contrived example, but a lot of slow matrix code is slow due to exactly this sort of goof.
0.661998
0.944213
0.753035
0.0194729
0.391572
0.161135
0.401238
0.631315
0.362677
0.378774
0.841039
0.738694
0.65871
0.636017
0.814609
0.491404
0.387279
0.664395
0.689704
0.973597
0.394442
0.593125
0.375151
0.616074
0.106672
0.498431
0.469438
0.0430491
0.979686
0.303477
0.777514
0.325297
0.382231
0.189065
0.946966
0.146542
0.910385
0.660291
0.298673
0.415225
0.0519509
0.900302
0.337033
0.60141
0.433128
0.491886
0.968077
0.383724
0.861235
0.12555
0.978727
0.0351349
0.423166
0.0935103
0.9191
0.845066
0.626109
0.458887
0.234283
0.781892
0.900063
0.830218
0.253338
0.692736
0.101829
0.715242
0.0384615
0.308713
0.840626
0.924361
0.855639
0.0354514
0.140221
0.348288
0.0931208
0.345303
0.140562
0.879647
0.666268
0.452288
0.769914
0.259598
0.96201
0.654832
0.378024
0.922292
0.790692
0.79274
0.307229
0.216268
0.069244292
8.25e-6
Matrix × diagonal = scaling of the matrix columns
If $D$ is a diagonal matrix with diagonal entries $d_1, \ldots, d_n$ and $A$ is written column-by-column as $A = \begin{bmatrix} a_1 & a_2 & \ldots & a_n \end{bmatrix}$, then $AD = \begin{bmatrix} a_1 d_1 & a_2 d_2 & \ldots & a_n d_n \end{bmatrix}$. We certainly do not need to explicitly form and multiply by a dense representation of the matrix $D$ to compute $AD$; we're better off just scaling the columns directly! Alternately, we can use the Diagonal
type in Julia, which constructs and works with an abstract diagonal matrix stored as just the diagonal entries.
2×2 Matrix{Float64}:
1.0 4.0
3.0 10.0
2×2 Matrix{Float64}:
1.0 4.0
3.0 10.0
2×2 Matrix{Float64}:
1.0 4.0
3.0 10.0
To see the difference between diagm
and Diagonal
, it is useful to look directly at the matrix representations. Notice that they don't look the same! The result of diagm
keeps a bunch of explicit zeros; the Diagonal
representation doesn't bother to store the off-diagonal zeros.
2×2 Matrix{Float64}:
1.0 0.0
0.0 2.0
2×2 Diagonal{Float64, Vector{Float64}}:
1.0 ⋅
⋅ 2.0
Diagonal × matrix = scaling of the matrix rows
The product $DA$ scales the rows of $A$ just as $AD$ scales the columns. As with the column scaling, we shouldn't ever form a dense diagonal matrix; we can be much more efficient in storage and time by using a Diagonal
representation or by using Julia's broadcast operations or something similar.
2×2 Matrix{Float64}:
1.0 2.0
6.0 10.0
2×2 Matrix{Float64}:
1.0 2.0
6.0 10.0
2×2 Matrix{Float64}:
1.0 2.0
6.0 10.0
Never form an explicit diagonal matrix
This should perhaps be rephrased as "never form an explicit dense representation of a diagonal matrix." Working with something like Diagonal
is fine.
Never form an explicit rank one matrix
A rank one matrix can always be written $uv^T$ (an outer product) for some vectors $u$ and $v$. Such matrices arise regularly in matrix computations, and it is almost never worth writing down a dense representation. Rather than computing and storing every pairwise product of the entries of $u$ and $v$, we're much better off trying to just store $u$ and $v$, and manipulate any computations on the outer product to just involve these two vectors. We saw an example above when we were talking about multipling a rank one matrix by a vector – this is $O(n^2)$ when done naively by explicitly forming the rank one matrix, but is only $O(n)$ if done a bit more cleverly.
Matrix × matrix = collection of matrix-vector products
There are several ways to think of taking a product of two matrices. One natural one involves thinking of the second matrix column-by-column: if $B = \begin{bmatrix} b_1 & \ldots & b_n\end{bmatrix}$, then $AB = \begin{bmatrix} Ab_1 & \ldots & Ab_n \end{bmatrix}$.
matmul_col (generic function with 1 method)
2×2 Matrix{Float64}:
25.0 32.0
66.0 84.0
2×2 Matrix{Float64}:
25.0 32.0
66.0 84.0
Matrix × matrix = collection of dot products
This way of thinking about matrix-matrix products is the one that many learn the first time that they see matrix multiplication.
The implementation illustrates something that may not be immediately obvious in Julia: because one-dimensional arrays are interpreted as column vectors and a slicing operation like A[i,:]
gives a one-dimensional array (assuming i
is a scalar index), we need to force Julia to recognize that this is a row vector (adjoint vector) when we implement the dot product.
matmul_dots (generic function with 1 method)
2×2 Matrix{Float64}:
25.0 32.0
66.0 84.0
Matrix × matrix = sum of rank one matrices
We saw this approach in the last notebook as well: matrix-matrix products can be seen as a sum of outer products between columns of the first matrix and rows of the second.
matmul_outer (generic function with 1 method)
2×2 Matrix{Float64}:
25.0 32.0
66.0 84.0
$AB$ involves linear combinations of rows from $B$
matmul_rows (generic function with 1 method)
2×2 Matrix{Float64}:
25.0 32.0
66.0 84.0
$AB$ involves linear combinations of columns from $A$
We really addressed this above when we said that a matrix-matrix product is a collection of matrix-vector products, since a matrix-vector product involves a linear combination of the columns of the matrix.
Block matrices
We often like to think of matrices not as arrays of numbers, but as arrays of submatrices. For example, if $A \in \mathbb{R}^{n \times n}$, $b, c \in \mathbb{R}^n$ and $d \in \mathbb{R}$, then I might write a block 2-by-2 matrix
$$M = \begin{bmatrix} A & b \\ c^T & d \end{bmatrix} \in \mathbb{R}^{(n+1) \times (n+1)}$$
where the elements (in the case $n = 2$) look like
$$M = \begin{bmatrix} a_{11} & a_{12} & b_1 \\ a_{21} & a_{22} & b_2 \\ c_1 & c_2 & d \end{bmatrix}$$
Julia's horizontal and vertical concatenation operations to the "right" thing with block matrices, allowing us to easily form bigger matrices.
7
11
13
14
42
3×3 Matrix{Float64}:
1.0 2.0 7.0
3.0 5.0 11.0
13.0 14.0 42.0
The "commandments" described above apply to block matrices as well as they do to ordinary matrices. For example, we can write matrix-vector products with $M$ in terms of the block columns times subvectors.
0.327207
0.641651
0.852711
7.57949
13.5697
49.0507
7.57949
13.5697
49.0507
Alternately, we can write a matrix-vector product in terms of rows.
7.57949
13.5697
49.0507
Julia also frequently does something reasonable if you try to build block matrices involving diagonal matrices (for example):
3×3 Matrix{Float64}:
1.0 0.0 7.0
0.0 2.0 11.0
13.0 14.0 42.0
Notice that the type of the result in this case is a sparse matrix. This uses a different internal data structure than the dense matrices we've seen so far. A dense matrix is a regular array of all the entries, laid out column after column (in languages like Julia). In some matrices, that would include a lot of zeros. But in a sparse matrix data structure, we store only the nonzero values (but also have to keep an indexing structure to remember where they are).
We will frequently see cases where a block matrix includes some blocks with special structure (and other blocks that are ordinary dense matrices). In this case, we may want to work with the pieces of the block matrix separately, rather than packing them into a single standard data structure.
6.29619
10.6631
49.0507
6.29619
10.6631
49.0507
Matrix shapes
We have already talked about one type of matrix that is mostly zeros, but we will see a lot of other such matrices as well. We sometimes talk about the nonzero pattern (or sparsity pattern) in terms of the "shape" of a matrix. Two types of matrix shapes that we'll see a lot in the first part of the class are lower and upper triangular matrices. In fact, Julia has explicit functions for getting the lower and upper triangular parts out of a standard dense matrix.
5×5 Matrix{Float64}:
0.402757 0.889451 0.659938 0.706049 0.782657
0.712714 0.912812 0.736406 0.0623196 0.196461
0.639698 0.754787 0.078397 0.936217 0.617625
0.206513 0.705258 0.85326 0.313553 0.43446
0.655685 0.804931 0.739925 0.288234 0.575178
5×5 Matrix{Float64}:
0.402757 0.889451 0.659938 0.706049 0.782657
0.0 0.912812 0.736406 0.0623196 0.196461
0.0 0.0 0.078397 0.936217 0.617625
0.0 0.0 0.0 0.313553 0.43446
0.0 0.0 0.0 0.0 0.575178
5×5 Matrix{Float64}:
0.402757 0.0 0.0 0.0 0.0
0.712714 0.912812 0.0 0.0 0.0
0.639698 0.754787 0.078397 0.0 0.0
0.206513 0.705258 0.85326 0.313553 0.0
0.655685 0.804931 0.739925 0.288234 0.575178
Alternately, we can construct upper and lower triangular views on the matrix Arand
. These can be used in expressions in the same way that triu(Arand)
or tril(Arand)
could, but they don't involve any new storage; it's just re-interpreting the Arand
storage by ignoring part of the matrix as all zeros.
5×5 UpperTriangular{Float64, Matrix{Float64}}:
0.402757 0.889451 0.659938 0.706049 0.782657
⋅ 0.912812 0.736406 0.0623196 0.196461
⋅ ⋅ 0.078397 0.936217 0.617625
⋅ ⋅ ⋅ 0.313553 0.43446
⋅ ⋅ ⋅ ⋅ 0.575178
5×5 LowerTriangular{Float64, Matrix{Float64}}:
0.402757 ⋅ ⋅ ⋅ ⋅
0.712714 0.912812 ⋅ ⋅ ⋅
0.639698 0.754787 0.078397 ⋅ ⋅
0.206513 0.705258 0.85326 0.313553 ⋅
0.655685 0.804931 0.739925 0.288234 0.575178
For these matrices, we can just look at all the numbers. For bigger matrices, it is sometimes useful to have a visualization that shows us the nonzero structure. A spy plot is such a visualization: we put a dot wherever there is a nonzero (colored by value, in this case) and leave blank space for zero entries.
When we get to eigenvalue problems, we'll often see shapes that look like "triangular plus an extra diagonal." These are known as upper or lower Hessenberg matrices (depending whether the triangular part is upper or lower). Julia supports routines to pull out the upper/lower Hessenberg part of a matrix, too.
5×5 Matrix{Float64}:
0.402757 0.889451 0.659938 0.706049 0.782657
0.712714 0.912812 0.736406 0.0623196 0.196461
0.0 0.754787 0.078397 0.936217 0.617625
0.0 0.0 0.85326 0.313553 0.43446
0.0 0.0 0.0 0.288234 0.575178
5×5 Matrix{Float64}:
0.402757 0.889451 0.0 0.0 0.0
0.712714 0.912812 0.736406 0.0 0.0
0.639698 0.754787 0.078397 0.936217 0.0
0.206513 0.705258 0.85326 0.313553 0.43446
0.655685 0.804931 0.739925 0.288234 0.575178
The second argument to each of these calls is the number of a diagonal: diagonal zero is the main diagonal, +1 is the first superdiagonal (first diagonal above the main diagonal), -1 is the first subdiagonal (first diagonal below the main diagonal), and so on.