Manipulating matrices

In this notebook, we play with some of the ideas introduced in the Jan 26 lecture notes, expressed concretely in Julia. In addition to using the Plots package, we will want to use the LinearAlgebra package in this notebook (and in most of the rest of the class!).

md"""

# Manipulating matrices

In this notebook, we play with some of the ideas introduced in the Jan 26 lecture notes, expressed concretely in Julia. In addition to using the `Plots` package, we will want to use the `LinearAlgebra` package in this notebook (and in most of the rest of the class!).

"""

160 μs

using Plots

2.7 s

using LinearAlgebra

231 μs

Twelve commandments

When Charlie Van Loan teaches matrix computations, he talks about twelve commandments of matrix manipulation. These are listed in the lecture notes; here, I'd like to illustrate them in code. Throughout, we'll use some small concrete matrices and vectors to illustrate the point.

md"""

## Twelve commandments

"""

149 μs

2×2 Matrix{Float64}:
 1.0  2.0
 3.0  5.0

A = [1.0 2.0; 3.0 5.0]

21.0 μs

2×2 Matrix{Float64}:
 7.0   8.0
 9.0  12.0

B = [7.0 8.0; 9.0 12.0]

12.1 μs

Float64

1.0

2.0

x = [1.0; 2.0]

7.5 μs

Matrix × vector = linear combination of matrix columns.

We discussed this some in Monday's notebook. There are a couple of ways of writing a matrix-vector product, but the column oriented approach is equivalent to taking a weighted linear combination of the columns of the matrix. So if we write a matrix as a collection of columns

$$A = \begin{bmatrix} a_1 & a_2 & \ldots & a_n \end{bmatrix}$$

then

$$Ax = \sum_{j=1}^n a_j x_j$$

md"""

#### Matrix × vector = linear combination of matrix columns.

$$A = \begin{bmatrix} a_1 & a_2 & \ldots & a_n \end{bmatrix}$$

then

$$Ax = \sum_{j=1}^n a_j x_j$$

"""

182 μs

matvec_col (generic function with 1 method)

function matvec_col(A, x)

m, n = size(A)

y = zeros(m)

for j = 1:n

y += A[:,j]*x[j]

end

796 μs

Float64

5.0

13.0

matvec_col(A, x)

21.2 μs

Float64

5.0

13.0

A*x

41.0 μs

Inner product = sum of products of corresponding elements

We have to be a little careful about this – this is the standard inner product for a real vector space. We will talk about more general inner products later in the class. But for the real standard inner product, we do have

$$x \cdot y = \sum_{j=1}^n x_j y_j$$

md"""

#### Inner product = sum of products of corresponding elements

We have to be a little careful about this -- this is the *standard* inner product for a *real* vector space. We will talk about more general inner products later in the class. But for the real standard inner product, we do have

$$x \cdot y = \sum_{j=1}^n x_j y_j$$

"""

240 μs

my_dot (generic function with 1 method)

function my_dot(x, y)

result = 0.0

for j = 1:length(x)

result += x[j]*y[j]

end

result

end

581 μs

5.0

my_dot(x, x)

7.8 μs

5.0

x'*x

27.8 μs

Order of operations matters to performance

Suppose we have three long vectors $u, v, w \in \mathbb{R}^n$ and we want to compute

$$y = uv^T w$$

The first way to do it, which is what we will get with the Julia code u*v'*w effectively computes

$$y = (uv^T) w$$

That is, we first compute the outer product $uv^T$ (in $O(n^2)$ time), then do a matrix-vector product (in $O(n^2)$ time.) The total time scales like $n^2$. Alternately, we can use associativity to write

$$y = u (v^T w)$$

In this version, we first compute the dot product $v^T w$ (in $O(n)$ time), then scale $u$ by that value (in $O(n)$ time).

This may seem like a contrived example, but a lot of slow matrix code is slow due to exactly this sort of goof.

md"""

#### Order of operations matters to performance

Suppose we have three long vectors $u, v, w \in \mathbb{R}^n$ and we want to compute

$$y = uv^T w$$

The first way to do it, which is what we will get with the Julia code `u*v'*w` effectively computes

$$y = (uv^T) w$$

$$y = u (v^T w)$$

In this version, we first compute the dot product $v^T w$ (in $O(n)$ time), then scale $u$ by that value (in $O(n)$ time).

This may seem like a contrived example, but a lot of slow matrix code is slow due to exactly this sort of goof.

"""

304 μs

ubig

Float64

0.661998

0.944213

0.753035

0.0194729

0.391572

0.161135

0.401238

0.631315

0.362677

0.378774

0.841039

0.738694

0.65871

0.636017

0.814609

0.491404

0.387279

0.664395

0.689704

0.973597

4991

0.394442

4992

0.593125

4993

0.375151

4994

0.616074

4995

0.106672

4996

0.498431

4997

0.469438

4998

0.0430491

4999

0.979686

5000

0.303477

ubig = rand(5000)

20.5 μs

vbig

Float64

0.777514

0.325297

0.382231

0.189065

0.946966

0.146542

0.910385

0.660291

0.298673

0.415225

0.0519509

0.900302

0.337033

0.60141

0.433128

0.491886

0.968077

0.383724

0.861235

0.12555

4991

0.978727

4992

0.0351349

4993

0.423166

4994

0.0935103

4995

0.9191

4996

0.845066

4997

0.626109

4998

0.458887

4999

0.234283

5000

0.781892

vbig = rand(5000)

13.7 μs

wbig

Float64

0.900063

0.830218

0.253338

0.692736

0.101829

0.715242

0.0384615

0.308713

0.840626

0.924361

0.855639

0.0354514

0.140221

0.348288

0.0931208

0.345303

0.140562

0.879647

0.666268

0.452288

4991

0.769914

4992

0.259598

4993

0.96201

4994

0.654832

4995

0.378024

4996

0.922292

4997

0.790692

4998

0.79274

4999

0.307229

5000

0.216268

wbig = rand(5000)

12.8 μs

0.069244292

@elapsed ubig*vbig'*wbig

71.0 ms

8.25e-6

@elapsed ubig*(vbig'*wbig)

117 μs

Matrix × diagonal = scaling of the matrix columns

If $D$ is a diagonal matrix with diagonal entries $d_1, \ldots, d_n$ and $A$ is written column-by-column as $A = \begin{bmatrix} a_1 & a_2 & \ldots & a_n \end{bmatrix}$, then $AD = \begin{bmatrix} a_1 d_1 & a_2 d_2 & \ldots & a_n d_n \end{bmatrix}$. We certainly do not need to explicitly form and multiply by a dense representation of the matrix $D$ to compute $AD$; we're better off just scaling the columns directly! Alternately, we can use the Diagonal type in Julia, which constructs and works with an abstract diagonal matrix stored as just the diagonal entries.

md"""

#### Matrix × diagonal = scaling of the matrix columns

If $D$ is a diagonal matrix with diagonal entries $d_1, \ldots, d_n$ and $A$ is written column-by-column as $A = \begin{bmatrix} a_1 & a_2 & \ldots & a_n \end{bmatrix}$, then $AD = \begin{bmatrix} a_1 d_1 & a_2 d_2 & \ldots & a_n d_n \end{bmatrix}$. We certainly do not need to explicitly form and multiply by a dense representation of the matrix $D$ to compute $AD$; we're better off just scaling the columns directly! Alternately, we can use the `Diagonal` type in Julia, which constructs and works with an abstract diagonal matrix stored as just the diagonal entries.

"""

199 μs

2×2 Matrix{Float64}:
 1.0   4.0
 3.0  10.0

A * diagm(x) # BAD: Explicitly forms the diagonal, O(n^3) cost

25.6 μs

2×2 Matrix{Float64}:
 1.0   4.0
 3.0  10.0

A .* x' # Better: Apply scaling using Julia broadcast ops

31.9 ms

2×2 Matrix{Float64}:
 1.0   4.0
 3.0  10.0

A * Diagonal(x) # Also good: Use a special representation for diagonal matrices

30.9 μs

To see the difference between diagm and Diagonal, it is useful to look directly at the matrix representations. Notice that they don't look the same! The result of diagm keeps a bunch of explicit zeros; the Diagonal representation doesn't bother to store the off-diagonal zeros.

md"""

To see the difference between `diagm` and `Diagonal`, it is useful to look directly at the matrix representations. Notice that they don't look the same! The result of `diagm` keeps a bunch of explicit zeros; the `Diagonal` representation doesn't bother to store the off-diagonal zeros.

"""

131 μs

2×2 Matrix{Float64}:
 1.0  0.0
 0.0  2.0

diagm(x)

8.0 μs

2×2 Diagonal{Float64, Vector{Float64}}:
 1.0   ⋅ 
  ⋅   2.0

Diagonal(x)

6.6 μs

Diagonal × matrix = scaling of the matrix rows

The product $DA$ scales the rows of $A$ just as $AD$ scales the columns. As with the column scaling, we shouldn't ever form a dense diagonal matrix; we can be much more efficient in storage and time by using a Diagonal representation or by using Julia's broadcast operations or something similar.

md"""

#### Diagonal × matrix = scaling of the matrix rows

The product $DA$ scales the rows of $A$ just as $AD$ scales the columns. As with the column scaling, we shouldn't ever form a dense diagonal matrix; we can be much more efficient in storage and time by using a `Diagonal` representation or by using Julia's broadcast operations or something similar.

"""

156 μs

2×2 Matrix{Float64}:
 1.0   2.0
 6.0  10.0

A .* x # Computes row scalings via Julia broadcast ops (could also do x .* A)

33.6 ms

2×2 Matrix{Float64}:
 1.0   2.0
 6.0  10.0

x .* A # I said we could do x .* A -- may as well show it's true!

33.3 ms

2×2 Matrix{Float64}:
 1.0   2.0
 6.0  10.0

Diagonal(x) * A

16.0 μs

Never form an explicit diagonal matrix

This should perhaps be rephrased as "never form an explicit dense representation of a diagonal matrix." Working with something like Diagonal is fine.

md"""

#### Never form an explicit diagonal matrix

This should perhaps be rephrased as "never form an explicit *dense* representation of a diagonal matrix." Working with something like `Diagonal` is fine.

"""

168 μs

Never form an explicit rank one matrix

A rank one matrix can always be written $uv^T$ (an outer product) for some vectors $u$ and $v$. Such matrices arise regularly in matrix computations, and it is almost never worth writing down a dense representation. Rather than computing and storing every pairwise product of the entries of $u$ and $v$, we're much better off trying to just store $u$ and $v$, and manipulate any computations on the outer product to just involve these two vectors. We saw an example above when we were talking about multipling a rank one matrix by a vector – this is $O(n^2)$ when done naively by explicitly forming the rank one matrix, but is only $O(n)$ if done a bit more cleverly.

md"""

#### Never form an explicit rank one matrix

A rank one matrix can always be written $uv^T$ (an *outer product*) for some vectors $u$ and $v$. Such matrices arise regularly in matrix computations, and it is almost never worth writing down a dense representation. Rather than computing and storing every pairwise product of the entries of $u$ and $v$, we're much better off trying to just store $u$ and $v$, and manipulate any computations on the outer product to just involve these two vectors. We saw an example above when we were talking about multipling a rank one matrix by a vector -- this is $O(n^2)$ when done naively by explicitly forming the rank one matrix, but is only $O(n)$ if done a bit more cleverly.

"""

227 μs

Matrix × matrix = collection of matrix-vector products

There are several ways to think of taking a product of two matrices. One natural one involves thinking of the second matrix column-by-column: if $B = \begin{bmatrix} b_1 & \ldots & b_n\end{bmatrix}$, then $AB = \begin{bmatrix} Ab_1 & \ldots & Ab_n \end{bmatrix}$.

md"""

#### Matrix × matrix = collection of matrix-vector products

"""

147 μs

matmul_col (generic function with 1 method)

function matmul_col(A, B)

m, p = size(A)

p, n = size(B)

C = zeros(m, n)

for j = 1:n

C[:,j] = A*B[:,j]

end

847 μs

2×2 Matrix{Float64}:
 25.0  32.0
 66.0  84.0

A*B

12.3 μs

2×2 Matrix{Float64}:
 25.0  32.0
 66.0  84.0

matmul_col(A, B)

20.4 μs

Matrix × matrix = collection of dot products

This way of thinking about matrix-matrix products is the one that many learn the first time that they see matrix multiplication.

The implementation illustrates something that may not be immediately obvious in Julia: because one-dimensional arrays are interpreted as column vectors and a slicing operation like A[i,:] gives a one-dimensional array (assuming i is a scalar index), we need to force Julia to recognize that this is a row vector (adjoint vector) when we implement the dot product.

md"""

#### Matrix × matrix = collection of dot products

This way of thinking about matrix-matrix products is the one that many learn the first time that they see matrix multiplication.

The implementation illustrates something that may not be immediately obvious in Julia: because one-dimensional arrays are interpreted as column vectors and a slicing operation like `A[i,:]` gives a one-dimensional array (assuming `i` is a scalar index), we need to force Julia to recognize that this is a row vector (adjoint vector) when we implement the dot product.

"""

182 μs

matmul_dots (generic function with 1 method)

function matmul_dots(A, B)

m, p = size(A)

p, n = size(B)

C = zeros(m, n)

for i = 1:m

for j = 1:n

C[i,j] = A[i,:]'*B[:,j]

end

1.1 ms

2×2 Matrix{Float64}:
 25.0  32.0
 66.0  84.0

matmul_dots(A, B)

14.5 μs

Matrix × matrix = sum of rank one matrices

We saw this approach in the last notebook as well: matrix-matrix products can be seen as a sum of outer products between columns of the first matrix and rows of the second.

md"""

#### Matrix × matrix = sum of rank one matrices

We saw this approach in the last notebook as well: matrix-matrix products can be seen as a sum of *outer products* between columns of the first matrix and rows of the second.

"""

160 μs

matmul_outer (generic function with 1 method)

function matmul_outer(A, B)

m, p = size(A)

p, n = size(B)

C = zeros(m, n)

for k = 1:p

C += A[:,k]*B[k,:]'

end

885 μs

2×2 Matrix{Float64}:
 25.0  32.0
 66.0  84.0

matmul_outer(A, B)

26.0 μs

$AB$ involves linear combinations of rows from $B$

md"""

#### $AB$ involves linear combinations of rows from $B$

"""

107 μs

matmul_rows (generic function with 1 method)

function matmul_rows(A, B)

m, p = size(A)

p, n = size(B)

C = zeros(m, n)

for i = 1:m

C[i,:] = A[i,:]'*B

end

898 μs

2×2 Matrix{Float64}:
 25.0  32.0
 66.0  84.0

matmul_rows(A, B)

20.5 μs

$AB$ involves linear combinations of columns from $A$

We really addressed this above when we said that a matrix-matrix product is a collection of matrix-vector products, since a matrix-vector product involves a linear combination of the columns of the matrix.

md"""

#### $AB$ involves linear combinations of columns from $A$

"""

142 μs

Block matrices

We often like to think of matrices not as arrays of numbers, but as arrays of submatrices. For example, if $A \in \mathbb{R}^{n \times n}$, $b, c \in \mathbb{R}^n$ and $d \in \mathbb{R}$, then I might write a block 2-by-2 matrix

$$M = \begin{bmatrix} A & b \\ c^T & d \end{bmatrix} \in \mathbb{R}^{(n+1) \times (n+1)}$$

where the elements (in the case $n = 2$) look like

$$M = \begin{bmatrix} a_{11} & a_{12} & b_1 \\ a_{21} & a_{22} & b_2 \\ c_1 & c_2 & d \end{bmatrix}$$

Julia's horizontal and vertical concatenation operations to the "right" thing with block matrices, allowing us to easily form bigger matrices.

md"""

## Block matrices

$$M = \begin{bmatrix} A & b \\ c^T & d \end{bmatrix} \in \mathbb{R}^{(n+1) \times (n+1)}$$

where the elements (in the case $n = 2$) look like

$$M = \begin{bmatrix}

a_{11} & a_{12} & b_1 \\

a_{21} & a_{22} & b_2 \\

c_1 & c_2 & d

\end{bmatrix}$$

Julia's horizontal and vertical concatenation operations to the "right" thing with block matrices, allowing us to easily form bigger matrices.

"""

220 μs

Int64

b = [7; 11]

7.5 μs

Int64

c = [13; 14]

9.2 μs

d = 42

6.7 μs

3×3 Matrix{Float64}:
  1.0   2.0   7.0
  3.0   5.0  11.0
 13.0  14.0  42.0

M = [A b; c' d]

154 ms

The "commandments" described above apply to block matrices as well as they do to ordinary matrices. For example, we can write matrix-vector products with $M$ in terms of the block columns times subvectors.

md"""

"""

114 μs

Float64

0.327207

0.641651

0.852711

z = rand(3)

13.7 μs

Float64

7.57949

13.5697

49.0507

M*z

21.1 μs

Float64

7.57949

13.5697

49.0507

[A; c']*z[1:2] + [b; d]*z[3]

55.7 μs

Alternately, we can write a matrix-vector product in terms of rows.

md"""

Alternately, we can write a matrix-vector product in terms of rows.

"""

104 μs

Float64

7.57949

13.5697

49.0507

[ [A b]*z ; [c' d]*z ]

31.3 μs

Julia also frequently does something reasonable if you try to build block matrices involving diagonal matrices (for example):

md"""

Julia also frequently does something reasonable if you try to build block matrices involving diagonal matrices (for example):

"""

106 μs

3×3 Matrix{Float64}:
  1.0   0.0   7.0
  0.0   2.0  11.0
 13.0  14.0  42.0

M2 = [Diagonal(x) b; c' d]

105 ms

Notice that the type of the result in this case is a sparse matrix. This uses a different internal data structure than the dense matrices we've seen so far. A dense matrix is a regular array of all the entries, laid out column after column (in languages like Julia). In some matrices, that would include a lot of zeros. But in a sparse matrix data structure, we store only the nonzero values (but also have to keep an indexing structure to remember where they are).

We will frequently see cases where a block matrix includes some blocks with special structure (and other blocks that are ordinary dense matrices). In this case, we may want to work with the pieces of the block matrix separately, rather than packing them into a single standard data structure.

md"""

Notice that the type of the result in this case is a *sparse* matrix. This uses a different internal data structure than the dense matrices we've seen so far. A dense matrix is a regular array of all the entries, laid out column after column (in languages like Julia). In some matrices, that would include a lot of zeros. But in a sparse matrix data structure, we store only the nonzero values (but also have to keep an indexing structure to remember where they are).

"""

189 μs

Float64

6.29619

10.6631

49.0507

M2*z # OK, but involves explicitly forming M2

11.7 μs

Float64

6.29619

10.6631

49.0507

[x .* z[1:2] + b*z[3]; c'*z[1:2] + d*z[3]] # Same computation, does something special with the diagonal block.

25.8 μs

Matrix shapes

We have already talked about one type of matrix that is mostly zeros, but we will see a lot of other such matrices as well. We sometimes talk about the nonzero pattern (or sparsity pattern) in terms of the "shape" of a matrix. Two types of matrix shapes that we'll see a lot in the first part of the class are lower and upper triangular matrices. In fact, Julia has explicit functions for getting the lower and upper triangular parts out of a standard dense matrix.

md"""

## Matrix shapes

"""

147 μs

Arand

5×5 Matrix{Float64}:
 0.402757  0.889451  0.659938  0.706049   0.782657
 0.712714  0.912812  0.736406  0.0623196  0.196461
 0.639698  0.754787  0.078397  0.936217   0.617625
 0.206513  0.705258  0.85326   0.313553   0.43446
 0.655685  0.804931  0.739925  0.288234   0.575178

Arand = rand(5,5)

12.5 μs

5×5 Matrix{Float64}:
 0.402757  0.889451  0.659938  0.706049   0.782657
 0.0       0.912812  0.736406  0.0623196  0.196461
 0.0       0.0       0.078397  0.936217   0.617625
 0.0       0.0       0.0       0.313553   0.43446
 0.0       0.0       0.0       0.0        0.575178

triu(Arand)

10.5 μs

5×5 Matrix{Float64}:
 0.402757  0.0       0.0       0.0       0.0
 0.712714  0.912812  0.0       0.0       0.0
 0.639698  0.754787  0.078397  0.0       0.0
 0.206513  0.705258  0.85326   0.313553  0.0
 0.655685  0.804931  0.739925  0.288234  0.575178

tril(Arand)

12.8 μs

Alternately, we can construct upper and lower triangular views on the matrix Arand. These can be used in expressions in the same way that triu(Arand) or tril(Arand) could, but they don't involve any new storage; it's just re-interpreting the Arand storage by ignoring part of the matrix as all zeros.

md"""

Alternately, we can construct upper and lower triangular *views* on the matrix `Arand`. These can be used in expressions in the same way that `triu(Arand)` or `tril(Arand)` could, but they don't involve any new storage; it's just re-interpreting the `Arand` storage by ignoring part of the matrix as all zeros.

"""

156 μs

5×5 UpperTriangular{Float64, Matrix{Float64}}:
 0.402757  0.889451  0.659938  0.706049   0.782657
  ⋅        0.912812  0.736406  0.0623196  0.196461
  ⋅         ⋅        0.078397  0.936217   0.617625
  ⋅         ⋅         ⋅        0.313553   0.43446
  ⋅         ⋅         ⋅         ⋅         0.575178

UpperTriangular(Arand)

7.7 μs

5×5 LowerTriangular{Float64, Matrix{Float64}}:
 0.402757   ⋅         ⋅         ⋅         ⋅ 
 0.712714  0.912812   ⋅         ⋅         ⋅ 
 0.639698  0.754787  0.078397   ⋅         ⋅ 
 0.206513  0.705258  0.85326   0.313553   ⋅ 
 0.655685  0.804931  0.739925  0.288234  0.575178

LowerTriangular(Arand)

7.0 μs

For these matrices, we can just look at all the numbers. For bigger matrices, it is sometimes useful to have a visualization that shows us the nonzero structure. A spy plot is such a visualization: we put a dot wherever there is a nonzero (colored by value, in this case) and leave blank space for zero entries.

md"""

For these matrices, we can just look at all the numbers. For bigger matrices, it is sometimes useful to have a visualization that shows us the nonzero structure. A *spy plot* is such a visualization: we put a dot wherever there is a nonzero (colored by value, in this case) and leave blank space for zero entries.

"""

142 μs

spy(triu(rand(100,100)))

3.1 s

When we get to eigenvalue problems, we'll often see shapes that look like "triangular plus an extra diagonal." These are known as upper or lower Hessenberg matrices (depending whether the triangular part is upper or lower). Julia supports routines to pull out the upper/lower Hessenberg part of a matrix, too.

md"""

When we get to eigenvalue problems, we'll often see shapes that look like "triangular plus an extra diagonal." These are known as upper or lower *Hessenberg* matrices (depending whether the triangular part is upper or lower). Julia supports routines to pull out the upper/lower Hessenberg part of a matrix, too.

"""

142 μs

5×5 Matrix{Float64}:
 0.402757  0.889451  0.659938  0.706049   0.782657
 0.712714  0.912812  0.736406  0.0623196  0.196461
 0.0       0.754787  0.078397  0.936217   0.617625
 0.0       0.0       0.85326   0.313553   0.43446
 0.0       0.0       0.0       0.288234   0.575178

triu(Arand,-1)

30.0 μs

5×5 Matrix{Float64}:
 0.402757  0.889451  0.0       0.0       0.0
 0.712714  0.912812  0.736406  0.0       0.0
 0.639698  0.754787  0.078397  0.936217  0.0
 0.206513  0.705258  0.85326   0.313553  0.43446
 0.655685  0.804931  0.739925  0.288234  0.575178

tril(Arand, 1)

11.3 μs

The second argument to each of these calls is the number of a diagonal: diagonal zero is the main diagonal, +1 is the first superdiagonal (first diagonal above the main diagonal), -1 is the first subdiagonal (first diagonal below the main diagonal), and so on.

md"""

"""

114 μs