using Polynomials
4 Linear Algebra
We will not get far in this book without a strong foundation in linear algebra. That means knowing how to compute with matrices, but it also means understanding the abstractions of linear algebra well enough to use them effectively. We assume prior familiarity with linear algebra at the level of an undergraduate course taught from Strang (2023) or Lay, Lay, and McDonald (2015). It will be helpful, but not necessary, to have a more advanced perspective as covered in Axler (2024) or even Lax (2007). Even for readers well-versed with linear algebra, we suggest skimming this chapter, as some ideas (quasimatrices, the framework of canonical forms) are not always standard in first (or even second) linear algebra courses.
While our treatment will be fairly abstract, we try to keep things concrete by regularly giving examples in Julia. Readers unfamiliar with Julia may want to review Chapter 2; it is also possible to simply skim past the code examples.
We do not assume a prior course in functional analysis. This will not prevent us from sneaking a little functional analysis into our discussion, or treating some linear algebra concepts from a functional analysis perspective.
4.1 Vector spaces
A vector space (or linear space) over a field
There are many types of vector spaces. Apart from the ubiquitous concrete vector spaces
4.1.1 Polynomials
We compute with vectors in
The Julia Polynomials.jl
package provides a variety of representations of and operations on polynomials.
For example, we can construct a polynomial in standard form from its coefficients in the monomial basis or from its roots, or we can keep it in factored form internally.
Polynomial([2, -3, 1])
fromroots([1, 2])
fromroots(FactoredPolynomial, [1, 2])
The fact that there are several natural representations for polynomials is part of what makes them a good example of an abstract vector space.
4.1.2 Dual spaces
A vector space in isolation is a lonely thing1. Fortunately, every vector space has an associated dual space of linear functionals2, that is
In matrix terms, we usually associate vectors with columns and dual vectors with columns (ordinary vectors) and rows (dual vectors). In Julia, we construct a dual vector by taking the conjugate transpose of a column, e.g.
typeof([1.0; 2.0; 3.0]) # Column vector in R^3
Vector{Float64} (alias for Array{Float64, 1})
typeof([0.0; 1.0; 0.0]') # Row vector in R^3
Adjoint{Float64, Vector{Float64}}
Applying a linear functional is concretely just “row times column”:
let
= [1.0; 2.0; 3.0] # Column vector in R^3
v = [0.0; 1.0; 0.0]' # Row vector in R^3
w *v
wend
2.0
In order to try to keep the syntax uniform between concrete vectors and operations on abstract polynomial spaces, we will define a DualVector
type representing a (linear) functional
struct DualVector
:: Function
f end
We overload the function evaluation and multiplication syntax so that we can write application of a dual vector w(v)
or as w*v
.
:: DualVector)(v) = w.f(v)
(w Base.:*(w :: DualVector, v) = w(v)
Because dual vectors still make up a vector space, we would like to be able to add and scale them:
Base.:+(w1 :: DualVector, w2 :: DualVector) =
DualVector(v -> w1(v) + w2(v))
Base.:-(w1 :: DualVector, w2 :: DualVector) =
DualVector(v -> w1(v) - w2(v))
Base.:-(w :: DualVector) =
DualVector(v -> -w(v))
Base.:*(c :: Number, w :: DualVector) =
DualVector(v -> c*w(v))
Base.:*(w :: DualVector, c :: Number) = c*w
Base.:/(w :: DualVector, c :: Number) =
DualVector(v -> w(v)/c)
Evaluating a polynomial at a point is one example of a functional in
let
= Polynomial([2.0; -3.0; 1.0])
p = DualVector(p -> p(0.0))
w1 = DualVector(p -> integrate(p, -1.0, 1.0))
w2 *p, w2*p
w1end
(2.0, 4.666666666666666)
4.1.3 Subspaces
Given a vector space
There are a few standard operations involving subspaces:
The span of a subset
is the smallest subspace that contains , i.e. the set of all finite linear combinations where each and .Given a collection of subspaces
, their sum is the subspace consisting of all sums of elements where only finitely many are nonzero. We say the sum is a direct sum if the decomposition into vectors is unique.The annihilator for a subspace
is the set of linear functionals that are zero on . That is: In inner product spaces, the annihilator of a subspace is identified with the orthogonal complement (Section 4.1.7).Given a subspace
, the quotient space is a vector space whose elements are equivalence classes under the relation if . We can always find a “complementary” subspace isomorphic to such that each equivalence class in contains a unique representative from . Indeed, we can generally find many such spaces! For any complementary subspace , is a direct sum .In the infinite-dimensional setting, the closure of a subspace consists of all limit points in the subspace. This only makes sense in topological vector spaces where we have enough structure that we can take limits. We will always consider the even stronger structure of normed vector spaces.
4.1.4 Quasimatrices
Matrix notation is convenient for expressing linear algebra over concrete vector spaces like
4.1.4.1 Column quasimatrices
A (column) quasimatrix is a list of vectors in some space
When
We can represent a quasimatrix with columns in Polynomial
s, e.g.
= let
Udemo = Polynomial([1.0])
p1 = Polynomial([-1.0, 1.0])
p2 = Polynomial([2.0, -3.0, 1.0])
p3 transpose([p1; p2; p3])
end
1×3 transpose(::Vector{Polynomial{Float64, :x}}) with eltype Polynomial{Float64, :x}:
Polynomial(1.0) Polynomial(-1.0 + 1.0*x) Polynomial(2.0 - 3.0*x + 1.0*x^2)
We can use these together with standard matrix operations involving concrete vectors, e.g.
*[1.0 1.0; 1.0 1.0; 1.0 0.0] Udemo
1×2 transpose(::Vector{Polynomial{Float64, :x}}) with eltype Polynomial{Float64, :x}:
Polynomial(2.0 - 2.0*x + 1.0*x^2) Polynomial(1.0*x)
We also want to be able to multiply a dual vector by a quasimatrix to get a concrete row vector:
Base.:*(w :: DualVector, U :: AbstractMatrix) = map(w, U)
For example,
DualVector(p->p(0.0)) * Udemo
1×3 transpose(::Vector{Float64}) with eltype Float64:
1.0 -1.0 2.0
We sometimes want to refer to a subset of the columns. In mathematical writing, we will use notation similar to that in Julia or MATLAB:
4.1.4.2 Row quasimatrices
Every vector space
In Julia, analogous to the column quasimatrices over spaces
= DualVector.(
Wdemo -> p(0.0);
[p -> p(1.0);
p -> integrate(p, -1.0, 1.0)]) p
3-element Vector{DualVector}:
DualVector(var"#27#30"())
DualVector(var"#28#31"())
DualVector(var"#29#32"())
4.1.4.3 Duality relations
For a column quasimatrix
# Example: map concrete [1; 1; 1] to a Polynomial
*[1; 1; 1] Udemo
# Example: map DualVector to a concrete row
DualVector(p->p(0))*Udemo
1×3 transpose(::Vector{Float64}) with eltype Float64:
1.0 -1.0 2.0
Similarly, a row quasimatrix
# Example: map concrete row to an abstract dual vector
typeof([1.0; 1.0; 1.0]'*Wdemo)
DualVector
# Example: map Polynomial to a concrete vector
.*Polynomial([2.0, -3.0, 1.0]) Wdemo
3-element Vector{Float64}:
2.0
0.0
4.666666666666666
Composing the actions of a row quasimatrix with a column quasimatrix gives us an ordinary matrix representing a mapping between two abstract vector spaces:
*Udemo Wdemo
3×3 Matrix{Float64}:
1.0 -1.0 2.0
1.0 0.0 0.0
2.0 -2.0 4.66667
Conversely, composing the actions of a column quasimatrix with a row quasimatrix (assuming
UWdemo(p) = Udemo*(Wdemo.*p)
UWdemo (generic function with 1 method)
These two compositions correspond respectively to the inner product and outer product interpretations of matrix-matrix multiplication.
4.1.4.4 Infinite quasimatrices
We may (rarely) discuss “infinite” quasimatrices with a countable number of columns. We will be analysts about it instead of algebraists. In the case of column quasimatrices:
We assume the columns lie in some complete vector space.
When we take a linear combination
, we assume the sequence is restricted to make sense of Usually, this will mean is bounded ( ), absolutely summable ( ), or absolutely square summable ( ).We take the “range” of such a quasimatrix to be the set of allowed limits of linear combinations.
We can similarly define infinite versions of row-wise quasimatrices.
4.1.5 Bases
A basis set for a finite-dimensional vector space
Thinking of dual spaces as consisting of rows, we generally represent the basis of
If
4.1.5.1 Component conventions
We will generally follow the convention common in many areas of numerical analysis: a vector
An alternate convention that addresses some of these issues is frequently used in areas of physics where tensors are particularly common. In this convention (“Einstein notation”), vectors with respect to a basis
The convention of using subscripts to denote vector coefficients (contravariant coefficients) and superscripts to denote dual coefficients (covariant coefficients) gives us a mechanism for “type checking” expressions: summation over a repeated index should involve one superscript (a covariant index) and one subscript (a contravariant index). At the same time, many authors who follow the summation convention only use subscript indices in order to save the superscript space for other purposes. Caveat lector!4
Indices are useful for matching functions (dual vectors) and arguments (vectors) without explicit reference to position in an expression. At the same time, indicial notation presumes a basis, with particularly basis-centric authors sometimes writing statements like “a vector is an array that transforms like a vector.” While we usually use bases in our computation, we prefer a basis-independent perspective for mathematical arguments when possible. With this said, we will revert to indicial notation when the situation merits it.
4.1.5.2 Polynomial bases
As an example of different choices of bases, we consider the polynomial spaces
let
= Polynomial([1.0; 2.0; 3.0; 4.0; 5.0])
p = range(-1.0, 1.0, length=5)
X = [xi^j for xi in X, j in 0:4]
V = V\p.(X)
c end
5-element Vector{Float64}:
1.0
2.0
3.0
4.0
5.0
That is, if
The power basis is not the only basis for
The related second kind Chebyshev polynomial basis
In matrix terms, we can write the relationship between the Chebyshev polynomials and the power basis as
function chebyshev_power_coeffs(d, kind=1)
= max(d+1, 2)
n = zeros(n, n)
M 1,1] = 1 # p0(x) = 1
M[2,2] = kind # p1(x) = kind*x
M[for j = 2:d # p_{j+1}(x) =
2:end,j+1] .= 2*M[1:end-1,j] # 2x p_j(x)
M[:,j+1] .-= M[:,j-1] # -p_{j-1}(x)
M[end
UpperTriangular(M[1:d+1,1:d+1])
end
chebyshev_power_coeffs (generic function with 2 methods)
For example, for the quadratic case, we have
chebyshev_power_coeffs(2)
3×3 UpperTriangular{Float64, Matrix{Float64}}:
1.0 0.0 -1.0
⋅ 1.0 0.0
⋅ ⋅ 2.0
and the coefficient vectors
chebyshev_power_coeffs(2) \ [2.0; -3.0; 1.0]
3-element Vector{Float64}:
2.5
-3.0
0.5
If we would like to form a representation of the Chebyshev basis, we could use the power basis expansion
function chebyshev_basis(d, kind=1)
= chebyshev_power_coeffs(d, kind)
M transpose([Polynomial(M[1:j,j]) for j=1:d+1])
end
chebyshev_basis(2)
1×3 transpose(::Vector{Polynomial{Float64, :x}}) with eltype Polynomial{Float64, :x}:
Polynomial(1.0) Polynomial(1.0*x) Polynomial(-1.0 + 2.0*x^2)
However, the Chebyshev polynomials are natively supported in the Polynomials.jl
package, and we can explicitly write polynomials in terms of Chebyshev polynomials. For example,
ChebyshevT([2.5, -3.0, 0.5])
convert(Polynomial, ChebyshevT([2.5, -3.0, 0.5]))
The coefficients satisfy exactly the equations from above.
The bases
function change_U_to_T(d)
= Matrix(0.5*I, d+1, d+1)
B 1,1] = 1.0
B[for j = 3:d+1
-2,j] = -0.5
B[jend
Bend
change_U_to_T (generic function with 1 method)
That is, we should have
chebyshev_basis(5) ==
chebyshev_basis(5,2) * change_U_to_T(5)
true
In addition to the Chebyshev polynomials, we will sometimes want the Legendre polynomial basis
function legendre_power_coeffs(d)
= max(d+1, 2)
n = zeros(n, n)
M 1,1] = 1
M[2,2] = 1
M[for j = 2:d
2:end,j+1] .= (2*j-1)*M[1:end-1,j]
M[:,j+1] .-= (j-1)*M[:,j-1]
M[:,j+1] ./= j
M[end
UpperTriangular(M[1:d+1,1:d+1])
end
legendre_power_coeffs (generic function with 1 method)
If we want a representation in terms of a list of Polynomial
objects (representing the basis polynomials), we can simply convert the columns:
function legendre_basis(d)
= legendre_power_coeffs(d)
M transpose([Polynomial(M[1:j,j]) for j=1:d+1])
end
legendre_basis(2)
1×3 transpose(::Vector{Polynomial{Float64, :x}}) with eltype Polynomial{Float64, :x}:
Polynomial(1.0) Polynomial(1.0*x) Polynomial(-0.5 + 1.5*x^2)
Each basis we have discussed has a different use case. Sometimes the “obvious” choice of basis (e.g. the standard basis in
4.1.5.3 Block bases
Consider the direct sum
4.1.5.4 Infinite bases
The notion of a basis can be generalized to the case of an infinite-dimensional space
4.1.6 Vector norms
A norm norm
function:
let
= [3.0; 4.0]
x norm(x), # Euclidean norm (also written norm(x,2))
norm(x,Inf), # Max norm
norm(x,1) # 1-norm
end
(5.0, 4.0, 7.0)
Many other norms can be related to one of these three norms. In particular, a “natural” norm in an abstract vector space will often look strange in the corresponding concrete representation with respect to some basis function. For example, consider the vector space of polynomials with degree at most 2 on Polynomials
package, but they are not difficult to compute for real polynomials:
real_roots(p) = sort(real(filter(isreal, roots(p))))
real_roots(p, a, b) = filter(x -> (a <= x <= b),
real_roots(p))
normL2(p) = sqrt(integrate(p*p, -1, 1))
function normL∞(p)
= [-1.0; real_roots(derivative(p), -1, 1); 1.0]
nodes maximum(abs.(p.(nodes)))
end
function normL1(p)
= [-1.0; real_roots(p, -1, 1); 1.0]
nodes sum(abs(integrate(p, nodes[j], nodes[j+1]))
for j = 1:length(nodes)-1)
end
let
= Polynomial([-0.5, 0.0, 1.0])
p normL2(p),
normL∞(p),
normL1(p)
end
(0.483045891539648, 0.5, 0.60947570824873)
The same norms can also be approximated from samples using the midpoint rule (?sec-quad-diff-ch):
let
= 100
n = Polynomial([-0.5, 0.0, 1.0])
p = range(-1.0, 1.0, length=n+1)
X = (X[1:end-1] + X[2:end])/2
X = p.(X)
pX norm(pX)*sqrt(2/n),
norm(pX, Inf),
norm(pX, 1)*2/n
end
(0.48297688971626773, 0.4999, 0.6093599999999999)
Of course, when we write
In a finite-dimensional vector space, all norms are equivalent: that is, if
A normed vector space is a metric space, and so we can use it to define things like Cauchy sequences and limits. In a finite-dimensional normed space, any Cauchy sequence
For any Banach space
4.1.7 Inner products
An inner product
4.1.7.1 The Riesz map
The inner product defines an (anti)linear map (the Riesz map) from vectors
riesz(dot) = v -> DualVector(u -> dot(u, v))
According to the Riesz representation theorem, in finite-dimensional spaces, this defines a 1-1 correspondence between vectors and dual vectors. For
In infinite-dimensional inner product spaces, we have a 1-1 correspondence between vectors and continuous dual vectors (which is all dual vectors for a complete inner product space). A (not-necessarily finite-dimensional) Euclidean space that is complete under the Euclidean norm is called a Hilbert space.
4.1.7.2 The Gram matrix
Suppose
gram(dot, V) = Symmetric([dot(vj, vi) for vi in V[:], vj in V[:]])
The Gram matrix is Hermitian and (because
We can also write the Riesz map in terms of
4.1.7.3 Euclidean norms
Every inner product defines a corresponding norm
4.1.7.4 Angles and orthogonality
The inner product and the associated norm satisfy the Cauchy-Schwarz inequality
If
Vectors that are mutually orthogonal and have unit length in the Euclidean norm are said to be orthonormal. A basis
4.1.7.5 Einstein notation
For physicists working with Einstein notation on real vector spaces, the Gram matrix (aka the metric tensor) is sometimes written in terms of the components
4.1.7.6 The inner product
Returning to our example of a vector space of polynomials, the standard
dotL2(p, q) = integrate(p*conj(q), -1, 1)
The Gram matrix for this inner product with the power basis (using zero-based indexing) is
gram_L2_power(d) = [(i+j)%2 == 0 ? 2.0/(i+j+1) : 0.0
=0:d, j=0:d] for i
For example, for quadratics we have
gram_L2_power(2)
3×3 Matrix{Float64}:
2.0 0.0 0.666667
0.0 0.666667 0.0
0.666667 0.0 0.4
Hence we can compute
let
= Polynomial([1.0; 1.0; 1.0])
p = Polynomial([2.0; -3.0; 1.0])
q dotL2(p, q)
end
4.4
or via
1.0; 1.0; 1.0]'*gram_L2_power(2)*[2.0; -3.0; 1.0] [
4.3999999999999995
For the Chebyshev basis, the Gram matrix has entries
function gram_L2_chebyshev(d)
tint(j) = j%2 == 0 ? 2.0/(1-j^2) : 0.0
m(i,j) = (tint(i+j) + tint(abs(i-j)))/2.0
m(i,j) for i=0:d, j=0:d]
[end
gram_L2_chebyshev (generic function with 1 method)
For example, the Gram matrix for
gram_L2_chebyshev(4)
5×5 Matrix{Float64}:
2.0 0.0 -0.666667 0.0 -0.133333
0.0 0.666667 0.0 -0.4 0.0
-0.666667 0.0 0.933333 0.0 -0.361905
0.0 -0.4 0.0 0.971429 0.0
-0.133333 0.0 -0.361905 0.0 0.984127
We can sanity check our integration by comparing to another routine:
gram_L2_chebyshev(4) ≈ gram(dotL2, chebyshev_basis(4)[:])
true
For the Legendre basis, the Gram matrix is
gram(dotL2, legendre_basis(4)[:])
5×5 Symmetric{Float64, Matrix{Float64}}:
2.0 0.0 0.0 0.0 0.0
0.0 0.666667 0.0 0.0 0.0
0.0 0.0 0.4 0.0 0.0
0.0 0.0 0.0 0.285714 0.0
0.0 0.0 0.0 0.0 0.222222
That is, the Gram matrix for the Legendre basis is diagonal, with diagonal entries
normalized_legendre_basis(d) =
transpose([sqrt((2j+1)/2)*Pj
= zip(legendre_basis(d), 0:d)]) for (Pj,j)
normalized_legendre_basis (generic function with 1 method)
and these form an orthonormal basis for the
let
= normalized_legendre_basis(3)
P̄ gram(dotL2, P̄)
end
4×4 Symmetric{Float64, Matrix{Float64}}:
1.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 0.0 1.0 0.0
0.0 0.0 0.0 1.0
Using an orthonormal basis
function dualL2_δx(x, d)
= normalized_legendre_basis(d)
P̄ = [P̄j(x) for P̄j in P̄]'
c *c[:]
P̄end
dualL2_δx (generic function with 1 method)
For example, evaluating
let
= Polynomial([2.0; -3.0; 1.0])
p = dualL2_δx(0.5, 2)
δhalf p(0.5) ≈ dotL2(p, δhalf)
end
true
4.2 Maps and forms
A central object in linear algebra is a linear map between two spaces. If
Closely associated with linear maps between two vector spaces are certain maps from two vector spaces. A bilinear form is a map
On inner product spaces, each sesquilinear form
A linear map from a space to itself is called a linear operator. Linear operators have enough additional structure that we will deal with them separately.
4.2.1 Dual and adjoint maps
Any linear map
4.2.2 Matrices
4.2.2.1 Matrices for maps
The usual representation of a linear map
More generally, if
If we consider the case where
4.2.2.2 Matrices for sesquilinear forms
Suppose
We note that the matrix representation of the sesquilinear form is not identical to the matrix representation of
4.2.2.3 Representing derivatives
As a concrete example of matrix representations, we consider the derivative operator
function derivative_power(d)
= zeros(d,d+1)
A for i=1:d
+1] = i
A[i,iend
Aend
derivative_power (generic function with 1 method)
Alternately, we can write
A similar relationship holds for the first and second kind Chebyshev polynomials:
derivative_chebyshevT(d) =
change_U_to_T(d-1)\derivative_power(d)
derivative_chebyshevT (generic function with 1 method)
It is always useful to compare against another implementation of the same computation, e.g.
let
= zeros(5,6)
Dmatrix = derivative.(chebyshev_basis(5))
DT for j = 1:length(DT)
for ci = collect(pairs(convert(ChebyshevT, DT[j])))
1]+1,j] = ci[2]
Dmatrix[ci[end
end
≈ derivative_chebyshevT(5)
Dmatrix end
true
Now consider a sesquilinear form on
derivative_chebyshevT_bilinear(d) =
gram_L2_chebyshev(d-1)*derivative_chebyshevT(d)
derivative_chebyshevT_bilinear (generic function with 1 method)
As before, we can sanity check against an alternate computation
let
= chebyshev_basis(5)
T a(p, q) = dotL2(derivative(p), q)
derivative_chebyshevT_bilinear(5) ≈
a(T[j+1], T[i+1]) for i=0:4, j=0:5]
[end
true
With respect to the Legendre basis, one can differentiate the Bonnet recurrence relation for
function derivative_legendre(d)
= zeros(d,d+1)
DP if d > 0
1,2] = 1
DP[end
for j = 1:d-1
:, j+2] = DP[:,j]
DP[+1,j+2] = 2j+1
DP[jend
DPend
derivative_legendre (generic function with 1 method)
As usual, we sanity check our computation:
derivative.(legendre_basis(3)) ==
legendre_basis(2)*derivative_legendre(3)
true
If we want the derivative for the normalized Legendre polynomials, it is convenient to write them as
derivative_nlegendre(d) =
sqrt.(2.0./(2*(0:d-1).+1)) .*
derivative_legendre(d) .*
sqrt.((2*(0:d).+1)/2)'
derivative_nlegendre (generic function with 1 method)
Unlike some of our earlier computations, there is a little difference between different approaches to computing the normalized Legendre polynomials, so we check whether the results are close rather than exactly the same:
let
= normalized_legendre_basis(2)*derivative_nlegendre(3) -
err derivative.(normalized_legendre_basis(3))
all(iszero.(truncate.(err, atol=1e-12)))
end
true
4.2.3 Block matrices
Sometimes we are interested in
4.2.4 Canonical forms
A canonical form for a map
In general, we will consider two types of canonical forms: those associated with general choices of bases and those associated with orthonormal bases (when
4.2.4.1 Block decomposition
The first step to the canonical forms for a general linear map is to decompose
With respect to such a decomposition, we have the block quasimatrix
The block decomposition of a map is closely connected to the Fredholm alternative theorem. Consider the equations
- Consistent (
): There is a solution to the system. In fact, if is any solution (a particular solution), then the set of all possible solutions is . more generally, an -dimensional set of solutions - Inconsistent (
): There is not a solution to the system ( ), but there does exist such that and .
Without the decomposition of
The decomposition of a map based on range and cokernel (for
4.2.4.2 General bases
Let
4.2.4.3 Orthonormal bases
In a general vector space, we can use any complementary subspaces to the range and null space of
Our canonical form in this setting involves a special choice for the bases
4.2.4.4 Decomposed derivatives
We return again to our example of the derivative mapping from
More interesting is the singular value decomposition in the
The singular values of the derivative mapping from
let
= 10
dmax
# Compute the singular values for each degree d
= zeros(dmax,dmax)
σs for d = 1:dmax
= svd(derivative_nlegendre(d))
F 1:d,d] .= F.S[end:-1:1]
σs[end
# Draw one line for the ith smallest singular value as a function of d
= plot(1:dmax, σs[1,:], legend=:false, marker=:star, xlabel="d")
p for i = 2:dmax
plot!(i:dmax, σs[i,i:dmax], marker=:star)
end
pend
As we increase
As an example, consider the smallest singular value and the associated basis vectors for the derivative acting on
let
# Compute the SVD of D acting on P_10
= 10
d = svd(derivative_nlegendre(d))
F = normalized_legendre_basis(d-1)*F.U
U = normalized_legendre_basis(d)*F.V
V = F.S
σs
# Get the singular triple associated with
# the smallest (nonzero) singular value
= σs[end], U[end], V[end]
σ, u, v
# Check various relations (use a mesh of 100 points)
= range(-1,1,length=100)
xx = sign(u(0))
s println("""
- |D*u-v*σ| = $(norm((derivative(v)-u*σ).(xx), Inf))
- |sin(π/2*x)-v| = $(norm(s*sin.(π/2*xx)-v.(xx), Inf))
- |cos(π/2*x)-u| = $(norm(s*cos.(π/2*xx)-u.(xx), Inf))
- |σmin-π/2| = $(abs(σ-π/2))
""")
end
- |D*u-v*σ| = 5.076326047387471e-14
- |sin(π/2*x)-v| = 4.919919749379886e-9
- |cos(π/2*x)-u| = 1.3086657329447118e-7
- |σmin-π/2| = 3.552713678800501e-15
In general, the
4.2.5 (Pseudo)inverses
Suppose
The pseudoinverse depends on the decompositions
As an example, the Moore-Penrose pseudoinverse of the derivative operator from
4.2.6 Norms
The space
Suppose
For inner product spaces, we can also define norms in terms of the singular value decomposition. Some examples include
- The spectral norm (the largest of the singular values)
- The Frobenius norm (the two-norm of the singular values)
- The nuclear norm (the sum of the singular values)
All of these are also examples of Ky Fan norms (the Ky Fan
For concrete vector spaces, we can write the Frobenius norm as
4.2.7 Isometries
A linear map that preserves norms is called an isometry. Isometries are always one-to-one, since any null vector must have zero norm. If an isometry is also onto, it is sometimes called a global isometry. Global isometries are invertible, and the inverse is also an isometry.
For general norms, there are not always many interesting isometries. For example, the only isometries from
Isometries between inner product spaces necessarily preserve inner products, since the inner product can be recovered from the associated norm. For global isometries between inner product spaces, the adjoint and the inverse are the same. A global isometry from a space to itself is called an orthogonal operator in the real case, or a unitary operator in the complex case.
A unitary operator maps an orthonormal basis to another orthonormal basis. Therefore, if
4.2.8 Volume scaling
In an inner product space, we usually say that a paralleliped associated with an orthonormal basis (i.e. a unit cube) has unit volume. A unitary operator between two spaces maps a unit cube in one space to a unit cube in another space, i.e. it preserves volume. More generally, if
A volume-preserving transformation
from into (or ).A diagonal scaling operation on
that change each side length from to . The volume of the scaled paralleliped in is then .A volume-preserving transformation
from into .
Therefore, if
4.3 Operators
An operator is a map
Most of our discussion of operators focuses on three connected ideas:
A key feature of an operator
is its invariant subspaces, i.e subspaces such that lies within . Given an invariant subspace, we can analyze the associated restriction of to . One-dimensional invariant subspaces play a particular notable role, as they contain the eigenvectors of ; the restrictions are the associated eigenvalues.We can sensibly talk about polynomials in
: if , then . Such polynomials are useful for designing and analyzing numerical methods (e.g. Krylov subspace methods for linear systems and eigenvalue problems). They are also useful as a more purely theoretical tool.
In Chapter 5, we will also discuss an alternative approach to analyzing operators by applying the tools of complex analysis to the resolvent
4.3.1 Minimal polynomial
If
If
We can write the minimal polynomial in factored form (over
The
If we let
The characteristic polynomial is often also written as
4.3.2 Canonical forms
As in the case of maps between different spaces, we are interested in canonical matrix representations for operators
4.3.2.1 Jordan form
As we noted above, the space
When the minimal polynomial exponent
When the minimal polynomial has zeros with multiplicity greater than one, the matrix is not diagonalizable. In this case, we cannot form a basis of eigenvectors, but we can form a basis if we include generalized eigenvectors. In the Jordan canonical form, we find bases consisting of Jordan chains, i.e. vectors satisfying
The Jordan form is fragile: almost all operators are diagonalizable and have distinct eigenvalues, and infinitesimally small changes will usually completely destroy any nontrival Jordan blocks. Moreover, diagonalizable operators that are close to operators with nontrivial Jordan blocks may have eigenvector bases, but those eigenvector bases are not all that nice. We therefore would like an alternate canonical form that gives up some of the goal of being “as diagonal as possible.”
4.3.2.2 Schur form
In our discussion of the Jordan form, we started with the decomposition of
The complex Schur form may in general involve complex choices of the basis
4.3.3 Similar matrices
So far, we have focused on matrices as representations of operators on abstract vector spaces, where we choose one basis
The properties we have discussed so far (eigenvalues, the minimal and characteristic polynomial, the Jordan form) are fundamentally properties of operators, and do not depend on the basis in which those operators are expressed. As much as possible, we try to make this clear by giving basis-free explanations when possible. But it is also possible to give matrix-based arguments in which we argue for invariance under similarity transformations. For example, if
4.3.4 Trace
We typically describe the trace in terms of a matrix representation: the trace of a matrix is the sum of its diagonal entries, i.e.
4.3.5 Frobenius inner product
The trace is also useful for defining other structures on spaces of linear maps in a basis-free way. Let
More generally, the trace can be used to define the Frobenius inner product on the space
4.3.6 Hermitian and skew
If
The operators
Any operator
There is an even stronger number between the Hermitian / skew-Hermitian decomposition and the complex numbers for the case of
4.4 Hermitian and quadratic forms
A sesquilinear form
If
If
A quadratic form is a function
4.4.1 Matrices
Given a basis
A quadratic form
Let
4.4.2 Canonical forms
In the case of a general basis, there exists is a basis such that the matrix representation of a Hermitian form is
- Positive definite if
- Positive semidefinite if
- Negative definite if
- Negative semidefinite if
- Strongly indefinite if
and .
A Hermitian positive definite form is an inner product.
If
The Hermitian eigenvalue problem has an enormous amount of structure that comes from the fact that it can be interpreted either in terms of a Hermitian form or the associated quadratic or in terms of an operator.
4.4.3 A derivative example
As before, it is useful to consider a specific example that does not involve a concrete vector space. We will use the real symmetric form on
let
# Implement the L map
= 2
d = dualL2_δx( 1.0, d)
δp1 = dualL2_δx(-1.0, d)
δn1 function L(p)
= derivative(p)
Dp = derivative(Dp)
D2p δp1*Dp(1) - δn1*Dp(-1) - D2p
end
# Test polynomials p and q
= Polynomial([2.0; -3.0; 1.0])
p = Polynomial([1.0; 1.0; 1.0])
q
# Test agreement with the form and self-adjointness
dotL2(derivative(p), derivative(q)) ≈ dotL2(L(p), q),
dotL2(L(p), q) ≈ dotL2(p, L(q))
end
(true, true)
The form is positive semidefinite, since
let
= 3
d
# Compute the matrix in terms of the normalized Legendre polynomials
= normalized_legendre_basis(d)
P̄ = gram(dotL2, derivative.(P̄))
AP̄
# Compute the eigendecomposition
= eigen(AP̄)
F = P̄*F.vectors
U
# Check orthonormality of the U basis,
# and that the associated matrix is Λ
gram(dotL2, U) ≈ I,
gram(dotL2, derivative.(U)) ≈ Diagonal(F.values)
end
(true, true)
Similar to what we saw in our discussion of the singular value decomposition, the
let
= 15
d = normalized_legendre_basis(d)
P̄ = gram(dotL2, derivative.(P̄))
AP̄
= plot()
p for j=2:5
= [eigvals(AP̄[1:n,1:n])[j] for n=j:d+1]
λs plot!(j:d+1, abs.(λs.-((j-1)*π/2)^2),
=:log10, label="λ[$j]")
yscaleend
pend
4.5 Tensors and determinants
So far, we have dealt with linear, bilinear, sesquilinear, and quadratic functions on vector spaces. Going beyond this to functions that are linear in several arguments takes us into the land of tensors.
4.5.1 Takes on tensors
There are several ways to approach tensors. In much of numerical analysis, only tensors of concrete spaces are considered, and tensors are treated as arrays of multi-dimensional arrays of numbers (with matrices as a special case where there are two dimensions used for indexing). For example, a tensor with three indices is represented by an element
Here, though, we will focus on tensors as basic objects of multilinear algebra. There are still several ways to approach this:
We can consider tensors purely in terms of bases, along with change-of-basis rules. This view, common in some corners of physics, says that “a tensor is an object that transforms like a tensor,” a perspective similar to “a vector is an object that transforms like a vector” that we mentioned earlier. We will prefer a discussion that gives changes of basis as a consequence of a more fundamental definition, rather than as the definition itself.
We can consider tensors from an algebraic perspective as a quotient space: the tensor product space
is a vector space spanned by elements of the Cartesian product , modulo the equivalence relations We write to denote the equivalence class associated with . While this definition has the advantage of being very general (it can generalize to structures other than vector spaces) and it does not depending on a basis, it can be a bit overly abstract.We will mostly consider tensors as multilinear forms, generalizing the notion of bilinear forms discussed above. This has the advantage of extending our discussion of bilinear and sesquilinear forms from earlier, remaining abstracted away from a specific basis while not too abstract.
4.5.2 Multilinear forms
Let
More generally, let
The vector space of all multilinear forms
In terms of the partial evaluation operation discussed above, we might consider
When a real inner product space
It is tempting to ask if the coefficients might be made simple by a special choices of the bases for the component spaces, analogous to the anonical forms we discussed before. Unfortunately, tensors beyond bilinear forms lack any nice canonical representation.
4.5.3 Polynomial products
As an example of tensor product spaces in action, we consider the space of polynomials in three variables with maximum degree per variable of
For concrete tensors, contractions can also be used to express operations like changing basis. For example, if
let
= 2
d = rand(d+1,d+1,d+1) # Power coeffs
C
# Transform to the Chebyshev coefficients
= copy(C)
D = chebyshev_power_coeffs(d) # A = inv(M)
M
# Contract with a_li
for j=1:d+1
for k = 1:d+1
:,j,k] = M\D[:,j,k]
D[end
end
# Contract with a_mj
for i=1:d+1
for k=1:d+1
:,k] = M\D[i,:,k]
D[i,end
end
# Contract with a_nk
for i=1:d+1
for j=1:d+1
:] = M\D[i,j,:]
D[i,j,end
end
# Check correctness by comparing evaluations a random point
= chebyshev_basis(d)
T = rand(3)
x,y,z = 0.0
p1 = 0.0
p2 for i=1:d+1
for j=1:d+1
for k=1:d+1
+= C[i,j,k] * x^(i-1) * y^(j-1) * z^(k-1)
p1 += D[i,j,k] * T[i](x) * T[j](y) * T[k](z)
p2 end
end
end
≈ p2
p1 end
true
4.5.4 Alternating forms
An alternating form is a multilinear form on several copies of the same vector space
The wedge product of two tensors over copies of the same space is
If
4.5.5 Determinants
Let
If
An alternating form on the concrete space
(written in terms of matrices )Equal to one for the identity matrix.
The interpretation of the determinant as representing a scaling of (signed) volumes explains various other properties, such as the important fact that
The determinant of a singular matrix must be zero, which explains why we can write the characteristic polynomial for a matrix as
“Vector spaces are like potato chips: you cannot have just one.” - W. Kahan↩︎
In infinite-dimensional settings, we will restrict our attention to continuous linear functionals (the continuous dual space) rather than all linear functionals (the algebraic dual space).↩︎
The set containing only the zero vector is the smallest subspace we ever talk about.↩︎
“Caveat lector”: let the reader beware!↩︎
Pafnuty Chebyshev was a nineteenth century Russian mathematician, and his name has been transliterated from the Cyrillic alphabet into the Latin alphabet in several different ways. We inherit our usual spelling from one of the French transliterations, but the symbol
for the polynomials comes from the German transliteration Tschebyscheff.↩︎Some authors write
for the set of bounded linear maps between two spaces. This distinction matters when we are dealing with infinite-dimensional normed vector spaces — since we are mostly concerned with the finite-dimensional case, we will not worry about it.↩︎In the absence of an inner product, the coimage is most naturally thought of as a subspace of
and the cokernel as a subspace of . The Riesz map allows us to move these back to subspaces of and .↩︎The notion of an induced norm makes sense in infinite-dimensional spaces as well, but there the supremum may not be achieved.↩︎
A monic polynomial is one in which the coefficient in front of the highest-degree term is 1.↩︎
There is a canonical injection
via . In finite-dimensional spaces, this is alway also surjective; in infinite-dimensional spaces, it is only surjective when the space is reflexive.↩︎