4  Linear Algebra

We will not get far in this book without a strong foundation in linear algebra. That means knowing how to compute with matrices, but it also means understanding the abstractions of linear algebra well enough to use them effectively. We assume prior familiarity with linear algebra at the level of an undergraduate course taught from Strang () or Lay, Lay, and McDonald (). It will be helpful, but not necessary, to have a more advanced perspective as covered in Axler () or even Lax (). Even for readers well-versed with linear algebra, we suggest skimming this chapter, as some ideas (quasimatrices, the framework of canonical forms) are not always standard in first (or even second) linear algebra courses.

While our treatment will be fairly abstract, we try to keep things concrete by regularly giving examples in Julia. Readers unfamiliar with Julia may want to review ; it is also possible to simply skim past the code examples.

We do not assume a prior course in functional analysis. This will not prevent us from sneaking a little functional analysis into our discussion, or treating some linear algebra concepts from a functional analysis perspective.

4.1 Vector spaces

A vector space (or linear space) over a field F is a set with elements (vectors) that can be added or scaled in a sensible way – that is, addition is associative and commutative and scaling (by elements of the field) is distributive. We will always take F to be R or C. We generally denote vector spaces by script letters (e.g. V,W), vectors by lower case Roman letters (e.g. v,w), and scalars by lower case Greek letters (e.g. α,β). But we feel free to violate these conventions according to the dictates of our conscience or in deference to other conflicting conventions.

There are many types of vector spaces. Apart from the ubiquitous concrete vector spaces Rn and Cn, the most common vector spaces in applied mathematics are different types of function spaces. These include Pd={polynomials of degree at most d};L(V,W)={linear maps VW};Ck(Ω)={ k-times differentiable functions on a set Ω};1={absolutely summable sequences x1,x2,}2={absolutely square-summable sequences x1,x2,}={bounded sequences x1,x2,} and many more. A finite-dimensional vector space can be put into 1-1 correspondence with some concrete vector space Fn (using a basis). In this case, n is the dimension of the space. Vector spaces that are not finite-dimensional are infinite-dimensional.

4.1.1 Polynomials

We compute with vectors in Fn (Rn and Cn), which we represent concretely by tuples of numbers in memory, usually stored in sequence. To keep a broader perspective, we will also frequently describe examples involving the polynomial spaces Pd.

The Julia Polynomials.jl package provides a variety of representations of and operations on polynomials.

using Polynomials

For example, we can construct a polynomial in standard form from its coefficients in the monomial basis or from its roots, or we can keep it in factored form internally.

Polynomial([2, -3, 1])
2 - 3∙x + x2
fromroots([1, 2])
2 - 3∙x + x2
fromroots(FactoredPolynomial, [1, 2])
(x - 1) * (x - 2)

The fact that there are several natural representations for polynomials is part of what makes them a good example of an abstract vector space.

4.1.2 Dual spaces

A vector space in isolation is a lonely thing. Fortunately, every vector space has an associated dual space of linear functionals, that is V={linear functions VF}. Dual pairs are everywhere in applied mathematics. The relation between random variables and probability densities, the theory of Lagrange multipliers, and the notion of generalized functions are all examples. It is also sometimes convenient to recast equations in terms of linear functionals, e.g. using the fact that v=0 iff wV,wv=0.

In matrix terms, we usually associate vectors with columns and dual vectors with columns (ordinary vectors) and rows (dual vectors). In Julia, we construct a dual vector by taking the conjugate transpose of a column, e.g.

typeof([1.0; 2.0; 3.0])   # Column vector in R^3
Vector{Float64} (alias for Array{Float64, 1})
typeof([0.0; 1.0; 0.0]')  # Row vector in R^3
Adjoint{Float64, Vector{Float64}}

Applying a linear functional is concretely just “row times column”:

let
    v = [1.0; 2.0; 3.0]   # Column vector in R^3
    w = [0.0; 1.0; 0.0]'  # Row vector in R^3
    w*v
end
2.0

In order to try to keep the syntax uniform between concrete vectors and operations on abstract polynomial spaces, we will define a DualVector type representing a (linear) functional

struct DualVector
    f :: Function
end

We overload the function evaluation and multiplication syntax so that we can write application of a dual vector w to a vector v as w(v) or as w*v.

(w :: DualVector)(v) = w.f(v)
Base.:*(w :: DualVector, v) = w(v)

Because dual vectors still make up a vector space, we would like to be able to add and scale them:

Base.:+(w1 :: DualVector, w2 :: DualVector) =
    DualVector(v -> w1(v) + w2(v))
Base.:-(w1 :: DualVector, w2 :: DualVector) =
    DualVector(v -> w1(v) - w2(v))
Base.:-(w :: DualVector) =
    DualVector(v -> -w(v))
Base.:*(c :: Number, w :: DualVector) =
    DualVector(v -> c*w(v))
Base.:*(w :: DualVector, c :: Number) = c*w
Base.:/(w :: DualVector, c :: Number) =
    DualVector(v -> w(v)/c)

Evaluating a polynomial at a point is one example of a functional in Pd; another example is a definite integral operation:

let
    p = Polynomial([2.0; -3.0; 1.0])
    w1 = DualVector(p -> p(0.0))
    w2 = DualVector(p -> integrate(p, -1.0, 1.0))
    w1*p, w2*p
end
(2.0, 4.666666666666666)

4.1.3 Subspaces

Given a vector space V, a (nonempty) subset UV is a subspace if it is closed under the vector space operations (addition and scalar multiplication). We usually take U to inherit any applicable structure from V, such as a norm or inner product. Many concepts in numerical linear algebra and approximation theory are best thought of in terms of subspaces, whether that is nested subspaces used in approximation theory, subspaces that are invariant under some linear map, or subspaces and their orthogonal complements.

There are a few standard operations involving subspaces:

  • The span of a subset SV is the smallest subspace that contains S, i.e. the set of all finite linear combinations icisi where each siS and ciF.

  • Given a collection of subspaces Ui, their sum is the subspace U consisting of all sums of elements iui where only finitely many uiUi are nonzero. We say the sum is a direct sum if the decomposition into vectors uiUi is unique.

  • The annihilator for a subspace UV is the set of linear functionals that are zero on U. That is: U={wV:uU,wu=0}. In inner product spaces, the annihilator of a subspace is identified with the orthogonal complement ().

  • Given a subspace UV, the quotient space V/U is a vector space whose elements are equivalence classes under the relation v1v2 if v1v2U. We can always find a “complementary” subspace UcV isomorphic to V/U such that each equivalence class in V/U contains a unique representative from Uc. Indeed, we can generally find many such spaces! For any complementary subspace Uc, V is a direct sum UUc.

  • In the infinite-dimensional setting, the closure of a subspace consists of all limit points in the subspace. This only makes sense in topological vector spaces where we have enough structure that we can take limits. We will always consider the even stronger structure of normed vector spaces.

4.1.4 Quasimatrices

Matrix notation is convenient for expressing linear algebra over concrete vector spaces like Rn and Cn. To get that same convenience for abstract vector spaces, we introduce the more general notion of quasimatrices, matrix-like objects that involve more than just ordinary numbers. This includes letting columns or rows belong to an abstract vector space instead of a concrete space like Rm or Rn, or using linear maps between subspaces as the “entries” of a matrix.

4.1.4.1 Column quasimatrices

A (column) quasimatrix is a list of vectors in some space V interpreted as columns of a matrix-like object, i.e. U=[u1u2un]. We write a linear combination of the columns of U as Uc=i=1nuici for some coefficient vector cFn. The range space R(U)={Uc:cFn} (or column space) is the set of all possible linear combinations of the columns; this is the same as the span of the set of column vectors. The key advantages of column quasimatrices over sets of vectors is that we assign indices to name the vectors and we allow for the possibility of duplicates.

When V is a concrete vector space like Rm or Cm, the notion of a quasimatrix coincides with the idea of an ordinary matrix, and taking a linear combination of columns corresponds to ordinary matrix-vector multiplication. We will typically use the word “quasimatrix” when the columns correspond to vectors in an abstract vector space like Pd.

We can represent a quasimatrix with columns in Pd by an ordinary Julia array of Polynomials, e.g.

Udemo = let
    p1 = Polynomial([1.0])
    p2 = Polynomial([-1.0, 1.0])
    p3 = Polynomial([2.0, -3.0, 1.0])
    transpose([p1; p2; p3])
end
1×3 transpose(::Vector{Polynomial{Float64, :x}}) with eltype Polynomial{Float64, :x}:
 Polynomial(1.0)  Polynomial(-1.0 + 1.0*x)  Polynomial(2.0 - 3.0*x + 1.0*x^2)

We can use these together with standard matrix operations involving concrete vectors, e.g.

Udemo*[1.0 1.0; 1.0 1.0; 1.0 0.0]
1×2 transpose(::Vector{Polynomial{Float64, :x}}) with eltype Polynomial{Float64, :x}:
 Polynomial(2.0 - 2.0*x + 1.0*x^2)  Polynomial(1.0*x)

We also want to be able to multiply a dual vector by a quasimatrix to get a concrete row vector:

Base.:*(w :: DualVector, U :: AbstractMatrix) = map(w, U)

For example,

DualVector(p->p(0.0)) * Udemo
1×3 transpose(::Vector{Float64}) with eltype Float64:
 1.0  -1.0  2.0

We sometimes want to refer to a subset of the columns. In mathematical writing, we will use notation similar to that in Julia or MATLAB: U:,p:q=[upup+1uq]. We will also sometimes (horizontally) concatenate two quasimatrices with columns in the same V, e.g. U=[U:,1:pU:,p+1:n]. The range of the horizontal concatenation of two quasimatrices is the same as the sum of the ranges, i.e. R(U)=R(U:,1:p)+R(U:,p+1:n).

4.1.4.2 Row quasimatrices

Every vector space V over a field F has a natural dual space V of linear functions from V to F (aka linear functionals). A row quasimatrix is a list of vectors in a dual space V interpreted as the rows of a matrix-like object, i.e. W=[w1w2wm]. We write linear combinations of rows as cW=i=1mc¯iwi; the set of all such linear combinations is the row space of the quasimatrix. We can refer to a subset of rows with colon notation as in Julia or MATLAB: (W)p:q,:=[wpwp+1wq]. We can also combine row quasimatrices via horizontal concatenation, e.g. W=[(W)1:p,:(W)p+1:m,:].

In Julia, analogous to the column quasimatrices over spaces Pd, we can define a structure representing row quasimatrices over spaces Pd.

Wdemo = DualVector.(
    [p -> p(0.0);
     p -> p(1.0);
     p -> integrate(p, -1.0, 1.0)])
3-element Vector{DualVector}:
 DualVector(var"#27#30"())
 DualVector(var"#28#31"())
 DualVector(var"#29#32"())

4.1.4.3 Duality relations

For a column quasimatrix U with columns in V, the map cUc takes a concrete vector cFn to an abstract vector V. Alternately, if wV is a linear functional, we have d=wU=[wu1wu2wun], representing a concrete row vector such that dc=w(Uc). So U can be associated either with a linear map from a concrete vector space Fn to an abstract vector space V, or with a linear map from the abstract dual space V to a concrete (row) vector space associated with linear functionals on Fn.

# Example: map concrete [1; 1; 1] to a Polynomial
Udemo*[1; 1; 1]
2.0 - 2.0∙x + 1.0∙x2
# Example: map DualVector to a concrete row
DualVector(p->p(0))*Udemo
1×3 transpose(::Vector{Float64}) with eltype Float64:
 1.0  -1.0  2.0

Similarly, a row quasimatrix W can be interpreted as a map from a concrete row vector to an abstract dual vector in V, or as a map from an abstract vector in V to a concrete vector in Fm.

# Example: map concrete row to an abstract dual vector
typeof([1.0; 1.0; 1.0]'*Wdemo)
DualVector
# Example: map Polynomial to a concrete vector
Wdemo.*Polynomial([2.0, -3.0, 1.0])
3-element Vector{Float64}:
 2.0
 0.0
 4.666666666666666

Composing the actions of a row quasimatrix with a column quasimatrix gives us an ordinary matrix representing a mapping between two abstract vector spaces: WU=[w1u1w1unwmu1wmun] For example,

Wdemo*Udemo
3×3 Matrix{Float64}:
 1.0  -1.0  2.0
 1.0   0.0  0.0
 2.0  -2.0  4.66667

Conversely, composing the actions of a column quasimatrix with a row quasimatrix (assuming m=n) gives us an operator mapping V to V: UW=k=1nukwk. In Julia, we can write this mapping as

UWdemo(p) = Udemo*(Wdemo.*p)
UWdemo (generic function with 1 method)

These two compositions correspond respectively to the inner product and outer product interpretations of matrix-matrix multiplication.

4.1.4.4 Infinite quasimatrices

We may (rarely) discuss “infinite” quasimatrices with a countable number of columns. We will be analysts about it instead of algebraists. In the case of column quasimatrices:

  • We assume the columns lie in some complete vector space.

  • When we take a linear combination Uc, we assume the sequence c is restricted to make sense of Uc=limnU:,1:nc1:n. Usually, this will mean c is bounded (), absolutely summable (1), or absolutely square summable (2).

  • We take the “range” of such a quasimatrix to be the set of allowed limits of linear combinations.

We can similarly define infinite versions of row-wise quasimatrices.

4.1.5 Bases

A basis set for a finite-dimensional vector space V is a finite set SV that spans V and whose elements are linearly independent — that is, no nonzero linear combination of entries of S is equal to zero. We can equivalently define a basis quasimatrix U such that R(U)=V and the mapping cUc is one-to-one. That is, we think of a basis quasimatrix as an invertible linear mapping from a concrete vector space Fn to an abstract space V. While we may refer to either the set or the quasimatrix as “a basis” when terminology is unambiguous, by default we use quasimatrices.

Thinking of dual spaces as consisting of rows, we generally represent the basis of V by a row quasimatrix W. The map ddW is then an invertible linear map from the set of concrete row vectors wo V. If U and W are bases of V and V, the matrix X=WU represents the composition of two invertible maps (from Fn to V and back via the two bases), and so is invertible. The map U1=X1W is the dual basis associated with U, which satisfies that U1U is the n-by-n identity matrix and UU1 is the identity map on V.

If U and U^=UX are two bases for V, a vector can be written as v=Uc or v=U^c^ where c^=X1c. Because we use X to change from the U basis to the U^ basis and use X1 to change from the c coefficients to the c^ coefficients, we say the coefficient vector is contravariant (it changes in contrast to the basis). In contrast, an abstract dual vector wV can be written in terms of the U and U^ vectors with the concrete row vectors d=wU or d^=wU^=dX. Because the transformations from U to U^ and from d to d^ both involve the same operation with X, we say these coefficient vector is covariant (it changes together with the basis).

4.1.5.1 Component conventions

We will generally follow the convention common in many areas of numerical analysis: a vector vV is associated with a column, a dual vector wV is associated with a row, and writing a dual vector followed by a vector (wv) indicates evaluation. However, there are some shortcomings to this convention, such as situations where we want to apply a dual vector to a vector but some other notational convention prevents us from writing the two symbols adjacent to each other. This issue becomes particularly obvious in multilinear algebra (tensor algebra), as we will see in .

An alternate convention that addresses some of these issues is frequently used in areas of physics where tensors are particularly common. In this convention (“Einstein notation”), vectors with respect to a basis V=[v1vn] are written as v=vici where the summation convention is in effect, i.e. there is an implicit summation over repeated indices. Dual vectors are written with respect to the basis V1=[w1wn] as w=diwi. Therefore wv=dici. Note that ordering of the symbols in the product makes no difference in this notation: dici and cidi indicate the same scalar value.

The convention of using subscripts to denote vector coefficients (contravariant coefficients) and superscripts to denote dual coefficients (covariant coefficients) gives us a mechanism for “type checking” expressions: summation over a repeated index should involve one superscript (a covariant index) and one subscript (a contravariant index). At the same time, many authors who follow the summation convention only use subscript indices in order to save the superscript space for other purposes. Caveat lector!

Indices are useful for matching functions (dual vectors) and arguments (vectors) without explicit reference to position in an expression. At the same time, indicial notation presumes a basis, with particularly basis-centric authors sometimes writing statements like “a vector is an array that transforms like a vector.” While we usually use bases in our computation, we prefer a basis-independent perspective for mathematical arguments when possible. With this said, we will revert to indicial notation when the situation merits it.

4.1.5.2 Polynomial bases

As an example of different choices of bases, we consider the polynomial spaces Pd. A common choice of basis for Pd is the power basis X0:d=[1xx2xd]. That is, we write polynomials pPd in terms of the basis as p=X0:dc=j=0dcjxj. A common basis for Pd is a basis of point evaluation functionals, i.e. Yp=[p(x0)p(x1)p(xd)] for distinct points x0,x1,,xd. The matrix YX0:d is the Vandermonde matrix V=[1x0x02x0d1x1x12x1d1x2x22x2d1xdxd2xdd]. The basis V1W is the dual basis associated with the monomial basis. For example, for quartic polynomials with equally spaced polynomials on [1,1], we can write

let
    p = Polynomial([1.0; 2.0; 3.0; 4.0; 5.0])
    X = range(-1.0, 1.0, length=5)
    V = [xi^j for xi in X, j in 0:4]
    c = V\p.(X)
end
5-element Vector{Float64}:
 1.0
 2.0
 3.0
 4.0
 5.0

That is, if p(x)=c0+c1x+c2x2+c3x3+c4x4, and let pX denote the vector of samples p(1),p(0.5),p(0),p(0.5),p(1), then c=V1pX. This is, alas, not a particularly good numerical approach to recovering a polynomial from point values, a topic that we will return to in .

The power basis is not the only basis for Pd. Other common choices include Newton or Lagrange polynomials with respect to a set of points. Along with Vandermonde matrices, we will discuss these topics in . We will sometimes use the (first kind) Chebyshev polynomial basis T0:d with columns T0(x),,Td(x) given by the recurrence T0(x)=1T1(x)=xTj+1(x)=2xTj(x)Tj1(x),j1. The Chebyshev polynomials can also be written on [1,1] in terms of trigonometric functions: Tm(cos(θ))=cos(mθ). This connection between Chebyshev polynomials and trigonometric functions is part of their utility, and allows us to use transform methods for working with Chebyshev polynomials ().

The related second kind Chebyshev polynomial basis U0:d satisfies U0(x)=1U1(x)=2xUj+1(x)=2xUj(x)Uj1(x),j1. These polynomials satisfy Um(cos(θ))sin(θ)=sin(mθ). These polynomials are useful in part because of the derivative relationship Tm(x)=mUm1(x).

In matrix terms, we can write the relationship between the Chebyshev polynomials and the power basis as T0:d=X0:dM where the matrix M is computed via the three-term recurrence:

function chebyshev_power_coeffs(d, kind=1)
    n = max(d+1, 2)
    M = zeros(n, n)
    M[1,1] = 1                          # p0(x) = 1
    M[2,2] = kind                       # p1(x) = kind*x
    for j = 2:d                         # p_{j+1}(x) =
        M[2:end,j+1] .= 2*M[1:end-1,j]  #   2x p_j(x)
        M[:,j+1] .-= M[:,j-1]           #   -p_{j-1}(x)
    end
    UpperTriangular(M[1:d+1,1:d+1])
end
chebyshev_power_coeffs (generic function with 2 methods)

For example, for the quadratic case, we have

chebyshev_power_coeffs(2)
3×3 UpperTriangular{Float64, Matrix{Float64}}:
 1.0  0.0  -1.0
  ⋅   1.0   0.0
  ⋅    ⋅    2.0

and the coefficient vectors d and c with respect to the Chebyshev and power bases are such that p=T0:dd=X0:dMd=X0:dc, so Md=c. For example, to write the polynomial p(x)=23x+x2 in the Chebyshev basis, we solve the triangular linear system:

chebyshev_power_coeffs(2) \ [2.0; -3.0; 1.0]
3-element Vector{Float64}:
  2.5
 -3.0
  0.5

If we would like to form a representation of the Chebyshev basis, we could use the power basis expansion

function chebyshev_basis(d, kind=1)
    M = chebyshev_power_coeffs(d, kind)
    transpose([Polynomial(M[1:j,j]) for j=1:d+1])
end

chebyshev_basis(2)           
1×3 transpose(::Vector{Polynomial{Float64, :x}}) with eltype Polynomial{Float64, :x}:
 Polynomial(1.0)  Polynomial(1.0*x)  Polynomial(-1.0 + 2.0*x^2)

However, the Chebyshev polynomials are natively supported in the Polynomials.jl package, and we can explicitly write polynomials in terms of Chebyshev polynomials. For example,

ChebyshevT([2.5, -3.0, 0.5])
2.5⋅T_0(x) - 3.0⋅T_1(x) + 0.5⋅T_2(x)
convert(Polynomial, ChebyshevT([2.5, -3.0, 0.5]))
2.0 - 3.0∙x + 1.0∙x2

The coefficients satisfy exactly the equations from above.

The bases T0:d and U0:d are also both bases for Pd, so there must be a change of basis matrix that converts between the two. We can look up or derive that Tn(x)=12(Un(x)Un2(x)), or, in terms of a matrix representation of the change of basis, T0:d=U0:dB where B is the upper triangular matrix B=[112121212121212]. Alternately, in code we have

function change_U_to_T(d)
    B = Matrix(0.5*I, d+1, d+1)
    B[1,1] = 1.0
    for j = 3:d+1
        B[j-2,j] = -0.5
    end
    B
end
change_U_to_T (generic function with 1 method)

That is, we should have

chebyshev_basis(5) ==
    chebyshev_basis(5,2) * change_U_to_T(5)
true

In addition to the Chebyshev polynomials, we will sometimes want the Legendre polynomial basis P0:d(x) with columns Pj(x) given by the recurrence P0(x)=1P1(x)=x(j+1)Pj+1(x)=(2j+1)xPj(x)jPj1(x). As with the Chebyshev polynomials, we can define an upper triangular matrix whose columns are coefficients for the Legendre polynomials in the power basis:

function legendre_power_coeffs(d)
    n = max(d+1, 2)
    M = zeros(n, n)
    M[1,1] = 1
    M[2,2] = 1
    for j = 2:d
        M[2:end,j+1] .= (2*j-1)*M[1:end-1,j]
        M[:,j+1] .-= (j-1)*M[:,j-1]
        M[:,j+1] ./= j
    end
    UpperTriangular(M[1:d+1,1:d+1])
end
legendre_power_coeffs (generic function with 1 method)

If we want a representation in terms of a list of Polynomial objects (representing the basis polynomials), we can simply convert the columns:

function legendre_basis(d)
    M = legendre_power_coeffs(d)
    transpose([Polynomial(M[1:j,j]) for j=1:d+1])
end

legendre_basis(2)           
1×3 transpose(::Vector{Polynomial{Float64, :x}}) with eltype Polynomial{Float64, :x}:
 Polynomial(1.0)  Polynomial(1.0*x)  Polynomial(-0.5 + 1.5*x^2)

Each basis we have discussed has a different use case. Sometimes the “obvious” choice of basis (e.g. the standard basis in Rn or the power basis in Pd) is not the best choice for numerical computations.

4.1.5.3 Block bases

Consider the direct sum V=U1U2Up. Given basis quasimatrices Uj for the spaces Uj, there is an associated basis quasimatrix V for V defined by horizontal concatenation: V=[U1U2Up]. Conversely, we can think of partitioning the columns of a basis V for V into disjoint subsets; then each subset is itself a basis for some subspace, and V is a direct sum of those subspaces. This sort of “block thinking” will generally be useful in our treatment of numerical linear algebra.

4.1.5.4 Infinite bases

The notion of a basis can be generalized to the case of an infinite-dimensional space V in two ways. The algebraists version (the Hamel basis) is a set of linearly independent vectors such that any vector is a finite linear combination of basis vectors. Such a basis always exists if we assume the axiom of choice, but is not particularly useful for our purposes. More useful for topics such as approximation theory and probability is the analyst’s notion of a basis: an infinite quasimatrix that represents a map between some sequence space (e.g. 1, 2, or ) and an abstract vector space.

4.1.6 Vector norms

A norm measures vector lengths. It is positive definite, (absolutely) homogeneous, and sub-additive: v0 and v=0 iff v=0αv=|α|vu+vu+v. The three most common vector norms we work with in Rn are the Euclidean norm (aka the 2-norm), the -norm (or max norm), and the 1-norm: v2=j|vj|2v=maxj|vj|v1=j|vj| These are all available via Julia’s norm function:

let
    x = [3.0; 4.0]
    norm(x),      # Euclidean norm (also written norm(x,2))
    norm(x,Inf),  # Max norm
    norm(x,1)     # 1-norm
end
(5.0, 4.0, 7.0)

Many other norms can be related to one of these three norms. In particular, a “natural” norm in an abstract vector space will often look strange in the corresponding concrete representation with respect to some basis function. For example, consider the vector space of polynomials with degree at most 2 on [1,1]. This space also has a natural Euclidean norm, max norm, and 1-norm; for a given polynomial p(x) these are p2=11|p(x)|2dxp=maxx[1,1]|p(x)|p1=11|p(x)|dx. These norms are not built into the Polynomials package, but they are not difficult to compute for real polynomials:

real_roots(p) = sort(real(filter(isreal, roots(p))))
real_roots(p, a, b) = filter(x -> (a <= x <= b),
                             real_roots(p))

normL2(p) = sqrt(integrate(p*p, -1, 1))

function normL∞(p)
    nodes = [-1.0; real_roots(derivative(p), -1, 1); 1.0]
    maximum(abs.(p.(nodes)))
end

function normL1(p)
    nodes = [-1.0; real_roots(p, -1, 1); 1.0]
    sum(abs(integrate(p, nodes[j], nodes[j+1]))
        for j = 1:length(nodes)-1)
end

let
    p = Polynomial([-0.5, 0.0, 1.0])
    normL2(p),
    normL∞(p),
    normL1(p)
end
(0.483045891539648, 0.5, 0.60947570824873)

The same norms can also be approximated from samples using the midpoint rule (?sec-quad-diff-ch):

let
    n = 100
    p = Polynomial([-0.5, 0.0, 1.0])
    X = range(-1.0, 1.0, length=n+1)
    X = (X[1:end-1] + X[2:end])/2
    pX = p.(X)
    norm(pX)*sqrt(2/n),
    norm(pX, Inf),
    norm(pX, 1)*2/n
end
(0.48297688971626773, 0.4999, 0.6093599999999999)

Of course, when we write p(x) in terms of the coefficient vector with respect to the power basis (for example), the max norm of the polynomial is not the same as the max norm of the coefficient vector.

In a finite-dimensional vector space, all norms are equivalent: that is, if and |||||| are two norms on the same finite-dimensional vector space, then there exist constants c and C such that for any v in the space, cv|||v|||Cv. Of course, there is no guarantee that the constants are small! In infinite-dimensional spaces, not all norms are equivalent.

A normed vector space is a metric space, and so we can use it to define things like Cauchy sequences and limits. In a finite-dimensional normed space, any Cauchy sequence {vk}k=1 in a finite-dimensional vector space V converges to a limit vV. A normed infinite-dimensional vector space where Cauchy sequences converge (i.e. a space that is complete under the norm) is called a Banach space.

For any Banach space V, we can define the dual norm for continuous functionals w in the dual space V by w=supv0wvv=supv=1wv. In the finite-dimensional case, the supremum is always achieved, i.e. for any w there exists a v of unit norm (v=1) such that wv=w. Conversely, for any v there exists a w of unit norm such that wv=v. The -norm and the 1-norm are dual norms of each other, and the Euclidean norm is its own dual.

4.1.7 Inner products

An inner product , is a function from two vectors into the real numbers (or complex numbers for an complex vector space). It is positive definite, linear in the first slot, and symmetric (or Hermitian in the case of complex vectors); that is: v,v0 and v,v=0 iff v=0αu,w=αu,w and u+v,w=u,w+v,wu,v=v,u, where the overbar in the latter case corresponds to complex conjugation. The standard inner product on Cn is xy=yx=j=1ny¯jxj. A vector space with an inner product is sometimes called an inner product space or a Euclidean space.

4.1.7.1 The Riesz map

The inner product defines an (anti)linear map (the Riesz map) from vectors v to dual vectors v via vu=u,v. In terms of the Julia code in this chapter,

riesz(dot) = v -> DualVector(u -> dot(u, v))

According to the Riesz representation theorem, in finite-dimensional spaces, this defines a 1-1 correspondence between vectors and dual vectors. For Rn or Cn under the standard inner product, the Riesz map is simply the (conjugate) transpose: [c1c2cn][c¯1c¯2c¯n]. Going beyond single vectors at a time, we can think of the Riesz map as inducing a mapping from colum quasimatrices to row quasimatrices.

In infinite-dimensional inner product spaces, we have a 1-1 correspondence between vectors and continuous dual vectors (which is all dual vectors for a complete inner product space). A (not-necessarily finite-dimensional) Euclidean space that is complete under the Euclidean norm is called a Hilbert space.

4.1.7.2 The Gram matrix

Suppose V is an arbitrary space with an inner product and V is a basis for that space. Then for vectors expressed in that basis, the inner product is Vc,Vd=jvjcj,ividi=i,jd¯icjvj,vi=dMc=c,dM, where M=VV is the Gram matrix whose entries are inner products of basis elements. In Julia code, we have

gram(dot, V) = Symmetric([dot(vj, vi) for vi in V[:], vj in V[:]])

The Gram matrix is Hermitian and (because V is a basis) positive definite. We call the expression dMc the M-inner product between the concrete vectors c and d.

We can also write the Riesz map in terms of V and M. Given wV, the Riesz map is w=VM1(wV); we can verify this based on the expression of the V-inner product as an M-inner product over coefficient vectors: u,w=V1u,M1(wV)=wVMMV1u=wu.

4.1.7.3 Euclidean norms

Every inner product defines a corresponding norm v=v,v. The inner product can also be recovered from the norm. In general, we can expand u+v2=u+v,u+v=u,u+v,u+u,v+v,v=u2+2u,v+v2 Therefore u,v=12(u+v2u2v2)u,v=u,iv=12(u+iv2u2v2). Of course, for real vector spaces only the first equation is needed.

4.1.7.4 Angles and orthogonality

The inner product and the associated norm satisfy the Cauchy-Schwarz inequality |u,v|uv. We define the cosine of the angle between nonzero real vectors u and v to be cosθ=v,wvw. Because of the Cauchy-Schwarz inequality, the cosine is always at most one in absolute value. For a real inner product space, the expansion of u+v2 and the definition of the cosine gives us the law of cosines: for any nonzero u and v, u+v2=u2+2uvcos(θ)+v2. We say two vectors u and v are orthogonal with respect to an inner product if u,v=0. If u and v are orthogonal, we have the Pythagorean theorem: u+v2=u2+v2.

If U is any subspace of an inner product space V, we can define the orthogonal complement U: U={vV:uU,u,v=0}. We can always write V as the direct sum UU. In other words, the orthogonal complement is isomorphic to the quotient space V/U, and in inner product spaces these two are often identified with each other.

Vectors that are mutually orthogonal and have unit length in the Euclidean norm are said to be orthonormal. A basis V for an n-dimensional space V is an orthonormal basis if all its columns are orthonormal. In this case, VV is the n-by-n identity matrix and VV is the identity map on V. That is, for an orthonormal basis quasimatrix, the Riesz map and the dual basis are the same, i.e. U=U1.

4.1.7.5 Einstein notation

For physicists working with Einstein notation on real vector spaces, the Gram matrix (aka the metric tensor) is sometimes written in terms of the components gij=vj,vi; the inverse of the Gram matrix is written with components gij. In this convention, the Riesz map from V to V is associated with using the metric tensor to “raise an index”: vi=gijwj. Conversely, the inverse Riesz map from V to V is associated with using the metric tensor to “lower an index”: wi=gijvj. And the Euclidean norm is written as v2=gijvivj When an orthonormal basis is used, the Gram matrix is just the identity, and raising or lowering indices appears to be a purely formal matter.

4.1.7.6 The L2 inner product

Returning to our example of a vector space of polynomials, the standard L2([1,1]) inner product is p,qL2([1,1])=11p(x)q¯(x)dx. In Julia code, we can write this as

dotL2(p, q) = integrate(p*conj(q), -1, 1)

The Gram matrix for this inner product with the power basis (using zero-based indexing) is aij=11xi1xj1dx={2/(i+j+1),i+j even0, otherwise which we can implement in Julia as

gram_L2_power(d) = [(i+j)%2 == 0 ? 2.0/(i+j+1) : 0.0
                    for i=0:d, j=0:d]

For example, for quadratics we have

gram_L2_power(2)
3×3 Matrix{Float64}:
 2.0       0.0       0.666667
 0.0       0.666667  0.0
 0.666667  0.0       0.4

Hence we can compute 1+x+x2,23x+x2 either via

let
    p = Polynomial([1.0; 1.0; 1.0])
    q = Polynomial([2.0; -3.0; 1.0])
    dotL2(p, q)
end
4.4

or via

[1.0; 1.0; 1.0]'*gram_L2_power(2)*[2.0; -3.0; 1.0]
4.3999999999999995

For the Chebyshev basis, the Gram matrix has entries 11Ti(x)Tj(x)dx. This can also be computed using the relations Tm(x)Tn(x)=12(Tm+n(x)+T|mn|(x))11Tn(x)dx={21n2,n even0,n odd.

function gram_L2_chebyshev(d)
    tint(j) = j%2 == 0 ? 2.0/(1-j^2) : 0.0
    m(i,j) = (tint(i+j) + tint(abs(i-j)))/2.0
    [m(i,j) for i=0:d, j=0:d]
end
gram_L2_chebyshev (generic function with 1 method)

For example, the Gram matrix for T0:4 is

gram_L2_chebyshev(4)
5×5 Matrix{Float64}:
  2.0        0.0       -0.666667   0.0       -0.133333
  0.0        0.666667   0.0       -0.4        0.0
 -0.666667   0.0        0.933333   0.0       -0.361905
  0.0       -0.4        0.0        0.971429   0.0
 -0.133333   0.0       -0.361905   0.0        0.984127

We can sanity check our integration by comparing to another routine:

gram_L2_chebyshev(4)  gram(dotL2, chebyshev_basis(4)[:])
true

For the Legendre basis, the Gram matrix is

gram(dotL2, legendre_basis(4)[:])
5×5 Symmetric{Float64, Matrix{Float64}}:
 2.0  0.0       0.0  0.0       0.0
 0.0  0.666667  0.0  0.0       0.0
 0.0  0.0       0.4  0.0       0.0
 0.0  0.0       0.0  0.285714  0.0
 0.0  0.0       0.0  0.0       0.222222

That is, the Gram matrix for the Legendre basis is diagonal, with diagonal entries 2/1,2/3,2/5,2/7,2/9,. In other words, the Legendre basis vectors are orthogonal with respect to the L2([1,1]) inner product. The normalized Legendre polynomials are 2j+12Pj, or, in code

normalized_legendre_basis(d) =
    transpose([sqrt((2j+1)/2)*Pj
               for (Pj,j) = zip(legendre_basis(d), 0:d)])
normalized_legendre_basis (generic function with 1 method)

and these form an orthonormal basis for the Pd spaces under the L2([1,1]) inner product. Hence, the Gram matrix for a normalized Legendre basis (using the L2([1,1]) inner product) is just the identity:

let
= normalized_legendre_basis(3)
    gram(dotL2, P̄)
end
4×4 Symmetric{Float64, Matrix{Float64}}:
 1.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0
 0.0  0.0  1.0  0.0
 0.0  0.0  0.0  1.0

Using an orthonormal basis U makes it straightforward to evaluate the Riesz map. For a given functional wV the associated wV under the Riesz map is w=U(wU), where (wU) represents the ordinary conjugate transpose of the concrete row vector wU. Then taking the inner produce with w is equivalent to evaluating w, since Uc,w=Uc,U(wU)=(wU)c=w(Uc). In the case of the normalized Legendre polynomials, for example, this gives us a way of computing a vector δx associated with the evaluation functional δx such that δxp=p(x):

function dualL2_δx(x, d)
= normalized_legendre_basis(d)
    c = [P̄j(x) for P̄j in P̄]'
*c[:]
end
dualL2_δx (generic function with 1 method)

For example, evaluating 23x+x2 at x=0.5 gives p(0.5)=p,δ0.5=0.75; in code, we have

let
    p = Polynomial([2.0; -3.0; 1.0])
    δhalf = dualL2_δx(0.5, 2)
    p(0.5)  dotL2(p, δhalf)
end
true

4.2 Maps and forms

A central object in linear algebra is a linear map between two spaces. If V and U are two vector spaces, we denote the vector space of linear maps between them by L(V,U).

Closely associated with linear maps between two vector spaces are certain maps from two vector spaces. A bilinear form is a map a:U×VF which is linear in both arguments. Such a form can be identified with a linear map from V to U by partial evaluation: va(,v). A sesquilinear form is linear in one argument and antilinear (conjugate linear) in the second argument, and can be identified with an antilinear map from V to U. We are generally interested in bilinear forms over real spaces and sesquilinear forms over complex spaces. When we want to talk about real bilinear forms and complex sesquilinear forms at the same time, we will usually just say “sesquilinear forms.”

On inner product spaces, each sesquilinear form a:V×WF is associated with a unique linear map LL(V,W) such that a(v,w)=Lv,wU. That is, sesquilinear forms can be identified with linear maps and vice-versa, and so we will address them together.

A linear map from a space to itself is called a linear operator. Linear operators have enough additional structure that we will deal with them separately.

4.2.1 Dual and adjoint maps

Any linear map LL(V,U) comes with an associated dual map in L(U,V) with the action uuL. If both spaces have inner products, then we compose the dual map with the Riesz map on both sides to get the adjoint LL(U,V) with the action u(uL). Equivalently, the adjoint map has the property that Lv,uU=v,LuV. In concrete spaces with the standard inner product, the adjoint map is represented by the conjugate transpose of the matrix of the original map.

4.2.2 Matrices

4.2.2.1 Matrices for maps

The usual representation of a linear map LL(Fn,Fm) is via an m-by-n matrix AFm×n. Applying the function y=Lx is equivalent to the matrix multiplication y=Ax: yi=j=1naijxj.

More generally, if LL(V,U) and V and U are bases for V and U, then there is an associated matrix representation A=U1LV. The action of the matrix A comes from converting from Fn to V (via V), applying L to get a vector in U, and mapping from U back to the concrete space Fm. Equivalently, we can write the abstract operator as L=UAV1, i.e. we map from V to a concrete vector in Fn, map that to a concrete vector in Fm by matrix multiplication with A, and map Fm to U via the U basis.

If we consider the case where V and U are alternate bases for V and U, there is an associated matrix representation A=U1LV=(U1U)A(V1V) Here the (invertible) matrix V1V represents a change of basis: if v=Vc is a representation in the V basis and v=Vd is a representation in the V basis, then d=V1Vc. Similarly, the matrix U1U converts the coefficients in the U basis to coefficients in the U basis.

4.2.2.2 Matrices for sesquilinear forms

Suppose a:V×WF is a sesquilinear form and V and W are bases for V and W. Then a(Vx,Wy)=j,ixjy¯ia(vj,wi)=yAx where A is the matrix with entries aij=a(vj,wi). In an inner product space where we can write a in terms of a linear operator L, we have a(Vx,Wy)=LVx,WyW=yWLVx, i.e. we can also think of A as WLV.

We note that the matrix representation of the sesquilinear form is not identical to the matrix representation of L as a linear map, except in the special case where we constrain V and W to be orthonormal bases. For more general matrices, the expression WLV involves both the matrix for the linear map (W1LV) and the Gram matrix (M=WW); that is, we can also write yWLVx=y(WW)W1LVx=W1LVx,yM.

4.2.2.3 Representing derivatives

As a concrete example of matrix representations, we consider the derivative operator D:PdPd1. With respect to the power basis for both Pd and Pd1, we write the relation ddxxn=nxn1 as DX0:d=X0:d1A where A=[0123d]. In code, we have

function derivative_power(d)
    A = zeros(d,d+1)
    for i=1:d
        A[i,i+1] = i
    end
    A
end
derivative_power (generic function with 1 method)

Alternately, we can write D=X0:d1AX0:d1 or A=X0:d11DX0:d.

A similar relationship holds for the first and second kind Chebyshev polynomials: ddxTn(x)=nUn1(x), and so if we use the first-kind polynomial basis T0:d for Pd and the second-kind polynomial basis U0:d1 for Pd1, we have DT0:d=U0:d1A where A is the same matrix as before. Using the change of basis, we have DT0:d=T0:d1B1A. In code, the matrix representation T0:d11DT0:d=B1A can be computed by

derivative_chebyshevT(d) =
    change_U_to_T(d-1)\derivative_power(d)
derivative_chebyshevT (generic function with 1 method)

It is always useful to compare against another implementation of the same computation, e.g.

let
    Dmatrix = zeros(5,6)
    DT = derivative.(chebyshev_basis(5))
    for j = 1:length(DT)
        for ci = collect(pairs(convert(ChebyshevT, DT[j])))
            Dmatrix[ci[1]+1,j] = ci[2]
        end
    end
    Dmatrix  derivative_chebyshevT(5)
end
true

Now consider a sesquilinear form on Pd×Pd1 given by a(u,w)=11w¯(x)u(x)dx. If we use the Chebyshev basis for both arguments, we have a(T0:dx,T0:d1y)=yT0:d1DT0:dx=yT0:d1T0:d1B1Ax, that is, the matrix is T0:d1T0:d1B1A. In terms of codes we have written, we have (for d=5)

derivative_chebyshevT_bilinear(d) =
    gram_L2_chebyshev(d-1)*derivative_chebyshevT(d)
derivative_chebyshevT_bilinear (generic function with 1 method)

As before, we can sanity check against an alternate computation

let
    T = chebyshev_basis(5)
    a(p, q) = dotL2(derivative(p), q)

    derivative_chebyshevT_bilinear(5) 
        [a(T[j+1], T[i+1]) for i=0:4, j=0:5]
end
true

With respect to the Legendre basis, one can differentiate the Bonnet recurrence relation for Pn and rearrange to get a recurrence relation for the derivatives: Pn+1(x)=(2n+1)Pn(x)+Pn1(x). Together with the initial conditions P0(x)=0 and P1(x)=P0(x), we have the following expression of D with respect to the Legendre bases for Pd and Pd1, i.e. DP0:d=P0:d1C:

function derivative_legendre(d)
    DP = zeros(d,d+1)
    if d > 0
        DP[1,2] = 1
    end
    for j = 1:d-1
        DP[:,  j+2] = DP[:,j]
        DP[j+1,j+2] = 2j+1
    end
    DP
end
derivative_legendre (generic function with 1 method)

As usual, we sanity check our computation:

derivative.(legendre_basis(3)) ==
    legendre_basis(2)*derivative_legendre(3)
true

If we want the derivative for the normalized Legendre polynomials, it is convenient to write them as P0:dSd where Sd is the diagonal matrix with elements (2j+1)/2 for j=0 to d. Then DP0:dSd=P0:d1Sd1(Sd11CSd). In code, we have

derivative_nlegendre(d) =
    sqrt.(2.0./(2*(0:d-1).+1)) .*
    derivative_legendre(d) .*
    sqrt.((2*(0:d).+1)/2)'
derivative_nlegendre (generic function with 1 method)

Unlike some of our earlier computations, there is a little difference between different approaches to computing the normalized Legendre polynomials, so we check whether the results are close rather than exactly the same:

let
    err = normalized_legendre_basis(2)*derivative_nlegendre(3) -
        derivative.(normalized_legendre_basis(3))
    all(iszero.(truncate.(err, atol=1e-12)))
end
true

4.2.3 Block matrices

Sometimes we are interested in LL(V,U) where the two spaces are direct sums: V=V1V2VnU=U1U2Um. In this case, we can write u=Lv with the subspace decompositions u=u1++um and v=v1++vn as ui=j=1nLijvj where LijL(Vj,Ui). We can write this in block quasimatrix notation as [u1um]=[L11L1nLm1Lmn][v1vn]. Now suppose we have associated partitioned bases V=[V1V2Vn]U=[U1U2Um]. Then the matrix A=U1LV can be thought of as a block matrix with blocks Aij=Ui1LijVj associated with each of the Lij. Concretely, if wi=Uidi and vj=Vjcj for appropriately-sized concrete vectors di and ci, we have [d1dm]=[A11A1nAm1Amn][c1cn].

4.2.4 Canonical forms

A canonical form for a map LL(V,U) is a “simple as possible” matrix representation associated with a particular choice of bases. This also makes sense when V and U are concrete vector spaces, in which case the canonical form is a sort of matrix decomposition.

In general, we will consider two types of canonical forms: those associated with general choices of bases and those associated with orthonormal bases (when V and U are inner product spaces). The canonical forms are essentially the same whether we are working with linear maps or sesquilinear forms; we will focus on the linear map case.

4.2.4.1 Block decomposition

The first step to the canonical forms for a general linear map is to decompose V and U into special direct sums. For V, we decompose into the kernel (or null space) of L, i.e. N(L)={vV:Lv=0}, along with any complementary space: V=V1V2,V2=N(L). For U, we decompose into the range space R(L)={Lv:vV} and any complementary space: U=U1U2,U1=R(L). The null space N(L) and the range R(L) are sometimes also known as the kernel and the image of L. The space U2 is isomorphic to U/R(L), sometimes called the cokernel, while the space V1 is isomorphic to V/N(L), sometimes called the coimage. These four spaces (the kernel, cokernel, range, and corange) play a key role in linear algebra. The rank r is thus the both the dimension of the range and the dimension of the coimage.

With respect to such a decomposition, we have the block quasimatrix L=[L110r×(nr)0(mr)×r0(mr)×(nr)]. We have indicated the dimensions of the zero blocks here for clarity. The structure of this quasimatrix comes directly from the definition of the null space (which guarantees that the second block column should be zero) and the range space (which guarantees that the second block row should be zero). The block L11L(V1,U1) is one-to-one (because all null vectors of L are in the V2 space) and onto (because U1 is the range space). Therefore, L11 is invertible.

The block decomposition of a map is closely connected to the Fredholm alternative theorem. Consider the equations Lx=b, and suppose we decompose our spaces in terms of kernel, cokernel, range, and corange as above to get the quasimatrix equation [L110r×(nr)0(mr)×r0(mr)×(nr)][x1x2]=[b1b2]. Then one of two conditions must hold; the system is either

  • Consistent (b2=0): There is a solution to the system. In fact, if x is any solution (a particular solution), then the set of all possible solutions is {x+v:Lv=0}. more generally, an (nr)-dimensional set of solutions
  • Inconsistent (b20): There is not a solution to the system (b20), but there does exist yU such that yL=0 and yb0.

Without the decomposition of U and V, the statement of the Fredholm alternative (for finite dimensional spaces) is a little mysterious. With the decomposition, it is almost a matter of inspection.

The decomposition of a map based on range and cokernel (for U) and kernel and coimage (for V) works equally well in the infinite-dimensional setting. Of course, in this setting some of the block dimensions will be infinite! But in some cases when the kernel and cokernel are finite-dimensional, we can define the index of L to be the difference between the dimension of the kernel and the dimension of the cokernel (or the codimension of the range space, which is the same thing). In the finite-dimensional picture, this is dim(V2)dim(U2)=(mr)(nr)=mn. In other words, the index generalizes the notion of “how many more equations than variables” there are.

4.2.4.2 General bases

Let V=V1V2 and let U1U2 as above (i.e. U1 is the range space and V2 is the null space). For any basis V1 for V1, the quasimatrix U1=LV1 is a basis for U1. Concatenating any bases V2 and U2 for V2 and U2, we have bases V=[V1V2] for V and U=[U1U2] for U. With respect to these bases, we have the matrix representation U1LV=[Ir×r000]. This matrix representation is completely characterized by the dimensions of V and U and the rank r.

4.2.4.3 Orthonormal bases

In a general vector space, we can use any complementary subspaces to the range and null space of L. But in an inner product space, we prefer to use the orthogonal complements: V1=N(L)coimageV2=N(L)kernel / null spaceU1=R(L)range / imageU2=R(L)cokernel These are sometimes referred to as the “four fundamental subspaces” for a linear map (see Strang ()).

Our canonical form in this setting involves a special choice for the bases V1 and U1. Specifically, we choose unit-length vectors v1,,vr by maximizing Lvj over all vectors in V orthogonal to v1,,vj1. As we will discuss in , the vectors Lv1,,Lvr themselves are then (nonzero) orthogonal vectors. The vectors vj are the right singular vectors, the norms σj=Lvj are the singular values, and the vectors uj=Lvj/σj are the left singular vectors. Concatenating the orthonormal bases V1=[v1vr] and U1=[u1ur] with orthonormal bases V2 and U2 for the orthogonal complements, we have the canonical form ULV=Σ=[Σ11000] where Σ11 is the diagonal matrix with the singular values σ1,,σr on the diagonal. By construction, these singular values are nonzero and are listed in descending order. Frequently, we write this canonical form as the singular value decomposition (SVD) of L: L=UΣV=U1Σ11V1.

4.2.4.4 Decomposed derivatives

We return again to our example of the derivative mapping from Pd to Pd1. For this example, the range space is all of Pd1 and the null space is the constant vectors. Decomposing with respect to the L2([1,1]) inner product, the cokernel and coimage are the empty set and the mean zero vectors. Therefore, we can find general bases in which the matrix is [Id×d0d×1].

More interesting is the singular value decomposition in the L2([1,1]) inner product. Here it is easiest to compute using the matrix representation of the derivative with respect to orthonormal bases like the normalized Legendre polynomials P¯0:d. Then taking C=UΣVT as the singular value decomposition of the matrix representing the derivative, the singular value decomposition of the derivative map is D=(P¯0:d1U)Σ(P¯0:dV)T. That is, the orthonormal basis vectors that diagonalize D are given by U^=P¯0:d1U and P¯0:dV.

The singular values of the derivative mapping from Pd to Pd1 depend on the dimension d of the space. It is illuminating to look at the behavior of these values in ascending order:

let
    dmax = 10

    # Compute the singular values for each degree d
    σs = zeros(dmax,dmax)
    for d = 1:dmax
        F = svd(derivative_nlegendre(d))
        σs[1:d,d] .= F.S[end:-1:1]
    end

    # Draw one line for the ith smallest singular value as a function of d
    p = plot(1:dmax, σs[1,:], legend=:false, marker=:star, xlabel="d")
    for i = 2:dmax
        plot!(i:dmax, σs[i,i:dmax], marker=:star)
    end
    p
end

As we increase d, we get more singular values; but each singular value (ordered from smallest to biggest) appears to fairly quickly converge from above to a limiting value. Indeed, we can make sense of these limits.

As an example, consider the smallest singular value and the associated basis vectors for the derivative acting on P10 with the L2([1,1]) inner product. We expect that regardless of the degree, we should have Dvmin=uminσmin where σmin is the smallest (nonzero) singular value and umin and vmin are the associated singular vectors (columns in the U and V bases). By d=10, we also expect to be close to the limiting behavior: the smallest singular value should converge to π/2, with associated singular vectors vmin(x)cos(xπ/2) and umin(x)sin(xπ/2). This anticipated behavior is borne out in numerical experiment:

let
    # Compute the SVD of D acting on P_10
    d = 10
    F = svd(derivative_nlegendre(d))
    U = normalized_legendre_basis(d-1)*F.U
    V = normalized_legendre_basis(d)*F.V
    σs = F.S

    # Get the singular triple associated with
    # the smallest (nonzero) singular value
    σ, u, v = σs[end], U[end], V[end]

    # Check various relations (use a mesh of 100 points)
    xx = range(-1,1,length=100)
    s = sign(u(0))
    println("""
- |D*u-v*σ| = $(norm((derivative(v)-u*σ).(xx), Inf))
- |sin(π/2*x)-v| = $(norm(s*sin.(π/2*xx)-v.(xx), Inf))
- |cos(π/2*x)-u| = $(norm(s*cos.(π/2*xx)-u.(xx), Inf))
- |σmin-π/2| = $(abs-π/2))
""")
end
- |D*u-v*σ| = 5.076326047387471e-14
- |sin(π/2*x)-v| = 4.919919749379886e-9
- |cos(π/2*x)-u| = 1.3086657329447118e-7
- |σmin-π/2| = 3.552713678800501e-15

In general, the kth smallest singular value converges to kπ/2, and the singular vectors converge to sines and cosines.

4.2.5 (Pseudo)inverses

Suppose LL(V,U) has a canonical form with respect to a general basis of L=U[Ir×r0r×(nr)0(mr)×r0(mr)×(nr)]V1. If rm and rn, then L does not have an inverse. However, we can define an associated pseudoinverse ML(V,U) by M=V[Ir×r0r×(mr)0(nr)×r0(nr)×(mr)]U1. Let PV=ML and PU=LM. In terms of the bases, we have PV=V[Ir×r0r×(nr)0(nr)×r0(nr)×(nr)]V1 and PU=U[Ir×r0r×(mr)0(mr)×r0(mr)×(mr)]U1. Both PV and PU are projectors, i.e. PV2=PV and PU2=PU. The PV projector is the identity on the subspace V1 that is the complementary to the kernel, and the null space is V1. The PU projector is the identity on the range space U1, and the null space is U2. If L is one-to-one, PV is simply the identity; and if L is onto, PU is the identity.

The pseudoinverse depends on the decompositions V=V1N(L) and U=R(L)U2, though it is independent of the details of the bases chosen for the four subspaces. In an inner product space, we most often choose to decompose V and U into orthogonal complements. This gives the Moore-Penrose pseudoinverse (often just called “the pseudoinverse” when the context does not specify some other pseudoinverse). The Moore-Penrose pseudoinverse plays a central role in least squares and minimal norm problems, as we discuss in . In terms of the singular value decomposition, the Moore-Penrose pseudoinverse is L=V[Σ111000]U. For the Moore-Penrose pseudoinverse, the associated projectors are PV=V1V1 and PU=U1U1. These are orthogonal projectors, since their range spaces are orthogonal complements of their null spaces. Projectors that are not orthogonal are said to be oblique projectors.

As an example, the Moore-Penrose pseudoinverse of the derivative operator from Pd to Pd1 is the indefinite integral operator Pd1 to Pd where the constant is chosen to make the result have zero mean. The associated projector on Pd is q(x)q(x)q¯ where q¯=1211q(x)dx.

4.2.6 Norms

The space L(V,U) is a vector space; and, as with other vector spaces, it makes sense to talk about norms. In particular, we frequently want norms that are consistent with vector norms on the range and domain spaces; that is, for any w and v, we want u=LvuLv. Particularly useful are the norms induced by vector norms on V and U: A=supv0Avv=supv=1Av. In a finite-dimensional space, the supremum is achieved, i.e. there exists a maximizing v.

Suppose SL(U,V) and TL(V,W), and we have consistent norms on the three vectors spaces U,V,W and the spaces of linear maps L(U,V) and L(V,W). Then for all uU, consistency implies TSuTSuTSu. Taking the supremum over u=1 implies that the induced norm on L(U,W), we have TSTS. This fact is most often used when the norms on L(U,V) and L(V,W) are also induced norms.

For inner product spaces, we can also define norms in terms of the singular value decomposition. Some examples include

  • The spectral norm (the largest of the singular values)
  • The Frobenius norm (the two-norm of the singular values)
  • The nuclear norm (the sum of the singular values)

All of these are also examples of Ky Fan norms (the Ky Fan k-norm is the sum of the k largest singular values) and Schatten norms (the Schatten norm is the p-norm of the vector of singular values). The spectral norm is the norm induced by the Euclidean structure on the two spaces. All the other Ky Fan and Schatten norms are consistent, though they are not induced norms.

For concrete vector spaces, we can write the Frobenius norm as AF=i,j|aij|2. The induced norms corresponding to the vector 1-norm and -norm are A1=maxji|aij|(max absolute column sum)A=maxij|aij|(max absolute row sum) The norm induced by the standard Euclidean norm (the spectral norm) is harder to compute without appealing to the singular value decomposition.

4.2.7 Isometries

A linear map that preserves norms is called an isometry. Isometries are always one-to-one, since any null vector must have zero norm. If an isometry is also onto, it is sometimes called a global isometry. Global isometries are invertible, and the inverse is also an isometry.

For general norms, there are not always many interesting isometries. For example, the only isometries from Rn to itself preserving the 1-norm or the -norm are (signed) permutations. For inner product spaces, though, there is a much richer group of isometries, and we will focus on this case.

Isometries between inner product spaces necessarily preserve inner products, since the inner product can be recovered from the associated norm. For global isometries between inner product spaces, the adjoint and the inverse are the same. A global isometry from a space to itself is called an orthogonal operator in the real case, or a unitary operator in the complex case.

A unitary operator maps an orthonormal basis to another orthonormal basis. Therefore, if LL(V,U) and W and Z are unitary operator operators on U and V, respectively, then L and WLZ have the same singular values. This also means that any norms that can be defined in terms of singular values (including the spectral norm, the Frobenius norm, and the nuclear norm) are the same for L and WLZ. This properties is known as unitary invariance (or orthogonal invariance in the real case).

4.2.8 Volume scaling

In an inner product space, we usually say that a paralleliped associated with an orthonormal basis (i.e. a unit cube) has unit volume. A unitary operator between two spaces maps a unit cube in one space to a unit cube in another space, i.e. it preserves volume. More generally, if LL(V,U) and V and U are orthonormal bases associated with the singular value canonical form, then we can see L=UΣV as

  • A volume-preserving transformation V from V into Rn (or Cn).

  • A diagonal scaling operation on Rn that change each side length from 1 to σj. The volume of the scaled paralleliped in Rn is then j=1nσj.

  • A volume-preserving transformation U from Rn into V.

Therefore, if ΩV is a region with volume vol(Ω), then vol(LΩ)=(j=1nσj)vol(Ω). In other words, the product of the singular values gives the magnitude of the determinant of L. We will discuss determinants and signed volumes in .

4.3 Operators

An operator is a map LL(V,V). Everything we know about general maps also works for operators, but we have a lot of additional structure coming from the fact that the space is mapped to itself.

Most of our discussion of operators focuses on three connected ideas:

  • A key feature of an operator L is its invariant subspaces, i.e subspaces UV such that LU={Lu:uU} lies within U. Given an invariant subspace, we can analyze the associated restriction L|U of L to U. One-dimensional invariant subspaces play a particular notable role, as they contain the eigenvectors of L; the restrictions are the associated eigenvalues.

  • We can sensibly talk about polynomials in L: if p(z)=j=0dcjzj, then p(L)=j=0dcjLj. Such polynomials are useful for designing and analyzing numerical methods (e.g. Krylov subspace methods for linear systems and eigenvalue problems). They are also useful as a more purely theoretical tool.

In , we will also discuss an alternative approach to analyzing operators by applying the tools of complex analysis to the resolvent R(z)=(zIL)1, a rational function on C whose poles correspond to eigenvalues. This perspective is useful both for computation and theory. It is particularly useful if we want to generalize from the finite-dimensional to the infinite-dimensional setting.

4.3.1 Minimal polynomial

If V is an n-dimensional space, then L(V,V) is an n2-dimensional space, and so there must be a linear dependency between the set of n2+1 operators {Lj}j=0n2. Consequently, there must be an annihilating polynomial of degree at most n2, i.e. a polynomial p such that p(L)=0. Among the annihilating polynomials for L, there must be some monic annihilating polynomial pmin of minimal degree m. We call pmin the minimal polynomial.

If p is any annihilating polynomial, polynomial division lets us write p(z)=q(z)pmin(z)+r(z),deg(r)<m. Because pmin(L)=0, we have p(L)=q(L)pmin(L)+r(L)=r(L), and therefore r=0; otherwise r would be an annihilating polynomial of lower degree than m, contradicting the minimality. Therefore any annihilating polynomial can be written as q(z)pmin(z) for some other polynomial q(z). As a direct consequence, the minimal polynomial is unique. Also as a consequence, the minimal polynomial of L|U for any subspace U divides pmin for L, since pmin for L is an annihilating polynomial for L|U.

We can write the minimal polynomial in factored form (over C) as pmin(z)=j=1s(zλj)dj. Each term LλjI must be singular; otherwise, we would have (LλjI)1pmin(L)=0, contradicting minimality.

The λj are the eigenvalues of L, and the null spaces of each LλjI are comprised of eigenvectors. If dj=1 for each eigenvalue, then there is a basis of eigenvectors; otherwise, we have to consider generalized eigenvectors as well to get a basis. In either case, the space V can be decomposed into a direct sum of invariant subspaces N((LλjI)dj); these are the maximal invariant subspaces associated with the eigenvalues.

If we let mj=dimN((LλjI)dj), we can also define the characteristic polynomial pchar(z)=j=1s(zλj)mj. The dimension mj is the algebraic multiplicity of the eigenvalue λj. The geometric multiplicity is the dimension of N(LλjI), i.e. the number of independent eigenvectors for λj.

The characteristic polynomial is often also written as pchar(z)=det(zIL), but writing the characteristic polynomial in terms of determinants has the disadvantage that we need an appropriate definition of determinants first!

4.3.2 Canonical forms

As in the case of maps between different spaces, we are interested in canonical matrix representations for operators LL(V,V), i.e. choices of bases that make the matrix representation “as simple as possible.” We consider two such forms: one involving an arbitrary basis, and the other involving an orthonormal basis.

4.3.2.1 Jordan form

As we noted above, the space V can be written as V=V1V2Vs, where Vs is the maximal invariant subspace associated with the eigenvalue λs, i.e. Vj=N((LλjI)dj). With respect to this partitioning of V, we have the block diagonal quasimatrix representation [L11L22Lss] where Ljj represents the restriction L|Vj. With a choice of bases for each of the Vj, this leads to a block diagonal matrix representation.

When the minimal polynomial exponent dj is 1, we have that Vj=N(LλjI), and so L|Vj=λjI. Hence, if there are no repeated zeros of the minimal polynomial, we can choose bases Vj for each subspace Vj, and each will satisfy LVj=Vj(λjI). Putting these together, we have the concatenated basis V=[V1V2Vs] satisfying the relationships LV=VΛ where Λ is a diagonal matrix in which each eigenvalue λj appears dimVj times in a row. With respect to the eigenvector basis V, we therefore have the canonical matrix representation Λ=V1LV. Hence, such operators are diagonalizable.

When the minimal polynomial has zeros with multiplicity greater than one, the matrix is not diagonalizable. In this case, we cannot form a basis of eigenvectors, but we can form a basis if we include generalized eigenvectors. In the Jordan canonical form, we find bases consisting of Jordan chains, i.e. vectors satisfying (LλjI)u1=0(LλjI)uk=uk1,1<kb. Rewriting the second expression as Luk=uk1+λjuk and defining the quasimatrix U=[u1ub], we can summarize with the matrix equation LU=UJb×b(λ) where Jb×b(λ)[λj1λj1λj1λj] is a b-dimensional Jordan block. The maximum size of any Jordan block is equal to the multiplicity dj of λj in the minimal polynomial; the number of Jordan blocks for λj is equal to the geometric multiplicity dimN(LλjI); and the sum of the sizes of the Jordan blocks is equal to the algebraic multiplicity dimN((LλjI)dj). Taking a basis formed from concatenating such Jordan chains for each space, we have the Jordan canonical form, a block diagonal matrix in which each diagonal block is a Jordan block.

The Jordan form is fragile: almost all operators are diagonalizable and have distinct eigenvalues, and infinitesimally small changes will usually completely destroy any nontrival Jordan blocks. Moreover, diagonalizable operators that are close to operators with nontrivial Jordan blocks may have eigenvector bases, but those eigenvector bases are not all that nice. We therefore would like an alternate canonical form that gives up some of the goal of being “as diagonal as possible.”

4.3.2.2 Schur form

In our discussion of the Jordan form, we started with the decomposition of V into maximal invariant subspaces associated with each eigenvalue. For the Schur form, we decompose V in terms of nested invariant subspaces: Wk=j=1kVj. Assuming V is an inner product space, we can also write Wk=j=1kUj where U1=V1 and otherwise Uj is the orthogonal complement of Wj1 in Wj. With respect to the decomposition V=j=1sUj, we have the block upper triangular quasimatrix representation [L11L12L1sL22L2sLss] With an appropriate choice of orthonormal basis Uj for each Uj and setting U=[U1U2Us], we have ULU=T where T is an upper triangular matrix for which the eigenvalues are on the diagonal. This is a (complex) Schur canonical form.

The complex Schur form may in general involve complex choices of the basis U and a complex matrix representation T, even if V is most naturally treated as a vector space over the reals. However, there is a real Schur form which involves a real orthonormal basis and a block upper triangular matrix T where the diagonal blocks are 1-by-1 (for real eigenvalues) or 2-by-2 (for conjugate pairs of eigenvalues).

4.3.3 Similar matrices

So far, we have focused on matrices as representations of operators on abstract vector spaces, where we choose one basis V for the space V: A=V1LV. If we consider an alternate basis V=VX for an invertible matrix X, we have the associated matrix representation A=V1LV=(V1V)1A(V1V)=X1AX. The transformation AX1AX is a similarity transformation, and the matrices A and A are said to be similar.

The properties we have discussed so far (eigenvalues, the minimal and characteristic polynomial, the Jordan form) are fundamentally properties of operators, and do not depend on the basis in which those operators are expressed. As much as possible, we try to make this clear by giving basis-free explanations when possible. But it is also possible to give matrix-based arguments in which we argue for invariance under similarity transformations. For example, if A^=X1AX, then A^j=(X1AX)j=X1AjX and, more generally, p(A^)=X1p(A)X. Therefore, any annihilating polynomial for p(A) is also an annihilating polynomial for A^; and the minimal and characteristic polynomials for A will similarly be the minimal and characteristic polynomials for A^.

4.3.4 Trace

We typically describe the trace in terms of a matrix representation: the trace of a matrix is the sum of its diagonal entries, i.e. tr(A)=i=1naii. The trace satisfies the cyclic property that for any BFm×n and CFn×m we have tr(AB)=i=1mj=1nbijcji=j=1ni=1mcjibij=tr(BA). Using this property, for any invertible XFn×n we have tr(X1AX)=tr(AXX1)=tr(A). That is, the trace is invariant under similarity, and is thus a property of the underlyling linear operator and not just of a particular matrix representation. In particular, if X1AX=J is a Jordan canonical form, we can see that the trace is the sum of the eigenvalues (counting geometric multiplicity).

4.3.5 Frobenius inner product

The trace is also useful for defining other structures on spaces of linear maps in a basis-free way. Let LL(V,U) be a linear map between two inner product spaces, and let L=UΣV be the singular value decomposition. Then tr(LL)=tr(VΣ2V)=tr(Σ2VV)=tr(Σ2)=j=1nσj2=LF2. where LF2 is the squared Frobenius norm. Also, if B is any matrix representation of L with respect to orthonormal bases, then we have LF2=BF2=tr(BB)=i,j|bij|2, so it is very simple to compute the Frobenius norm of L given any matrix representation with respect to orthonormal bases.

More generally, the trace can be used to define the Frobenius inner product on the space L(V,W) via L,MF=tr(ML). The Frobenius norm is the standard Euclidean norm associated with the Frobenius inner product.

4.3.6 Hermitian and skew

If LL(V,V), then L=H+S,H=12(L+L),S=12(LL).

The operators H and S are known as the Hermitian and skew-Hermitian parts of L. We note that and H,SF=0, so by the Pythagorean theorem, LF2=HF2+SF2. More generally, L(V,V) is a direct sum of Hermitian and skew-Hermitian subspaces, and these spaces are orthogonal complements of each other.

Any operator L has a Schur canonical form ULU=T where T is an upper triangular matrix. For a Hermitian matrix H, we have T=UHU=(UHU)=T, which implies that T must be diagonal (since the subdiagonal elements of T are zero and the superdiagonal elements of T are zero), and that the diagonal part of T must be real (since tii=tii). Similarly, for a skew-Hermitian matrix S, we have T=USU=(USU)=T, which simplies that in this case T is diagonal and the diagonal part is imaginary. Hence, the decomposition of a matrix into Hermitian and skew-Hermitian parts in many ways mirrors the decomposition of a complex number into real and imaginary parts.

There is an even stronger number between the Hermitian / skew-Hermitian decomposition and the complex numbers for the case of R2×2. Let g:CR2×2 be the map g(a+bi)=aI+bJ,J=[0110]. Then g(0)=0g(1)=Ig(z+w)=g(z)+g(w)g(zw)=g(z)g(w)g(z1)=g(z)1. That is, the complex matrices are isomorphic to the subset of the 2-by-2 real matrix where the symmetric part is proportional to the identity.

4.4 Hermitian and quadratic forms

A sesquilinear form a:V×VF is Hermitian if a(v,w)=a(w,v).
If a(v,w)=a(w,v), the form is called skew-Hermitian. Any sesquilinear form on V×V can be decomposed into Hermitian and skew-Hermitian parts: a(v,w)=aH(v,w)+aS(v,w)aH(v,w)=12(a(v,w)+a(w,v))aS(v,w)=12(a(v,w)a(w,v)). We use the terms Hermitian and skew-Hermitian generically to also refer to symmetric and skew-symmetric forms on real vector spaces.

If V is an inner product space, then for any sesquilinear form a:V×VF, there is an associated operator L:VV such that a(v,w)=Lv,w. When a is Hermitian, we have Lv,w=a(v,w)=a(w,v)=v,Lw, i.e. L=L is a self-adjoint operator. A similar argument says that if a is skew-Hermitian, the corresponding operator will satisfy L=L. Hence, the study of Hermitian forms is closely linked to the study of self-adjoint operators.

A quadratic form is a function ϕ:VF that is homogeneous of degree 2 (i.e. ϕ(αv)=α2ϕ(v)) and such that (u,v)ϕ(u+v)ϕ(u)ϕ(v) is bilinear. The associated bilinear form is always symmetric. Hence, on real vector spaces the study of Hermitian forms is also closely linked to the study of quadratic forms.

4.4.1 Matrices

Given a basis V for V, the standard matrix representation of a sesquilinear form on V×V is a square matrix with entries aij=a(vj,vi) so that a(Vc,Vd)=dAc. Decomposing the form into Hermitian and skew-Hermitian parts corresponds to decomposing A=H+S where H=H and S=S are the Hermitian and skew-Hermitian parts of the matrix.

A quadratic form ϕ has an associated symmetric bilinear form a(u,v)=12(ϕ(u+v)ϕ(u)ϕ(v)). The matrix representation for a quadratic form is ϕ(Vc)=cTAc where the entries of A are aij=a(vi,vj). Conversely, for any real bilinear form, a, we have that ϕ(v):=a(v,v) is a quadratic form, and ϕ(Vc)=cTHc where H is the matrix for the symmetric part of a.

Let A be a matrix representing a sesquilinear form a:V×VF with respect to a basis V, i.e. a(Vc,Vd)=dAv. We can write any other basis as V^=VX for some nonsingular X; with respect to the V^ basis, we have a(V^cV^d)=a(VXc,VXd)=d(XAX)c. For any nonsingular X, we say A and A^=XAX are congruent. Just as similar matrices represent the same operator under different bases, congruent matrices represent the same sesquilinear form under different bases. When we restrict our attention to orthonormal bases, the matrix X will be unitary; we note that X=X1 for unitary X, so a unitary congruence and a unitary similarity are the same thing.

4.4.2 Canonical forms

In the case of a general basis, there exists is a basis such that the matrix representation of a Hermitian form is [Iν+0ν0Iν] The triple (ν+,ν0,ν) is the inertia of the form. Inertia, like rank, is a property of the linear algebraic object (in this case the Hermitian form) rather than its representation in any specific basis. Let n=dim(V); then we classify the form based on its inertia as:

  • Positive definite if ν=(n,0,0)
  • Positive semidefinite if ν=(r,nr,0)
  • Negative definite if ν=(0,0,n)
  • Negative semidefinite if ν(0,nr,r)
  • Strongly indefinite if ν+>0 and ν>0.

A Hermitian positive definite form is an inner product.

If V is an inner product space and we restrict ourselves to orthonormal bases, we have a diagonal matrix Λ with ν+ positive values, ν0 zeros, and ν negative values. If we identify a(v,w) with Lv,w, then this canonical form gives that a(vi,vj)=Lvi,vj=λiδij. The Hermitian property implies that a(vi,vi)=a(vi,vi)=λ, so λ must be real even when the form is over C. Given the orthonormality of the basis, this means Lvj=vjλj, i.e. the canonical decomposition of the Hermitian form is really the same as the eigenvalue decomposition of the associated self-adjoint operator L.

The Hermitian eigenvalue problem has an enormous amount of structure that comes from the fact that it can be interpreted either in terms of a Hermitian form or the associated quadratic or in terms of an operator.

4.4.3 A derivative example

As before, it is useful to consider a specific example that does not involve a concrete vector space. We will use the real symmetric form on Pd given by a(p,q)=dpdx,dqdx using the L2([1,1]) inner product between polynomials. Integration by parts implies that a(p,q)=11dpdxdqdxdx=dpdxq|1111d2pdx2qdx=δ1δ1dpdxδ1δ1dpdxd2pdx2,q, where δ1 and δ1 are the evaluation functionals at ±1 and δ1 and δ1 are the associated vectors under the Riesz map. Hence, the self-adjoint linear map associated with the form is L=δ1δ1ddxδ1δ1ddxd2dx2. We can check ourselves by implementing this relation in code

let
    # Implement the L map
    d = 2
    δp1 = dualL2_δx( 1.0, d)
    δn1 = dualL2_δx(-1.0, d)
    function L(p)
        Dp = derivative(p)
        D2p = derivative(Dp)
        δp1*Dp(1) - δn1*Dp(-1) - D2p
    end

    # Test polynomials p and q
    p = Polynomial([2.0; -3.0; 1.0])
    q = Polynomial([1.0; 1.0; 1.0])

    # Test agreement with the form and self-adjointness
    dotL2(derivative(p), derivative(q))  dotL2(L(p), q),
    dotL2(L(p), q)  dotL2(p, L(q))
end
(true, true)

The form is positive semidefinite, since a(1,1)=0 but a cannot ever by negative (by positive definiteness of the inner product); the inertia is (d,1,0). While we can compute the eigenvalues in terms of a matrix derived from L, it is simpler to compute the matrix directly from the form:

let
    d = 3

    # Compute the matrix in terms of the normalized Legendre polynomials
= normalized_legendre_basis(d)
    AP̄ = gram(dotL2, derivative.(P̄))

    # Compute the eigendecomposition
    F = eigen(AP̄)
    U =*F.vectors

    # Check orthonormality of the U basis,
    # and that the associated matrix is Λ
    gram(dotL2, U)  I,
    gram(dotL2, derivative.(U))  Diagonal(F.values)
end
(true, true)

Similar to what we saw in our discussion of the singular value decomposition, the kth nonzero eigenvalue converges as a function of d to a limiting value of (kπ/2)2. We plot the convergence of the first few eigenvalues below:

let
    d = 15
= normalized_legendre_basis(d)
    AP̄ = gram(dotL2, derivative.(P̄))

    p = plot()
    for j=2:5
        λs = [eigvals(AP̄[1:n,1:n])[j] for n=j:d+1]
        plot!(j:d+1, abs.(λs.-((j-1)*π/2)^2),
              yscale=:log10, label="λ[$j]")
    end
    p
end

4.5 Tensors and determinants

So far, we have dealt with linear, bilinear, sesquilinear, and quadratic functions on vector spaces. Going beyond this to functions that are linear in several arguments takes us into the land of tensors.

4.5.1 Takes on tensors

There are several ways to approach tensors. In much of numerical analysis, only tensors of concrete spaces are considered, and tensors are treated as arrays of multi-dimensional arrays of numbers (with matrices as a special case where there are two dimensions used for indexing). For example, a tensor with three indices is represented by an element aijk of Fm×n×p. We will see this perspective in detail later, both in our discussion of numerical linear algebra and in our discussion of tensors in latent factor models.

Here, though, we will focus on tensors as basic objects of multilinear algebra. There are still several ways to approach this:

  • We can consider tensors purely in terms of bases, along with change-of-basis rules. This view, common in some corners of physics, says that “a tensor is an object that transforms like a tensor,” a perspective similar to “a vector is an object that transforms like a vector” that we mentioned earlier. We will prefer a discussion that gives changes of basis as a consequence of a more fundamental definition, rather than as the definition itself.

  • We can consider tensors from an algebraic perspective as a quotient space: the tensor product space VW is a vector space spanned by elements of the Cartesian product V×W, modulo the equivalence relations α(v,w)(αv,w)(v,αw)(u+v,w)(u,w)+(v,w)(u,v+w)(u,v)+(u,w) We write vw to denote the equivalence class associated with (v,w). While this definition has the advantage of being very general (it can generalize to structures other than vector spaces) and it does not depending on a basis, it can be a bit overly abstract.

  • We will mostly consider tensors as multilinear forms, generalizing the notion of bilinear forms discussed above. This has the advantage of extending our discussion of bilinear and sesquilinear forms from earlier, remaining abstracted away from a specific basis while not too abstract.

4.5.2 Multilinear forms

Let V and U be two vector spaces. For any uU and wV, the outer product uw is associated with a rank 1 map in L(V,U) given by vuwv. Alternately, we can see uw as a bilinear map from U×VF given by (f,v)fuwv; or we can see uw as a dual map in L(U,V) given by ffuw. We will use the uniform notation uw to cover all of these possible interpretations of the outer product.

More generally, let V1,V2,,Vd be vector spaces. Then for any vectors v1,v2,,vd in these spaces, the tensor product v1v2vd can be interpreted as the multilinear map from V1×V2×VdF given by (v1v2vd)(w1,w2,,wd)=(w1v1)(w2vd)(wdvd). As in the case of just two vector spaces, we can define other maps by partial evaluation. For example, we can consider this as a map from V2××VdV1 by (wd,,wd)a(,w2,,wd) where “filling in” all but the first argument gives an element of V1 (or technically of V1, but this is isomorphic to V1).

The vector space of all multilinear forms V1×VdF is written as V1Vd. If Vj has a basis V(j) and associated dual basis (W(j)), then an arbitrary element aV1Vd can be written uniquely as a=i1,,idai1idvi1(1)vid(d). where ai1id=a(wi1(1),wid(d)). That is, the set of all tensor products of basis vectors for the spaces Vj form a basis for the tensor product space V1Vd.

In terms of the partial evaluation operation discussed above, we might consider a as a map from V2Vd to V1 by taking b=i2,,idbi2idwi2(2)wid(d)i2,,idbi2ida(,wi2(2),wid(d))=i1,i2,,idvi1(1)ai1idbi2id. This generalized partial evaluation operation is known as a tensor contraction. In general, we can contract two tensors on any pairs of “slots” in the form that take vectors from a dual pair of spaces.

When a real inner product space V appears in a tensor, we can construct a new tensor in which V appears in the same slot by composing that slot with the inverse Riesz map (“lowering the index” in the physics parlance). Similarly, we can convert a V slot to a V slot via the Riesz map (“raising the index”). When a tensor is represented with respect to orthonormal bases, raising or lowering operations do not change any of the coefficients.

It is tempting to ask if the coefficients might be made simple by a special choices of the bases for the component spaces, analogous to the anonical forms we discussed before. Unfortunately, tensors beyond bilinear forms lack any nice canonical representation.

4.5.3 Polynomial products

As an example of tensor product spaces in action, we consider the space of polynomials in three variables with maximum degree per variable of d. In the power basis, we would write this as PdPdPd={i,j,kcijkxiyjzk:cijkRd×d×d}. Evaluating a polynomial pPdPdPd at a particular point (α,β,γ) is associated with evaluation of a trilinear form, or equivalently with contraction against a tensor in PdPdPd; i.e. p(δαδβδγ) We can similarly think about contracting in some slots but not others, or contracting against other linear functionals (like integration).

For concrete tensors, contractions can also be used to express operations like changing basis. For example, if T0:dA=X0:d, we can write p(x,y,z)=i,j,kcijkxiyjzk=l,m,ndlmnTl(x)Tm(y)Tn(z)dlmn=i,j,kcijkaliajmakn That is, we change bases by applying A on each “fiber” of the coefficient tensor (i.e. a vector associated with holding two indices fixed and letting the other vary). Indeed, the transformation from the cijk to the dlmn coefficients is most easily coded in terms of such a matrix operation:

let
    d = 2
    C = rand(d+1,d+1,d+1)  # Power coeffs

    # Transform to the Chebyshev coefficients
    D = copy(C)
    M = chebyshev_power_coeffs(d)  # A = inv(M)

    # Contract with a_li
    for j=1:d+1
        for k = 1:d+1
            D[:,j,k] = M\D[:,j,k]
        end
    end

    # Contract with a_mj
    for i=1:d+1
        for k=1:d+1
            D[i,:,k] = M\D[i,:,k]
        end
    end

    # Contract with a_nk
    for i=1:d+1
        for j=1:d+1
            D[i,j,:] = M\D[i,j,:]
        end
    end

    # Check correctness by comparing evaluations a random point
    T = chebyshev_basis(d)
    x,y,z = rand(3)
    p1 = 0.0
    p2 = 0.0
    for i=1:d+1
        for j=1:d+1
            for k=1:d+1
                p1 += C[i,j,k] * x^(i-1) * y^(j-1) * z^(k-1)
                p2 += D[i,j,k] * T[i](x) * T[j](y) * T[k](z)
            end
        end
    end
    p1  p2
end
true

4.5.4 Alternating forms

An alternating form is a multilinear form on several copies of the same vector space V with the property that it is zero if any input is repeated. This implies that swapping the order of any pair of arguments reverses the sign; for example, if a:V×V×VF is a multilinear form, then a(u,v,w)+a(w,v,u)=a(u,v,w)+a(u,v,u)+a(w,v,w)+a(w,v,u)=a(u,v,w+u)+a(w,v,w+u)=a(u+w,v,w+u)=0. The set of all alternating forms on k copies of V is written as VV, a subspace of VV.

The wedge product of two tensors over copies of the same space is fg=fggf. Alternating forms of a given number of arguments also form a vector space. For example, VVV has a basis of vectors wiwjwk for each possible subset {i,j,k} of three distinct indices from the range 1,,n. Hence, it has dimension n choose 3.

If V is n-dimensional, the space of n copies of V wedged together has n choose n dimensions — that is, there is only one alternating form on n inputs, up to scaling. We arbitrarily choose one such form by picking a basis V and letting a(v1,v2,,vn)=1 and declaring this to be the signed volume form. In most situations, V is an inner product space and we choose V to be an orthonormal basis.

4.5.5 Determinants

Let fVV be the signed volume form for a space V. Rather than writing evaluation of f as f(v1,v2,,vn), we will collect the arguments into a quasimatrix and write f(V) for the volume associated with a parallelipiped whose edges are vectors V.

If U is a basis such that f(U)=1, then we can write any other set of n vectors as UA for some matrix A. The determinant of A is the function such that f(UA)=f(U)det(A)=det(A). Put differently, the determinant represents the change of volume in a mapping L=UAU1 represented with respect to a basis quasimatrix U corresponding to a parallelipiped of unit (signed) volume. In order to satisfy this definition, the determinant must be

  • An alternating form on the concrete space FnFn (written in terms of matrices Fn×n)

  • Equal to one for the identity matrix.

The interpretation of the determinant as representing a scaling of (signed) volumes explains various other properties, such as the important fact that det(AB)=det(A)det(B). It also explains why determinants arise in the change-of-variables formula for integration, which is one of the main places where we will see them.

The determinant of a singular matrix must be zero, which explains why we can write the characteristic polynomial for a matrix as det(zIA). However, we prefer to treat both determinants and characteristic polynomials as interesting objects in their own right, and the connection through the determinant as a derived fact.


  1. “Vector spaces are like potato chips: you cannot have just one.” - W. Kahan↩︎

  2. In infinite-dimensional settings, we will restrict our attention to continuous linear functionals (the continuous dual space) rather than all linear functionals (the algebraic dual space).↩︎

  3. The set containing only the zero vector is the smallest subspace we ever talk about.↩︎

  4. “Caveat lector”: let the reader beware!↩︎

  5. Pafnuty Chebyshev was a nineteenth century Russian mathematician, and his name has been transliterated from the Cyrillic alphabet into the Latin alphabet in several different ways. We inherit our usual spelling from one of the French transliterations, but the symbol T for the polynomials comes from the German transliteration Tschebyscheff.↩︎

  6. Some authors write L(V,U) for the set of bounded linear maps between two spaces. This distinction matters when we are dealing with infinite-dimensional normed vector spaces — since we are mostly concerned with the finite-dimensional case, we will not worry about it.↩︎

  7. In the absence of an inner product, the coimage is most naturally thought of as a subspace of V and the cokernel as a subspace of W. The Riesz map allows us to move these back to subspaces of V and W.↩︎

  8. The notion of an induced norm makes sense in infinite-dimensional spaces as well, but there the supremum may not be achieved.↩︎

  9. A monic polynomial is one in which the coefficient in front of the highest-degree term is 1.↩︎

  10. There is a canonical injection VV via v(wwv). In finite-dimensional spaces, this is alway also surjective; in infinite-dimensional spaces, it is only surjective when the space is reflexive.↩︎