Notebook for 2022-03-27

This lecture is mostly to remind you about some relevant calculus – but it is helpful to be able to sanity check your calculus numerically, so let's do a notebook of examples and finite difference checks.

md"""

# Notebook for 2022-03-27

This lecture is mostly to remind you about some relevant calculus -- but it is helpful to be able to sanity check your calculus numerically, so let's do a notebook of examples and finite difference checks.

"""

30.2 ms

using LinearAlgebra

272 μs

using Plots

7.7 s

Directional derivatives and finite differences

Let's consider a concrete example of $f : \mathbb{R}^2 \rightarrow \mathbb{R}^2$

$$f(x) = \begin{bmatrix} x_1 + 2x_2 - 2 \\ x_1^2 + 4 x_2^2 - 4 \end{bmatrix}$$

md"""

## Directional derivatives and finite differences

Let's consider a concrete example of $f : \mathbb{R}^2 \rightarrow \mathbb{R}^2$

$$f(x) = \begin{bmatrix} x_1 + 2x_2 - 2 \\ x_1^2 + 4 x_2^2 - 4 \end{bmatrix}$$

"""

147 μs

ftest (generic function with 1 method)

ftest(x) = [x[1] + 2*x[2] - 2; x[1]^2 + 4*x[2]^2 - 4]

450 μs

Now define $g(s) = f(x_0+su)$ for some randomly chosen $x_0$ and direction $u$

md"""

Now define $g(s) = f(x_0+su)$ for some randomly chosen $x_0$ and direction $u$

"""

118 μs

g (generic function with 1 method)

begin

x0 = randn(2)

u = randn(2)

g(s :: Float64) = ftest(x0 + s*u)

end

54.6 ms

begin

xx = range(-3.0, 3.0, length=100)

ss = range(-1.0, 1.0, length=100)

ls = 10.0/norm(u)

# Plot the second component of f vs x, y

p1 = plot(xx, xx, (x,y) -> ftest([x; y])[2], st=:contourf)

plot!([x0[1]-ls*u[1], x0[1]+ls*u[1]],

[x0[2]-ls*u[2], x0[2]+ls*u[2]], xlims=(-3, 3), ylims=(-3, 3),

label="x0 + su")

plot!([x0[1]], [x0[2]], marker=true, label="x0")

# Plot cross-sectional bit

p2 = plot(ss, [g(s)[2] for s in ss], label="g(s)")

plot(p1, p2, layout=(2,1))

end

3.2 s

We compute the derivative $g'(0)$ both analytically and using a finite difference estimate. So far, we have used one-sided finite differences, but it is actually a little more accurate to use symmetric finite differences:

$$g'(s) = \frac{g(s+h)-g(s-h)}{2h} + O(h^2)$$

md"""

$$g'(s) = \frac{g(s+h)-g(s-h)}{2h} + O(h^2)$$

"""

126 μs

2.5452209356438296e-11

begin

# Compute g'(0) analytically

dg0 = [u[1] + 2*u[2]; 2*x0[1]*u[1] + 8*x0[2]*u[2]]

# Estimate g'(0) by a symmetric finite difference computation

dg0_fd = (g(1e-6)-g(-1e-6))/2e-6

# Compute the relative error between the two computations

norm(dg0-dg0_fd)/norm(dg0)

end

45.1 ms

We can also compute second derivatives analytically or by computing second derivatives. We could do this by finite differencing the first derivative, or by the formula

$$g''(s) = \frac{g(s-h) -2 g(s) + g(s+h)}{h^2} + O(h^2).$$

md"""

We can also compute second derivatives analytically or by computing second derivatives. We could do this by finite differencing the first derivative, or by the formula

$$g''(s) = \frac{g(s-h) -2 g(s) + g(s+h)}{h^2} + O(h^2).$$

"""

117 μs

1.3830592783544882e-9

begin

d2g0 = [0; 2*u[1]^2 + 8*u[2]^2]

d2g0_fd = (g(-1e-4)-2*g(0.)+g(1e-4))/1e-8

norm(d2g0-d2g0_fd)/norm(d2g0)

end

27.6 ms

The second-order Taylor series approximation to $g(s)$ lies directly atop the true approximation (as it should in this case – can you see why?).

md"""

The second-order Taylor series approximation to $g(s)$ lies directly atop the true approximation (as it should in this case -- can you see why?).

"""

121 μs

begin

gtaylor(s) = g(0.) + (dg0 + d2g0/2*s)*s

plot(ss, [g(s)[2] for s in ss], label="Original")

plot!(ss, [gtaylor(s)[2] for s in ss], label="Taylor", style=:dash)

end

128 ms

Derivatives, approximation, and chain rule

The function $f$ is differentiable at $x$ if there is a good affine (constant plus linear) approximation

$$f(x) + f'(x)z + o(\|z\|)$$

where the Jacobian $f'(x)$ (also written $J(x)$ or $\frac{\partial f}{\partial x})$ is the $m \times n$ matrix whose $(i,j)$ entry is the partial derivative $f_{i,j} = \partial f_i/\partial x_j$. If $f$ is differentiable, the Jacobian matrix maps directions to directional derivatives.

md"""

## Derivatives, approximation, and chain rule

The function $f$ is differentiable at $x$ if there is a good affine (constant plus linear) approximation

$$f(x) + f'(x)z + o(\|z\|)$$

where the *Jacobian* $f'(x)$ (also written $J(x)$ or $\frac{\partial f}{\partial x})$ is the $m \times n$ matrix whose $(i,j)$ entry is the partial derivative $f_{i,j} = \partial f_i/\partial x_j$. If $f$ is differentiable, the Jacobian matrix maps directions to directional derivatives.

"""

18.3 ms

1.1509785999592623e-16

begin

# Jacobian matrix for our test function

dftest(x) = [1.0 2.0 ; 2*x[1] 8*x[2]]

# Sanity check that we get the right behavior in the u direction

J0u = dftest(x0)*u

norm(J0u-dg0)/norm(dg0)

end

47.4 ms

The chain rule is just about matrix multiplication: the derivative of a composition is the composition of derivatives. As an example of this, consider the behavior of $f$ on a circle: $h(\theta) = f(g(\theta))$.

md"""

"""

121 μs

6.603894155300888e-9

begin

# Parameterization of points on a circle and its Jacobian

gcirc(θ :: Float64) = [cos(θ); sin(θ)]

dgcirc(θ :: Float64) = [-sin(θ); cos(θ)]

# Composition and its derivative by chain rule

htest(θ) = ftest(gcirc(θ))

dhtest(θ) = dftest(gcirc(θ)) * dgcirc(θ)

# Derivative estimate by symmetric finite differences

dhtest_fd(θ; Δθ=1e-4) = (htest(θ+Δθ)-htest(θ-Δθ))/(2*Δθ)

# Compare vs finite differences

norm(dhtest(1.23)-dhtest_fd(1.23))/norm(dhtest(1.23))

end

46.0 ms