Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

Notebook for 2022-03-27

This lecture is mostly to remind you about some relevant calculus – but it is helpful to be able to sanity check your calculus numerically, so let's do a notebook of examples and finite difference checks.

md"""
# Notebook for 2022-03-27

This lecture is mostly to remind you about some relevant calculus -- but it is helpful to be able to sanity check your calculus numerically, so let's do a notebook of examples and finite difference checks.
"""
30.2 ms
using LinearAlgebra
272 μs
using Plots
7.7 s

Directional derivatives and finite differences

Let's consider a concrete example of $f : \mathbb{R}^2 \rightarrow \mathbb{R}^2$

$$f(x) = \begin{bmatrix} x_1 + 2x_2 - 2 \\ x_1^2 + 4 x_2^2 - 4 \end{bmatrix}$$

md"""
## Directional derivatives and finite differences

Let's consider a concrete example of $f : \mathbb{R}^2 \rightarrow \mathbb{R}^2$

$$f(x) = \begin{bmatrix} x_1 + 2x_2 - 2 \\ x_1^2 + 4 x_2^2 - 4 \end{bmatrix}$$
"""
147 μs
ftest (generic function with 1 method)
ftest(x) = [x[1] + 2*x[2] - 2; x[1]^2 + 4*x[2]^2 - 4]
450 μs

Now define $g(s) = f(x_0+su)$ for some randomly chosen $x_0$ and direction $u$

md"""
Now define $g(s) = f(x_0+su)$ for some randomly chosen $x_0$ and direction $u$
"""
118 μs
g (generic function with 1 method)
begin
x0 = randn(2)
u = randn(2)
g(s :: Float64) = ftest(x0 + s*u)
end
54.6 ms
begin
xx = range(-3.0, 3.0, length=100)
ss = range(-1.0, 1.0, length=100)
ls = 10.0/norm(u)

# Plot the second component of f vs x, y
p1 = plot(xx, xx, (x,y) -> ftest([x; y])[2], st=:contourf)
plot!([x0[1]-ls*u[1], x0[1]+ls*u[1]],
[x0[2]-ls*u[2], x0[2]+ls*u[2]], xlims=(-3, 3), ylims=(-3, 3),
label="x0 + su")
plot!([x0[1]], [x0[2]], marker=true, label="x0")

# Plot cross-sectional bit
p2 = plot(ss, [g(s)[2] for s in ss], label="g(s)")

plot(p1, p2, layout=(2,1))
end
3.2 s

We compute the derivative $g'(0)$ both analytically and using a finite difference estimate. So far, we have used one-sided finite differences, but it is actually a little more accurate to use symmetric finite differences:

$$g'(s) = \frac{g(s+h)-g(s-h)}{2h} + O(h^2)$$

md"""
We compute the derivative $g'(0)$ both analytically and using a finite difference estimate. So far, we have used one-sided finite differences, but it is actually a little more accurate to use symmetric finite differences:

$$g'(s) = \frac{g(s+h)-g(s-h)}{2h} + O(h^2)$$
"""
126 μs
2.5452209356438296e-11
begin
# Compute g'(0) analytically
dg0 = [u[1] + 2*u[2]; 2*x0[1]*u[1] + 8*x0[2]*u[2]]

# Estimate g'(0) by a symmetric finite difference computation
dg0_fd = (g(1e-6)-g(-1e-6))/2e-6

# Compute the relative error between the two computations
norm(dg0-dg0_fd)/norm(dg0)
end
45.1 ms

We can also compute second derivatives analytically or by computing second derivatives. We could do this by finite differencing the first derivative, or by the formula

$$g''(s) = \frac{g(s-h) -2 g(s) + g(s+h)}{h^2} + O(h^2).$$

md"""
We can also compute second derivatives analytically or by computing second derivatives. We could do this by finite differencing the first derivative, or by the formula

$$g''(s) = \frac{g(s-h) -2 g(s) + g(s+h)}{h^2} + O(h^2).$$
"""
117 μs
1.3830592783544882e-9
begin
d2g0 = [0; 2*u[1]^2 + 8*u[2]^2]
d2g0_fd = (g(-1e-4)-2*g(0.)+g(1e-4))/1e-8
norm(d2g0-d2g0_fd)/norm(d2g0)
end
27.6 ms

The second-order Taylor series approximation to $g(s)$ lies directly atop the true approximation (as it should in this case – can you see why?).

md"""
The second-order Taylor series approximation to $g(s)$ lies directly atop the true approximation (as it should in this case -- can you see why?).
"""
121 μs
begin
gtaylor(s) = g(0.) + (dg0 + d2g0/2*s)*s
plot(ss, [g(s)[2] for s in ss], label="Original")
plot!(ss, [gtaylor(s)[2] for s in ss], label="Taylor", style=:dash)
end
128 ms

Derivatives, approximation, and chain rule

The function $f$ is differentiable at $x$ if there is a good affine (constant plus linear) approximation

$$f(x) + f'(x)z + o(\|z\|)$$

where the Jacobian $f'(x)$ (also written $J(x)$ or $\frac{\partial f}{\partial x})$ is the $m \times n$ matrix whose $(i,j)$ entry is the partial derivative $f_{i,j} = \partial f_i/\partial x_j$. If $f$ is differentiable, the Jacobian matrix maps directions to directional derivatives.

md"""
## Derivatives, approximation, and chain rule

The function $f$ is differentiable at $x$ if there is a good affine (constant plus linear) approximation

$$f(x) + f'(x)z + o(\|z\|)$$

where the *Jacobian* $f'(x)$ (also written $J(x)$ or $\frac{\partial f}{\partial x})$ is the $m \times n$ matrix whose $(i,j)$ entry is the partial derivative $f_{i,j} = \partial f_i/\partial x_j$. If $f$ is differentiable, the Jacobian matrix maps directions to directional derivatives.
"""
18.3 ms
1.1509785999592623e-16
begin
# Jacobian matrix for our test function
dftest(x) = [1.0 2.0 ; 2*x[1] 8*x[2]]

# Sanity check that we get the right behavior in the u direction
J0u = dftest(x0)*u
norm(J0u-dg0)/norm(dg0)
end
47.4 ms

The chain rule is just about matrix multiplication: the derivative of a composition is the composition of derivatives. As an example of this, consider the behavior of $f$ on a circle: $h(\theta) = f(g(\theta))$.

md"""
The chain rule is just about matrix multiplication: the derivative of a composition is the composition of derivatives. As an example of this, consider the behavior of $f$ on a circle: $h(\theta) = f(g(\theta))$.
"""
121 μs
6.603894155300888e-9
begin
# Parameterization of points on a circle and its Jacobian
gcirc(θ :: Float64) = [cos(θ); sin(θ)]
dgcirc(θ :: Float64) = [-sin(θ); cos(θ)]

# Composition and its derivative by chain rule
htest(θ) = ftest(gcirc(θ))
dhtest(θ) = dftest(gcirc(θ)) * dgcirc(θ)

# Derivative estimate by symmetric finite differences
dhtest_fd(θ; Δθ=1e-4) = (htest(θ+Δθ)-htest(θ-Δθ))/(2*Δθ)

# Compare vs finite differences
norm(dhtest(1.23)-dhtest_fd(1.23))/norm(dhtest(1.23))

end
46.0 ms