Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

HW 5 for CS 4220

You may (and probably should) talk about problems with the each other, with the TAs, and with me, providing attribution for any good ideas you might get. Your final write-up should be your own.

md"""
# HW 5 for CS 4220

You may (and probably should) talk about problems with the each other, with the TAs, and with me, providing attribution for any good ideas you might get. Your final write-up should be your own.
"""
165 μs
using LinearAlgebra
277 μs
using Plots
3.1 s

Some preliminary reminders

Consider three $C^2$ functions $\phi, f, g : \mathbb{R}^n \rightarrow \mathbb{R}$ given by

$$\phi(x) = f(x) g(x)$$

We will denote partial derivatives compactly by $f_{,i} = \partial f(x)/\partial x_i$ and $f_{,ij} = \partial^2 f/(\partial x_i \partial x_j)$ (and similarly with $g$ and $\phi$). In this case, the product rule gives us

$$\begin{align*} \phi_{,i} &= f_{,i} g + f g_{,i} \\ \phi_{,ij} &= f_{,ij} g + f_{,i} g_{,j} + f_{,j} g_{,i} + f g_{,ij} \end{align*}$$

or, in terms of gradients and Hessians,

$$\begin{align*} \nabla \phi &= (\nabla f) g + f (\nabla g) \\ H_{\phi} &= H_f g + (\nabla f) (\nabla g)^T + (\nabla g) (\nabla f)^T + f H_g \end{align*}$$

We also know that if $f : \mathbb{R}^n \rightarrow \mathbb{R}$ and $h : \mathbb{R} \rightarrow \mathbb{R}$ and $\psi(x) = h(f(x))$, then the chain rule gives

$$\begin{align*} \psi_{,i}(x) &= h'(f(x)) f_{,i}(x) \\ \psi_{,ij}(x) &= h''(f(x)) f_{,i}(x) f_{,j}(x) + h'(f(x)) f_{,ij}(x) \end{align*}$$

or, in terms of gradients and Hessians,

$$\begin{align*} \nabla \psi(x) &= h'(f(x)) \nabla f(x) \\ H_{\psi}(x) &= h''(f(x)) \nabla f(x) \nabla f(x)^T + h'(f(x)) H_f \end{align*}$$

md"""
#### Some preliminary reminders

Consider three $C^2$ functions $\phi, f, g : \mathbb{R}^n \rightarrow \mathbb{R}$ given by

$$\phi(x) = f(x) g(x)$$

We will denote partial derivatives compactly by $f_{,i} = \partial f(x)/\partial x_i$ and $f_{,ij} = \partial^2 f/(\partial x_i \partial x_j)$ (and similarly with $g$ and $\phi$). In this case, the product rule gives us

$$\begin{align*}
\phi_{,i} &= f_{,i} g + f g_{,i} \\
\phi_{,ij} &= f_{,ij} g + f_{,i} g_{,j} + f_{,j} g_{,i} + f g_{,ij}
\end{align*}$$

or, in terms of gradients and Hessians,

$$\begin{align*}
\nabla \phi &= (\nabla f) g + f (\nabla g) \\
H_{\phi} &= H_f g + (\nabla f) (\nabla g)^T + (\nabla g) (\nabla f)^T + f H_g
\end{align*}$$

We also know that if $f : \mathbb{R}^n \rightarrow \mathbb{R}$ and $h : \mathbb{R} \rightarrow \mathbb{R}$ and $\psi(x) = h(f(x))$, then the chain rule gives
342 μs

Questions

Consider the objective function $\phi : \mathbb{R}^n \rightarrow \mathbb{R}$ given by

$$\phi(x) = \exp(-\|x^2\|/2) (c^T x - d)$$

where $c \in \mathbb{R}^n$ and $d \in \mathbb{R}$. Assume $c \neq 0$

  1. Complete the following code to compute $\nabla \phi(x)$ and $H_{\phi}(x)$. We have given a tester to sanity check your result – what is this tester doing?

  2. Complete the Newton iteration code for solving $\nabla \phi(x) = 0$, and give a semilog plot to illustrate quadratic convergence. In the test case, does this converge to a local minimum, maximum, or saddle of $\phi$?

  3. Argue that any stationary point of $\phi$ must occur at $x = \alpha c$ where $\alpha$ is a root of the quadratic $\alpha^2 \|c\|^2 - d \alpha - 1 = 0$. Check that this is true for the solution computed in question 2.

md"""
#### Questions

Consider the objective function $\phi : \mathbb{R}^n \rightarrow \mathbb{R}$ given by

$$\phi(x) = \exp(-\|x^2\|/2) (c^T x - d)$$

where $c \in \mathbb{R}^n$ and $d \in \mathbb{R}$. Assume $c \neq 0$

1. Complete the following code to compute $\nabla \phi(x)$ and $H_{\phi}(x)$. We have given a tester to sanity check your result -- what is this tester doing?
2. Complete the Newton iteration code for solving $\nabla \phi(x) = 0$, and give a semilog plot to illustrate quadratic convergence. In the test case, does this converge to a local minimum, maximum, or saddle of $\phi$?
3. Argue that any stationary point of $\phi$ must occur at $x = \alpha c$ where $\alpha$ is a root of the quadratic $\alpha^2 \|c\|^2 - d \alpha - 1 = 0$. Check that this is true for the solution computed in question 2.
"""
1.4 ms
ϕ (generic function with 1 method)
ϕ(x, c, d) = exp(-x'*x/2) * (c'*x-d)
353 μs
function ϕderivs(x, c, d)
f = exp(-x'*x/2)
g = c'*x-d
ϕ = f*g

# TODO: Compute the gradient ∇ϕ and the Hessian Hϕ

ϕ, ∇ϕ,
end
---
# Sanity check correctness of derivatives
let
h = 1e-4
c = rand(2)
d = rand()
x = rand(2)
u = rand(2)
ϕ0, ∇ϕ, = ϕderivs(x, c, d)

ϕp = ϕ(x+h*u, c, d)
ϕm = ϕ(x-h*u, c, d)
ϕ0 = ϕ(x, c, d)
∂ϕ_∂u_fd = (ϕp - ϕm)/(2*h)
∂2ϕ_∂u2_fd = (ϕp - 2*ϕ0 + ϕm)/(h*h)

end
---
function ϕnewton(x0, c, d; maxiter=10, ∇ϕtol=1e-8, monitor=(x, ϕ, ∇ϕ)->nothing)
x = copy(x0)
ϕ, ∇ϕ, = ϕderivs(x, c, d)
for k = 1:maxiter

# TODO: Compute a Newton update

ϕ, ∇ϕ, = ϕderivs(x, c, d)
if norm(∇ϕ) < ∇ϕtol
return x
end
end
error("Did not converge in $maxiter steps")
end
---
let
x = [0.0; 0.0]
c = [1.0; 2.0]
d = 3.0
resids = []
monitor(x, ϕ, ∇ϕ)=push!(resids, norm(∇ϕ))
x = ϕnewton(x, c, d, monitor=monitor)

# TODO: Print x, y;
# plot norm(∇ϕ) vs iteration to show quadratic convergence;
# diagnose min/max/saddle;
# check that the conditions in problem 3 are satisfied
ϕ, ∇ϕ, = ϕderivs(x, c, d)
α = c\x
presid = norm(c)^2*α^2 - d*α - 1

p = plot(resids, yscale=:log10)
md"""
- x = ($(x[1]), $(x[2]))
- Hϕ posdef? $(isposdef(Hϕ))
- x = αc? $(norm(x-α*c)/norm(x))
- α |c|^2 - d α - 1 = $(presid)

$p
"""
md"""
"""
end
---