Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

Notebook for 2023-03-22

md"""
# Notebook for 2023-03-22
"""
21.2 ms
using Plots
2.7 s
using Roots
149 ms

A test problem

md"""
## A test problem
"""
116 μs
begin
testf(x) = x^3-2*x+1
dtestf(x) = 3*x^2-2
xx = range(0.5, 0.8, length=100)
plot(xx, testf.(xx))
end
3.6 s

A bisection routine

Bisection is pretty bulletproof. It is also slow, and does not generalize well to systems of nonlinear equations.

md"""
## A bisection routine

Bisection is pretty bulletproof. It is also slow, and does not generalize well to systems of nonlinear equations.
"""
147 μs
bisect (generic function with 1 method)
function bisect(f, xlo, xhi; xtol=1e-8, ftol=1e-8, maxiter=1000, monitor=(x)->nothing)

# Evaluate initial interval and sanity check
flo = f(xlo)
fhi = f(xhi)
if sign(flo) == sign(fhi)
error("Function must have different signs at interval endpoints")
end

# Enter bisection loop
for k = 1:maxiter
x = (xlo+xhi)/2
fx = f(x)

# Terminate if the function value is tiny or the x error is tiny
if abs(fx) < ftol || x-xlo < xtol
return x
end

# Adjust the interval
if sign(fx) == sign(flo)
xlo, flo = x, fx
else
xhi, fhi = x, fx
end
end

# If we didn't meet any of our criteria, give a best guess
(xlo+xhi)/2

2.2 ms
begin
resids_b = []
bisect(testf, 0.5, 0.8, monitor=(x)->push!(resids_b, abs(testf(x))))
plot(resids_b, yscale=:log10)
end
259 ms

Unguarded Newton

The unguarded Newton iteration is

$$x_{k+1} = x_k-\frac{f(x_k)}{f'(x_k)}$$

md"""
## Unguarded Newton

The unguarded Newton iteration is

$$x_{k+1} = x_k-\frac{f(x_k)}{f'(x_k)}$$
"""
140 μs
unguarded_newton (generic function with 1 method)
function unguarded_newton(f, df, x; xtol=1e-8, ftol=1e-8, maxiter=100,
monitor=(x)->nothing)

for k = 1:maxiter
fx = f(x)
dx = fx/df(x)
x -= dx
if abs(fx) < ftol || abs(dx) < xtol
return x
end
end

x
end
1.9 ms
begin
resids_n = []
unguarded_newton(testf, dtestf, 0.5, monitor=(x)->push!(resids_n, abs(testf(x))))
plot(resids_n, yscale=:log10)
end
11.0 ms

The problem with unguarded Newton is that it only converges from points that are close enough to the solution. A nice example is what happens with using Newton to find a zero of $\tan^{-1}(x)$. Start too far from the zero at the origin, and the iteration swings wildly back and forth between large positive values and large negative values until it swings all the way out of the range of the floating point numbers and starts producing NaN results.

md"""
The problem with unguarded Newton is that it only converges from points that are close enough to the solution. A nice example is what happens with using Newton to find a zero of $\tan^{-1}(x)$. Start too far from the zero at the origin, and the iteration swings wildly back and forth between large positive values and large negative values until it swings all the way out of the range of the floating point numbers and starts producing NaN results.
"""
141 μs
begin
resids_n2 = []
unguarded_newton(atan, (x)->1/(1+x^2), 1.5,
monitor=(x)->push!(resids_n2, atan(x)))
plot(resids_n2)
end
13.2 ms

Guarding Newton

In the arctangent example, Newton is always stepping in the right direction – it just moves by the wrong amount. We get around this with a line search strategy. The simplest such strategy (which we will improve on later) is to cut the step size in half every time a proposed step reduces $|f(x)|$.

md"""
## Guarding Newton

In the arctangent example, Newton is always stepping in the right direction -- it just moves by the wrong amount. We get around this with a line search strategy. The simplest such strategy (which we will improve on later) is to cut the step size in half every time a proposed step reduces $|f(x)|$.
"""
383 μs
newton_ls (generic function with 1 method)
function newton_ls(f, df, x; ftol=1e-8, maxiter=100, monitor=(x)->nothing)
fx = f(x)
iters = 0

# While not converged and budget remains
while abs(fx) > ftol && iters < maxiter

# Compute a Newton step
dx = fx/df(x)

# Try taking a full step, then half step, etc until improvement
α = 1.0
xtrial = x
ftrial = fx
while iters < maxiter
xtrial = x-α*dx
ftrial = f(xtrial)
iters += 1
α /= 2
abs(ftrial) > abs(fx) || break
end

# Accept the step and move on
x = xtrial
fx = ftrial
end

x, abs(fx) < ftol
end
2.2 ms
begin
resids_n3 = []
newton_ls(atan, (x)->1/(1+x^2), 1.5,
monitor=(x)->push!(resids_n3, abs(atan(x))))
plot(resids_n3, yscale=:log10)
end
8.7 ms

Secant iteration

Despite fast local convergence, Newton's iteration is not as robust as bisection, even with line search. It also has the annoying property that we have to remember how to compute derivatives!

Secant iteration uses a secant approximation to the function rather than a tangent approximation to the function. The secant approximation of $f$ through evaluations at $a$ and $b$ is

$$s(x) = f(a) + \frac{f(b)-f(a)}{b-a} (x-a)$$

and setting the secant to zero gives

$$x = \frac{a f(b) - b f(a)}{f(b)-f(a)}$$

If $f(a) f(b) < 0$, then $x$ always lies in the interval $[a,b]$. Hence, we can use the secant intersection point in place of the midpoint for a bisection-like iteration; this is the method of false position (or regula falsi). It can run slower than bisection in some cases, though typically not.

A variety of faster methods combine bisection with secant iteration (or some other polynomial interpolant) in some way.

We will just show the most straightforward version of secant iteration with no safeguarding.

md"""
## Secant iteration

Despite fast local convergence, Newton's iteration is not as robust as bisection, even with line search. It also has the annoying property that we have to remember how to compute derivatives!

*Secant* iteration uses a secant approximation to the function rather than a tangent approximation to the function. The secant approximation of $f$ through evaluations at $a$ and $b$ is

$$s(x) = f(a) + \frac{f(b)-f(a)}{b-a} (x-a)$$

and setting the secant to zero gives

$$x = \frac{a f(b) - b f(a)}{f(b)-f(a)}$$

If $f(a) f(b) < 0$, then $x$ always lies in the interval $[a,b]$. Hence, we can use the secant intersection point in place of the midpoint for a bisection-like iteration; this is the *method of false position* (or *regula falsi*). It can run slower than bisection in some cases, though typically not.

A variety of faster methods combine bisection with secant iteration (or some other polynomial interpolant) in some way.
406 μs
secant (generic function with 1 method)
function secant(f, a, b; ftol=1e-8, maxiter=100, monitor=(x)->nothing)

# Evaluate end points, set b to be the smaller of the two
fa = f(a)
fb = f(b)
if abs(fa) < abs(fb)
a, b = b, a
fa, fb = fb, fa
end

# Do secant iteration (no safeguarding)
for k =1:maxiter
x = (a*fb-b*fa)/(fb-fa)
fx = f(x)
if abs(fx) < ftol
return x, true
end
a, b = b, x
fa, fb = fb, fx
end

# Return best estimate so far if tolerance not reached
b, false
end
2.3 ms
begin
resids_s = []
secant(testf, 0.5, 0.8, monitor=(x)->push!(resids_s, abs(testf(x))))[1]
plot(resids_s, yscale=:log10)
end
504 μs

Using fzero

The Roots package includes functions find_zero and fzero that includes a variety of root-finding algorithms. The package is quite robust, and includes a number of algorithms. Given an initial bracket, the default is slow-but-reliable bisection; but one can also specify methods like the A42 algorithm, which combines bisection with an interpolation-based method.

md"""
## Using `fzero`

The `Roots` package includes functions `find_zero` and `fzero` that includes a variety of root-finding algorithms. The package is quite robust, and includes a number of algorithms. Given an initial bracket, the default is slow-but-reliable bisection; but one can also specify methods like the `A42` algorithm, which combines bisection with an interpolation-based method.
"""
177 μs
0.6180339887498948
find_zero(testf, (0.5, 0.8), Roots.A42())
308 ms

One thing that find_zero does not include is the type of monitor functionality that we have used in our home-grown codes above. However, it is easy to add this as a wrapper around an existing function.

md"""
One thing that `find_zero` does *not* include is the type of monitor functionality that we have used in our home-grown codes above. However, it is easy to add this as a wrapper around an existing function.
"""
160 μs
begin

# Add a monitor to an existing function f(x)
function add_monitor(f, monitor)
function monitored_f(x)
f(x)
end
end

# Set up a monitored version of our test function and run the optimizer
fz_xs = []
monitored_testf = add_monitor(testf, (x)->push!(fz_xs, x))
x_fz = find_zero(monitored_testf, (0.5, 0.8), Roots.A42())

# Plot convergence (add an offset to avoid complaints about zero on a log scale)
plot(abs.(testf.(fz_xs)) .+ 1e-16, yscale=:log10)

end
132 ms