Notebook for 2023-03-22

md"""

# Notebook for 2023-03-22

"""

21.2 ms

using Plots

2.7 s

using Roots

149 ms

A test problem

md"""

## A test problem

"""

116 μs

begin

testf(x) = x^3-2*x+1

dtestf(x) = 3*x^2-2

xx = range(0.5, 0.8, length=100)

plot(xx, testf.(xx))

end

3.6 s

A bisection routine

Bisection is pretty bulletproof. It is also slow, and does not generalize well to systems of nonlinear equations.

md"""

## A bisection routine

Bisection is pretty bulletproof. It is also slow, and does not generalize well to systems of nonlinear equations.

"""

147 μs

bisect (generic function with 1 method)

function bisect(f, xlo, xhi; xtol=1e-8, ftol=1e-8, maxiter=1000, monitor=(x)->nothing)

# Evaluate initial interval and sanity check

flo = f(xlo)

fhi = f(xhi)

if sign(flo) == sign(fhi)

error("Function must have different signs at interval endpoints")

end

# Enter bisection loop

for k = 1:maxiter

x = (xlo+xhi)/2

monitor(x)

fx = f(x)

# Terminate if the function value is tiny or the x error is tiny

if abs(fx) < ftol || x-xlo < xtol

return x

end

# Adjust the interval

if sign(fx) == sign(flo)

xlo, flo = x, fx

else

xhi, fhi = x, fx

end

# If we didn't meet any of our criteria, give a best guess

(xlo+xhi)/2

2.2 ms

begin

resids_b = []

bisect(testf, 0.5, 0.8, monitor=(x)->push!(resids_b, abs(testf(x))))

plot(resids_b, yscale=:log10)

end

259 ms

Unguarded Newton

The unguarded Newton iteration is

$$x_{k+1} = x_k-\frac{f(x_k)}{f'(x_k)}$$

md"""

## Unguarded Newton

The unguarded Newton iteration is

$$x_{k+1} = x_k-\frac{f(x_k)}{f'(x_k)}$$

"""

140 μs

unguarded_newton (generic function with 1 method)

function unguarded_newton(f, df, x; xtol=1e-8, ftol=1e-8, maxiter=100,

monitor=(x)->nothing)

for k = 1:maxiter

monitor(x)

fx = f(x)

dx = fx/df(x)

x -= dx

if abs(fx) < ftol || abs(dx) < xtol

return x

end

1.9 ms

begin

resids_n = []

unguarded_newton(testf, dtestf, 0.5, monitor=(x)->push!(resids_n, abs(testf(x))))

plot(resids_n, yscale=:log10)

end

11.0 ms

The problem with unguarded Newton is that it only converges from points that are close enough to the solution. A nice example is what happens with using Newton to find a zero of $\tan^{-1}(x)$. Start too far from the zero at the origin, and the iteration swings wildly back and forth between large positive values and large negative values until it swings all the way out of the range of the floating point numbers and starts producing NaN results.

md"""

"""

141 μs

begin

resids_n2 = []

unguarded_newton(atan, (x)->1/(1+x^2), 1.5,

monitor=(x)->push!(resids_n2, atan(x)))

plot(resids_n2)

end

13.2 ms

Guarding Newton

In the arctangent example, Newton is always stepping in the right direction – it just moves by the wrong amount. We get around this with a line search strategy. The simplest such strategy (which we will improve on later) is to cut the step size in half every time a proposed step reduces $|f(x)|$.

md"""

## Guarding Newton

In the arctangent example, Newton is always stepping in the right direction -- it just moves by the wrong amount. We get around this with a line search strategy. The simplest such strategy (which we will improve on later) is to cut the step size in half every time a proposed step reduces $|f(x)|$.

"""

383 μs

newton_ls (generic function with 1 method)

function newton_ls(f, df, x; ftol=1e-8, maxiter=100, monitor=(x)->nothing)

fx = f(x)

monitor(x)

iters = 0

# While not converged and budget remains

while abs(fx) > ftol && iters < maxiter

# Compute a Newton step

dx = fx/df(x)

# Try taking a full step, then half step, etc until improvement

α = 1.0

xtrial = x

ftrial = fx

while iters < maxiter

xtrial = x-α*dx

ftrial = f(xtrial)

monitor(xtrial)

iters += 1

α /= 2

abs(ftrial) > abs(fx) || break

end

# Accept the step and move on

x = xtrial

fx = ftrial

end

x, abs(fx) < ftol

end

2.2 ms

begin

resids_n3 = []

newton_ls(atan, (x)->1/(1+x^2), 1.5,

monitor=(x)->push!(resids_n3, abs(atan(x))))

plot(resids_n3, yscale=:log10)

end

8.7 ms

Secant iteration

Despite fast local convergence, Newton's iteration is not as robust as bisection, even with line search. It also has the annoying property that we have to remember how to compute derivatives!

Secant iteration uses a secant approximation to the function rather than a tangent approximation to the function. The secant approximation of $f$ through evaluations at $a$ and $b$ is

$$s(x) = f(a) + \frac{f(b)-f(a)}{b-a} (x-a)$$

and setting the secant to zero gives

$$x = \frac{a f(b) - b f(a)}{f(b)-f(a)}$$

If $f(a) f(b) < 0$, then $x$ always lies in the interval $[a,b]$. Hence, we can use the secant intersection point in place of the midpoint for a bisection-like iteration; this is the method of false position (or regula falsi). It can run slower than bisection in some cases, though typically not.

A variety of faster methods combine bisection with secant iteration (or some other polynomial interpolant) in some way.

We will just show the most straightforward version of secant iteration with no safeguarding.

md"""

## Secant iteration

Despite fast local convergence, Newton's iteration is not as robust as bisection, even with line search. It also has the annoying property that we have to remember how to compute derivatives!

*Secant* iteration uses a secant approximation to the function rather than a tangent approximation to the function. The secant approximation of $f$ through evaluations at $a$ and $b$ is

$$s(x) = f(a) + \frac{f(b)-f(a)}{b-a} (x-a)$$

and setting the secant to zero gives

$$x = \frac{a f(b) - b f(a)}{f(b)-f(a)}$$

If $f(a) f(b) < 0$, then $x$ always lies in the interval $[a,b]$. Hence, we can use the secant intersection point in place of the midpoint for a bisection-like iteration; this is the *method of false position* (or *regula falsi*). It can run slower than bisection in some cases, though typically not.

A variety of faster methods combine bisection with secant iteration (or some other polynomial interpolant) in some way.

406 μs

secant (generic function with 1 method)

function secant(f, a, b; ftol=1e-8, maxiter=100, monitor=(x)->nothing)

# Evaluate end points, set b to be the smaller of the two

fa = f(a)

fb = f(b)

if abs(fa) < abs(fb)

a, b = b, a

fa, fb = fb, fa

end

monitor(b)

# Do secant iteration (no safeguarding)

for k =1:maxiter

x = (a*fb-b*fa)/(fb-fa)

fx = f(x)

monitor(x)

if abs(fx) < ftol

return x, true

end

a, b = b, x

fa, fb = fb, fx

end

# Return best estimate so far if tolerance not reached

b, false

end

2.3 ms

begin

resids_s = []

secant(testf, 0.5, 0.8, monitor=(x)->push!(resids_s, abs(testf(x))))[1]

plot(resids_s, yscale=:log10)

end

504 μs

Using `fzero`

The Roots package includes functions find_zero and fzero that includes a variety of root-finding algorithms. The package is quite robust, and includes a number of algorithms. Given an initial bracket, the default is slow-but-reliable bisection; but one can also specify methods like the A42 algorithm, which combines bisection with an interpolation-based method.

md"""

## Using `fzero`

The `Roots` package includes functions `find_zero` and `fzero` that includes a variety of root-finding algorithms. The package is quite robust, and includes a number of algorithms. Given an initial bracket, the default is slow-but-reliable bisection; but one can also specify methods like the `A42` algorithm, which combines bisection with an interpolation-based method.

"""

177 μs

0.6180339887498948

find_zero(testf, (0.5, 0.8), Roots.A42())

308 ms

One thing that find_zero does not include is the type of monitor functionality that we have used in our home-grown codes above. However, it is easy to add this as a wrapper around an existing function.

md"""

One thing that `find_zero` does *not* include is the type of monitor functionality that we have used in our home-grown codes above. However, it is easy to add this as a wrapper around an existing function.

"""

160 μs

begin

# Add a monitor to an existing function f(x)

function add_monitor(f, monitor)

function monitored_f(x)

monitor(x)

f(x)

end

# Set up a monitored version of our test function and run the optimizer

fz_xs = []

monitored_testf = add_monitor(testf, (x)->push!(fz_xs, x))

x_fz = find_zero(monitored_testf, (0.5, 0.8), Roots.A42())

# Plot convergence (add an offset to avoid complaints about zero on a log scale)

plot(abs.(testf.(fz_xs)) .+ 1e-16, yscale=:log10)

end

132 ms

Notebook for 2023-03-22

A test problem

A bisection routine

Unguarded Newton

Guarding Newton

Secant iteration

Using fzero

Using `fzero`