Frontmatter

If you are publishing this notebook on the web, you can set the parameters below to provide HTML metadata. This is useful for search engines and social media.

Project 2: Where are My Glasses?

blurred image

The image above is a blurred version of a picture that I took at the local SPCA. As we will see, a naive approach to de-blurring produces garbage, a characteristic feature of ill-posed problems. In order to reconstruct the image, we need to regularize the reconstruction, just as we would regularize an ill-posed least squares problem. We will use Tikhonov regularization, as described in class. However, this involves choosing the value of a regularization parameter. Your mission is to investigate the dependence on the parameter, and to investigate three approaches to choosing this parameter (mostly) automatically.

You are given the file blurry.png (the blurred image shown above). You are responsible for the following deliverables, which can either by submitted as a Jupyter notebook (recommended) or in standalone code:

  • Codes to compute "optimal" values of the regularization parameter $\lambda$ via the discrepancy principle, the L-curve, and generalized cross-validation.

  • A report that addresses the questions posed in the rest of this prompt.

This project is inspired by the project on image deblurring by James G. Nagy and Dianne P. O'Leary (Image Deblurring: I Can See Clearly Now) in Computing in Science and Engineering; Project: Vol. 5, No. 3, May/June 2003, pp. 82-85; Solution: Vol. 5, No. 4, July/August 2003). Other useful references include:

It should be possible to do this assignment based only on ideas presented in this prompt and in the lectures. However, you are always welcome to use any ideas or code you find in the literature, including in the references noted above, provided you give an appropriate citation.

md"""
# Project 2: Where are My Glasses?

![blurred image](https://github.com/dbindel/cs4220-s20/raw/master/hw/code/proj1/data/blurry.png)

The image above is a blurred version of a picture that
I took at the local SPCA. As we will see, a naive approach to
de-blurring produces garbage, a characteristic feature of *ill-posed*
problems. In order to reconstruct the image, we need to *regularize*
the reconstruction, just as we would regularize an ill-posed least squares
problem. We will use Tikhonov regularization, as described in class.
However, this involves choosing the value of a
regularization parameter. Your mission is to investigate the dependence
on the parameter, and to investigate three approaches to
1.1 ms

Julia setup

We use the following Julia packages for this project.

  • FileIO: The FileIO library provides extensible support for loading different types of files; we use it to work with PNG files.

  • Images: The Images package provides various tools for working with images.

  • FFTW: The FFTW package provides Fourier transform tools.

  • Colors: Manipulates different color spaces

  • Roots: Solve one-dimensional nonlinear equations

  • Plots: Create plots in Julia

You will also want to install the ImageMagick or QuartzImageIO (on Mac) to work with FileIO to load images. You do not need to include these packages directly with using, but they will be loaded by the FileIO library.

md"""
## Julia setup

We use the following Julia packages for this project.

- [FileIO](https://github.com/JuliaIO/FileIO.jl): The FileIO library provides extensible support for loading different types of files; we use it to work with PNG files.
- [Images](https://juliaimages.org/latest/): The Images package provides various tools for working with images.
- [FFTW](https://github.com/JuliaMath/FFTW.jl): The FFTW package provides Fourier transform tools.
- [Colors](https://github.com/JuliaGraphics/Colors.jl): Manipulates different color spaces
- [Roots](https://github.com/JuliaMath/Roots.jl): Solve one-dimensional nonlinear equations
- [Plots](http://docs.juliaplots.org/latest/): Create plots in Julia

733 μs
using Images
5.8 s
using FileIO
212 μs
using FFTW
179 μs
using Colors
199 μs
using Roots
137 ms
using Plots
10.6 s
using DelimitedFiles
221 μs
using LinearAlgebra
151 μs

Once we have the prerequisites, we can load the blurred image. If you have the data locally, it is fine to just use load("blurry.png") to get the image; but one can also combine the load with a download command to fetch the data from an online source.

md"""
Once we have the prerequisites, we can load the blurred image. If you have the data locally, it is fine to just use `load("blurry.png")` to get the image; but one can also combine the load with a `download` command to fetch the data from an online source.
"""
145 μs
blurry_img
blurry_img = download("https://github.com/dbindel/cs4220-s20/raw/master/hw/code/proj1/data/blurry.png") |> load
1.8 s

Image array manipulation

Images are represented in Julia as two-dimensional arrays of color values. In the default representation of the PNG file that we downloaded, each pixel color is represented by 24 bits, with 8 bits each for the red, green, and blue intensity values (RGB). Each 8 bit number is represented as a fixed-point number with values ranging from 0.0 = 0/255 to 1.0 = 255/255 (a Normed{Uint8,8} type in Julia).

md"""
## Image array manipulation

Images are represented in Julia as two-dimensional arrays of color values. In the default representation of the PNG file that we downloaded, each pixel color is represented by 24 bits, with 8 bits each for the red, green, and blue intensity values (RGB). Each 8 bit number is represented as a fixed-point number with values ranging from 0.0 = 0/255 to 1.0 = 255/255 (a `Normed{Uint8,8}` type in Julia).
"""
186 μs
Matrix{RGB{N0f8}} (alias for Array{RGB{Normed{UInt8, 8}}, 2})
typeof(blurry_img)
28.9 μs

The picture was generated by a blurring operation that works on each color plane independently. To deblur, we want to work on the arrays for one color plane at a time. Therefore, we will manipulate the data in terms of three floating-point arrays (one each for the red, green, and blue levels), which we construct via the channelview function. The colorview function maps these floating point arrays back to a viewable image. We define a convenience function arrays_to_img that does this conversion and makes sure that all the entries of the arrays are clipped back to the interval from 0 to 1.

md"""
The picture was generated by a blurring operation that works on each color plane independently. To deblur, we want to work on the arrays for one color plane at a time. Therefore, we will manipulate the data in terms of three floating-point arrays (one each for the red, green, and blue levels), which we construct via the `channelview` function. The `colorview` function maps these floating point arrays back to a viewable image. We define a convenience function `arrays_to_img` that does this conversion and makes sure that all the entries of the arrays are clipped back to the interval from 0 to 1.
"""
161 μs
img_rgb_pixels (generic function with 1 method)
# Convenience function to count pixels
img_rgb_pixels(V) = prod(size(V)[2:3])
283 μs
arrays_to_img (generic function with 1 method)
# Convert 3-array representation to ordinary colorview representation
function arrays_to_img(V)
rgb = min.(max.(V, 0.0), 1.0)
return colorview(RGB, V[1,:,:], V[2,:,:], V[3,:,:])
end
414 μs

Now we can compute the 3-array representation of the image and illustrate the helpers.

md"""
Now we can compute the 3-array representation of the image and illustrate the helpers.
"""
117 μs
blurry_rgb
3×249×320 reinterpret(reshape, Float32, ::Array{RGB{Float32},2}) with eltype Float32:
[:, :, 1] =
 0.360784  0.345098  0.34902   0.352941  …  0.376471  0.388235  0.396078  0.388235
 0.443137  0.435294  0.443137  0.447059     0.447059  0.458824  0.470588  0.466667
 0.545098  0.541176  0.54902   0.556863     0.533333  0.54902   0.560784  0.560784

[:, :, 2] =
 0.360784  0.345098  0.34902   0.352941  …  0.392157  0.403922  0.407843  0.392157
 0.439216  0.431373  0.439216  0.447059     0.458824  0.470588  0.478431  0.466667
 0.541176  0.537255  0.54902   0.556863     0.541176  0.552941  0.564706  0.560784

[:, :, 3] =
 0.356863  0.341176  0.34902   0.352941  …  0.407843  0.415686  0.415686  0.392157
 0.435294  0.431373  0.439216  0.447059     0.470588  0.482353  0.482353  0.462745
 0.537255  0.533333  0.54902   0.556863     0.545098  0.560784  0.568627  0.556863

;;; … 

[:, :, 318] =
 0.364706  0.352941  0.352941  0.352941  …  0.329412  0.341176  0.352941  0.364706
 0.447059  0.443137  0.447059  0.45098      0.407843  0.423529  0.435294  0.447059
 0.552941  0.552941  0.556863  0.560784     0.509804  0.52549   0.541176  0.552941

[:, :, 319] =
 0.364706  0.352941  0.34902   0.352941  …  0.345098  0.356863  0.368627  0.376471
 0.447059  0.439216  0.447059  0.45098      0.419608  0.435294  0.447059  0.454902
 0.552941  0.54902   0.552941  0.556863     0.517647  0.533333  0.54902   0.556863

[:, :, 320] =
 0.364706  0.34902   0.34902   0.352941  …  0.360784  0.372549  0.384314  0.384314
 0.447059  0.439216  0.443137  0.447059     0.435294  0.447059  0.458824  0.462745
 0.54902   0.545098  0.552941  0.556863     0.52549   0.541176  0.556863  0.560784
# Compute the 3-array representation of the blurry image and illustrate the above helpers
blurry_rgb = channelview(float.(blurry_img))
33.0 ms
680 μs

npixels = 79680

md"""npixels = $(img_rgb_pixels(blurry_rgb))"""
14.6 ms

Blurring, convolution, and Fourier transforms

The image was blurred by convolving with a blurring kernel and then rounding the result. That is, the map from the original pixels $x_{ij}$ to the blurred pixels $y_{ij}$ is given by

$$y_{ij} = (k*x)_{ij} = \sum_{k,l} k_{kl} x_{i-k,j-l}$$

The kernel corresponds to the type of blurring that one sees with motion blur: each pixel in the output is the average of a few pixels along a diagonal line in the image.

md"""
## Blurring, convolution, and Fourier transforms

The image was blurred by *convolving* with a blurring kernel and then rounding the result. That is, the map from the original pixels $x_{ij}$ to the blurred pixels $y_{ij}$ is given by

$$y_{ij} = (k*x)_{ij} = \sum_{k,l} k_{kl} x_{i-k,j-l}$$

The kernel corresponds to the type of blurring that one sees with motion blur: each pixel in the output is the average of a few pixels along a diagonal line in the image.
"""
241 μs
K
11×29 Matrix{Float32}:
 0.0       0.0       0.0        0.0         …  0.0135039  0.024886  0.020744
 0.0       0.0       0.0        0.0            0.0217821  0.0104    0.0
 0.0       0.0       0.0        0.0            0.0        0.0       0.0
 0.0       0.0       0.0        0.0            0.0        0.0       0.0
 0.0       0.0       0.0        0.0            0.0        0.0       0.0
 0.0       0.0       0.0        0.0         …  0.0        0.0       0.0
 0.0       0.0       0.0        0.0            0.0        0.0       0.0
 0.0       0.0       0.0        0.0            0.0        0.0       0.0
 0.0       0.0       0.0        0.00189212     0.0        0.0       0.0
 0.0       0.0104    0.0217821  0.0331641      0.0        0.0       0.0
 0.020744  0.024886  0.0135039  0.00212182  …  0.0        0.0       0.0
K = [ 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 2.1218172991546583f-03 1.3503900755254135f-02 2.4885984211353594f-02 2.0744015897822540f-02 ;
0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 0.0000000000000000f+00 1.0629667668937061f-02 2.2011751125036522f-02 3.3164136802798241f-02 2.1782053346698766f-02 1.0399969890599306f-02 0.0000000000000000f+00 ;
81.1 μs
Gray.(K/maximum(K))
26.8 ms

The function represented by the matrix $K$ is sometimes called the point spread function or PSF. It turns out that the convolution operation $k*x$ – which we will generally denote as a linear operation $H$ that maps images to images – can be done very quickly via Fourier transforms:

$$(\hat{k*x})_{ij} = \hat{k}_{ij} \hat{x}_{ij} = s_{ij} \hat{x}_{ij}.$$

Therefore, we will want to keep around a 2D Fourier transform of the (padded) kernel matrix for fast operations with $H$.

md"""
The function represented by the matrix $K$ is sometimes called the *point spread function* or PSF. It turns out that the convolution operation $k*x$ -- which we will generally denote as a linear operation $H$ that maps images to images -- can be done very quickly via Fourier transforms:

$$(\hat{k*x})_{ij} = \hat{k}_{ij} \hat{x}_{ij} = s_{ij} \hat{x}_{ij}.$$

Therefore, we will want to keep around a 2D Fourier transform of the (padded) kernel matrix for fast operations with $H$.
"""
254 μs
get_psf_fft (generic function with 1 method)
function get_psf_fft(img, K)
h, w = size(img)
center = (size(K).+1)./2
PSF = zeros(h, w)
PSF[1:size(K,1), 1:size(K,2)] = K
PSFfft = fft(circshift(PSF, 1 .- center))
return PSFfft
end
752 μs
s
249×320 Matrix{ComplexF64}:
      1.0-5.03612e-17im  0.987252-4.53739e-17im  …  0.987252-5.0676e-17im
 0.997162-3.14893e-17im    0.9963+9.05986e-18im     0.972657-7.89454e-17im
 0.988679+9.07342e-17im  0.999709+9.77815e-17im     0.952665+9.99487e-17im
 0.974637+7.58128e-17im  0.997444+7.08255e-17im     0.927483+8.30755e-17im
 0.955184+1.86158e-16im  0.989529+1.93964e-16im     0.897369+1.66384e-16im
 0.930519-1.89735e-17im  0.976049+5.42101e-19im  …  0.862631-1.33997e-17im
 0.900899-8.0231e-18im   0.957144+3.18755e-17im     0.823625-6.47094e-17im
         ⋮                                       ⋱  
 0.900899+5.71375e-17im  0.823625+7.36173e-17im     0.957144+4.11656e-17im
 0.930519+2.56162e-17im  0.862631+2.21468e-17im     0.976049-6.37524e-18im
 0.955184-4.17807e-18im  0.897369-2.23927e-17im  …  0.989529+6.00509e-18im
 0.974637+3.66528e-17im  0.927483+4.96632e-17im     0.997444+2.27267e-17im
 0.988679+2.16047e-17im  0.952665+2.94109e-17im     0.999709+2.22758e-17im
 0.997162+1.28584e-17im  0.972657+4.40165e-18im       0.9963+1.77891e-17im
3.9 ms
Gray.(abs.(s))
38.9 ms

Regularization and deblurring

In practice we do not store the exact values produced by the blurring operation $H$. Instead, if $V^{\mathrm{orig}}$ is the true input image (represented as three columns, one per plane) and $V^{\mathrm{blur}}$ is the blurred image, we actually have

$$V^{\mathrm{blur}} = \operatorname{round}(H V^{\mathrm{orig}}),$$

where the rounding operation maps to the nearest 8-bit fixed point representation. As described above, our blurring operation is a convolution, which can be diagonalized by a Fourier transform $Z$; that is,

$$H = Z^{*} S Z$$

where $Z$ is a unitary matrix ($Z^* Z = I$) whose action can be applied by the two-dimensional fast Fourier transform and $S$ is a diagonal matrix of eigenvalues. The matrix $H$ is not symmetric, so the eigenvalues are complex!

The simplest reconstruction approach is to simply solve

$$V^{\mathrm{naive}} \approx H^{-1} V^{\mathrm{blur}}.$$

which we can do programmatically by

  • Transforming $V^{\mathrm{blur}}$ into the Fourier basis: $\hat{V} = Z V^{\mathrm{blur}}$.

  • Forming $U = S^{-1} V$ (i.e. scaling each element of $V$).

  • Transforming back: $V^{\mathrm{naive}} = Z^* U$.

In fact, we can do many of the manipulations in this assignment working entirely with transformed images $\hat{V} = ZV$ at intermediate stages, and only transforming back to real space at the very end. However, if we want to work this way, it is important to realize that the fft function actually applies a scaled version of $Z$; that is,

$$\mbox{fft}(V) = \sqrt{N_\mbox{pixels}} ZV$$

and the inverse FFT similarly divides by the square root of the number of pixels.

A better approach to deblurring the image is Tikhonov regularization:

$$V^{\mathrm{tik}}(\lambda) = \operatorname{argmin}_V \|HV-V^{\mathrm{blur}}\|_F^2 + \lambda^2 \|V\|_F^2.$$

We will now investigate why this approach is better.

Questions/tasks

These tasks use the following functions to compute the Tikhonov deblurred image as three color plane arrays and as an image array, respectively.

md"""
## Regularization and deblurring

In practice we do not store the exact values produced by the blurring operation $H$.
Instead, if $V^{\mathrm{orig}}$ is the true input image (represented as three columns, one per plane) and $V^{\mathrm{blur}}$ is the blurred image, we actually have

$$V^{\mathrm{blur}} = \operatorname{round}(H V^{\mathrm{orig}}),$$

where the rounding operation maps to the nearest 8-bit fixed point representation.
As described above, our blurring operation is
a *convolution*, which can be diagonalized by a Fourier transform $Z$;
that is,

$$H = Z^{*} S Z$$

where $Z$ is a unitary matrix ($Z^* Z = I$) whose action can be applied
by the two-dimensional fast Fourier transform and $S$ is a diagonal
matrix of eigenvalues. The matrix $H$ is not symmetric, so the
eigenvalues are complex!
3.9 ms
resid_norm (generic function with 1 method)
begin
# Apply FFT or inverse FFT across pixel plans and use to apply filter
img_fft(V) = mapslices(fft, V; dims=[2, 3]) / sqrt(img_rgb_pixels(V))
img_ifft(Vt) = real.(mapslices(ifft, Vt; dims=[2, 3])) * sqrt(img_rgb_pixels(Vt))
# Compute scaling factors for deblurring and residual in Fourier space
deblur_factors(s, λ) = conj.(s) ./ (abs2.(s) .+ λ^2)
resid_factors(s, λ) = λ^2 ./ (abs2.(s) .+ λ^2)
# Blurring and regularized deblurring operations (applied in Fourier space)
blur_fft(Vt, s) = mapslices(p -> s.*p, Vt; dims=[2, 3])
deblur_fft(Vt, s, λ) = blur_fft(Vt, deblur_factors(s, λ))
resid_fft(Vt, s, λ) = blur_fft(Vt, resid_factors(s, λ))
# Blurring and regularized deblurring operations (applied in real space)
blur(V, s) = img_ifft(blur_fft(img_fft(V), s))
deblur(V, s, λ) = img_ifft(deblur_fft(img_fft(V), s, λ))
resid(V, s, λ) = img_ifft(resid_fft(img_fft(V), s, λ))
3.3 ms
  1. Generate a deblurred image with the command deblur_img(blurry_rgb, s, 0) (with $\lambda = 0$). Ideally, this naive deblurring approach should reproduce the original image. Does it? What happens when you run `deblur_img(blurry_rgb, s, 0.01) (with $\lambda = 0.01$)?

  2. Using your understanding of the normal equations and of the eigenvalue decomposition of $H$, explain why the function deblur actually does solve the Tikhonov minimization problem.

  3. Because of rounding, the blurred image actually looks like $V^{\mathrm{blur}} = HV^{\mathrm{orig}} + E$ where $E$ is a backward error. What is the maximum error per pixel (assuming no rounding error in applying $H$)? What is a natural bound on $\|E\|_F^2$?

  4. Using the decomposition $H = Z^* S Z$, what are the singular values of $H$? The eigenvalues of $H$ are in the s array –- what is the smallest singular value?

  5. Based on the error bound on $\|E\|_F^2$ and the value of the smallest singular value, provide a bound on $\|V^{\mathrm{orig}}-V^{\mathrm{naive}}\|_F$. Compare to $\|V^{\mathrm{blur}}\|_F$, and argue that the bad behavior of the naive deblurring approach is completely predictable.

md"""
1. Generate a deblurred image with the command `deblur_img(blurry_rgb, s, 0)` (with $\lambda = 0$). Ideally, this naive deblurring approach should reproduce the original image. Does it? What happens when you run ``deblur_img(blurry_rgb, s, 0.01)` (with $\lambda = 0.01$)?

2. Using your understanding of the normal equations and of the eigenvalue decomposition of $H$, explain why the function `deblur` actually does solve the Tikhonov minimization problem.

3. Because of rounding, the blurred image actually looks like $V^{\mathrm{blur}} = HV^{\mathrm{orig}} + E$ where $E$ is a backward error. What is the maximum error per pixel (assuming no rounding error in applying $H$)? What is a natural bound on $\|E\|_F^2$?

4. Using the decomposition $H = Z^* S Z$, what are the singular values of $H$? The eigenvalues of $H$ are in the `s` array --- what is the smallest singular value?

550 μs

Solutions

md"""
#### Solutions
"""
130 μs

Choosing the parameter

The unsatisfactory feature of Tikhonov regularization is that there is a free parameter $\lambda$, and so far we have no guidance as to how to find it. In the remainder of this project, we consider three different approaches to finding an "optimal" $\lambda$ according to three different criteria.

md"""
## Choosing the parameter

The unsatisfactory feature of Tikhonov regularization is that there
is a free parameter $\lambda$, and so far we have no guidance as to
how to find it. In the remainder of this project, we consider three
different approaches to finding an "optimal" $\lambda$ according
to three different criteria.
"""
195 μs

Morozov's discrepancy principle

Morozov's discrepancy principle says roughly that we should choose $\lambda$ so that the residual

$$\rho(\lambda) = \|HV^{\mathrm{tik}}(\lambda) - V^{\mathrm{blur}}\|_F$$

is about the same size as the difference between $V^{\mathrm{blur}}$ and the "correct" right hand side. Usually this distance is difficult to estimate (which often makes the discrepancy principle hard to apply in practice), but in our case we already have a good estimate.

Questions/tasks

  1. Complete the discrepancy function below to compute $\lambda$ such that $\rho(\lambda)$ is approximately $\tau = (\sqrt{n/4})/255$ where $n$ is the number of pixels (this corresponds to assuming the rounded-off quantity is uniformly distributed). The parameters λmin and λmax are user-defined lower and upper bounds on the value of the regularization parameter $\lambda$. Default values of $10^{-4}$ and $1$ span a good range. You may use the Julia fzero routine to find the zero of the equation.

  2. What is the optimal $\lambda$ by this criterion? Show the image for the computed regularization parameter.

  3. On a log-log scale, plot $\rho(\lambda)$ against $\lambda$ over a "reasonable" range of values around the recommended regularization parameter. Use a dashed horizontal line to indicate $\tau = (\sqrt{n/4})/255$, and mark somehow the point on the curve associated with the desired value of $\lambda$. Are there any visually distinctive features of the plot that suggest this is around the right value?

md"""
### Morozov's discrepancy principle

Morozov's discrepancy principle says roughly that we should choose $\lambda$ so that the residual

$$\rho(\lambda) = \|HV^{\mathrm{tik}}(\lambda) - V^{\mathrm{blur}}\|_F$$

is about the same size as the difference between $V^{\mathrm{blur}}$
and the "correct" right hand side. Usually this distance is difficult
to estimate (which often makes the discrepancy principle hard to apply
in practice), but in our case we already have a good estimate.

3.8 ms

Solutions

We begin by completing the discrepancy function to compute $\lambda_{\mathrm{morozov}}$ such that $\rho(\lambda_{\mathrm{morozov}}) = \tau$.

md"""
#### Solutions

We begin by completing the `discrepancy` function to compute $\lambda_{\mathrm{morozov}}$ such that $\rho(\lambda_{\mathrm{morozov}}) = \tau$.
"""
199 μs
τ_morozov (generic function with 1 method)
# Compute the threshold for Morozov
τ_morozov(V) = sqrt(img_rgb_pixels(V)/4)/255.0
274 μs
discrepancy (generic function with 3 methods)
# TODO: Find the optimal λ according to the discrepancy principle
function discrepancy(V, s, λmin=1e-4, λmax=1)
return 0.01 # Placeholder code
end
528 μs

Let's see what the result of regularizing with $\lambda_{\mathrm{morozov}}$ is for our test image.

md"""
Let's see what the result of regularizing with $\lambda_{\mathrm{morozov}}$ is for our test image.
"""
159 μs
λ_morozov
0.01
λ_morozov = discrepancy(blurry_rgb, s)
24.5 μs
734 ms

Finally, we produce a log-log plot of $\rho(\lambda)$ vs $\lambda$ with a dashed horizontal line at $\tau$ and a point marking the computed intersection of the two curves.

md"""
Finally, we produce a log-log plot of $\rho(\lambda)$ vs $\lambda$ with a dashed horizontal line at $\tau$ and a point marking the computed intersection of the two curves.
"""
139 μs
0.5534832553530149
begin
# Compute a log-spaced array of λ and corresponding residual norms ρ(λ)
blurry_rgbt = img_fft(blurry_rgb)
λs = exp.(range(log(1e-4), length=20, stop=log(1e0)));
ρs = [resid_norm_fft(blurry_rgbt, s, λ) for λ=λs]
# Compute the estimated error level for the discrepancy principle
# TODO: Create the desired plot
end
325 ms

Generalized cross-validation

The generalized cross-validation criterion involves minimizing

$$G(\lambda) = \frac{N \rho(\lambda)^2} {\left( \mathrm{tr}(I-H \hat{H}^{\dagger}(\lambda)) \right)^2},$$

where $\hat{H}^\dagger(\lambda)$ is the solution operator for the Tikhonov regularized problem ($\hat{H}^\dagger(\lambda) = (H^*H + \lambda^2I)^{-1}H^*$) and $N$ is the number of unknowns (in this case, three times the number of pixels). Minimizing $G$ is easy given the SVD of $H$. In fact, we know how to compute the SVD of $H$ in this problem, but in other settings this is not so easy. So we will seek a different approach.

The troublesome term is the trace that appears in the denominator, but it turns out that we can estimate this trace using the fact that if $z$ is any random vector with independent entries of mean zero and variance one, then

$$\mathrm{tr}(I-H \hat{H}^\dagger(\lambda)) = \mathbb{E}\left[ z^T (I-H \hat{H}^\dagger(\lambda)) z \right].$$

The Hutchinson estimator uses random probe vectors consisting of $\pm 1$ entries to estimate the trace of a large matrix, and Golub and von Matt suggested a procedure that approximates the GCV criterion with minimization of

$$\tilde{G}(\lambda) = \frac{Nm^2 \rho(\lambda)^2} {\left( \sum_{l=1}^m z_l^T(I-H \hat{H}^{\dagger}(\lambda)) z_l \right)^2},$$

where each vector $z_l$ is a $\pm 1$ vector drawn using a pseudo-random number generator.

Questions/tasks

  1. Show that $\frac{d}{d\lambda} \left( I-H \hat{H}^\dagger(\lambda) \right) = 2\lambda( \hat{H}^\dagger(\lambda) )^* \hat{H}^\dagger(\lambda).$ From this, you will be able to compute derivatives of $\rho(\lambda)^2$ and of $z_\ell^T (I-H \hat{H}^\dagger(\lambda)) z_{\ell}$.

  2. For a given set of probe vectors stored in a probes $\times$ image size array Z, complete the gcv routine below to evaluate $\tilde{G}(\lambda)$ and the derivative $\tilde{G}'(\lambda)$. You should need to employ multiple factorizations per call.

  3. Using the gcv function as a building block, complete the optimizer gcvopt to minimize the $\tilde{G}$. You may want to make a version of gcv that works with transformed data for performance in the optimizer.

  4. What is the optimal $\lambda$ by this criterion? Show the image for the computed regularization parameter.

  5. On a log-log scale, plot $\tilde{G}(\lambda)$ against $\lambda$ over a "reasonable" range of values around the recommended regularization parameter. Mark somehow the point on the curve associated with the desired value of $\lambda$. Is the curve "flat" in the neighborhood of the minimum, or is the minimum clearly defined?

md"""
### Generalized cross-validation

The *generalized cross-validation criterion* involves minimizing

$$G(\lambda) =
\frac{N \rho(\lambda)^2}
{\left( \mathrm{tr}(I-H \hat{H}^{\dagger}(\lambda)) \right)^2},$$

where $\hat{H}^\dagger(\lambda)$ is the solution operator for
the Tikhonov regularized problem ($\hat{H}^\dagger(\lambda) = (H^*H + \lambda^2I)^{-1}H^*$) and $N$ is the number of unknowns
(in this case, three times the number of pixels). Minimizing $G$ is
easy given the SVD of $H$. In fact, we know how to compute the SVD
of $H$ in this problem, but in other settings this is not so easy.
So we will seek a different approach.

837 μs

Solutions

You should feel free to rewrite anything in this section! The point of the code and text below is to give you some starting points.

The gcv function returns the estimated GCV function $G(\lambda)$ and its derivative $G'(\lambda)$. The gcvopt function returns the estimate minimizer $\lambda_{\mathrm{gcv}}$ of the GCV function.

md"""
#### Solutions

You should feel free to rewrite anything in this section! The point of the code and text below is to give you some starting points.

The `gcv` function returns the estimated GCV function $G(\lambda)$ and its derivative $G'(\lambda)$. The `gcvopt` function returns the estimate minimizer $\lambda_{\mathrm{gcv}}$ of the GCV function.
"""
234 μs
gcv (generic function with 1 method)
# TODO: Return (estimated) G(λ), G'(λ)
function gcv(V, s, λ, Z)
return λ, 1.0 # Placeholder
end
271 μs
gcvopt (generic function with 1 method)
# TODO: Optimize the GCV for λ between λmin and λmax
# Suggested strategy: use the fzero function on G' as returned by gcv
function gcvopt(V, s, λmin, λmax, Z)
return 0.01 # Placeholder
end
255 μs

We provide some convenience functions to generate a set of probe vectors for the method. Surprisingly few probes are needed to get reasonable estimates (at least, reasonable for our purposes). We will default to using 10.

md"""
We provide some convenience functions to generate a set of probe vectors for the method. Surprisingly few probes are needed to get reasonable estimates (at least, reasonable for our purposes). We will default to using 10.
"""
136 μs
hutchinson_probes (generic function with 2 methods)
hutchinson_probes(V, m=10) = sign.(rand(m, size(V,2), size(V,3)) .- 0.5)
444 μs

Let's run the computation and show the picture for our test image.

md"""
Let's run the computation and show the picture for our test image.
"""
162 μs
0.01
begin
λ_gcv = gcvopt(blurry_rgb, s, 1e-4, 1, Z)
end
1.5 ms
13.8 ms

Finally, we create a log-log plot of $G(\lambda)$ as a function of $\lambda$, and mark the point $(\lambda_{\mathrm{gcv}}, G(\lambda_{\mathrm{gcv}}))$. The curve is not all that sharp, but neither is it terribly flat; the parameter is reasonably well pinned down.

md"""
Finally, we create a log-log plot of $G(\lambda)$ as a function of $\lambda$, and mark the point $(\lambda_{\mathrm{gcv}}, G(\lambda_{\mathrm{gcv}}))$. The curve is not all that sharp,
but neither is it terribly flat; the parameter is reasonably well pinned down.
"""
166 μs
0.01
begin
# Compute (estimated) G(λ) for a range of λ values and for λ_gcv
# (you may want to use a function gcvt that works on transformed quantities)
Gs = [gcv(blurry_rgb, s, λ, Z)[1] for λ=λs]
Gλ_gcv = gcv(blurry_rgb, s, λ_gcv, Z)[1]
# TODO: Produce the plot
end
47.6 μs

The L-curve

The L-curve is a log-log plot of the residual norm ($x$-axis) against the solution norm ($y$-axis) for varying values of $\lambda$. It is named because it tends to look like the letter L. The vertical part of the L corresponds to the under-regularized case, where the residual norm changes negligibly with changes in $\lambda$, but the solution norm changes dramatically. The horizontal part corresponds to the over-regularized case, where small reductions to the solution norm correspond to increasingly large residuals. The corner of the L is the "sweet spot" where the two balance each other, and we seek to find this by finding $\lambda$ where the curvature is maximal. You are given a function that computes a point on the L-curve (and the curvature) associated with a given $\lambda$; and a function that sweeps out the curve, returning parallel lists of residual norms, solution norms, logarithmically-spaced values for $\lambda$, and curvatures $\kappa(\lambda)$.

Questions/tasks

  1. Draw a (log-log) plot of the L-curve, and a semi-log plot showing the curvature as a function of the (log) residual norm. Is there a well-defined corner with a large curvature?

  2. Write an optimizer to maximize the curvature (the lcurveopt function). You may use lcurve to compute $\kappa$, and you may use existing Julia routines to do the optimization (or you can just use brute force).

  3. What is the optimal $\lambda$ by this criterion? Show the image for the computed regularization parameter; is this a good result?

md"""
### The L-curve

The L-curve is a log-log plot of the residual norm ($x$-axis) against
the solution norm ($y$-axis) for varying values of $\lambda$. It is
named because it tends to look like the letter L. The vertical part of the L
corresponds to the under-regularized case, where the residual norm
changes negligibly with changes in $\lambda$, but the solution norm
changes dramatically. The horizontal part corresponds to the
over-regularized case, where small reductions to the solution norm
correspond to increasingly large residuals. The corner of the L is
the "sweet spot" where the two balance each other, and we seek to
find this by finding $\lambda$ where the curvature is maximal.
You are given a function that computes a point on the
L-curve (and the curvature) associated with a given $\lambda$;
and a function that sweeps out the curve, returning
465 μs
lcurvet (generic function with 1 method)
function lcurvet(V, s, λ)
abs2V = abs2.(V)
wdot(ss) = real(sum(mapslices(p -> ss .* p, abs2V; dims=[2,3])))
C = abs2.(s) .+ λ^2

# Compute the squared solution norm η(λ)^2 and residual norm ρ(λ)^2
η2 = wdot(abs2.(s) ./ C.^2)
ρ2 = wdot((abs2.(s) ./ C .- 1).^2)
# We also need the first and second derivatives with respect to λ
dη2 = -4*λ*wdot(abs2.(s) ./ C.^3)
dρ2 = -4*λ*wdot((abs2.(s) ./ C .- 1) .* abs2.(s) ./ C.^2)
d2η2 = 4*wdot(abs2.(s) ./ C.^3 .* (2*λ ./ C .- 1)) +
8*λ^2*wdot(abs2.(s) ./ C.^4)
d2ρ2 = 4*wdot((abs2.(s) ./ C .- 1).*(abs2.(s) ./ C.^2 .* (2*λ ./ C .- 1))) +
8*λ^2*wdot(abs2.(s).^2 ./ C.^4)
# The hatted versions that we'd typically plot are log(η) and log(ρ).
# For computing curvature, it's fine to use ηh = log(η^2) and ρh = log(ρ^2)
ηh = log(η2)
ρh = log(ρ2)
# d(log f) = df/f
3.2 ms
lcurve (generic function with 1 method)
lcurve(V, s, λ) = lcurvet(img_fft(V), s, λ)
257 μs

Solutions

You should feel free to rewrite anything in this section! The point of the code below is to give you some starting points.

md"""
#### Solutions

You should feel free to rewrite anything in this section! The point of the code below is to give you some starting points.
"""
199 μs
lcurveopt (generic function with 1 method)
# TODO: You should feel free to change the arguments to this function if it makes
# sense to you to do so
function lcurveopt(blurry_rgbt, s)
return 0.1 # Placeholder
end
230 μs
plot_lcurve (generic function with 1 method)
function plot_lcurve(blurry_rgbt, s, τ)

λ_lcurve = lcurveopt(blurry_rgbt, s)
# TODO: Produce the plot
p1 = plot(λs, λs) # Placeholder - plot the L curve

# TODO: Produce the curvature
p2 = plot(λs, λs) # Placeholder - plot curvature

end
357 μs
p1lc, p2lc, λ_lcurve = plot_lcurve(blurry_rgbt, s, τ)
3.5 s
5.7 μs
20.5 μs
0.1
6.3 μs
13.5 ms

Notes

  1. The type of very ill-conditioned problems seen here occur frequently not only in image reconstruction and inverse problems, but also in solving "first kind" integral equations that occur in mathematical physics. Integral equations of the second kind tend to be much better behaved.

  2. We have sadly passed on a discussion of the use of iterative solvers for treating the large linear systems that arise during the computation of the Tikhonov solutions and during the computations involved in determining the regularization parameter.

  3. The types of model selection criteria involved here are useful as well for a variety of problems in machine learning –- though we are not always so blessed with tricks to make everything efficient!

  4. As you might gather from our example, though the methods described in this project are mostly robust (hence popular) any regularization method can be fooled. There is no substitute for checking the answer to see if it looks sensible.

md"""
## Notes

1. The type of very ill-conditioned problems seen here occur frequently not only in image reconstruction and inverse problems, but also in solving "first kind" integral equations that occur in mathematical physics. Integral equations of the second kind tend to be much better behaved.

2. We have sadly passed on a discussion of the use of iterative solvers for treating the large linear systems that arise during the computation of the Tikhonov solutions and during the computations involved in determining the regularization parameter.

3. The types of model selection criteria involved here are useful as well for a variety of problems in machine learning --- though we are not always so blessed with tricks to make everything efficient!

4. As you might gather from our example, though the methods described in this project are mostly robust (hence popular) any regularization method can be fooled. There is no substitute for checking the answer to see if it looks sensible.
"""
381 μs