Project 2: Where are My Glasses?
The image above is a blurred version of a picture that I took at the local SPCA. As we will see, a naive approach to de-blurring produces garbage, a characteristic feature of ill-posed problems. In order to reconstruct the image, we need to regularize the reconstruction, just as we would regularize an ill-posed least squares problem. We will use Tikhonov regularization, as described in class. However, this involves choosing the value of a regularization parameter. Your mission is to investigate the dependence on the parameter, and to investigate three approaches to choosing this parameter (mostly) automatically.
You are given the file blurry.png
(the blurred image shown above). You are responsible for the following deliverables, which can either by submitted as a Jupyter notebook (recommended) or in standalone code:
Codes to compute "optimal" values of the regularization parameter $\lambda$ via the discrepancy principle, the L-curve, and generalized cross-validation.
A report that addresses the questions posed in the rest of this prompt.
This project is inspired by the project on image deblurring by James G. Nagy and Dianne P. O'Leary (Image Deblurring: I Can See Clearly Now) in Computing in Science and Engineering; Project: Vol. 5, No. 3, May/June 2003, pp. 82-85; Solution: Vol. 5, No. 4, July/August 2003). Other useful references include:
Hansen, Nagy, O'Leary. Deblurring Images: Matrices, Spectra, and Filtering, SIAM 2006.
Hansen and O'Leary. "The Use of the L-Curve in the Regularization of Discrete Ill-Posed Problems" in SIAM J. Sci. Comput., Vol 14, No. 6, November 1993.
Golub, Heath, and Wahba. "Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter" in Technometrics, Vol. 21, No. 2, May 1979.
Golub and von Matt. "Generalized Cross-Validation for Large-Scale Problems" in Journal of Computational and Graphical Statistics, Vol. 6, No. 1, March 1997.
Morozov. Methods for Solving Incorrectly Posed Problems, Springer-Verlag, 1984.
It should be possible to do this assignment based only on ideas presented in this prompt and in the lectures. However, you are always welcome to use any ideas or code you find in the literature, including in the references noted above, provided you give an appropriate citation.
Julia setup
We use the following Julia packages for this project.
FileIO: The FileIO library provides extensible support for loading different types of files; we use it to work with PNG files.
Images: The Images package provides various tools for working with images.
FFTW: The FFTW package provides Fourier transform tools.
Colors: Manipulates different color spaces
Roots: Solve one-dimensional nonlinear equations
Plots: Create plots in Julia
You will also want to install the ImageMagick or QuartzImageIO (on Mac) to work with FileIO to load images. You do not need to include these packages directly with using
, but they will be loaded by the FileIO library.
Once we have the prerequisites, we can load the blurred image. If you have the data locally, it is fine to just use load("blurry.png")
to get the image; but one can also combine the load with a download
command to fetch the data from an online source.
Image array manipulation
Images are represented in Julia as two-dimensional arrays of color values. In the default representation of the PNG file that we downloaded, each pixel color is represented by 24 bits, with 8 bits each for the red, green, and blue intensity values (RGB). Each 8 bit number is represented as a fixed-point number with values ranging from 0.0 = 0/255 to 1.0 = 255/255 (a Normed{Uint8,8}
type in Julia).
Matrix{RGB{N0f8}} (alias for Array{RGB{Normed{UInt8, 8}}, 2})
The picture was generated by a blurring operation that works on each color plane independently. To deblur, we want to work on the arrays for one color plane at a time. Therefore, we will manipulate the data in terms of three floating-point arrays (one each for the red, green, and blue levels), which we construct via the channelview
function. The colorview
function maps these floating point arrays back to a viewable image. We define a convenience function arrays_to_img
that does this conversion and makes sure that all the entries of the arrays are clipped back to the interval from 0 to 1.
img_rgb_pixels (generic function with 1 method)
arrays_to_img (generic function with 1 method)
Now we can compute the 3-array representation of the image and illustrate the helpers.
3×249×320 reinterpret(reshape, Float32, ::Array{RGB{Float32},2}) with eltype Float32:
[:, :, 1] =
0.360784 0.345098 0.34902 0.352941 … 0.376471 0.388235 0.396078 0.388235
0.443137 0.435294 0.443137 0.447059 0.447059 0.458824 0.470588 0.466667
0.545098 0.541176 0.54902 0.556863 0.533333 0.54902 0.560784 0.560784
[:, :, 2] =
0.360784 0.345098 0.34902 0.352941 … 0.392157 0.403922 0.407843 0.392157
0.439216 0.431373 0.439216 0.447059 0.458824 0.470588 0.478431 0.466667
0.541176 0.537255 0.54902 0.556863 0.541176 0.552941 0.564706 0.560784
[:, :, 3] =
0.356863 0.341176 0.34902 0.352941 … 0.407843 0.415686 0.415686 0.392157
0.435294 0.431373 0.439216 0.447059 0.470588 0.482353 0.482353 0.462745
0.537255 0.533333 0.54902 0.556863 0.545098 0.560784 0.568627 0.556863
;;; …
[:, :, 318] =
0.364706 0.352941 0.352941 0.352941 … 0.329412 0.341176 0.352941 0.364706
0.447059 0.443137 0.447059 0.45098 0.407843 0.423529 0.435294 0.447059
0.552941 0.552941 0.556863 0.560784 0.509804 0.52549 0.541176 0.552941
[:, :, 319] =
0.364706 0.352941 0.34902 0.352941 … 0.345098 0.356863 0.368627 0.376471
0.447059 0.439216 0.447059 0.45098 0.419608 0.435294 0.447059 0.454902
0.552941 0.54902 0.552941 0.556863 0.517647 0.533333 0.54902 0.556863
[:, :, 320] =
0.364706 0.34902 0.34902 0.352941 … 0.360784 0.372549 0.384314 0.384314
0.447059 0.439216 0.443137 0.447059 0.435294 0.447059 0.458824 0.462745
0.54902 0.545098 0.552941 0.556863 0.52549 0.541176 0.556863 0.560784
npixels = 79680
Blurring, convolution, and Fourier transforms
The image was blurred by convolving with a blurring kernel and then rounding the result. That is, the map from the original pixels $x_{ij}$ to the blurred pixels $y_{ij}$ is given by
$$y_{ij} = (k*x)_{ij} = \sum_{k,l} k_{kl} x_{i-k,j-l}$$
The kernel corresponds to the type of blurring that one sees with motion blur: each pixel in the output is the average of a few pixels along a diagonal line in the image.
11×29 Matrix{Float32}:
0.0 0.0 0.0 0.0 … 0.0135039 0.024886 0.020744
0.0 0.0 0.0 0.0 0.0217821 0.0104 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 … 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.00189212 0.0 0.0 0.0
0.0 0.0104 0.0217821 0.0331641 0.0 0.0 0.0
0.020744 0.024886 0.0135039 0.00212182 … 0.0 0.0 0.0
The function represented by the matrix $K$ is sometimes called the point spread function or PSF. It turns out that the convolution operation $k*x$ – which we will generally denote as a linear operation $H$ that maps images to images – can be done very quickly via Fourier transforms:
$$(\hat{k*x})_{ij} = \hat{k}_{ij} \hat{x}_{ij} = s_{ij} \hat{x}_{ij}.$$
Therefore, we will want to keep around a 2D Fourier transform of the (padded) kernel matrix for fast operations with $H$.
get_psf_fft (generic function with 1 method)
249×320 Matrix{ComplexF64}:
1.0-5.03612e-17im 0.987252-4.53739e-17im … 0.987252-5.0676e-17im
0.997162-3.14893e-17im 0.9963+9.05986e-18im 0.972657-7.89454e-17im
0.988679+9.07342e-17im 0.999709+9.77815e-17im 0.952665+9.99487e-17im
0.974637+7.58128e-17im 0.997444+7.08255e-17im 0.927483+8.30755e-17im
0.955184+1.86158e-16im 0.989529+1.93964e-16im 0.897369+1.66384e-16im
0.930519-1.89735e-17im 0.976049+5.42101e-19im … 0.862631-1.33997e-17im
0.900899-8.0231e-18im 0.957144+3.18755e-17im 0.823625-6.47094e-17im
⋮ ⋱
0.900899+5.71375e-17im 0.823625+7.36173e-17im 0.957144+4.11656e-17im
0.930519+2.56162e-17im 0.862631+2.21468e-17im 0.976049-6.37524e-18im
0.955184-4.17807e-18im 0.897369-2.23927e-17im … 0.989529+6.00509e-18im
0.974637+3.66528e-17im 0.927483+4.96632e-17im 0.997444+2.27267e-17im
0.988679+2.16047e-17im 0.952665+2.94109e-17im 0.999709+2.22758e-17im
0.997162+1.28584e-17im 0.972657+4.40165e-18im 0.9963+1.77891e-17im
Regularization and deblurring
In practice we do not store the exact values produced by the blurring operation $H$. Instead, if $V^{\mathrm{orig}}$ is the true input image (represented as three columns, one per plane) and $V^{\mathrm{blur}}$ is the blurred image, we actually have
$$V^{\mathrm{blur}} = \operatorname{round}(H V^{\mathrm{orig}}),$$
where the rounding operation maps to the nearest 8-bit fixed point representation. As described above, our blurring operation is a convolution, which can be diagonalized by a Fourier transform $Z$; that is,
$$H = Z^{*} S Z$$
where $Z$ is a unitary matrix ($Z^* Z = I$) whose action can be applied by the two-dimensional fast Fourier transform and $S$ is a diagonal matrix of eigenvalues. The matrix $H$ is not symmetric, so the eigenvalues are complex!
The simplest reconstruction approach is to simply solve
$$V^{\mathrm{naive}} \approx H^{-1} V^{\mathrm{blur}}.$$
which we can do programmatically by
Transforming $V^{\mathrm{blur}}$ into the Fourier basis: $\hat{V} = Z V^{\mathrm{blur}}$.
Forming $U = S^{-1} V$ (i.e. scaling each element of $V$).
Transforming back: $V^{\mathrm{naive}} = Z^* U$.
In fact, we can do many of the manipulations in this assignment working entirely with transformed images $\hat{V} = ZV$ at intermediate stages, and only transforming back to real space at the very end. However, if we want to work this way, it is important to realize that the fft
function actually applies a scaled version of $Z$; that is,
$$\mbox{fft}(V) = \sqrt{N_\mbox{pixels}} ZV$$
and the inverse FFT similarly divides by the square root of the number of pixels.
A better approach to deblurring the image is Tikhonov regularization:
$$V^{\mathrm{tik}}(\lambda) = \operatorname{argmin}_V \|HV-V^{\mathrm{blur}}\|_F^2 + \lambda^2 \|V\|_F^2.$$
We will now investigate why this approach is better.
Questions/tasks
These tasks use the following functions to compute the Tikhonov deblurred image as three color plane arrays and as an image array, respectively.
resid_norm (generic function with 1 method)
Generate a deblurred image with the command
deblur_img(blurry_rgb, s, 0)
(with $\lambda = 0$). Ideally, this naive deblurring approach should reproduce the original image. Does it? What happens when you run `deblur_img(blurry_rgb, s, 0.01)
(with $\lambda = 0.01$)?Using your understanding of the normal equations and of the eigenvalue decomposition of $H$, explain why the function
deblur
actually does solve the Tikhonov minimization problem.Because of rounding, the blurred image actually looks like $V^{\mathrm{blur}} = HV^{\mathrm{orig}} + E$ where $E$ is a backward error. What is the maximum error per pixel (assuming no rounding error in applying $H$)? What is a natural bound on $\|E\|_F^2$?
Using the decomposition $H = Z^* S Z$, what are the singular values of $H$? The eigenvalues of $H$ are in the
s
array –- what is the smallest singular value?Based on the error bound on $\|E\|_F^2$ and the value of the smallest singular value, provide a bound on $\|V^{\mathrm{orig}}-V^{\mathrm{naive}}\|_F$. Compare to $\|V^{\mathrm{blur}}\|_F$, and argue that the bad behavior of the naive deblurring approach is completely predictable.
Solutions
Choosing the parameter
The unsatisfactory feature of Tikhonov regularization is that there is a free parameter $\lambda$, and so far we have no guidance as to how to find it. In the remainder of this project, we consider three different approaches to finding an "optimal" $\lambda$ according to three different criteria.
Morozov's discrepancy principle
Morozov's discrepancy principle says roughly that we should choose $\lambda$ so that the residual
$$\rho(\lambda) = \|HV^{\mathrm{tik}}(\lambda) - V^{\mathrm{blur}}\|_F$$
is about the same size as the difference between $V^{\mathrm{blur}}$ and the "correct" right hand side. Usually this distance is difficult to estimate (which often makes the discrepancy principle hard to apply in practice), but in our case we already have a good estimate.
Questions/tasks
Complete the
discrepancy
function below to compute $\lambda$ such that $\rho(\lambda)$ is approximately $\tau = (\sqrt{n/4})/255$ where $n$ is the number of pixels (this corresponds to assuming the rounded-off quantity is uniformly distributed). The parametersλmin
andλmax
are user-defined lower and upper bounds on the value of the regularization parameter $\lambda$. Default values of $10^{-4}$ and $1$ span a good range. You may use the Juliafzero
routine to find the zero of the equation.What is the optimal $\lambda$ by this criterion? Show the image for the computed regularization parameter.
On a log-log scale, plot $\rho(\lambda)$ against $\lambda$ over a "reasonable" range of values around the recommended regularization parameter. Use a dashed horizontal line to indicate $\tau = (\sqrt{n/4})/255$, and mark somehow the point on the curve associated with the desired value of $\lambda$. Are there any visually distinctive features of the plot that suggest this is around the right value?
Solutions
We begin by completing the discrepancy
function to compute $\lambda_{\mathrm{morozov}}$ such that $\rho(\lambda_{\mathrm{morozov}}) = \tau$.
τ_morozov (generic function with 1 method)
discrepancy (generic function with 3 methods)
Let's see what the result of regularizing with $\lambda_{\mathrm{morozov}}$ is for our test image.
0.01
Finally, we produce a log-log plot of $\rho(\lambda)$ vs $\lambda$ with a dashed horizontal line at $\tau$ and a point marking the computed intersection of the two curves.
0.5534832553530149
Generalized cross-validation
The generalized cross-validation criterion involves minimizing
$$G(\lambda) = \frac{N \rho(\lambda)^2} {\left( \mathrm{tr}(I-H \hat{H}^{\dagger}(\lambda)) \right)^2},$$
where $\hat{H}^\dagger(\lambda)$ is the solution operator for the Tikhonov regularized problem ($\hat{H}^\dagger(\lambda) = (H^*H + \lambda^2I)^{-1}H^*$) and $N$ is the number of unknowns (in this case, three times the number of pixels). Minimizing $G$ is easy given the SVD of $H$. In fact, we know how to compute the SVD of $H$ in this problem, but in other settings this is not so easy. So we will seek a different approach.
The troublesome term is the trace that appears in the denominator, but it turns out that we can estimate this trace using the fact that if $z$ is any random vector with independent entries of mean zero and variance one, then
$$\mathrm{tr}(I-H \hat{H}^\dagger(\lambda)) = \mathbb{E}\left[ z^T (I-H \hat{H}^\dagger(\lambda)) z \right].$$
The Hutchinson estimator uses random probe vectors consisting of $\pm 1$ entries to estimate the trace of a large matrix, and Golub and von Matt suggested a procedure that approximates the GCV criterion with minimization of
$$\tilde{G}(\lambda) = \frac{Nm^2 \rho(\lambda)^2} {\left( \sum_{l=1}^m z_l^T(I-H \hat{H}^{\dagger}(\lambda)) z_l \right)^2},$$
where each vector $z_l$ is a $\pm 1$ vector drawn using a pseudo-random number generator.
Questions/tasks
Show that $\frac{d}{d\lambda} \left( I-H \hat{H}^\dagger(\lambda) \right) = 2\lambda( \hat{H}^\dagger(\lambda) )^* \hat{H}^\dagger(\lambda).$ From this, you will be able to compute derivatives of $\rho(\lambda)^2$ and of $z_\ell^T (I-H \hat{H}^\dagger(\lambda)) z_{\ell}$.
For a given set of probe vectors stored in a probes $\times$ image size array
Z
, complete thegcv
routine below to evaluate $\tilde{G}(\lambda)$ and the derivative $\tilde{G}'(\lambda)$. You should need to employ multiple factorizations per call.Using the
gcv
function as a building block, complete the optimizergcvopt
to minimize the $\tilde{G}$. You may want to make a version ofgcv
that works with transformed data for performance in the optimizer.What is the optimal $\lambda$ by this criterion? Show the image for the computed regularization parameter.
On a log-log scale, plot $\tilde{G}(\lambda)$ against $\lambda$ over a "reasonable" range of values around the recommended regularization parameter. Mark somehow the point on the curve associated with the desired value of $\lambda$. Is the curve "flat" in the neighborhood of the minimum, or is the minimum clearly defined?
Solutions
You should feel free to rewrite anything in this section! The point of the code and text below is to give you some starting points.
The gcv
function returns the estimated GCV function $G(\lambda)$ and its derivative $G'(\lambda)$. The gcvopt
function returns the estimate minimizer $\lambda_{\mathrm{gcv}}$ of the GCV function.
gcv (generic function with 1 method)
gcvopt (generic function with 1 method)
We provide some convenience functions to generate a set of probe vectors for the method. Surprisingly few probes are needed to get reasonable estimates (at least, reasonable for our purposes). We will default to using 10.
hutchinson_probes (generic function with 2 methods)
Let's run the computation and show the picture for our test image.
0.01
Finally, we create a log-log plot of $G(\lambda)$ as a function of $\lambda$, and mark the point $(\lambda_{\mathrm{gcv}}, G(\lambda_{\mathrm{gcv}}))$. The curve is not all that sharp, but neither is it terribly flat; the parameter is reasonably well pinned down.
0.01
The L-curve
The L-curve is a log-log plot of the residual norm ($x$-axis) against the solution norm ($y$-axis) for varying values of $\lambda$. It is named because it tends to look like the letter L. The vertical part of the L corresponds to the under-regularized case, where the residual norm changes negligibly with changes in $\lambda$, but the solution norm changes dramatically. The horizontal part corresponds to the over-regularized case, where small reductions to the solution norm correspond to increasingly large residuals. The corner of the L is the "sweet spot" where the two balance each other, and we seek to find this by finding $\lambda$ where the curvature is maximal. You are given a function that computes a point on the L-curve (and the curvature) associated with a given $\lambda$; and a function that sweeps out the curve, returning parallel lists of residual norms, solution norms, logarithmically-spaced values for $\lambda$, and curvatures $\kappa(\lambda)$.
Questions/tasks
Draw a (log-log) plot of the L-curve, and a semi-log plot showing the curvature as a function of the (log) residual norm. Is there a well-defined corner with a large curvature?
Write an optimizer to maximize the curvature (the
lcurveopt
function). You may uselcurve
to compute $\kappa$, and you may use existing Julia routines to do the optimization (or you can just use brute force).What is the optimal $\lambda$ by this criterion? Show the image for the computed regularization parameter; is this a good result?
lcurvet (generic function with 1 method)
lcurve (generic function with 1 method)
Solutions
You should feel free to rewrite anything in this section! The point of the code below is to give you some starting points.
lcurveopt (generic function with 1 method)
plot_lcurve (generic function with 1 method)
0.1
0.1
Notes
The type of very ill-conditioned problems seen here occur frequently not only in image reconstruction and inverse problems, but also in solving "first kind" integral equations that occur in mathematical physics. Integral equations of the second kind tend to be much better behaved.
We have sadly passed on a discussion of the use of iterative solvers for treating the large linear systems that arise during the computation of the Tikhonov solutions and during the computations involved in determining the regularization parameter.
The types of model selection criteria involved here are useful as well for a variety of problems in machine learning –- though we are not always so blessed with tricks to make everything efficient!
As you might gather from our example, though the methods described in this project are mostly robust (hence popular) any regularization method can be fooled. There is no substitute for checking the answer to see if it looks sensible.