Learning Outcomes:

After taking this course, you will be able to do the following:

Basics:
1. Describe intuitively and mathematically the geometry and physics of image formation.
2. Explain intuitively what information gets lost in image formation
Image processing:
1. Implement convolution and understand what kind of filtering operations can be implemented as a convolution.
2. Fourier transforms Relate physical properties of the image to the ``frequency'' in a Fourier transform and explain its impact on various image processing operations and its interaction with convolution
3. Use the language of Fourier transforms to explain what happens during subsampling or upsampling, and when subsampling can lead to aliasing.
Grouping:
1. Implement basic edge detection using convolution and explain why image noise can be a problem.
2. Explain when edges can be hard to detect
3. Explain how edges can be sharpened using non-max suppression and implement the same.
4. Explain what texture means and why texture causes a problem for boundary detection.
5. Implement a version of the texture gradient proposed by Martin et al.
6. Explain why local edge detection can miss edges and the benefits of global reasoning
7. Intuitively explain how min-cut and normalized cut can be used for segmentation
Reconstruction:
1. Enumerate what it means to recover 3D structure of the scene: the position and orientation of the camera and the 3D locations of all points
2. Explain why recovering 3D structure from an image is an ill-posed problem.
3. Explain the additional information that is needed, and thus describe the problem of camera calibration, pose estimation, stereo and structure-from-motion.
4. Identify the reasons why estimating correspondences between images is hard.
5. Contrast different ways of measuring patch/pixel similarity in terms of their invariance and discriminability.
6. Explain why feature detection is useful, and what kind of features are useful to detect.
7. Mathematically derive and implement in code the Harris corner detector and derive the connection between the second moment matrix and the nature of the feature and its orientation.
8. Implement the MOPS feature descriptor and explain how SIFT improves upon MOPS.
9. Derive how camera parameters and 3D structures can be mathematically obtained from a set of correspondences (camera calibration, stereo, structure from motion).
10. Derive the special case of two-view stereo when the two views are related by a simple translation along X; explain the relationship between disparity and depth.
11. Explain and implement the plane-sweep stereo algorithm.
12. Understand the need for removing outliers and derive the RANSAC algorithm.
13. Explain what radiance means, how it relates to pixel values, and explain mathematically its relationship to surface normal, surface albedo and lighting.
14. Derive and implement photometric stereo to get normal and depth from a set of images from the same camera but different lights.
Recognition:
1. Write down the ERM principle and how it relates to generalization, overfitting and underfitting.
2. Derive the gradient descent update and SGD update for a general loss function and machine learning model.
3. Explain the rationale behind the bag-of-words feature representation.
4. Explain the rationale behind convolutions and subsamplings as critical layers in a neural network.
5. Design and implement a neural network for classification.
6. Explain the semantic segmentation task.
7. Design model architectures and loss functions for semantic segmentation.
8. Explain the object detection task.
9. Describe the R-CNN, Fast R-CNN and Faster-RCNN object detection approaches.