Learning Outcomes:
After taking this course, you will be able to do the following:
- Basics:
- Describe intuitively and mathematically the geometry and physics of image formation.
- Explain intuitively what information gets lost in image formation
- Image processing:
- Implement convolution and understand what kind of filtering operations can be implemented as a convolution.
- Fourier transforms Relate physical properties of the image to the ``frequency'' in a Fourier transform and explain its impact on various image processing operations and its interaction with convolution
- Use the language of Fourier transforms to explain what happens during subsampling or upsampling, and when subsampling can lead to aliasing.
- Grouping:
- Implement basic edge detection using convolution and explain why image noise can be a problem.
- Explain when edges can be hard to detect
- Explain how edges can be sharpened using non-max suppression and implement the same.
- Explain what texture means and why texture causes a problem for boundary detection.
- Implement a version of the texture gradient proposed by Martin et al.
- Explain why local edge detection can miss edges and the benefits of global reasoning
- Intuitively explain how min-cut and normalized cut can be used for segmentation
- Reconstruction:
- Enumerate what it means to recover 3D structure of the scene: the position and orientation of the camera and the 3D locations of all points
- Explain why recovering 3D structure from an image is an ill-posed problem.
- Explain the additional information that is needed, and thus describe the problem of camera calibration, pose estimation, stereo and structure-from-motion.
- Identify the reasons why estimating correspondences between images is hard.
- Contrast different ways of measuring patch/pixel similarity in terms of their invariance and discriminability.
- Explain why feature detection is useful, and what kind of features are useful to detect.
- Mathematically derive and implement in code the Harris corner detector and derive the connection between the second moment matrix and the nature of the feature and its orientation.
- Implement the MOPS feature descriptor and explain how SIFT improves upon MOPS.
- Derive how camera parameters and 3D structures can be mathematically obtained from a set of correspondences (camera calibration, stereo, structure from motion).
- Derive the special case of two-view stereo when the two views are related by a simple translation along X; explain the relationship between disparity and depth.
- Explain and implement the plane-sweep stereo algorithm.
- Understand the need for removing outliers and derive the RANSAC algorithm.
- Explain what radiance means, how it relates to pixel values, and explain mathematically its relationship to surface normal, surface albedo and lighting.
- Derive and implement photometric stereo to get normal and depth from a set of images from the same camera but different lights.
- Recognition:
- Write down the ERM principle and how it relates to generalization, overfitting and underfitting.
- Derive the gradient descent update and SGD update for a general loss function and machine learning model.
- Explain the rationale behind the bag-of-words feature representation.
- Explain the rationale behind convolutions and subsamplings as critical layers in a neural network.
- Design and implement a neural network for classification.
- Explain the semantic segmentation task.
- Design model architectures and loss functions for semantic segmentation.
- Explain the object detection task.
- Describe the R-CNN, Fast R-CNN and Faster-RCNN object detection approaches.