Michael Wu
(myw9)
CS 5670:
Computer Vision
September 24,
2012
Project 2: Feature Detection and Matching
Feature detection was performed by
computing Harris values c(H) using
2x2 weighted Harris matrices H. To provide
invariance to rotation, a 5x5 Gaussian mask was used to assign weights to each
Harris matrix using neighboring pixels. For computational efficiency, Harris
values were estimated using the determinant and trace values of the Harris
matrix. When selecting specific pixel locations to represent features, a simple
dynamic thresholding method was used to select an appropriate Harris value
threshold. The mean (µ) and standard deviation (σ) of Harris values were
calculated across the image. Assuming the values follow a roughly Gaussian
distribution, an initial threshold of µ + 2σ, which results in detecting
approximately 2.2% of the pixels as features.
Figure 1. Gaussian Distribution
(http://en.wikipedia.org/wiki/Normal_distribution)
Using this initial thresholding value, local maxima were identified in a 5x5 window and used as feature locations. If the number of detected features is less than the MIN_FEATURES constant parameter, then the dynamic method incrementally decreases the threshold until this condition is met.
The pixel location and angle is
specified for each detected feature. To determine the angle or orientation of
the feature, a 7x7 Gaussian kernel was used to apply a low-pass filter. The
filter is used to reduce noise in the image that could drastically change the
gradient at a single location. After pre-filtering, 3x3 sobel filters were
applied to obtain the gradients in the x and y direction. The angle is then
calculated by taking the arctangent of the y-grad divided by the x-grad.
Feature Description
Two feature descriptors were implemented: (1) a simple 5x5 square window descriptor (without orientation) and (2) a simplified MOPS descriptor. In the previous feature detection stage, pixel locations within 20 pixels of the image boundary are ignored and not used as features. This prevents descriptors from having varying lengths, which would otherwise cause issues during matching.
The simplified MOPS descriptor sub-samples 8x8 patches from a 41x41 pixel region around the feature using a low-passed image (7x7 Gaussian kernel). Low-pass filtering the image prior to sub-sampling prevents aliasing in the reduced image. The 8x8 patch is oriented by using the inverse matrix of the rotation transform and applying bilinear interpolation to estimate the correct pixel value of the rotated point.
Figure 2. Bilinear Interpolation
(http://en.wikipedia.org/wiki/Bilinear_interpolation)
*Note: The rotation is applied after centering the origin on the feature location. After rotation and interpolation, the 8x8 patch is normalized to have a zero mean and unit variance. The normalization provides invariance to affine intensity changes.
The “ratio test” was implemented as the matching score, where the score is equal to the distance of the current feature to the best feature in the second image divided by the distance of the current feature to the second best feature in the other image. This ratio is between 0 and 1, where 0 indicates a strong match and 1 indicates an ambiguous match.
Figure 3. “Graf” Results
“Graf” AUC Summary
Descriptor/Matching
Method |
Area Under
Curve |
Simple + SSD |
0.673416 |
Simple + Ratio Test |
0.708075 |
MOPS + SSD |
0.900277 |
MOPS + Ratio Test |
0.928643 |
Figure 4. “Yosemite” Results
“Yosemite” AUC Summary
Descriptor/Matching
Method |
Area Under
Curve |
Simple + SSD |
0.902653 |
Simple + Ratio Test |
0.888056 |
MOPS + SSD |
0.963887 |
MOPS + Ratio Test |
0.976831 |
Figure 5. Harris Values of “Graf” img1.ppm
(harris.tga)
Figure 6. Harris Values of “Yosemite”
Yosemite1.jpg (harris.tga)
Average
AUC (Leuven)
Matching \ Descriptor |
5x5 Window |
MOPS |
SSD |
0.357021 |
0.702188 |
Ratio Test |
0.557175 |
0.727467 |
Average AUC (Bikes)
Matching \ Descriptor |
5x5 Window |
MOPS |
SSD |
0.356457 |
0.645929 |
Ratio Test |
0.516427 |
0.649424 |
Average AUC (Wall)
Matching \ Descriptor |
5x5 Window |
MOPS |
SSD |
0.397541 |
0.682856 |
Ratio Test |
0.564342 |
0.661867 |
By design, the simplified MOPS descriptor with the Harris corner detection is relatively invariant to translation (using small patches), rotation (using oriented patches), and affine intensity changes (normalizing pixel values). However, the implementation was not designed to be invariant to changes in image scale. Some other weaknesses include: (1) Limited feature detection near image boundaries (due to the 20 pixel cut-off mentioned in the design section) and (2) No invariance to 3D transformations.
FPGA Features - Harris Detection, Simplified MOPS Descriptor, and Ratio Test Matching