Feature Detection and Matching

CS5670 Introduction to Computer Vision

Brian Orecchio and Kevin Green


Major Design Choices

ComputeHarrisValues

ComputeHarrisValues was the first function in which design choices had to be made. To calculate A, B, and C values for a pixel, we calculated the Ix and Iy value for each pixel in a 5x5 neighborhood around the original pixel, applied the associated Gaussian value to Ix/Iy and incremented A, B, and C as necessary. In order to compute Ix and Iy, we used a middle derivative.

We also decided to modify the signature for ComputeHarrisValues, adding a third parameter: CFloatImage &orientationImage. The reason for doing this is so that while we still had the approximated eigen value per pixel, we could calculate and save the canonical orientation per pixel.


computeLocalMaxima: Threshold

In computeLocalMaxima, we were required to determine a threshold to be used as a cut off for harris values when determining features. We chose 0.01 for this value. This determination was based on our testing experience. Any threshold lower than this seemed to detect too many features.


ComputeMopsDescriptors

As described in the project description, we took a 41x41 window around detected features. Prior to determining a pixel value for each x,y coordinate, we performed a translation transformation that would put the feature pixel at (0,0). We then performed a rotation, and translation back. This gave us a float value for each x,y coordinate in the window and so we performed bilinear interpolation to determine a valid pixel value.

If pixels fell outside the rang of the window, we decided to fill these pixels with a 0.0 value. While this is not ideal, we did not feel that approximating these pixels with nearby values was appropriate because this could have a negative effect on the feature description.

Once we knew our pixel value for each index in the 41x41 window, we performed downsampling. As described in class we applied Gaussian filters and then threw out every other row and column. For the 41x41 we also omitted the 41st row/column. Once we had a 10x10 sample, we threw out the borders to get an 8x8 window. Throwing out the borders and the 41st row and column in the first window was a compromise, but we believe that applying the Gaussian filter at every step forces these rows and columns to still be represented (in a slight manner) in the final window.

Another downsampling algorithm we tried was to divide the 41x41 image into 64 5x5 pixel windows (again throwing out the 41st row/column). We would sample each window via a Gaussian filter and put the resultant value into our 8x8 downsampled window. After implementing and testing this algorithm, we determined that it did not produce as good results as our original choice. The code has been left in our features.cpp file under the EfficientDownSampleMopsDescriptor function.


ROC Curves

ROC Curves from graf/img1.ppm and graf/img2.ppm

ROC Curves from yosemite/Yosemite1.jpg and yosemite/Yosemite2.jpg


Harris Operator Images

Harris Operator image from graf/img1.ppm

Harris Operator image from yosemite/Yosemite1.jpg

For Harris Operator images we needed to multiple pixel values by 2 in order for them to show up.


Area Under The Curve Averages


Strengths and Weaknesses

We found that our Feature Detector and Matcher's greatest strength was matching concentrated features. For example, in the image graf/img1.ppm, we would get much better results when we selected more specific features. The inverse of this, is that picking more general regions caused more miscellaneous false positive features to be matched. In general we found that our feature detector / matcher works well for basic rotations, translations, and transformations. But, as the degree of these changes gets greater, our true positive rate falls. Specifically if we begin to introduce perspective transformations then the accuracy of our matching drops. For example, as expected, graf/img1.ppm matched graf/img2.ppm much better than graf/img4.ppm.


Personal Images

Works pretty well with translations.

Does not work very well with perspective transformations.