Project 2: Feature Detection and Matching

Feature Match Libe Slope

The above shows feature detecting in a user image on libe slope, with a rotated camera giving a second translated image. Note how most of the features are properly clumped on the tall tree in the second image as they should be. I know what you're thinking. It's exciting.

For this project we had to decide how best to implement a harris feature matcher, a thresholding local-maximum function and a simplified version of MOPS descriptors. We also had to write a ratio test to determine whether features were matched between two images and a function that computed simplified descriptors from 5x5 squares around each feature. The former simply record the second highest distance and used the ratio as a score, taking greatly from the simple SSD distance, and the latter was simply pulling a 5x5 square around each feature coordinate. Therefore, these did not involve any major decision making on our part.

Harris Feature Detection: This part of the assignment was quite straightforward. To provide an accurate derivative at each point in the image, we first computed an x-derivative equivalent of the image and a y-derivate equivalent of the image, essentially using a convolution kernel where each pixel became the difference of the pixel after it and before it. This could result in negative pixel values, but since these numbers were only used for calculations, this was permissable. Here, and at all points for out of bounds pixels (except one case), we used the closest pixel to the image wherever a pixel went out of bounds.

We then decided to use a 5x5 Gaussian to weight and compute the Harris values at each pixel. Here, instead of taking the closest pixel value for out of bounds, we considered any out of bounds derivative value to be zero, as the image is considered to no longer be changing outside of its boundaries. After computing the three values representing the four corners of the Harris matrix, we calculated the "c" value to represent the Harris weight at that point. This weight was multiplied by 4 to give nicer harris.tga images.

We also made the decision to calculate the angle of maximum change (canonical orientation) at this point as we could easily access lambda maximum. This was stored in a float array that was passed by pointer as an extra call to the method.

Computing the Local Maximum: The primary discretion in this function was how large an area we chose to consider a local maximum over, as well as the threshhold level for considering a value a true feature on an image. We chose an area of 5x5, as we found this prevented a large number of features right next to each other. For the threshold, we considered many fixed values, as well as some values that were a ratio the maximum value in the harris image calculated. In the end we decided to use a static low value that gave some features even in very blurry images (although admittedly a lot in sharp images with many features). We decided to do this because we realized features could be lost even in two extremely similar images if there were a particularly sharp features in one image and not the other. The static threshhold would still catch the same features, but the variant threshhold would take far fewer images on the image with the high maximum feature.

As this material is a bit dry, let's interject here with a fun user generated image and features:

Sunglasses Reflection Sunglasses Reflection Features

Features are found as expected on the reflection of the image taker, but they're even on the blurry objects in the background and the edge of the glasses. No featuresare on the more gradually changing hand, as expected with the given feature size.

MOPS Descriptors: This method gave us the most amount of choice, and resulted in a good deal of headache, but more pride at the ultimate success when the method worked. We started by creating a blurred version of the image to select descriptors from, using a simple convolution with the provided 7x7 Gaussian kernel. Then, since we developed code before discovering globalWarp, we produced our own method of bilinear interpolation and grid sweep of 41x41 pixel areas to produce properly rotated and subsampled descriptors. In a charming quadruple for-loop, each of the 64 desrciptor pixels drew from the proper 5x5 area of interpolated pixels, where each was select by rotating the x and y values relative to the center of the feature using the equivalent of a rotation matrix. Once all this magic was completed, we performed a simple subtraction of the mean and division by the standard deviation to allow our detector to be more invariant to changes in brightness and contrast, which worked well. We did not implement any bonus sections to this or any other part of the assignment, as time was limited, particularly due to a frisbee tournament for one partner with a surprising and unfortunate lack of internet access.

So, how well did our code work, do you ask? Here are some fast stats compiled by benchmarking, followed by some pretty graphs:

AUC for Bikes:
Simple Features, SSD Match: 0.324627
Simple Features, Ratio Match: 0.507218
MOPS Features, SSD Match: 0.621670
MOPS Features, Ratio Match: 0.612592

AUC for Leuven:
Simple Features, SSD Match: 0.171189
Simple Features, Ratio Match: 0.431623
MOPS Features, SSD Match: 0.601879
MOPS Features, Ratio Match: 0.637524

AUC for Wall:
Simple Features, SSD Match: 0.224154
Simple Features, Ratio Match: 0.528156
MOPS Features, SSD Match: 0.596984
MOPS Features, Ratio Match: 0.602600

As is apparent from the above, on these image sets, using simple feature descriptors and simply matching using SSD you'd actually be better randomly selecting matches and a great number to toss out as the area under the ROC curve is well under 0.50 in all cases, which is expected in random matches.

Where simple descriptors are used but the ratio match is used, enough features are tossed out that in the wall (3D turning) and bikes (blurring images) you at least break even. In the darkening leuven images you still are better off guessing.

Much better results were seen with MOPS descriptors. With either the SSD or Ratio Match distances you see an average AUC well over 0.50, indicating good feature matching. Interestingly, in the case of the blurring bikes, the SSD performed better than the Ratio Match on average. This relatively small better performance is likely explainable due to the fact that later, more blurred images may seem to all have close scores to each other, and correct tosses could be tossed out by accident.

Let's look at some pretty graphs. The first is from the Yosemite images:

Yosemite 6 Plot Graph

All the lines (particularly sift, of course) show a nice positive bow, with MOPS performing well, along with the ratio test. Let's look at what the harris image looks like of the first Yosemite mountain picture:

Yosemite Harris Image

Kind of pretty. The factor up of the harris image did help significantly. Next are the graffiti graphs, a nice alliteration and better picture:

Graffiti 6 Plot Graph

Not quite as nice a bow, but still a bow. Even sift didn't perform as well here. Let's see the harris image for the first graffiti picture:

Graffiti Harris Image

Bit hard to tell what's going on there. But let's face it, it was pretty hard to tell what was going on in the actual graffiti too.

Strengths and Weaknesses: So, as could be seen, above, our feature detector and descriptor (particularly the MOPS descriptor with a ratio test for matching) was strong as several types of changes in images. Since both the Harris feature detector and MOPS descriptor were completely invariant to translation, the plots for the simple translated Yosemite image were very high, with a strong degree of matches, indicating translation was a strength. Blurring and slight 3D Rotation were not invariants of either MOPS or the Harris feature detector, but as seen in the bikes set and wall set the program did fairly well in these areas. While MOPS was invariant to both changes in brightness and contrast, the Harris feature detector wasn't. These changes weren't a particular strength but also weren't a weakness, as seen in the numbers put out for the leuven set and the final user images, to be dramatically revealed by scrolling down a bit.

Definite weaknesses were scale and any significant 3D rotations. Tests were run on such user images, but were quite terrible and not worthy of even being show. As expected, as features become particularly scruntched or size different, both the MOPS and Harris algorithms become rather useless at matching. And since everthing was simplified in this project, our solution isn't particularly good at any excessive changes, but this only to be expected.

As a parting note, here's another nice image with some matching to look at:

Crab Image

Features Matched on Crab Images

That's a lot of features. Mind you there's a lot of features to be seen in a crab. Even with an increase in both brightness and contrast (beyond the reasonable realm for a nice image) feature matching on the claw is great.

Have a nice day! Thanks for taking the time to read this and grade our assignment!