Computer Vision, Spring 2011
Project 1: Feature Detection and Matching
Wenlei Xie
(wx49)
1. Custom Feature Descriptor and
Explanation
A simplified and modified SIFT Feature Descriptor is used in this project
as the custom descriptor. It mainly followed the SIFT descriptor on the slides.
We choose the SIFT Feature mainly because it has been demonstrated as one
of the most successfully feature descriptors in Computer Vision. Its main idea
(the combination of histograms) is simple but powerful, which is intuitively
robust to several kinds of change, e.g. change of illumination.
Here is the detailed procedure for this custom descriptor.
1) Take a 16x16 square window around the
detected feature. The square window’s edge is parallel to the dominant
orientation of the feature point.
2) Rotate the window to the horizontal. This operation is to make this feature
rotation invariant.
3) Apply the Gaussian filter for the pixels in
the window. This implementation used a 3x3 Gaussian mask with
. This operation is to make this feature more
robust to noise, as the Gaussian filter will soften the patch and thus reduce
the effect of the noise points.
4) Calculate the gratitude for each pixel; throw
out these pixels with gradient magnitude below a certain threshold. (The
threshold is 0.001 in this implementation). This operation is to ignore these pixels that don’t have discriminative
information.
5) Divide the 16x16 window into 4x4 grid of
cells, i.e. each grid is a 4x4 small window.
6) Compute the orientation histogram for each
grid. The orientation is discretized into 8 categories, i.e. the angles in
are considered as the same.
In the implementation, we use a ‘soft histogram’, this is to say, for an angle
in category i, we add dimension i by 2 while also adding dimension i-1 and i+1 by 1. This is to ‘soften’
the dimension, as the adjacent dimensions are still related.
7) Combine these histograms together. For each
grid, there are 8 orientations, thus the descriptor has 16x8=128
dimensions.
2. Performance Report
2.1
ROC
Curves and AUC result
·
Graf Dataset
![](project1_report.files/image006.png)
|
AUC
|
Simple Window
+ SSD
|
0.710248
|
Simple Window
+ Ratio Test
|
0.740930
|
MOPS + SSD
|
0.808540
|
MOPS + Ratio
Test
|
0.898305
|
Customer + SSD
|
0.903622
|
Customer +
Ratio Test
|
0.919961
|
·
Yosemite Dataset
![](project1_report.files/image008.png)
|
AUC
|
Simple Window
+ SSD
|
0.846409
|
Simple Window
+ Ratio Test
|
0.903634
|
MOPS + SSD
|
0.877682
|
MOPS + Ratio
Test
|
0.951950
|
Customer + SSD
|
0.909553
|
Customer +
Ratio Test
|
0.960553
|
2.2
Harris
Score
The Harris Score
on Graf
![](project1_report.files/image010.jpg)
![](project1_report.files/image012.jpg)
The Harris Score
on Yosemite
![](project1_report.files/image014.jpg)
![](project1_report.files/image016.jpg)
2.3
Average
AUC on Benchmark sets
|
Graf
|
Leuven
|
Bikes
|
Wall
|
Simple+ SSD
|
0.542540
|
0.251648
|
0.284568
|
0.270532
|
Simple + Ratio
Test
|
0.539316
|
0.519957
|
0.474922
|
0.538836
|
MOPS + SSD
|
0.562743
|
0.593044
|
0.630684
|
0.582338
|
MOPS + Ratio
Test
|
0.587625
|
0.656019
|
0.622528
|
0.623670
|
Custom + SSD
|
0.587297
|
0.510180
|
0.579308
|
0.599822
|
Custom + Ratio
Test
|
0.605696
|
0.650979
|
0.619982
|
0.629150
|
3. Strengths and Weakness
As we can see in
the ROC curve part, our custom descriptor outperforms the simple MOPS feature
implemented in this project, which demonstrates that the combination of
gratitude orientation histogram could be used as a very good feature
descriptor. Since it only counts for the frequency of gratitude orientation, it
is robust to transformation of the pictures.
However, in the benchmark part,
we found the MOPS descriptor performs as well as, or even better than the
custom descriptor. The main reason is that this implementation of SIFT only use
the 16x16 window near the feature point, thus it might not lose some useful
information, especially when the picture is foggy, or the pictures have largely
different scale, rotation, etc. One might expect a better performance after
adding the Gaussian pyramid into this descriptor, which will make this descriptor
scale-invariant.
4. More Pictures and Results
![](project1_report.files/image018.jpg)
The features on text FOOD PHARMACY are selected to perform
the query. The result is good: many features on the corresponding text in the
other picture are fetched; while relatively a few mismatched features are found
in the other place of picture.
![](project1_report.files/image020.jpg)
This time we select the text Wegmans instead,
and use the alternative picture to provide features. The result is still reasonable but not as good as the previous one.
More mismatched features are found. The might be caused by (1) the text Wegmans on the
second picture has a large rotation angle and make the descriptors different; (2)
some of the features might overlap in one picture but not overlapped in the
other. A scale-invariant feature might perform better in this situation.
![](project1_report.files/image022.jpg)
This query performance is bad. In fact all
the selected features in the first photo have its corresponding features in the
second photo. However most of them are wrongly matched to other random
features. One reason is that some features on the building are similar.
Moreover, the direction of the maximum eigenvalue sometimes fails to indicate
the right direction in rotation transformation case.
![](project1_report.files/image024.jpg)
This pair
of pictures is taken on different time for the McGraw Tower to test the effect
of different illumination. As the result shows, this descriptor can handle the
photos with different illumination to some extent. As some of the features on
the clock surface are successfully discovered.