Introduction to Computer Vision
Project 2: Feature Detection and Matching
Report
Name: PENG, KUAN-CHUAN; NetID: kp388
HU, RAN; NetID: rh523
Major Design:
In
the feature detection part, we identify points of interest by Harris corner
detection method, which is improved into scale invariant, apply the adaptive non-maximum
suppression method to the Harris image,
and compute local maximum. In the feature description part, we not only apply
the simple descriptor and the MOPS descriptor demanded in the basic part, but
also implement the CENTRIST (CENsus
TRansform
hISTogram) descriptor for place and scene
recognition, which is based on two
papers:[1][2]. In the feature matching part, we implement four matching methods: SSD, ratio
test, ratio consistency method,
and histogram intersection method. The ratio
consistency method uses left-right consistency check
which outperforms the ratio test
method. The histogram intersection method is designed only for CENTRIST descriptor.
Now, the
features we detect can be reasonably invariant to translation, rotation, and
illumination. We also evaluate the performance of our detection on benchmark
images. The evaluation results are
shown below.
Performance Evaluation:
1. The ROC curve and AUC of the Yosemite test images (Yosemite1.jpg and Yosemite2.jpg), and the graf test images (img1.ppm and img2.ppm): (the x-direction is false positive rate and the y-direction is true positive rate)
|
|
Yosemite test images |
graf test images |
The AUC of the two sets of test images:
Method \ test set |
Yosemite |
Graf |
Simple window + SSD |
0.825014 |
0.553022 |
Simple window + ratio |
0.872198 |
0.536485 |
MOPS + SSD |
0.706626 |
0.686756 |
MOPS + ratio |
0.862354 |
0.767794 |
SIFT + SSD |
0.994692 |
0.943780 |
SIFT + ratio |
0.995494 |
0.967525 |
From the above experimental results, we show that in terms of descriptor type, SIFT is the best one. MOPS and simple window are the second best and the worst one respectively. In terms of match type, ratio test is generally better than SSD.
1. The Harris operator of one image each in both the Yosemite and graf test pairs: (the images of Harris operator are scaled for visualization purpose)
The Yosemite test images:
The graf test images:
2. The average AUC on the three benchmark sets: (CENTRIST: our feature descriptor)
Method \ Benchmark |
leuven |
bike |
Wall |
Simple window + SSD |
0.236847 |
0.463444 |
0.319197 |
Simple window + ratio |
0.512403 |
0.540977 |
0.519911 |
MOPS + SSD |
0.644686 |
0.610108 |
0.491568 |
MOPS + ratio |
0.63882 |
0.62192 |
0.586467 |
CENTRIST + SSD |
0.419695 |
0.601626 |
0.169925 |
CENTRIST + ratio |
0.521197 |
0.505213 |
0.558635 |
Strength and Weakness:
In this project,
we implement three descriptors: simple window descriptor, MOPS (Multi-Scale
Oriented Patches) descriptor, and CENTRIST descriptor.
Besides, the results of SIFT
descriptor are also provided. The simple window descriptor is only invariant to translation. The MOPS descriptor achieves translation and
rotation invariant. SIFT features are invariant to scale and robust to
orientation changes. CENTRIST mainly encodes the structural properties within
an image and suppresses detailed textural information, and histograms of
various image properties are especially good for scene recognition.
However, the descriptors may not handle 3D
rotation or large
orientation variation very well, and this
happens when dealing with
the graf image dataset.
Test on Our Own Images:
Extra Credit Items:
1. Design and implement our own feature descriptor:
We implement the CENTRIST descriptor and its corresponding feature matching method, histogram intersection measure. The idea is based on these two papers: [1], [2]. First, for each interest point, take a 41x41 square window around that point according to its canonical direction (like MOPS). Second, apply census transform to that 41x41 patch, and each pixel in the patch will be in [0, 255]. Finally, calculate the histogram of the census transform of the patch and return a normalized 256-d vector as the CENTRIST descriptor of that point.
Testing: use descriptortype=3 and matchtype=4.
The results are shown in 2. The performance of CENTRIST descriptor is not that impressive probably because of the fact that in the reference paper, CENTRIST descriptor (or census filter) is used to search the matching point in the corresponding neighborhood in another image (application like disparity calculation), not the whole possible feature points in another image. However, CENTRIST descriptor does show comparable performance in some cases.
2. Implement adaptive non-maximum suppression:
Testing: open the file ��feature.cpp,�� make sure at top of the code ��bool UseAdaptiveLocalMax=1;�� and compile again.
The idea is based on the MOPS paper on the website of this project. There is an adjustable parameter at top of the code in the file ��feature.cpp�� to decide how many feature points we want to detect in one image (#define NUM_INTEREST_POINT 300). The following is an example with adaptive non-maximum suppression compared to that of simple non-maximum suppression:
|
|
Simple non-maximum suppression Number of points detected: 936 |
Adaptive non-maximum suppression Number of points detected (can be set by the user): 300 |
The average AUC of the benchmark will improve if the number of the points detected is set properly, and the following is the comparison: (nip=NUM_INTEREST_POINT)
Method \ Benchmark |
Leuven |
bike |
Wall |
||||||
Non-maximum suppression type |
Normal |
adaptive/nip |
normal |
adaptive/nip |
normal |
adaptive/nip |
|||
Simple window + SSD |
0.236847 |
0.346468 |
300 |
0.463444 |
0.44792 |
350 |
0.319197 |
0.447492 |
300 |
Simple window + ratio |
0.512403 |
0.569481 |
350 |
0.540977 |
0.581514 |
350 |
0.519911 |
0.570903 |
300 |
Simple window + ratio consistent |
0.526531 |
0.495897 |
400 |
0.518189 |
0.517847 |
350 |
0.489035 |
0.490301 |
300 |
MOPS + SSD |
0.644686 |
0.692519 |
550 |
0.610108 |
0.596535 |
400 |
0.491568 |
0.479467 |
350 |
MOPS + ratio |
0.63882 |
0.674502 |
600 |
0.62192 |
0.614979 |
400 |
0.586467 |
0.518576 |
350 |
MOPS + ratio consistent |
0.638081 |
0.645998 |
400 |
0.626346 |
0.621312 |
400 |
0.617729 |
0.57501 |
350 |
CENTRIST + SSD |
0.419695 |
0.433435 |
350 |
0.601626 |
0.701879 |
350 |
0.169925 |
N/A |
N/A |
CENTRIST + ratio |
0.521197 |
0.534983 |
350 |
0.505213 |
0.541413 |
400 |
0.558635 |
N/A |
N/A |
CENTRIST + ratio consistent |
0.505964 |
0.597026 |
200 |
0.440869 |
0.525085 |
400 |
0.344507 |
N/A |
N/A |
CENTRIST + histogram intersection |
0.469402 |
0.475518 |
350 |
0.62005 |
0.677093 |
450 |
0.17435 |
N/A |
N/A |
(The N/A fields indicate that the result of benchmark command will return -1.#IND00 as the average AUC, and it��s probably a bug.)
From the above experimental results, we show that the average AUC of adaptive non-maximum suppression is generally higher than that of normal non-maximum suppression.
3. Make our feature detector scale invariant:
Testing: open the file ��feature.cpp,�� make sure at top of the code ��bool UseScaleInvariantHarris=1;�� and compile again.
4. Implement a method that outperforms the ratio test for deciding if a feature is a valid match:
Testing: use matchtype=
The idea is doing left-right
consistency check: assume that p1 is an interest point in image1, and the best
match of p
From the table in 2., the average AUC of the ratio consistent method is not apparently higher than the ratio counterpart (but the numbers are comparable). However, we compare the ratio consistent method to the ratio test by Yosemite and graf test images. The following is the ROC (the x-direction is false positive rate and the y-direction is true positive rate) and AUC:
|
|
Yosemite test images |
graf test images |
The AUC:
Descriptor Type |
Simple window |
MOPS |
SIFT |
||||||
Match Type |
SSD |
ratio |
ratio consistent |
SSD |
ratio |
ratio consistent |
SSD |
ratio |
ratio consistent |
Yosemite |
0.825014 |
0.872198 |
0.904434 |
0.706626 |
0.862354 |
0.878918 |
0.994692 |
0.995494 |
0.986097 |
graf |
0.553022 |
0.536485 |
0.663754 |
0.686756 |
0.767794 |
0.808077 |
0.943780 |
0.967525 |
0.961356 |
The experimental results of Yosemite and graf dataset show that the AUC of the ratio consistent method is generally higher than (or comparable to) that of the ratio test, which supports the claim that the ratio consistent method outperforms the ratio test.
Reference:
[1] H.
Hirschm¨uller and D. Scharstein, ��Evaluation of stereo
matching costs on images with radiometric differences,�� IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 31, no. 9, pp. 1582-1599, 2009.
[2] J. Wu, C. Geyer, and J. M. Rehg, ��Real-time human detection using contour cues,�� IEEE International Conference on Robotics and Automation, 2011.