Introduction to Computer Vision

Project 2: Feature Detection and Matching

Report

Name: PENG, KUAN-CHUAN; NetID: kp388

HU, RAN; NetID: rh523

Major Design:

In the feature detection part, we identify points of interest by Harris corner detection method, which is improved into scale invariant, apply the adaptive non-maximum suppression method to the Harris image, and compute local maximum. In the feature description part, we not only apply the simple descriptor and the MOPS descriptor demanded in the basic part, but also implement the CENTRIST (CENsus TRansform hISTogram) descriptor for place and scene recognition, which is based on two papers:[1][2]. In the feature matching part, we implement four matching methods: SSD, ratio test, ratio consistency method, and histogram intersection method. The ratio consistency method uses left-right consistency check which outperforms the ratio test method. The histogram intersection method is designed only for CENTRIST descriptor.

Now, the features we detect can be reasonably invariant to translation, rotation, and illumination. We also evaluate the performance of our detection on benchmark images. The evaluation results are shown below.

Performance Evaluation:

1. The ROC curve and AUC of the Yosemite test images (Yosemite1.jpg and Yosemite2.jpg), and the graf test images (img1.ppm and img2.ppm): (the x-direction is false positive rate and the y-direction is true positive rate)


Yosemite test images	graf test images

The AUC of the two sets of test images:

Method \ test set	Yosemite	Graf
Simple window + SSD	0.825014	0.553022
Simple window + ratio	0.872198	0.536485
MOPS + SSD	0.706626	0.686756
MOPS + ratio	0.862354	0.767794
SIFT + SSD	0.994692	0.943780
SIFT + ratio	0.995494	0.967525

From the above experimental results, we show that in terms of descriptor type, SIFT is the best one. MOPS and simple window are the second best and the worst one respectively. In terms of match type, ratio test is generally better than SSD.

1. The Harris operator of one image each in both the Yosemite and graf test pairs: (the images of Harris operator are scaled for visualization purpose)

The Yosemite test images:

The graf test images:

2. The average AUC on the three benchmark sets: (CENTRIST: our feature descriptor)

Method \ Benchmark	leuven	bike	Wall
Simple window + SSD	0.236847	0.463444	0.319197
Simple window + ratio	0.512403	0.540977	0.519911
MOPS + SSD	0.644686	0.610108	0.491568
MOPS + ratio	0.63882	0.62192	0.586467
CENTRIST + SSD	0.419695	0.601626	0.169925
CENTRIST + ratio	0.521197	0.505213	0.558635

Strength and Weakness:

In this project, we implement three descriptors: simple window descriptor, MOPS (Multi-Scale Oriented Patches) descriptor, and CENTRIST descriptor. Besides, the results of SIFT descriptor are also provided. The simple window descriptor is only invariant to translation. The MOPS descriptor achieves translation and rotation invariant. SIFT features are invariant to scale and robust to orientation changes. CENTRIST mainly encodes the structural properties within an image and suppresses detailed textural information, and histograms of various image properties are especially good for scene recognition.

However, the descriptors may not handle 3D rotation or large orientation variation very well, and this happens when dealing with the graf image dataset.

Test on Our Own Images:

Extra Credit Items:

1. Design and implement our own feature descriptor:

We implement the CENTRIST descriptor and its corresponding feature matching method, histogram intersection measure. The idea is based on these two papers: [1], [2]. First, for each interest point, take a 41x41 square window around that point according to its canonical direction (like MOPS). Second, apply census transform to that 41x41 patch, and each pixel in the patch will be in [0, 255]. Finally, calculate the histogram of the census transform of the patch and return a normalized 256-d vector as the CENTRIST descriptor of that point.

Testing: use descriptortype=3 and matchtype=4.

The results are shown in 2. The performance of CENTRIST descriptor is not that impressive probably because of the fact that in the reference paper, CENTRIST descriptor (or census filter) is used to search the matching point in the corresponding neighborhood in another image (application like disparity calculation), not the whole possible feature points in another image. However, CENTRIST descriptor does show comparable performance in some cases.

2. Implement adaptive non-maximum suppression:

Testing: open the file “feature.cpp,” make sure at top of the code “bool UseAdaptiveLocalMax=1;” and compile again.

The idea is based on the MOPS paper on the website of this project. There is an adjustable parameter at top of the code in the file “feature.cpp” to decide how many feature points we want to detect in one image (#define NUM_INTEREST_POINT 300). The following is an example with adaptive non-maximum suppression compared to that of simple non-maximum suppression:

Simple non-maximum suppression

Number of points detected: 936

Adaptive non-maximum suppression

Number of points detected (can be set by the user): 300

The average AUC of the benchmark will improve if the number of the points detected is set properly, and the following is the comparison: (n_ip=NUM_INTEREST_POINT)

Method \ Benchmark	Leuven			bike			Wall
Non-maximum suppression type	Normal	adaptive/n_ip		normal	adaptive/n_ip		normal	adaptive/n_ip
Simple window + SSD	0.236847	0.346468	300	0.463444	0.44792	350	0.319197	0.447492	300
Simple window + ratio	0.512403	0.569481	350	0.540977	0.581514	350	0.519911	0.570903	300
Simple window + ratio consistent	0.526531	0.495897	400	0.518189	0.517847	350	0.489035	0.490301	300
MOPS + SSD	0.644686	0.692519	550	0.610108	0.596535	400	0.491568	0.479467	350
MOPS + ratio	0.63882	0.674502	600	0.62192	0.614979	400	0.586467	0.518576	350
MOPS + ratio consistent	0.638081	0.645998	400	0.626346	0.621312	400	0.617729	0.57501	350
CENTRIST + SSD	0.419695	0.433435	350	0.601626	0.701879	350	0.169925	N/A	N/A
CENTRIST + ratio	0.521197	0.534983	350	0.505213	0.541413	400	0.558635	N/A	N/A
CENTRIST + ratio consistent	0.505964	0.597026	200	0.440869	0.525085	400	0.344507	N/A	N/A
CENTRIST + histogram intersection	0.469402	0.475518	350	0.62005	0.677093	450	0.17435	N/A	N/A

(The N/A fields indicate that the result of benchmark command will return -1.#IND00 as the average AUC, and it’s probably a bug.)

From the above experimental results, we show that the average AUC of adaptive non-maximum suppression is generally higher than that of normal non-maximum suppression.

3. Make our feature detector scale invariant:

Testing: open the file “feature.cpp,” make sure at top of the code “bool UseScaleInvariantHarris=1;” and compile again.

4. Implement a method that outperforms the ratio test for deciding if a feature is a valid match:

Testing: use matchtype=3 in the command.

The idea is doing left-right consistency check: assume that p1 is an interest point in image1, and the best match of p1 in image2 is p2. We switch the roles of image1 and image2, calling the best match of p2 in image1 p1’. In addition to the ratio of the scores of the best match and the second best match, we also check the ratio of the scores of the “distances” of (p1, p2) and (p2, p1’). The matching score of p1 will be the multiple of these two ratios (the smaller, the better). We call this method “ratio consistent.”

From the table in 2., the average AUC of the ratio consistent method is not apparently higher than the ratio counterpart (but the numbers are comparable). However, we compare the ratio consistent method to the ratio test by Yosemite and graf test images. The following is the ROC (the x-direction is false positive rate and the y-direction is true positive rate) and AUC:


Yosemite test images	graf test images

The AUC:

Descriptor Type	Simple window			MOPS			SIFT
Match Type	SSD	ratio	ratio consistent	SSD	ratio	ratio consistent	SSD	ratio	ratio consistent
Yosemite	0.825014	0.872198	0.904434	0.706626	0.862354	0.878918	0.994692	0.995494	0.986097
graf	0.553022	0.536485	0.663754	0.686756	0.767794	0.808077	0.943780	0.967525	0.961356

The experimental results of Yosemite and graf dataset show that the AUC of the ratio consistent method is generally higher than (or comparable to) that of the ratio test, which supports the claim that the ratio consistent method outperforms the ratio test.

Reference:

[1] H. Hirschm¨uller and D. Scharstein, “Evaluation of stereo matching costs on images with radiometric differences,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 9, pp. 1582-1599, 2009.

[2] J. Wu, C. Geyer, and J. M. Rehg, “Real-time human detection using contour cues,” IEEE International Conference on Robotics and Automation, 2011.