Introduction to Computer Vision

Project 2: Feature Detection and Matching

Report

Name: PENG, KUAN-CHUAN; NetID: kp388

HU, RAN; NetID: rh523

Major Design:

In the feature detection part, we identify points of interest by Harris corner detection method, which is improved into scale invariant, apply the adaptive non-maximum suppression method to the Harris image, and compute local maximum. In the feature description part, we not only apply the simple descriptor and the MOPS descriptor demanded in the basic part, but also implement the CENTRIST (CENsus TRansform hISTogram) descriptor for place and scene recognition, which is based on two papers:[1][2]. In the feature matching part, we implement four matching methods: SSD, ratio test, ratio consistency method, and histogram intersection method. The ratio consistency method uses left-right consistency check which outperforms the ratio test method. The histogram intersection method is designed only for CENTRIST descriptor.

Now, the features we detect can be reasonably invariant to translation, rotation, and illumination. We also evaluate the performance of our detection on benchmark images. The evaluation results are shown below.

 

Performance Evaluation:

1.      The ROC curve and AUC of the Yosemite test images (Yosemite1.jpg and Yosemite2.jpg), and the graf test images (img1.ppm and img2.ppm): (the x-direction is false positive rate and the y-direction is true positive rate)

Yosemite test images

graf test images

 

The AUC of the two sets of test images:

Method \ test set

Yosemite

Graf

Simple window + SSD

0.825014

0.553022

Simple window + ratio

0.872198

0.536485

MOPS + SSD

0.706626

0.686756

MOPS + ratio

0.862354

0.767794

SIFT + SSD

0.994692

0.943780

SIFT + ratio

0.995494

0.967525

From the above experimental results, we show that in terms of descriptor type, SIFT is the best one. MOPS and simple window are the second best and the worst one respectively. In terms of match type, ratio test is generally better than SSD.

 

1.      The Harris operator of one image each in both the Yosemite and graf test pairs: (the images of Harris operator are scaled for visualization purpose)

The Yosemite test images:

 

The graf test images:

  

 

2.      The average AUC on the three benchmark sets: (CENTRIST: our feature descriptor)

Method \ Benchmark

leuven

bike

Wall

Simple window + SSD

0.236847

0.463444

0.319197

Simple window + ratio

0.512403

0.540977

0.519911

MOPS + SSD

0.644686

0.610108

0.491568

MOPS + ratio

0.63882

0.62192

0.586467

CENTRIST + SSD

0.419695

0.601626

0.169925

CENTRIST + ratio

0.521197

0.505213

0.558635

 

Strength and Weakness:

In this project, we implement three descriptors: simple window descriptor, MOPS (Multi-Scale Oriented Patches) descriptor, and CENTRIST descriptor. Besides, the results of SIFT descriptor are also provided. The simple window descriptor is only invariant to translation. The MOPS descriptor achieves translation and rotation invariant. SIFT features are invariant to scale and robust to orientation changes. CENTRIST mainly encodes the structural properties within an image and suppresses detailed textural information, and histograms of various image properties are especially good for scene recognition.

However, the descriptors may not handle 3D rotation or large orientation variation very well, and this happens when dealing with the graf image dataset.

 

Test on Our Own Images:

 

Extra Credit Items:

1.      Design and implement our own feature descriptor:

We implement the CENTRIST descriptor and its corresponding feature matching method, histogram intersection measure. The idea is based on these two papers: [1], [2]. First, for each interest point, take a 41x41 square window around that point according to its canonical direction (like MOPS). Second, apply census transform to that 41x41 patch, and each pixel in the patch will be in [0, 255]. Finally, calculate the histogram of the census transform of the patch and return a normalized 256-d vector as the CENTRIST descriptor of that point.

Testing: use descriptortype=3 and matchtype=4.

The results are shown in 2. The performance of CENTRIST descriptor is not that impressive probably because of the fact that in the reference paper, CENTRIST descriptor (or census filter) is used to search the matching point in the corresponding neighborhood in another image (application like disparity calculation), not the whole possible feature points in another image. However, CENTRIST descriptor does show comparable performance in some cases.

 

2.      Implement adaptive non-maximum suppression:

Testing: open the file “feature.cpp,” make sure at top of the code “bool UseAdaptiveLocalMax=1;” and compile again.

The idea is based on the MOPS paper on the website of this project. There is an adjustable parameter at top of the code in the file “feature.cpp” to decide how many feature points we want to detect in one image (#define NUM_INTEREST_POINT 300). The following is an example with adaptive non-maximum suppression compared to that of simple non-maximum suppression:

Simple non-maximum suppression

Number of points detected: 936

Adaptive non-maximum suppression

Number of points detected (can be set by the user): 300

The average AUC of the benchmark will improve if the number of the points detected is set properly, and the following is the comparison: (nip=NUM_INTEREST_POINT)

Method \ Benchmark

Leuven

bike

Wall

Non-maximum suppression type

Normal

adaptive/nip

normal

adaptive/nip

normal

adaptive/nip

Simple window + SSD

0.236847

0.346468

300

0.463444

0.44792

350

0.319197

0.447492

300

Simple window + ratio

0.512403

0.569481

350

0.540977

0.581514

350

0.519911

0.570903

300

Simple window + ratio consistent

0.526531

0.495897

400

0.518189

0.517847

350

0.489035

0.490301

300

MOPS + SSD

0.644686

0.692519

550

0.610108

0.596535

400

0.491568

0.479467

350

MOPS + ratio

0.63882

0.674502

600

0.62192

0.614979

400

0.586467

0.518576

350

MOPS + ratio consistent

0.638081

0.645998

400

0.626346

0.621312

400

0.617729

0.57501

350

CENTRIST + SSD

0.419695

0.433435

350

0.601626

0.701879

350

0.169925

N/A

N/A

CENTRIST + ratio

0.521197

0.534983

350

0.505213

0.541413

400

0.558635

N/A

N/A

CENTRIST + ratio consistent

0.505964

0.597026

200

0.440869

0.525085

400

0.344507

N/A

N/A

CENTRIST + histogram intersection

0.469402

0.475518

350

0.62005

0.677093

450

0.17435

N/A

N/A

(The N/A fields indicate that the result of benchmark command will return -1.#IND00 as the average AUC, and it’s probably a bug.)

From the above experimental results, we show that the average AUC of adaptive non-maximum suppression is generally higher than that of normal non-maximum suppression.

 

3.      Make our feature detector scale invariant:

Testing: open the file “feature.cpp,” make sure at top of the code “bool UseScaleInvariantHarris=1;” and compile again.

 

   

 

4.      Implement a method that outperforms the ratio test for deciding if a feature is a valid match:

Testing: use matchtype=3 in the command.

The idea is doing left-right consistency check: assume that p1 is an interest point in image1, and the best match of p1 in image2 is p2. We switch the roles of image1 and image2, calling the best match of p2 in image1 p1’. In addition to the ratio of the scores of the best match and the second best match, we also check the ratio of the scores of the “distances” of (p1, p2) and (p2, p1’). The matching score of p1 will be the multiple of these two ratios (the smaller, the better). We call this method “ratio consistent.”

From the table in 2., the average AUC of the ratio consistent method is not apparently higher than the ratio counterpart (but the numbers are comparable). However, we compare the ratio consistent method to the ratio test by Yosemite and graf test images. The following is the ROC (the x-direction is false positive rate and the y-direction is true positive rate) and AUC:

Yosemite test images

graf test images

 

The AUC:

Descriptor Type

Simple window

MOPS

SIFT

Match Type

SSD

ratio

ratio consistent

SSD

ratio

ratio consistent

SSD

ratio

ratio consistent

Yosemite

0.825014

0.872198

0.904434

0.706626

0.862354

0.878918

0.994692

0.995494

0.986097

graf

0.553022

0.536485

0.663754

0.686756

0.767794

0.808077

0.943780

0.967525

0.961356

The experimental results of Yosemite and graf dataset show that the AUC of the ratio consistent method is generally higher than (or comparable to) that of the ratio test, which supports the claim that the ratio consistent method outperforms the ratio test.

 

Reference:

[1] H. Hirschm¨uller and D. Scharstein, “Evaluation of stereo matching costs on images with radiometric differences,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 9, pp. 1582-1599, 2009.

[2] J. Wu, C. Geyer, and J. M. Rehg, “Real-time human detection using contour cues,” IEEE International Conference on Robotics and Automation, 2011.