CS4670/5670: Computer Vision, Fall 2013
Project 5: Object Detection

Brief

Assigned: Saturday, November 23, 2013
Code Due: Thursday, December 5, 2013 (by 11:59pm)
Artifact Due: Friday, December 6, 2013 (by 11:59pm)
For the demos on Friday, please download this set of files: pedestrian_demo.zip
This assignment should be done in groups of 2 students.

Introduction

The goal of this project is to implement a simple, effective method for detecting pedestrians in an image. You will be working off of the technique of Dalal and Triggs (PDF) from 2005. This technique has four main components:

A feature descriptor. We first need a way to describe an image region with a high-dimensional descriptor. For this project, you will be implementing two descriptors: tiny images and histogram of gradients (HOG) features.
A learning method. Next, we need a way to learn to classify an image region (described using one of the features above) as a pedestrian or not. For this, we will be using support vector machines (SVMs) and a large training dataset of image regions containing pedestrians (positive examples) or not containing pedestrians (negative examples).
A sliding window detector. Using our classifer, we can tell if an image region looks like a pedestrian or not. The next step is to run this classifier as a sliding window detector on an input image in order to detect all instances of pedestrians in that image. In order to detect pedestrians at multiple scales we run our sliding window detector at multiple scales to form a pyramid of detector responses.
Non-maxima suppression. Given the pyramid generated by the sliding window detector the final step is to find the best detections in each region by selecting the strongest responses within a neighborhood within an image and across scales.

Using our skeleton code as a starting point, you'll be implementing parts of all four of these components, and evaluating your methods by creating precision-recall (PR) curves.

Downloads

Skeleton code

git

git

>> git clone http://www.cs.cornell.edu/courses/cs4670/2013fa/projects/p5/skeleton.git

skeleton

>> git pull

git

Here's

For those that are already using git to work in groups, you can still share code with your partner by having multiple masters to your local repository (one being this original repository and the other some remote service like github where you host the code you are working on); here's a reference with more information.

Solution executables: Mac, Linux, Windows
Cropped Pedestrian Dataset (18 MB). You will use this dataset for training and testing your detector.
Full Image Pedestrian Dataset (10 MB). You will use this dataset to test your sliding windows and non-maxima suppression code.
Full negatives set (87 MB, only for extra credit)

Compiling

Dependencies

libjpeg
libpng
FLTK
CMake

Generating project files with CMake

This project uses CMake to generate compilation files from a set of project description files CMakeLists.txt. For those unfamiliar with CMake you can find out more about it in this wiki. Cmake is readily available on Linux, and can be downloaded for other platforms (both command line or GUI version available here. CMake searches for dependencies and can automatically generate compilation instructions in the form of Makefiles, Visual Studio project files, XCode project files, etc (run cmake -h to see a full list of project formats). The basic procedure for generating these files with the command line tool is to first create directory where the compilation files will go

>> cd path/with/source
>> mkdir build
>> cd build

and then running cmake inside the build directory. The simplest form is

>> cmake .. # Assuming here you are inside the previously created build directory

the command will search for dependencies and generate a Makefile. Now, if you have no errors you can build the project with

>> make

if you are getting compilation errors related to linking and headers that were not found it might useful to run

>> VERBOSE=1 make

this will output all commands that CMake is running (normally it only prints out which file it is currently working on). CMake can also generate build instructions in debug and release modes with the following flags

>> cmake -DCMAKE_BUILD_TYPE=Debug ..
>> cmake -DCMAKE_BUILD_TYPE=Release ..

To generate project files for other IDEs you can use the flag -G

>> cmake -G Xcode ..

Windows

CMake also has a GUI that is especially useful in Microsoft Windows environments. TIPS: When generating project files for Visual Studio make sure to tell CMake to generate 32-bit projects (by selecting Visual Studio 10 as the compiler when it asks, instead of Visual Studio 10 x64). Once you generate the Visual Studio project and open it, you might also want to manually set the startup project to objdet (instead of ALL_BUILD) to get the debugging to work properly (to do this, right-click on the objdet project in Visual Studio, and select Set as StartUp Project.

Using the software

This project has a GUI and a command line interface, which are complimentary to each other. The GUI serves as a way to visually inspect different aspects of the pipline and to fine tune parameters. The command line interface is used to train our classifier and test it on the datasets we will provide. You can also load the generated classifier into the GUI and run it on individual images.

To launch the GUI simply run the command without arguments or double click on its icon. To get help with the command line interface run objdet -h.

Training a classifier

In order to train a new SVM classifier you will run the following command

>> objdet TRAIN pedestrian_train.cdataset -f hog hog.svm

This will load all images in the dataset pedestrian_train, extract HOG descriptors, train the classifier and save it to the file hog.svm. The .dataset file contains a list of filenames and the class of each image. A +1 before the filename indicates a file that contains a pedestrian, while -1 indicates that there are no pedestrians. Finally, the program will save the trained model into the file hog.svm.

The HOG feature extractor created uses the default parameters, if you want to try different settings you can do so by choosing them in the GUI, saving them to file with "File/Save Parameters" and then running

>> objdet TRAIN pedestrian_train.cdataset -p hog.param hog.svm

Note that the -f flag used to choose the descriptor is no longer necessary as this information is saved in the params file.

Once you have a trained classifier you can visualize its weights in the GUI by loading the .svm file and clicking on the menu item "SVM/Show SVM Weights". For the HOG descriptor the GUI will display an image that is similar to the following one

Here the left side, in red, shows a visualization of negative weights; these are edge orientations that should not be present in an image region containing a pedestrian. For instance, observe the horizontal edges in the region of the legs. On the right, in green, are the positive weights showing edge orientations that should be present in images of pedestrians.

Testing the classifier

To test the classifier you will run the command

>> objdet PRED pedestrian_test.cdataset hog.svm hog.pr hog.cdataset

This will load the images in pedestrian_test, extract descriptors, and classify them using the classifier stored in hog.svm. In the terminal you see the average precision of the classifier on the given dataset. The command also generates a .pr file, which contains the Precision-Recall curve, and a .cdataset file, which contains the classifier output for each input image.

To visualize the PR curve we provide the MATLAB script plot_pr.m that can plot multiple curves at once (in case you want to compare results for different settings or descriptors). To generate a plot for the PR curves hog.pr and ti.pr you can invoke the script in MATLAB

MATLAB>> plot_pr('PR curve', 'ti.pr', 'TinyImg', 'hog.pr', 'HOG', 'output', 'pr.eps')

The first argument is the plot title; this is followed by a list of pairs containing the .pr file followed by the curve name (which will show up in the plot legend), and finally you can optionally specify an output image with the 'output' option followed by the output filename. An example of the precision-recall curve for the solution code is show below:

Sliding window detection

So far we have trained and tested the classifier on cropped images, where the image either contained a pedestrian or not. A more realistic use is to run the classifier on an uncropped image, evaluating for every possible location and scale wether there is an instance of the object of interests or not. The final parts of this project involve implementing the functionality that will evaluate the classifier you train on a all scales and locations of an image and to select the best detections inside an image. Once this is done you will test your sliding window detector with the following command

>> objdet PREDSL test_predsl.dataset hog.svm hog_preds.pr hog.dataset

The command above is very similar to the one we used to evaluate the detector on the cropped images. Note however that here we are using a .dataset file, instead of the .cdataset one we used before. This datset file format specifies uncropped images together with the location of possibly multiple pedestrians. Here again you can fine tune the parameters for the image pyramid and non-maxima supression in the GUI, save them, and pass them to the command line with the flag -p. Note that here only the image pyarmid and non-maxima supression parameters in the file will be used, the feature extraction parameters are fixed and contained in the .svm file. When implementing the sliding window detection you might find it useful to inspect the result of applying the classifier to an image. You can visualize this in the GUI in the "SVM Response" tab. To fine tune parameters and visualize the results of your implementation of non-maxima suppression you can use the "Detections" tab in the GUI.

Exposing More Parameters

If you find it necessary to add extra parameters to any of the classes that are manipulated in the GUI (e.g., your feature extractor or your non maxima suppression code) you can easily expose these fields by editing three methods:

getDefaultParameters. This method retuns an instance of ParametersMap that is initially exposed in the GUI.
getParameters. This method returns the current value of the parameters for the class.
The class constructor that takes as input an instance of ParametersMap. In the constructor you will retrieve the value for your new parameter.

The three methods manipulate instances of the class ParametersMap, which is essentially a dictionary that associates strings (the parameter name) to the parameter values. By editing these three methods you expose the fields in the GUI and ensure that they are properly read and stored to file.

Todo

All TODOs are part of the library subproject od

Feature.cpp
- TinyImageGradFeatureExtractor::operator()
- HOGFeatureExtractor::operator()
SupportVectorMachine.cpp
- SVM Train
- SVM Sliding Window
Detection.cpp
- Detection::relativeOverlap
ObjectDetector.cpp

Turnin

In addition to the code, you will need to turn in a zipfile with .param files, along with a webpage, as the artifact. Your zipfile should contain the following items:

The .params files you used to generate the results in the report. One for HOG (hog.params) and another one for TIG (tig.params). We will run your code on two separate datasets (one of cropped images to evaluate the feature descriptor and another one with full images to evaluate the NMS) containing images that were not released and the top scorering groups will receive extra credit.
A webpage containing
- Visualizations generated by the GUI of the two features you implemented.
- The visualizations of the SVM Weights for you classifiers (for both classifiers).
- Precision recall curves computed with the croppped image test dataset containing results for TIG and HOG features. You can additionally show the PR curve for other variants of the feature descriptors you implement.
- Precision recall curves for the test image dataset test.dataset for both descriptors.
- Please describe any extra credit items on your webpage.

Extra credit

Here are some ideas of things you can implement for extra credit (some of these are described in the Dalal and Triggs paper):

Have cells overlap (so that pixels contribute to more than one cell)
The block normalization described in the original paper
A way to mine for hard negatives and improve your classifier (see the original paper for an explanation)
Invent your own feature descriptor
A principled way to handle occlusions

Last modified on December 4, 2013

CS4670/5670: Computer Vision, Fall 2013 Project 5: Object Detection