CS4670/5670: Computer Vision, Fall 2013
Project 5: Object Detection
Brief
- Assigned: Saturday, November 23, 2013
- Code Due: Thursday, December 5, 2013 (by 11:59pm)
- Artifact Due: Friday, December 6, 2013 (by 11:59pm)
- For the demos on Friday, please download this set of files: pedestrian_demo.zip
- This assignment should be done in groups of 2 students.
Introduction
The goal of this project is to implement a simple, effective method
for detecting pedestrians in an image. You will be working off of the
technique of Dalal and Triggs
(
PDF) from 2005. This
technique has four main components:
- A feature descriptor. We first need a way to describe an
image region with a high-dimensional descriptor. For this project,
you will be implementing two descriptors: tiny images and histogram
of gradients (HOG) features.
- A learning method. Next, we need a way to learn to
classify an image region (described using one of the features above)
as a pedestrian or not. For this, we will be using support vector
machines (SVMs) and a large training dataset of image regions
containing pedestrians (positive examples) or not containing
pedestrians (negative examples).
- A sliding window detector. Using our classifer, we can
tell if an image region looks like a pedestrian or not. The next
step is to run this classifier as a sliding window detector
on an input image in order to detect all instances of pedestrians in
that image. In order to detect pedestrians at multiple scales we
run our sliding window detector at multiple scales to form a pyramid
of detector responses.
- Non-maxima suppression. Given the pyramid generated by the
sliding window detector the final step is to find the best detections
in each region by selecting the strongest responses within a neighborhood
within an image and across scales.
Using our skeleton code as a starting point, you'll be implementing
parts of all four of these components, and evaluating your methods by
creating precision-recall (PR) curves.
Downloads
- Skeleton code
For this assignment we will
distribute the skeleton code
using git. (This should help
make distributing any updates easier.) Please
install git on your system; installed, you can
download the code by typing (using the command-line interface
to git)
>> git clone http://www.cs.cornell.edu/courses/cs4670/2013fa/projects/p5/skeleton.git
This will create the directory skeleton. To get updates to the code
you can then simply run
>> git pull
This will fetch any updates and merge them into your local
copy. If we modify a file you have already
modified git will not overwrite your
changes. Instead, it will mark the file as having a conflict
and then ask you to resolve how to integrate the changes from
these two
sources. Here's
a quick guide on how to resolve these conflicts.
For those that are already using git to work in groups, you
can still share code with your partner by having multiple
masters to your local repository (one being this original
repository and the other some remote service like github
where you host the code you are working
on); here's
a reference with more information.
- Solution executables: Mac, Linux,
Windows
- Cropped Pedestrian Dataset (18 MB). You will use this dataset
for training and testing your detector.
- Full Image Pedestrian Dataset (10 MB). You will use this dataset
to test your sliding windows and non-maxima suppression code.
- Full negatives set (87 MB, only for extra credit)
Compiling
Dependencies
- libjpeg
- libpng
- FLTK
- CMake
Generating project files with CMake
This project
uses CMake to generate compilation files from a set of
project description
files
CMakeLists.txt
. For those unfamiliar
with CMake you can find out more about it in
this
wiki. Cmake is
readily available on Linux, and can be downloaded for other platforms
(both command line or GUI version
available
here.
CMake searches for dependencies and can automatically generate compilation
instructions in the form of Makefiles, Visual Studio project files,
XCode project files, etc (run
cmake -h
to see a full list of
project formats). The basic procedure for generating these files with the
command line tool is to first create directory where the compilation files will go
>> cd path/with/source
>> mkdir build
>> cd build
and then running
cmake
inside the build directory. The
simplest form is
>> cmake .. # Assuming here you are inside the previously created build directory
the command will search for dependencies and generate a Makefile. Now,
if you have no errors you can build the project with
>> make
if you are getting compilation errors related to linking and headers
that were not found it might useful to run
>> VERBOSE=1 make
this will output all commands that CMake is running (normally
it only prints out which file it is currently working on).
CMake can also generate build instructions in debug and
release modes with the following flags
>> cmake -DCMAKE_BUILD_TYPE=Debug ..
>> cmake -DCMAKE_BUILD_TYPE=Release ..
To generate project files for other IDEs you can use the flag
-G
>> cmake -G Xcode ..
Windows
CMake also has a GUI that is especially useful in Microsoft Windows
environments.
TIPS: When generating project files for Visual Studio make sure to tell CMake to generate
32-bit projects (by selecting Visual Studio 10 as the compiler when it asks,
instead of Visual Studio 10 x64). Once you generate the Visual Studio project and open it,
you might also want to manually set the startup
project to
objdet
(instead of
ALL_BUILD
) to get the debugging
to work properly (to do this, right-click on the
objdet
project in Visual Studio, and select
Set as StartUp Project.
Using the software
This project has a GUI and a command line interface, which are complimentary to each
other. The GUI serves as a way to visually inspect different aspects of the
pipline and to fine tune parameters. The command line interface is used to
train our classifier and test it on the datasets we will provide. You can
also load the generated classifier into the GUI and run it on individual images.
To launch the GUI simply run the command without arguments or double click on its icon.
To get help with the command line interface run objdet -h
.
Training a classifier
In order to train a new SVM classifier you will run the following command
>> objdet TRAIN pedestrian_train.cdataset -f hog hog.svm
This will load all images in the dataset pedestrian_train, extract HOG
descriptors, train the classifier and save it to the file
hog.svm
.
The .dataset file contains a list of filenames and the class of each image.
A
+1
before the filename indicates a file that contains a pedestrian, while
-1
indicates that there are no pedestrians. Finally, the program will save
the trained model into the file hog.svm.
The HOG feature extractor created uses the default parameters, if you want
to try different settings you can do so by choosing them in the GUI, saving
them to file with "File/Save Parameters" and then running
>> objdet TRAIN pedestrian_train.cdataset -p hog.param hog.svm
Note that the
-f
flag used to choose the descriptor is
no longer necessary as this information is saved in the params file.
Once you have a trained classifier you can visualize its weights in
the GUI by loading the .svm
file and clicking on the
menu item "SVM/Show SVM Weights". For the HOG descriptor the GUI will
display an image that is similar to the following one
Here the left side, in red, shows a visualization of negative weights;
these are edge orientations that should not be present in an image region
containing a pedestrian. For instance, observe the horizontal edges in the
region of the legs. On the right, in green, are the positive weights showing
edge orientations that should be present in images of pedestrians.
Testing the classifier
To test the classifier you will run the command
>> objdet PRED pedestrian_test.cdataset hog.svm hog.pr hog.cdataset
This will load the images in
pedestrian_test
, extract
descriptors, and classify them using the classifier stored in
hog.svm
. In the terminal you see the average precision
of the classifier on the given dataset. The command also generates
a
.pr
file, which contains the Precision-Recall curve,
and a
.cdataset
file, which contains the classifier
output for each input image.
To visualize the PR curve we provide
the MATLAB script plot_pr.m
that can plot multiple
curves at once (in case you want to compare results for different
settings or descriptors). To generate a plot for the PR curves
hog.pr
and ti.pr
you can invoke the
script in MATLAB
MATLAB>> plot_pr('PR curve', 'ti.pr', 'TinyImg', 'hog.pr', 'HOG', 'output', 'pr.eps')
The first argument is the plot title; this is followed by a list of
pairs containing the
.pr file followed by
the curve name (which will show up in the plot legend), and finally
you can optionally specify an output image with the
'output'
option followed by the output filename. An example of the precision-recall
curve for the solution code is show below:
Sliding window detection
So far we have trained and tested the classifier on cropped images, where
the image either contained a pedestrian or not. A more realistic use is to
run the classifier on an uncropped image, evaluating for every possible location
and scale wether there is an instance of the object of interests or not.
The final parts of this project involve implementing the functionality that
will evaluate the classifier you train on a all scales and locations of
an image and to select the best detections inside an image. Once this is
done you will test your sliding window detector with the following command
>> objdet PREDSL test_predsl.dataset hog.svm hog_preds.pr hog.dataset
The command above is very similar to the one we used to evaluate the detector
on the cropped images. Note however that here we are using a
.dataset
file, instead of the
.cdataset
one we used before. This datset file format
specifies uncropped images together with the location of possibly multiple pedestrians.
Here again you can fine tune the parameters for the image pyramid
and non-maxima supression in the GUI, save them, and pass them
to the command line with the flag
-p
. Note that
here only the image pyarmid and non-maxima supression parameters
in the file will be used, the feature extraction parameters are fixed
and contained in the
.svm
file.
When implementing the sliding window detection you might find it
useful to inspect the result of applying the classifier to
an image. You can visualize this in the GUI in the "SVM Response"
tab. To fine tune parameters and visualize the results of your implementation
of non-maxima suppression you can use the "Detections" tab in the GUI.
Exposing More Parameters
If you find it necessary to add extra parameters to any of the classes that are manipulated
in the GUI (e.g., your feature extractor or your non maxima suppression code) you
can easily expose these fields by editing three methods:
getDefaultParameters
. This method retuns an instance of ParametersMap
that is initially exposed in the GUI.
getParameters
. This method returns the current value of the parameters for
the class.
- The class constructor that takes as input an instance of
ParametersMap
. In
the constructor you will retrieve the value for your new parameter.
The three methods manipulate instances of the class
ParametersMap
, which is essentially a dictionary
that associates strings (the parameter name) to the parameter values. By editing these three methods
you expose the fields in the GUI and ensure that they are properly read and stored to file.
Todo
All TODOs are part of the library subproject
od
Feature.cpp
TinyImageGradFeatureExtractor::operator()
Extract a simple descriptor by downsampling the input image and computing
the gradient magnitude. We already provide a more basic feature in
TinyImageFeatureExtractor
that should help you familiarize
yourself with the code.
HOGFeatureExtractor::operator()
The HOG descriptor, as described in class, divides an image region
into a set of k x k cells, computes a histogram of
gradient orientations for each cell, normalizes each
histogram, and then concatenates the histogram for each
cell into a single, high-dimensional descriptor vector.
Please see the lecture notes and the Dalal and Triggs
paper for more information.
- SupportVectorMachine.cpp
- SVM Train
- SVM Sliding Window
Detection.cpp
Detection::relativeOverlap
Compute the relative overlap between two detections. This is used in non-maxima
suppression and in the evaluation code.
ObjectDetector.cpp
Implement NMS within an image and across different levels of a
response pyramid.
Turnin
In addition to the code, you will need to turn in a
zipfile with
.param
files, along with a webpage, as the
artifact. Your zipfile should contain the following items:
- The
.params
files you used to generate the results in the report.
One for HOG (hog.params
) and another one for TIG (tig.params
). We will
run your code on two separate datasets (one of cropped images to evaluate the
feature descriptor and another one with full images to evaluate the NMS) containing
images that were not released and the top scorering groups will receive extra credit.
- A webpage containing
- Visualizations generated by the GUI of the two features you implemented.
- The visualizations of the SVM Weights for you classifiers (for both classifiers).
- Precision recall curves computed with the croppped image test dataset
containing results for TIG and HOG features. You can additionally show
the PR curve for other variants of the feature descriptors you implement.
- Precision recall curves for the test image dataset
test.dataset
for both descriptors.
- Please describe any extra credit items on your webpage.
Further Reading
- A recent survey on the best methods
for pedestrian detection.
- Rodrigo Benenson et al presented recently a collection of improvements
to the detector implemented in this project that gives it a significant boost in performace. You can use
this as inspiration for extra credit.
- HOGgles: a better way to visualize HOGs (whith MATLAB code available).
- Rujikietgumjorn and Collins discuss a new way of handing occlusion
- A disadvantage of the method implemented in this project is that the classifier will recognize
a single view of an object (in our case a frontal or back view of a pedestrian). Nevertheless
Malisiewicz et al. show
that by combining multiple classifiers into an ensemble, each one trained on a different view of an object, we
can actually detect objects in any configuration. Furthermore, they show that each of the individual
classifiers can be trained with a single image of the object/view of interest, as long as a large collection
of negative examples is supplied.
- Another approach for generalizing this classifier to objects in general pose and view is the work of
Pedro Felzenszwalb et al. At a high level
these algorithms combine multiple classifiers similar to the one you implemented in this project, but now
each one is specialized at detecting one part of the object (e.g., leg or arm or head).
Extra credit
Here are some ideas of things you can implement
for extra credit (some of these are described in the Dalal and Triggs
paper):
- Have cells overlap (so that pixels contribute to more than one cell)
- The block normalization described in the original paper
- A way to mine for hard negatives and improve your classifier (see the original
paper for an explanation)
- Invent your own feature descriptor
- A principled way to handle occlusions
Last modified on December 4, 2013