|
Recognizing Objects by Simultaneously Combining Appearance and
Geometry Daniel Huttenlocher, PI |
|
|
Project
|
The goal of this project is to achieve
a qualitative improvement in the robustness of object category recognition
and localization, by formulating the problem as a single overall estimation
problem rather than as successive stages of feature detection and object
detection. In the proposed approach each object class is modeled as a
collection of local patches arranged in a deformable configuration, where
certain pairs of parts are connected by spring-like connections. The object
localization problem is formulated in terms of energy minimization, where
there is a cost for placing each patch at each location in the image, and a
cost for placing pairs of patches in a manner that stretches the springs
connecting them. This kind of formulation was
proposed in the 1970's under the name Pictorial Structures, but was abandoned
due to its computational complexity. Recent algorithmic advances have made it
possible to pursue this approach as an alternative to traditional methods
based on detecting features. The central focus of the proposed work is to use
techniques for combining uncertain information to improve the efficiency and
accuracy of object recognition. The work aims to improve the ability to
detect and localize objects in images. Accurate localization of objects is
important for systems that interact with the world,
however the focus of most current research activity is on classification
methods that are better suited to image retrieval problems. The work also
aims to develop methods that take advantage of sources of information beyond
a single object. While it is well known that scene-level context can be
helpful in improving recognition, interpreting such context often depends on
recognizing objects and vice versa. Extensions of the Pictorial Structures
approach to scene-level context offer particular promise because the methods
are designed to directly combine multiple sources of uncertain information
without the need for intermediate detection decisions. |
Results
|
Work in the first year has focused
on two aspects of the overall approach.
The first focus is on low-level processing, bringing machine learning
techniques and graphical representations from work in object recognition to
bear on low-level vision problems. The
goal here is to move beyond hand-tuned models with simple 4-connected grid
topologies that are common in low-level vision, and towards richer models
that are learned from examples and can be integrated into a single overall
framework for object recognition.
These are described in [3,4,5]. The second
focus is on high-level object recognition, largely experimental work
extending our previous work on Pictorial Structure models [1] and scene
context [2] to larger numbers of object categories and the more challenging
PASCAL VOC 2007 dataset. In order to
achieve state-of-the-art precision-recall curves for this data, we had to
move beyond single-scale models to multiple spatial scales, taking an
approach motivated by the recent work of Felzenszwalb et. al. [6] using
multi-scale HoG features and a multi-scale extension
of the modeling framework we had previously used. We have also been collaborating with Kodak
on applying these kinds of flexible template models to recognition for
consumer photographs. |
Plans
for Next Year |
We have begun bringing the low-level
learning approaches together with the object category recognition techniques,
focusing on learning models for objects composed of HoG-like
features that consist of spatially-uniform graphical models defined by the
image grid, coupled with Pictorial Structure style part-based flexible
template models (again represented as graphical models). This combination of low- and high-level
recognition techniques is the main planned focus for the second year. |
Relevant
|
[1] P.F. Felzenszwalb and D.P.
Huttenlocher Pictorial
Structures for Object Recognition, Intl. Journal of Computer Vision,
61(1), pp. 55-79, January 2005. |
Cited |
[6] P.F. Felzenszwalb, D.
McAllester and D. Ramanan. A Discriminatively
Trained, Multiscale, Deformable Part Model. CVPR 2008. |
|
Last Updated: July, 2008 |