|
Recognizing Objects by Simultaneously Combining Appearance and
Geometry Daniel Huttenlocher, PI |
|
|
Project
|
This
project investigates methods that formulate the object recognition problem as
a single overall optimization rather than as successive stages of feature
detection and matching. Such feature
matching approaches have predominated throughout the history of research in
object recognition, and are particularly pr Such
an energy minimization formulation was proposed in the 1970's under the name
Pictorial Structures, but was abandoned due to its computational
complexity. Recent algorithmic
advances have made it possible to further investigate this kind of
approach. Initial results on detecting
and localizing objects have been promising, but also demonstrate how much
remains to be done for this approach to form a viable alternative to
feature-based object recognition. This
project investigates some of the key initial questions in determining whether
the energy minimization approach to object recognition could be a viable
alternative to current feature-based approaches, including how to learn such
models with minimal supervision, and how to incorporate global geometric
information such as object scale and orientation into the models. The
approach is based on computing cost maps that determine how well each part
matches at each possible location in the image. These cost maps are then combined together
in the energy minimization process. In
contrast, traditional feature detection approaches find a small number of
locations where each feature or part might be present in the image. While the
sparse nature of feature locations may seem to require less computation than
working with entire cost maps, the necessity of handling spurious and missed
feature detections in fact makes such feature-based methods quite
computationally intensive. |
Results |
The main research
focus was on developing weakly supervised learning techniques for the k-fan models
introduced in our CVPR 2005 paper. In this training paradigm the only
annotation required for learning is the category labels for the object(s) in
the training images. In the case of multiple objects per training image,
coarse location information is need to specify which region of the image
corresponds to which object category. We have developed an approach that
builds weak initial models and then improves those models using EM
(Expectation Maximization). The initial models are formed by randomly selecting
patches from training images and then building pairwise
models, composed of two patches, that are correlated highly with a given
object category. Those pairwise models are then
combined into an overall spatial model using a greedy search procedure to
form the initial spatial model. This work is described in part in papers in
ECCV 2006 and CVPR 2007. The latter
paper |
Rel |
|
ยท
Object
Recognition Without Feature Detection |
|
|
Last Updated: December, 2007 |