We consider the task of 3-d depth estimation from a single still image. Depth estimation is a challenging problem, since local features alone are insufficient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a hierarchical, multi-scale Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models the depths and the relation between depths at different points in the image.
We show that, even on unstructured scenes
(of indoor and outdoor environments which include forests, trees, buildings, etc.),
our algorithm is frequently able to recover fairly accurate depthmaps.
We further propose a model that incorporates both monocular cues and stereo (triangulation) cues,
to obtain significantly more accurate depth estimates than is possible using
either monocular or stereo cues alone.
Data,
Code,
More results,
Convert your image to 3-d model
Publications
Learning 3-D Scene Structure from a Single Still Image,
3-D Depth Reconstruction from a Single Still Image,
Learning Depth from Single Monocular Images,
Depth Estimation using Monocular and Stereo Cues,
High Speed Obstacle Avoidance using Monocular Vision and Reinforcement Learning,
Jeff Michels, Ashutosh Saxena, Andrew Y. Ng.
In ICML, 2005.
[pdf]
Media CoverageWhy a robot is better with one eye than two, New Scientist, Dec 17, 2005.What the robots see, Mechanical Engineering, Apr 2006. Going Deep, Scientific Computing, Mar 2006. "Robot Vision Algorithm", as reported by media: Physorg, Science Daily, Stanford Report, Dec 7, 2005. One eye on the world, Stanford Scientific, vol. 4, Issue 3, 2006. Note: IJCV had highest impact factor (6.085 in 2006) in all computer sciene, as well as artificial intelligence journals. |