Learning Depth from Single Monocular Images

Home | Publications | Make3D Range Image Data | Make3d

We consider the task of 3-d depth estimation from a single still image. Depth estimation is a challenging problem, since local features alone are insufficient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a hierarchical, multi-scale Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models the depths and the relation between depths at different points in the image.

We show that, even on unstructured scenes (of indoor and outdoor environments which include forests, trees, buildings, etc.), our algorithm is frequently able to recover fairly accurate depthmaps. We further propose a model that incorporates both monocular cues and stereo (triangulation) cues, to obtain significantly more accurate depth estimates than is possible using either monocular or stereo cues alone.

Data, Code, More results, Convert your image to 3-d model

Publications

Learning 3-D Scene Structure from a Single Still Image,
Ashutosh Saxena, Min Sun, Andrew Y. Ng, In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007. (best paper) [ps, pdf]
(Full 3-d models from a single image.)
3-D Depth Reconstruction from a Single Still Image,
Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng. International Journal of Computer Vision (IJCV), Aug 2007. [pdf, Springer, springerPdf]
Learning Depth from Single Monocular Images,
Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng. In NIPS 18, 2005. [ps, pdf]
(Infer a depthmap from a single still image.)
Depth Estimation using Monocular and Stereo Cues,
Ashutosh Saxena, Jamie Schulte, Andrew Y. Ng. In IJCAI, 2007. [pdf]
(Monocular cues were used to improve the performance of stereo vision.)
High Speed Obstacle Avoidance using Monocular Vision and Reinforcement Learning, Jeff Michels, Ashutosh Saxena, Andrew Y. Ng. In ICML, 2005. [pdf]
(A simplified version of the monocular-depth algorithm was used to drive a rc-car in real-time.)

Media Coverage
Why a robot is better with one eye than two, New Scientist, Dec 17, 2005.
What the robots see, Mechanical Engineering, Apr 2006.
Going Deep, Scientific Computing, Mar 2006.
"Robot Vision Algorithm", as reported by media: Physorg, Science Daily, Stanford Report, Dec 7, 2005.

One eye on the world, Stanford Scientific, vol. 4, Issue 3, 2006.
Note: IJCV had highest impact factor (6.085 in 2006) in all computer sciene, as well as artificial intelligence journals.

More results on test images

Results
Other Results(Different camera)
Images completely different from type of images in training set (Downloaded from the internet, different camera, low resolution as compared to training set)
588 images downloaded from the internet