An unsupervised learning framework for depth and ego-motion estimation from monocular videos using TensorFlow.
SfMLearner is an unsupervised deep learning framework that simultaneously estimates depth maps from single images and camera ego-motion from video sequences. It solves the structure-from-motion problem using self-supervision from view synthesis, eliminating the need for labeled 3D data. The system was originally presented in a CVPR 2017 paper and achieves state-of-the-art results on KITTI benchmarks.
Computer vision researchers and engineers working on 3D scene understanding, particularly those focused on autonomous driving, robotics, or AR/VR applications requiring geometry estimation from monocular cameras.
Developers choose SfMLearner because it provides a complete, well-documented implementation of a seminal unsupervised learning approach for 3D vision, with pre-trained models and evaluation code for standard datasets. Its self-supervised paradigm reduces data annotation costs while maintaining competitive accuracy.
An unsupervised learning framework for depth and ego-motion estimation from monocular videos
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Provides the official code for the influential CVPR 2017 paper on unsupervised depth and ego-motion, ensuring authenticity and serving as a key reference in the field.
Includes detailed evaluation scripts for KITTI depth and pose benchmarks with pre-computed predictions, allowing direct validation against published results like the 0.183 Abs Rel on Eigen split.
Uses photometric consistency from video sequences as a supervisory signal, eliminating the need for expensive labeled 3D data and enabling scalable learning on large datasets.
Offers downloadable models for depth and pose estimation on KITTI via Google Drive, facilitating quick testing and demonstration without training from scratch.
Built on TensorFlow 1.0 with CUDA 8.0, which are deprecated and can cause compatibility issues on modern systems, requiring legacy environment setup.
Requires running specific scripts to format KITTI and Cityscapes data with exact parameters (e.g., img_width=416, img_height=128), adding overhead for custom datasets.
Primarily optimized for KITTI and Cityscapes; adapting to other video sequences necessitates significant modifications to data loading and preprocessing code.