A PyTorch implementation of self-supervised monocular depth estimation using 3D packing for high-resolution, real-time depth prediction.
PackNet-SfM is a self-supervised monocular depth estimation framework that predicts depth maps from single images or video sequences without requiring labeled depth data. It solves the problem of accurate 3D scene understanding for applications like autonomous driving by learning from video alone, using a novel 3D packing architecture to preserve fine details and enable real-time performance.
Computer vision researchers and engineers working on autonomous driving, robotics, and 3D scene understanding who need accurate, efficient depth estimation without costly ground-truth data.
Developers choose PackNet-SfM for its state-of-the-art self-supervised performance, ability to generalize across camera models (including non-pinhole), and real-time inference capabilities, all while being open-source and backed by extensive research from Toyota Research Institute.
TRI-ML Monocular Depth Estimation Repository
Uses symmetric packing and unpacking blocks with 3D convolutions to compress detail-preserving representations, enabling high-resolution depth prediction as shown in the CVPR 2020 paper.
Trained self-supervised only on monocular videos, eliminating the need for expensive depth labeling, which is a core advantage highlighted in the framework's description.
Optimized for real-time performance using TensorRT, making it suitable for autonomous driving applications where speed is critical, as noted in the README.
Extends to non-pinhole cameras like fisheye through Neural Ray Surfaces, allowing depth estimation beyond traditional models, based on the 3DV 2020 implementation.
Requires Docker and is only tested on Ubuntu 18.04, with additional configuration for AWS and WANDB, making initial setup cumbersome and error-prone.
Needs at least 6GB of GPU memory, and more for larger models or higher resolutions, which can be prohibitive for resource-constrained environments.
The README states that future development has moved to a new repository (vidar), limiting updates and support for this version, potentially leaving users with outdated tools.
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Image augmentation for machine learning experiments.
Node-based Visual Programming Toolbox
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.