Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Robotic Tooling
  3. packnet-sfm

packnet-sfm

MITPythonv0.1.2

A PyTorch implementation of self-supervised monocular depth estimation using 3D packing for high-resolution, real-time depth prediction.

Visit WebsiteGitHubGitHub
1.3k stars245 forks0 contributors

What is packnet-sfm?

PackNet-SfM is a self-supervised monocular depth estimation framework that predicts depth maps from single images or video sequences without requiring labeled depth data. It solves the problem of accurate 3D scene understanding for applications like autonomous driving by learning from video alone, using a novel 3D packing architecture to preserve fine details and enable real-time performance.

Target Audience

Computer vision researchers and engineers working on autonomous driving, robotics, and 3D scene understanding who need accurate, efficient depth estimation without costly ground-truth data.

Value Proposition

Developers choose PackNet-SfM for its state-of-the-art self-supervised performance, ability to generalize across camera models (including non-pinhole), and real-time inference capabilities, all while being open-source and backed by extensive research from Toyota Research Institute.

Overview

TRI-ML Monocular Depth Estimation Repository

Use Cases

Best For

  • Self-supervised depth estimation from monocular video for autonomous vehicles
  • Real-time depth prediction in robotics and drone navigation
  • 3D scene reconstruction without LiDAR or ground-truth depth labels
  • Depth estimation on non-pinhole cameras like fisheye or catadioptric systems
  • Academic research in computer vision and self-supervised learning
  • Benchmarking depth estimation models on datasets like DDAD and KITTI

Not Ideal For

  • Projects needing depth estimation from single static images without video sequences
  • Applications deployed on hardware with less than 6GB GPU memory
  • Teams wanting a production-ready, actively maintained codebase with extensive community support

Pros & Cons

Pros

Innovative 3D Packing

Uses symmetric packing and unpacking blocks with 3D convolutions to compress detail-preserving representations, enabling high-resolution depth prediction as shown in the CVPR 2020 paper.

No Ground-Truth Data

Trained self-supervised only on monocular videos, eliminating the need for expensive depth labeling, which is a core advantage highlighted in the framework's description.

Real-Time Inference

Optimized for real-time performance using TensorRT, making it suitable for autonomous driving applications where speed is critical, as noted in the README.

Camera Model Flexibility

Extends to non-pinhole cameras like fisheye through Neural Ray Surfaces, allowing depth estimation beyond traditional models, based on the 3DV 2020 implementation.

Cons

Complex Docker Setup

Requires Docker and is only tested on Ubuntu 18.04, with additional configuration for AWS and WANDB, making initial setup cumbersome and error-prone.

High Hardware Demands

Needs at least 6GB of GPU memory, and more for larger models or higher resolutions, which can be prohibitive for resource-constrained environments.

Deprecated Codebase

The README states that future development has moved to a new repository (vidar), limiting updates and support for this version, potentially leaving users with outdated tools.

Frequently Asked Questions

Quick Stats

Stars1,275
Forks245
Contributors0
Open Issues73
Last commit2 years ago
CreatedSince 2019

Tags

#autonomous-driving#3d-vision#self-supervised-learning#depth-prediction#cvpr#computer-vision#pytorch

Built With

A
AWS
D
Docker
T
TensorRT
P
PyTorch

Links & Resources

Website

Included in

Robotic Tooling3.8k
Auto-fetched 9 hours ago

Related Projects

detectron2detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Stars34,425
Forks7,925
Last commit21 days ago
EasyOCREasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Stars29,366
Forks3,562
Last commit4 months ago
imgaugimgaug

Image augmentation for machine learning experiments.

Stars14,734
Forks2,455
Last commit1 year ago
meshroommeshroom

Node-based Visual Programming Toolbox

Stars12,695
Forks1,207
Last commit18 hours ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub