An end-to-end deep learning system for reconstructing complete 3D scenes (geometry and semantics) from posed 2D images.
Atlas is an end-to-end deep learning system for 3D scene reconstruction from posed 2D images. It takes a set of images with known camera poses and outputs a complete 3D scene representation including both geometry (as a mesh or TSDF) and semantic segmentation per vertex. It solves the problem of creating detailed, semantically-aware 3D models from image collections without relying on traditional multi-view stereo pipelines.
Computer vision researchers and engineers working on 3D reconstruction, SLAM, AR/VR, and robotics who need to generate semantically labeled 3D environments from images.
Atlas provides a fully learned pipeline that can produce more complete and semantically meaningful reconstructions compared to traditional methods like COLMAP, especially in challenging areas with textureless surfaces or occlusions, by leveraging deep learning priors trained on large datasets.
Atlas: End-to-End 3D Scene Reconstruction from Posed Images
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Directly generates 3D meshes with per-vertex semantic labels from posed images, as shown in the sample results and benchmark evaluations on ScanNet.
Includes scripts for quantitative evaluation against ground truth, enabling reliable performance assessment on datasets like ScanNet, as detailed in the evaluation section.
Provides a Colab notebook for easy testing without local setup, along with pretrained models for immediate use, lowering the barrier to entry.
Supports parallel processing on multiple GPUs for datasets like ScanNet, as indicated in the data prep instructions, allowing efficient handling of large-scale data.
Requires specific versions of PyTorch, NVIDIA apex, and other libraries, with the README warning about compatibility issues and the need for exact installations.
Preparing datasets such as ScanNet can take hours even on powerful GPU setups, as noted in the instructions with multi-GPU recommendations for efficiency.
Pretrained models are trained with Z-up metric coordinates and do not generalize to other orientations, restricting use cases without retraining or pose adjustment.