A deep learning approach that unifies global place recognition and local 6DoF pose refinement for robust relocalization in large-scale 3D point clouds.
DH3D is a deep learning framework for 6-degree-of-freedom (6DoF) relocalization in large-scale 3D point clouds. It solves the problem of determining a camera's precise position and orientation within previously mapped environments by unifying global place recognition and local pose refinement. The system learns hierarchical 3D descriptors directly from raw point data, enabling robust performance across different sensor types including LiDAR and visual SLAM systems.
Researchers and engineers working on robotics, autonomous vehicles, augmented reality, and 3D computer vision who need robust relocalization capabilities in large-scale environments. This includes those developing SLAM systems, navigation algorithms, and 3D scene understanding applications.
Developers choose DH3D because it provides a unified architecture that handles both global retrieval and local refinement in a single forward pass, offers strong generalization across different point cloud sources without fine-tuning, and achieves competitive results against state-of-the-art approaches while being more efficient than separate systems.
DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DOF Relocalization
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Combines global place recognition and local 6DoF pose refinement in a single forward pass, reducing system complexity and improving efficiency, as stated in the README's abstract.
Performs well on point clouds from different sources like LiDAR and visual SLAM without fine-tuning, demonstrated on Oxford RobotCar and ETH datasets in the experiments.
Predicts keypoint discriminativeness without manual annotation, making it adaptable to new data, as highlighted in the key features.
Uses an effective attention mechanism to aggregate local descriptors, leading to robust global representations for place recognition, as described in the system overview.
Requires building TensorFlow from source and compiling multiple custom operators like Flex-Convolution and PointNet++, which is time-consuming and error-prone, as detailed in the build instructions.
Relies on TensorFlow 1.x and older Ubuntu versions (16.04/18.04), making integration with modern deep learning ecosystems challenging and limiting long-term support.
Provides basic instructions without detailed troubleshooting, API documentation, or examples for custom datasets, which can hinder implementation and debugging.