Real-time 3D semantic mapping system using a handheld RGB-D camera, built on ROS with ORB_SLAM2 and PSPNet.
Semantic SLAM is a real-time 3D mapping system that builds semantically annotated maps from a handheld RGB-D camera. It solves the problem of environment understanding for robotics by combining simultaneous localization and mapping (SLAM) with deep learning-based semantic segmentation to label objects (e.g., walls, furniture) in the generated 3D map.
Robotics researchers, computer vision engineers, and developers working on autonomous navigation, augmented reality, or environmental modeling who need real-time semantic 3D perception.
It offers an open-source, integrated pipeline for semantic SLAM using commodity hardware, with configurable output (semantic or RGB maps) and multiple fusion methods, built on robust ROS and proven libraries like ORB_SLAM2 and OctoMap.
Real time semantic slam in ROS with a hand held RGB-D camera
Integrates ORB_SLAM2, depth data, and PSPNet segmentation to generate 3D semantic octomaps at interactive rates, as demonstrated in the run-time analysis with 1 Hz map updates.
Allows switching between semantic octomaps with object labels and standard RGB octomaps via parameter settings, offering flexibility for different perception tasks.
Supports max-confidence or Bayesian fusion for semantic label integration, providing options for robustness in label assignment, as detailed in the project report.
Implemented as ROS nodes with launch files, compatible with common RGB-D cameras like Asus Xtion and ROS bags, simplifying deployment in robotic systems.
Relies on PyTorch 0.4.0 and ORB_SLAM2, which are older versions that may cause compatibility issues with modern systems and lack ongoing updates.
Requires manual building of ORB_SLAM2, handling multiple dependencies, and configuring parameters in YAML files, making setup non-trivial and error-prone.
Semantic segmentation runs at only 2-3 Hz, as noted in the run-time section, which could bottleneck real-time applications in fast-moving scenarios.
Primarily tested with Asus Xtion camera and specific GPU hardware, with no clear documentation for adapting to other RGB-D sensors or edge devices.
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Image augmentation for machine learning experiments.
Node-based Visual Programming Toolbox
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.