A 3D vision library for monocular and stereo 3D human detection, social distancing, and body orientation estimation from 2D keypoints.
Monoloco is a Python library for 3D human perception from 2D keypoints, supporting both monocular and stereo camera setups. It estimates 3D positions, body orientation, and social interactions, enabling applications like social distancing monitoring and activity analysis without specialized hardware.
Computer vision researchers, developers, and practitioners working on human-centric applications such as surveillance, robotics, and social behavior analysis who need 3D understanding from standard cameras.
It provides an open-source, research-backed solution that combines monocular and stereo approaches for robust 3D localization, includes uncertainty estimation, and offers ready-to-use features like social distancing visualization, making advanced 3D perception accessible.
A 3D vision library from 2D keypoints: monocular and stereo 3D detection for humans, social distancing, and body orientation.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Based on multiple peer-reviewed papers (ICRA, T-ITS, ICCV) with quantitative results on KITTI, showing competitive performance in 3D localization and social distancing analysis.
Supports both monocular and stereo inputs via --mode arguments, allowing flexibility in hardware setup for various applications like robotics or surveillance.
Includes built-in social distancing visualization and activity recognition (e.g., hand-raising), ready for deployment without additional coding, as shown in the webcam and prediction examples.
Provides epistemic uncertainty estimates for monocular predictions using dropout (--n_dropout 50), enhancing reliability in safety-critical applications.
Tightly coupled with OpenPifPaf for 2D keypoints, requiring its use and limiting compatibility with other pose detection libraries, which adds dependency overhead.
Training requires downloading and preprocessing datasets like KITTI with multiple steps, including running OpenPifPaf on all images and handling annotations, which is time-consuming.
Real-time performance is only assured with a GPU, as stated in the README ('GPU is not required, yet highly recommended for real-time performances'), making CPU deployments slow.