A CVPR 2018 algorithm for efficient multi-person pose estimation and tracking in videos, ranking first in the ICCV 2017 PoseTrack challenge.
DetectAndTrack is a research implementation from Facebook AI Research (FAIR) that provides an efficient algorithm for detecting and tracking human poses in videos. It solves the problem of multi-person pose estimation across video frames by combining detection and tracking into a single framework, which improves both accuracy and computational efficiency compared to separate systems.
Computer vision researchers and engineers working on video analysis, human pose estimation, activity recognition, or multi-object tracking, particularly those interested in reproducing or building upon state-of-the-art methods from academic literature.
Developers choose DetectAndTrack for its proven performance (it ranked first in the PoseTrack challenge), its efficient unified approach to detection and tracking, and the availability of pre-trained models and detailed configurations for both 2D and 3D models, which facilitate research and application development.
The implementation of an algorithm presented in the CVPR18 paper: "Detect-and-Track: Efficient Pose Estimation in Videos"
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Ranked first in the ICCV 2017 PoseTrack challenge with 55.2% MOTA and 60.6% mAP, providing benchmark accuracy for video pose tracking as confirmed in the README.
Integrates detection and tracking into a single end-to-end model, reducing computational overhead compared to separate systems, which is central to the paper's contribution.
Offers both 2D and 3D Mask R-CNN models with pre-trained weights, allowing experimentation with temporal context, as detailed in the config examples for different GPU setups.
Includes official PoseTrack metrics (MOTA, mAP) and supports upper-bound analysis to debug performance limits, with scripts for automated evaluation and visualization.
Built on Caffe2 and Python 2.7, which are deprecated and no longer maintained, complicating integration with modern deep learning ecosystems and toolchains.
Requires compiling custom ops, setting up specific Anaconda environments, and handling GPU dependencies, as outlined in the lengthy installation section with potential issues like NCCL bugs.
Training demands at least 4 GPUs (e.g., P100s or 1080Tis) with no CPU support, limiting accessibility for researchers or teams with fewer resources.
Primarily designed for the PoseTrack dataset; adapting to other datasets requires significant modification of data handling code and JSON formats, as noted in the dataset setup instructions.