A fully convolutional neural network for real-time instance segmentation, achieving high speed and accuracy on COCO.
YOLACT is a deep learning model designed for real-time instance segmentation, which involves detecting and precisely outlining each distinct object in an image. It solves the problem of performing this computationally intensive task quickly enough for live applications, such as autonomous driving or interactive systems, by using a single, fully convolutional network architecture.
Computer vision researchers and engineers who need to implement fast and accurate instance segmentation in real-time applications, including video processing, robotics, and augmented reality.
Developers choose YOLACT for its unique combination of high frame rates and competitive accuracy, enabled by its simple yet effective protonet-based mask generation, avoiding the complexity and latency of two-stage models like Mask R-CNN.
A simple, fully convolutional model for real-time instance segmentation.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Processes images at over 30 fps on a Titan Xp GPU, enabling live video analysis as demonstrated in the ICCV demo and benchmark tables.
Uses a fully convolutional, single-stage network that avoids complex multi-stage pipelines, reducing latency and simplifying the model design.
Supports multiple backbones like ResNet50, ResNet101, and Darknet53, allowing users to trade off between speed and accuracy based on their needs.
YOLACT++ achieves up to 34.6 mAP on COCO test-dev, offering a balanced performance for real-time applications compared to slower models.
Requires compiling DCNv2 with CUDA, which adds installation overhead and can be error-prone, especially on non-standard systems.
Optimized for the COCO dataset; custom datasets need manual conversion to COCO format, which is cumbersome and may not support all annotation types.
While fast, its mAP scores are lower than state-of-the-art models like Mask R-CNN, making it less suitable for precision-focused tasks.
Benchmarks rely on high-end GPUs like Titan Xp; performance degrades on lower-end hardware, limiting accessibility for resource-constrained environments.