A large-scale dataset of object-centric video clips with 3D bounding box annotations and AR metadata for 3D object detection research.
Objectron is a large-scale dataset of short, object-centric video clips designed for 3D object detection and understanding research. It provides annotated videos with 3D bounding boxes and AR metadata like camera poses and point clouds, capturing everyday objects from multiple angles. The dataset addresses the need for high-quality 3D perception data to train and evaluate computer vision models.
Computer vision researchers, machine learning engineers, and developers working on 3D object detection, augmented reality, or robotic perception systems. It is particularly valuable for those needing multi-view object data with precise spatial annotations.
Developers choose Objectron for its scale, rich metadata, and ready-to-use formats that streamline 3D model training. Its unique combination of video sequences, 3D annotations, and AR session data provides a comprehensive foundation for advancing real-world 3D perception beyond 2D image datasets.
Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
With 15,000 annotated video clips and over 4 million images across nine categories, it provides extensive coverage for training robust 3D models, as highlighted in the dataset statistics.
Includes camera poses, sparse point-clouds, surface planes, and manually annotated 3D bounding boxes, enabling advanced tasks like 3D reconstruction and pose estimation, as detailed in the schema files.
Offers tf.record files compatible with TensorFlow, PyTorch, and JAX, along with tutorials for loading data, reducing integration overhead for researchers.
Sourced from 10 countries across five continents, ensuring variety and reducing geographic bias, which supports more generalizable model development.
At 4.4TB total size, it demands significant disk space and bandwidth, making it inaccessible for individual researchers or teams with resource constraints.
Only covers nine everyday object types (e.g., bikes, chairs), so it's unsuitable for projects needing data on broader or niche categories like furniture or industrial tools.
Annotations are in protobuf formats requiring custom parsing scripts, as noted in the tutorials, which adds setup complexity and potential errors for newcomers.
Data is hosted on Google Cloud Storage, which might incur costs for heavy usage or require stable internet access, creating vendor lock-in and accessibility issues.