A PyTorch framework for efficient 3D semantic and panoptic segmentation using superpoint-based transformer architectures.
Superpoint Transformer is an open-source PyTorch implementation of research models for 3D point cloud segmentation. It introduces the Superpoint Transformer for efficient semantic segmentation and SuperCluster for scalable panoptic segmentation, both based on hierarchical superpoint graphs. The framework solves the problem of processing large-scale 3D scenes with limited computational resources by using lightweight transformer architectures and graph clustering formulations.
Researchers and developers working on 3D computer vision, particularly those focused on point cloud segmentation for applications like autonomous driving, robotics, and geospatial analysis. It's suited for those needing efficient models for large-scale 3D data.
Developers choose Superpoint Transformer for its exceptional efficiency—models train in hours on a single GPU with very few parameters—while achieving competitive accuracy on standard benchmarks. Its unique superpoint-based approach and scalable graph clustering formulation offer a principled alternative to dense 3D convolutions.
Official PyTorch implementation of Superpoint Transformer [ICCV'23], SuperCluster [3DV'24 Oral], and EZ-SP [ICRA'26]
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Models have as few as 212k parameters, making them 200x smaller than PointNeXt and 40x smaller than Stratified Transformer, as highlighted in the README benchmarks.
SPT trains on S3DIS in 3 hours on one GPU, with preprocessing 7x faster than Superpoint Graph, and SuperCluster processes 18M points in 10.1s, enabling large-scale processing.
SuperCluster formulates panoptic segmentation as superpoint graph clustering, allowing it to handle massive scenes on a single GPU with fewer than 1M parameters, per the paper results.
Achieves state-of-the-art mIoU and PQ scores on datasets like S3DIS, KITTI-360, and DALES, with results backed by Papers with Code badges in the README.
Includes tools for creating shareable HTML visualizations of 3D segmentation results, enhancing interpretability and collaboration, as shown in the notebooks and media.
Requires specific hardware (64G RAM, Linux OS), and the install.sh script has optional dependencies like TorchSparse, making it less accessible for casual users.
The README explicitly notes non-backward compatible changes in updates, such as the EZ-SP release, forcing users to reinstall environments and reprocess datasets.
Parameterizing superpoint partitions and graph clustering requires deep understanding, as admitted in the tutorials, which may deter non-experts from adapting it to new data.
Focuses on research benchmarks; using custom data involves significant preprocessing and hyperparameter tuning, with less documentation for production pipelines compared to broader frameworks.