A transformer-based foundation model pretrained on millions of single-cell profiles for generative AI tasks in single-cell multi-omics.
scGPT is a transformer-based foundation model pretrained on millions of single-cell profiles to analyze and interpret single-cell multi-omics data. It solves the challenge of leveraging large-scale cellular data for tasks like integration, annotation, and prediction using generative AI, providing a versatile tool for computational biologists.
Bioinformaticians, computational biologists, and researchers working with single-cell RNA-seq or multi-omics data who need advanced AI-driven analysis tools.
Developers choose scGPT for its extensive pretrained model zoo, efficient zero-shot capabilities, and comprehensive support for key single-cell analysis tasks, all built on a modern transformer architecture optimized for biological data.
scGPT is a foundation model designed for single-cell multi-omics data analysis using generative AI. It leverages transformer architecture pretrained on millions of single-cell profiles to enable a wide range of downstream biological tasks, advancing computational biology by providing a powerful, unified model for cellular data.
scGPT aims to build a foundational AI model for single-cell biology, democratizing access to advanced computational methods and accelerating discoveries in multi-omics research through open-source collaboration.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Offers multiple organ-specific and whole-human models trained on millions of cells, such as the 33-million-cell whole-human model, providing a robust starting point for diverse biological contexts.
Enables tasks like cell embedding and reference mapping without task-specific training, leveraging the foundation model for quick insights as demonstrated in the zero-shot tutorials.
Integrates faiss for similarity search, allowing mapping of 10,000 query cells in under a second with low memory usage, as highlighted in the reference mapping tutorial.
Provides web applications via Superbio.ai for tasks like reference mapping and GRN inference, lowering the barrier to entry with cloud GPU support.
Installation can be problematic, especially with optional flash-attention requiring specific GPU and CUDA versions (e.g., CUDA 11.7), leading to compatibility issues as noted in the README.
Key features like pretraining code with generative attention masking and some fine-tuning examples are still on the to-do list, limiting advanced customization and usage.
Optimal performance depends on GPUs and specific software versions, making it less suitable for resource-constrained or heterogeneous computing environments.