A zero-shot foundation model for generating universal embeddings from single-cell gene expression data.
Universal Cell Embeddings (UCE) is a foundation model for single-cell RNA sequencing data that generates unified representations of cells across tissues, species, and conditions. It solves the problem of fragmented cell representations by providing a consistent embedding space for downstream biological analyses without requiring retraining for each new dataset.
Bioinformaticians, computational biologists, and researchers working with single-cell genomics data who need to compare or integrate datasets across different experiments or species.
Developers choose UCE because it offers a pretrained, zero-shot model that eliminates the need for dataset-specific training, provides cross-species compatibility, and integrates seamlessly with the widely-used AnnData ecosystem for single-cell analysis.
UCE is a zero-shot foundation model for single-cell gene expression data
Generates cell representations without fine-tuning on new datasets, as highlighted in the key features, enabling immediate use for diverse biological analyses.
Uses a unified gene vocabulary to handle data from multiple species, allowing direct comparison across tissues and experimental conditions without retraining.
Embeds datasets directly into the popular AnnData format by adding embeddings to the .obsm slot, as shown in the output description, simplifying integration with existing pipelines.
Offers both lightweight (4-layer) and deep (33-layer) models, providing flexibility based on computational resources and accuracy needs, as specified in the usage instructions.
Embeddings from the 33-layer model are not compatible with those from the 4-layer model, as noted in the data section, which can hinder comparative analyses and require careful version management.
The README specifies batch sizes for an 80GB GPU, indicating that running the deep model requires substantial computational resources, limiting accessibility for teams without high-end hardware.
Users need to manually download model files for the 33-layer variant from external links, and scripts rely on additional automatically downloaded files, adding setup complexity and potential dependency issues.
Deep probabilistic analysis of single-cell and spatial omics data
scGPT is a foundation model designed for single-cell multi-omics data analysis using generative AI. It leverages transformer architecture pretrained on millions of single-cell profiles to enable a wide range of downstream biological tasks, advancing computational biology by providing a powerful, unified model for cellular data. ## Key Features - **Pretrained Model Zoo** — Offers multiple organ-specific and whole-human models trained on millions of cells for various applications. - **Zero-Shot Applications** — Supports tasks like cell embedding and reference mapping without task-specific training. - **Reference Mapping** — Enables fast similarity search across millions of cells using efficient indexing with faiss. - **Multi-Task Fine-Tuning** — Can be adapted for scRNA-seq integration, cell type annotation, perturbation prediction, and GRN inference. - **Online Tools** — Provides accessible web applications for reference mapping, cell annotation, and GRN inference via cloud GPUs. ## Philosophy scGPT aims to build a foundational AI model for single-cell biology, democratizing access to advanced computational methods and accelerating discoveries in multi-omics research through open-source collaboration.
Pathology Foundation Model - Nature Medicine
Prov-GigaPath: A whole-slide foundation model for digital pathology from real-world data
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.