Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Computational Biology
  3. UCE

UCE

MITPython

A zero-shot foundation model for generating universal embeddings from single-cell gene expression data.

GitHubGitHub
255 stars36 forks0 contributors

What is UCE?

Universal Cell Embeddings (UCE) is a foundation model for single-cell RNA sequencing data that generates unified representations of cells across tissues, species, and conditions. It solves the problem of fragmented cell representations by providing a consistent embedding space for downstream biological analyses without requiring retraining for each new dataset.

Target Audience

Bioinformaticians, computational biologists, and researchers working with single-cell genomics data who need to compare or integrate datasets across different experiments or species.

Value Proposition

Developers choose UCE because it offers a pretrained, zero-shot model that eliminates the need for dataset-specific training, provides cross-species compatibility, and integrates seamlessly with the widely-used AnnData ecosystem for single-cell analysis.

Overview

UCE is a zero-shot foundation model for single-cell gene expression data

Use Cases

Best For

  • Integrating single-cell datasets from different experimental conditions
  • Comparing cell types across multiple species
  • Zero-shot cell type annotation without labeled training data
  • Building downstream analysis pipelines on unified cell representations
  • Researchers needing consistent embeddings for meta-analysis across studies
  • Computational biologists exploring cross-tissue cellular relationships

Not Ideal For

  • Projects requiring real-time or low-latency embedding generation
  • Environments with limited GPU memory (e.g., below 80GB for the deep model)
  • Datasets using ENSEMBL IDs instead of gene names in .var_names
  • Teams needing to mix embeddings from different UCE model versions

Pros & Cons

Pros

Zero-shot Embeddings

Generates cell representations without fine-tuning on new datasets, as highlighted in the key features, enabling immediate use for diverse biological analyses.

Cross-species Compatibility

Uses a unified gene vocabulary to handle data from multiple species, allowing direct comparison across tissues and experimental conditions without retraining.

Seamless AnnData Integration

Embeds datasets directly into the popular AnnData format by adding embeddings to the .obsm slot, as shown in the output description, simplifying integration with existing pipelines.

Scalable Model Variants

Offers both lightweight (4-layer) and deep (33-layer) models, providing flexibility based on computational resources and accuracy needs, as specified in the usage instructions.

Cons

Model Version Incompatibility

Embeddings from the 33-layer model are not compatible with those from the 4-layer model, as noted in the data section, which can hinder comparative analyses and require careful version management.

High GPU Requirements

The README specifies batch sizes for an 80GB GPU, indicating that running the deep model requires substantial computational resources, limiting accessibility for teams without high-end hardware.

Complex File Management

Users need to manually download model files for the 33-layer variant from external links, and scripts rely on additional automatically downloaded files, adding setup complexity and potential dependency issues.

Frequently Asked Questions

Quick Stats

Stars255
Forks36
Contributors0
Open Issues0
Last commit3 months ago
CreatedSince 2023

Tags

#zero-shot-learning#single-cell-rna-seq#computational-biology#gene-expression#bioinformatics#cell-embedding#foundation-model#pytorch

Built With

P
PyTorch

Included in

Computational Biology122
Auto-fetched 1 day ago

Related Projects

totalVItotalVI

Deep probabilistic analysis of single-cell and spatial omics data

Stars1,634
Forks453
Last commit3 days ago
scGPTscGPT

scGPT is a foundation model designed for single-cell multi-omics data analysis using generative AI. It leverages transformer architecture pretrained on millions of single-cell profiles to enable a wide range of downstream biological tasks, advancing computational biology by providing a powerful, unified model for cellular data. ## Key Features - **Pretrained Model Zoo** — Offers multiple organ-specific and whole-human models trained on millions of cells for various applications. - **Zero-Shot Applications** — Supports tasks like cell embedding and reference mapping without task-specific training. - **Reference Mapping** — Enables fast similarity search across millions of cells using efficient indexing with faiss. - **Multi-Task Fine-Tuning** — Can be adapted for scRNA-seq integration, cell type annotation, perturbation prediction, and GRN inference. - **Online Tools** — Provides accessible web applications for reference mapping, cell annotation, and GRN inference via cloud GPUs. ## Philosophy scGPT aims to build a foundational AI model for single-cell biology, democratizing access to advanced computational methods and accelerating discoveries in multi-omics research through open-source collaboration.

Stars1,568
Forks331
Last commit1 month ago
UNIUNI

Pathology Foundation Model - Nature Medicine

Stars738
Forks84
Last commit1 year ago
GigaPathGigaPath

Prov-GigaPath: A whole-slide foundation model for digital pathology from real-world data

Stars612
Forks104
Last commit1 year ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub