Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Computational Biology
  3. scFoundation

scFoundation

Apache-2.0Jupyter Notebook

A 100M-parameter foundation model for single-cell transcriptomics, enabling gene expression enhancement, drug response prediction, and perturbation analysis.

GitHubGitHub
418 stars73 forks0 contributors

What is scFoundation?

scFoundation is a large-scale foundation model for single-cell transcriptomics, built with 100M parameters and trained on over 50 million human single-cell transcriptomics data. It solves the problem of fragmented, task-specific models in computational biology by providing a unified pretrained model that generalizes across multiple downstream tasks like gene expression enhancement, drug response prediction, and perturbation analysis.

Target Audience

Bioinformaticians, computational biologists, and researchers working with single-cell RNA-seq data who need a robust foundation model for diverse analysis tasks without training models from scratch for each application.

Value Proposition

Developers choose scFoundation because it offers state-of-the-art performance across multiple downstream tasks, reduces the need for extensive task-specific training, and provides precomputed embeddings that can be easily integrated with existing pipelines like DeepCDR and GEARS.

Overview

scFoundation is a large-scale pretrained model for single-cell transcriptomics, built on the xTrimoGene architecture and trained on over 50 million human single-cell transcriptomics data. It serves as a foundational model that achieves state-of-the-art performance across diverse downstream tasks in computational biology.

Key Features

  • Large-Scale Pretraining — Trained on 50M+ human single-cell transcriptomics data with 100M parameters for robust feature learning.
  • Multiple Downstream Tasks — Enables gene expression enhancement, drug response prediction, perturbation prediction, and cell type annotation.
  • Cell and Gene Embeddings — Generates embeddings that can be integrated or fine-tuned with other models for specialized analyses.
  • Comprehensive Tooling — Provides code for preprocessing, model inference, and task-specific pipelines like GEARS and DeepCDR.

Philosophy

scFoundation aims to provide a unified foundation model for single-cell transcriptomics, leveraging large-scale data and advanced architecture to generalize across diverse biological tasks and reduce the need for task-specific model training.

Use Cases

Best For

  • Enhancing read depth in single-cell RNA-seq data for better clustering
  • Predicting cancer drug response (IC50) using single-cell transcriptomics
  • Analyzing genetic perturbation effects in single-cell data
  • Inferring gene modules and regulatory networks from gene context embeddings
  • Mapping organoid data to in vivo reference datasets
  • Automating cell type annotation in single-cell transcriptomics studies

Not Ideal For

  • Projects with strict computational budget constraints or needing real-time inference
  • Research focused exclusively on non-human organisms without plans for model retraining
  • Teams requiring a simple, all-in-one package without dependencies on external APIs or multiple codebases

Pros & Cons

Pros

Massive Pretraining Scale

Trained on over 50 million human single-cell transcriptomics data with 100M parameters, providing robust feature learning for diverse biological tasks, as stated in the README.

Broad Task Compatibility

Supports multiple downstream applications like gene expression enhancement and drug response prediction, with provided code for integration with tools like GEARS and DeepCDR in dedicated folders.

Scientific Credibility

Published in Nature Methods, peer-reviewed validation confirms state-of-the-art performance across various benchmarks, enhancing trust for research use.

Embedding Flexibility

Generates cell and gene embeddings that can be fine-tuned or integrated with other models, as detailed in the model folder, allowing customization for specific analyses.

Cons

API and Platform Volatility

The old API was officially discontinued in April 2024, forcing migration to a new platform, which indicates potential instability and reliance on external services that may disrupt workflows.

Heavy Resource Demands

With 100M parameters and large embeddings, running scFoundation requires significant GPU memory and storage, making it challenging for standard lab setups, as hinted by the need for online services or CLI tools.

Setup and Integration Complexity

Involves multiple dependencies, separate codebases for different tasks, and integration steps, as seen in the fragmented README structure and references to external repositories like scvi-tools.

Frequently Asked Questions

Quick Stats

Stars418
Forks73
Contributors0
Open Issues30
Last commit7 months ago
CreatedSince 2023

Tags

#deep-learning#drug-response-prediction#computational-biology#gene-expression#bioinformatics#cell-embedding#foundation-model#pytorch

Built With

t
tqdm
P
PyTorch Lightning
p
pyyaml
D
DeepSpeed
p
pandas
N
NumPy
D
Docker
P
PyTorch
S
SciPy

Included in

Computational Biology122
Auto-fetched 2 hours ago

Related Projects

totalVItotalVI

Deep probabilistic analysis of single-cell and spatial omics data

Stars1,652
Forks466
Last commit1 day ago
scGPTscGPT

scGPT is a foundation model designed for single-cell multi-omics data analysis using generative AI. It leverages transformer architecture pretrained on millions of single-cell profiles to enable a wide range of downstream biological tasks, advancing computational biology by providing a powerful, unified model for cellular data. ## Key Features - **Pretrained Model Zoo** — Offers multiple organ-specific and whole-human models trained on millions of cells for various applications. - **Zero-Shot Applications** — Supports tasks like cell embedding and reference mapping without task-specific training. - **Reference Mapping** — Enables fast similarity search across millions of cells using efficient indexing with faiss. - **Multi-Task Fine-Tuning** — Can be adapted for scRNA-seq integration, cell type annotation, perturbation prediction, and GRN inference. - **Online Tools** — Provides accessible web applications for reference mapping, cell annotation, and GRN inference via cloud GPUs. ## Philosophy scGPT aims to build a foundational AI model for single-cell biology, democratizing access to advanced computational methods and accelerating discoveries in multi-omics research through open-source collaboration.

Stars1,592
Forks335
Last commit2 months ago
UNIUNI

Pathology Foundation Model - Nature Medicine

Stars752
Forks87
Last commit1 year ago
GigaPathGigaPath

Prov-GigaPath: A whole-slide foundation model for digital pathology from real-world data

Stars621
Forks104
Last commit1 year ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub