A 100M-parameter foundation model for single-cell transcriptomics, enabling gene expression enhancement, drug response prediction, and perturbation analysis.
scFoundation is a large-scale foundation model for single-cell transcriptomics, built with 100M parameters and trained on over 50 million human single-cell transcriptomics data. It solves the problem of fragmented, task-specific models in computational biology by providing a unified pretrained model that generalizes across multiple downstream tasks like gene expression enhancement, drug response prediction, and perturbation analysis.
Bioinformaticians, computational biologists, and researchers working with single-cell RNA-seq data who need a robust foundation model for diverse analysis tasks without training models from scratch for each application.
Developers choose scFoundation because it offers state-of-the-art performance across multiple downstream tasks, reduces the need for extensive task-specific training, and provides precomputed embeddings that can be easily integrated with existing pipelines like DeepCDR and GEARS.
scFoundation is a large-scale pretrained model for single-cell transcriptomics, built on the xTrimoGene architecture and trained on over 50 million human single-cell transcriptomics data. It serves as a foundational model that achieves state-of-the-art performance across diverse downstream tasks in computational biology.
scFoundation aims to provide a unified foundation model for single-cell transcriptomics, leveraging large-scale data and advanced architecture to generalize across diverse biological tasks and reduce the need for task-specific model training.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Trained on over 50 million human single-cell transcriptomics data with 100M parameters, providing robust feature learning for diverse biological tasks, as stated in the README.
Supports multiple downstream applications like gene expression enhancement and drug response prediction, with provided code for integration with tools like GEARS and DeepCDR in dedicated folders.
Published in Nature Methods, peer-reviewed validation confirms state-of-the-art performance across various benchmarks, enhancing trust for research use.
Generates cell and gene embeddings that can be fine-tuned or integrated with other models, as detailed in the model folder, allowing customization for specific analyses.
The old API was officially discontinued in April 2024, forcing migration to a new platform, which indicates potential instability and reliance on external services that may disrupt workflows.
With 100M parameters and large embeddings, running scFoundation requires significant GPU memory and storage, making it challenging for standard lab setups, as hinted by the need for online services or CLI tools.
Involves multiple dependencies, separate codebases for different tasks, and integration steps, as seen in the fragmented README structure and references to external repositories like scvi-tools.