A BERT-based foundation model pretrained on large-scale scRNA-seq data for automated cell type annotation in single-cell analysis.
scBERT is a BERT-based foundation model pretrained on large-scale single-cell RNA sequencing (scRNA-seq) data for automated cell type annotation. It addresses common challenges in scRNA-seq analysis, such as batch effects and reliance on curated marker gene lists, by leveraging deep learning to capture gene-gene interactions. The model follows a pre-train and fine-tune approach, enabling accurate annotation on user-specific datasets.
Bioinformaticians, computational biologists, and researchers working with single-cell RNA sequencing data who need reliable, scalable cell type annotation tools. It is suited for those familiar with deep learning frameworks and Python-based bioinformatics workflows.
Developers choose scBERT because it provides a state-of-the-art, pretrained deep learning model specifically designed for scRNA-seq data, offering improved accuracy over traditional methods by effectively handling batch effects and leveraging latent gene interactions without requiring extensive manual curation.
scBERT is a deep learning model designed to address the challenges of cell type annotation in single-cell RNA sequencing (scRNA-seq) data. It leverages the pre-train and fine-tune paradigm to overcome issues like batch effects, reliance on curated marker genes, and inefficient use of gene-gene interaction information.
scBERT applies the success of large-scale pretrained language models to computational biology, aiming to provide a robust, data-driven foundation for cell type annotation that reduces reliance on manually curated knowledge.
scBERT is specifically designed to better manage batch effects compared to traditional annotation algorithms, as stated in the README, making it robust for multi-experiment datasets.
Includes built-in functionality to detect novel cell types by thresholding predicted probabilities, with a default threshold of 0.5, providing flexibility for exploratory analysis.
Can efficiently infer cell types for thousands of cells, with the README citing ~25 minutes for 10,000 cells on a desktop, enabling large-scale studies.
Reduces reliance on manually curated marker genes by leveraging pretrained models on massive unlabeled scRNA-seq data, aligning with modern AI paradigms for improved accuracy.
Requires specific steps like gene symbol revision according to NCBI Gene database and normalization with scanpy, adding complexity and potential for errors in the workflow.
Depends on older library versions such as torch 1.8.1, which may lead to compatibility issues, security vulnerabilities, and lack of access to newer features.
Explicitly stated as not approved for clinical use in the disclaimer, limiting its applicability in medical research or diagnostic settings.
Deep probabilistic analysis of single-cell and spatial omics data
scGPT is a foundation model designed for single-cell multi-omics data analysis using generative AI. It leverages transformer architecture pretrained on millions of single-cell profiles to enable a wide range of downstream biological tasks, advancing computational biology by providing a powerful, unified model for cellular data. ## Key Features - **Pretrained Model Zoo** — Offers multiple organ-specific and whole-human models trained on millions of cells for various applications. - **Zero-Shot Applications** — Supports tasks like cell embedding and reference mapping without task-specific training. - **Reference Mapping** — Enables fast similarity search across millions of cells using efficient indexing with faiss. - **Multi-Task Fine-Tuning** — Can be adapted for scRNA-seq integration, cell type annotation, perturbation prediction, and GRN inference. - **Online Tools** — Provides accessible web applications for reference mapping, cell annotation, and GRN inference via cloud GPUs. ## Philosophy scGPT aims to build a foundational AI model for single-cell biology, democratizing access to advanced computational methods and accelerating discoveries in multi-omics research through open-source collaboration.
Pathology Foundation Model - Nature Medicine
Prov-GigaPath: A whole-slide foundation model for digital pathology from real-world data
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.