An automated cell type annotation tool for single-cell RNA-seq data using logistic regression classifiers.
CellTypist is an automated cell type annotation tool designed for single-cell RNA sequencing (scRNA-seq) data analysis. It uses logistic regression classifiers optimized by stochastic gradient descent to predict cell types and subtypes from gene expression profiles, helping researchers interpret cellular heterogeneity in complex tissues. The tool supports both built-in models (with a focus on immune cells) and custom-trained models for specialized applications.
Bioinformaticians, computational biologists, and researchers analyzing single-cell RNA-seq data who need accurate and scalable cell type annotation, particularly those working with immune cell datasets or requiring custom classification models.
Developers choose CellTypist for its interpretable logistic regression approach, flexibility in model training and cross-species conversion, and seamless integration with popular single-cell analysis ecosystems like Scanpy and AnnData, enabling reproducible and customizable workflows.
A tool for semi-automatic cell type classification
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Accepts count tables in CSV, TSV, MTX formats or AnnData objects, supporting both gene-by-cell and cell-by-gene layouts, making it versatile for various data sources.
Directly works with AnnData and Scanpy workflows, enabling easy visualization like UMAP plots and dot plots for result inspection without extra conversion steps.
Allows training new classifiers on user-specific reference datasets with options for feature selection and SGD/mini-batch learning, enhancing adaptability to diverse biological questions.
Supports converting models between species (e.g., human to mouse) using orthologous gene mapping, facilitating comparative studies without retraining from scratch.
Explicitly states no plan for R compatibility in the README, forcing R users to convert objects to AnnData, which adds complexity for R-centric teams.
Pre-trained models focus primarily on immune sub-populations, limiting out-of-the-box utility for other cell types without time-consuming custom training.
Training custom models requires careful data normalization (log1p to 10,000 counts per cell) and parameter tuning, with warnings about expression format checks that can be error-prone for beginners.