A pre-trained language model for single-cell RNA sequencing data that encodes cell-cell relations and accelerates inference for downstream tasks.
CellPLM is a pre-trained language model specifically designed for single-cell RNA sequencing data analysis. It encodes cell-cell relations to create more accurate biological representations, solving the limitation of models that treat cells in isolation. The model achieves state-of-the-art performance on downstream tasks like cell-type annotation with significantly faster inference speeds.
Bioinformaticians, computational biologists, and researchers working with single-cell RNA sequencing data who need accurate and efficient tools for cell-type annotation and other downstream analysis tasks.
Developers choose CellPLM because it uniquely models cell-cell relationships rather than treating cells independently, leading to more biologically meaningful representations. It offers both superior accuracy compared to alternatives like scGPT and Geneformer, and 100x faster inference speeds, making it practical for large-scale analyses.
Official repo for CellPLM: Pre-training of Cell Language Model Beyond Single Cells.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Encodes relationships between cells rather than treating them in isolation, leading to more biologically accurate representations as highlighted in the paper's philosophy.
Delivers 100x faster inference speeds compared to existing pre-trained models, making it practical for large-scale single-cell datasets.
Consistently outperforms competitors like scGPT and Geneformer across diverse tissue datasets (e.g., Pancreas, HLCA) in accuracy, as shown in the README's result table.
Provides readily downloadable model checkpoints for immediate fine-tuning, reducing the need for costly pre-training from scratch.
Requires GPU with CUDA >=11.7 and PyTorch, which limits accessibility for users with older hardware or CPU-only setups.
As a new model (accepted by ICLR 2024), it lacks extensive community support, third-party integrations, and mature documentation compared to established tools.
Full installation involves conda environments and specific CUDA versions, which can be cumbersome for researchers without deep technical expertise.