A knowledge-informed cross-species foundation model pre-trained on over 120 million human and mouse single-cell transcriptomes to decipher universal gene regulatory mechanisms.
GeneCompass is a knowledge-informed cross-species foundation model that deciphers universal gene regulatory mechanisms using artificial intelligence. It's pre-trained on over 120 million human and mouse single-cell transcriptomes and integrates prior biological knowledge to understand how genes are regulated across different organisms and cell types. The model solves the challenge of traditional research paradigms that focus on individual model organisms without integrating diverse cell types across species.
Computational biologists, bioinformaticians, and researchers working with single-cell transcriptomics data who need to analyze gene regulatory networks, perform cell-type annotation, or study cross-species biological mechanisms. It's particularly valuable for teams investigating cell fate transitions and drug target discovery.
Developers choose GeneCompass because it outperforms state-of-the-art models in diverse biological applications while enabling entirely new realms of cross-species investigation. Its unique integration of prior biological knowledge with massive single-cell data provides more accurate predictions of gene regulatory mechanisms and has been experimentally validated to identify factors that successfully induce cell differentiation.
GeneCompass
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Pre-trained on over 120 million human and mouse single-cell transcriptomes, enabling robust analysis of universal gene regulatory mechanisms, as highlighted in the README.
Integrates four types of prior biological knowledge during self-supervised pre-training, enhancing accuracy for downstream tasks like cell-type annotation and GRN inference.
Identifies candidate genes that successfully induce human embryonic stem cell differentiation, providing real-world biological relevance and validation, as described in the study.
Outperforms state-of-the-art models in diverse applications, including cell-type annotation across multiple organ datasets, with examples provided in the repository.
Requires specific PyTorch and CUDA versions (e.g., pytorch-1.13.1, cuda-11.7), manual environment configuration, and multiple data downloads, which can be a barrier to entry.
Currently pre-trained only on human and mouse data, limiting immediate applicability to other organisms without additional, resource-intensive training.
The README assumes prior expertise, with commands like modifying bashrc and distributed training scripts that may confuse users unfamiliar with deep learning setups.