A collection of transformer-based foundation models for genomics and transcriptomics, enabling tasks like sequence analysis, functional prediction, and conversational DNA exploration.
InstaDeep AI for Genomics is a collection of transformer-based foundation models designed for analyzing biological sequences like DNA and RNA. It provides tools for tasks such as functional prediction, genome annotation, sequence generation, and conversational exploration of genomic data, helping researchers extract insights from complex biological datasets.
Bioinformaticians, computational biologists, genomics researchers, and data scientists working with DNA/RNA sequence data, transcriptomics, or single-cell analysis who need state-of-the-art deep learning models.
Developers choose this for its curated suite of high-performance, research-validated models that achieve state-of-the-art results on genomic tasks, open access to pre-trained weights, and integration with popular platforms like Hugging Face for easy adoption into workflows.
Foundation Models for Genomics & Transcriptomics
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Models are validated in top-tier journals like Nature Methods, with NTv3 achieving state-of-the-art accuracy for functional-track prediction and genome annotation across species.
Offers a wide range from Nucleotide Transformer for general tasks to specialized tools like ChatNT for conversational analysis and SegmentNT for segmentation, covering diverse genomic needs.
Pre-trained weights and inference notebooks are hosted on Hugging Face, simplifying model loading and experimentation for users familiar with the platform.
Includes models like AgroNT for plant genomics and sCT for single-cell analysis, enabling targeted research in various biological contexts beyond human data.
Models like NTv3 handle contexts up to 1 Mb, requiring significant GPU/TPU memory and compute power, which can be prohibitive for resource-constrained environments.
Dependencies on specific versions like JAX 0.3.25+ and Python 3.11, along with documentation split across multiple files in ./docs, increase initial setup time and learning curve.
The license prohibits commercial use and requires share-alike terms, limiting adoption for industry applications or projects needing more permissive licensing.