Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Computational Biology
  3. HyenaDNA

HyenaDNA

Apache-2.0Assembly

A long-range genomic foundation model that processes DNA sequences up to 1 million nucleotides at single nucleotide resolution.

Visit WebsiteGitHubGitHub
791 stars107 forks0 contributors

What is HyenaDNA?

HyenaDNA is a long-range genomic foundation model that processes DNA sequences up to 1 million nucleotides at single nucleotide resolution. It is designed to handle ultra-long genomic contexts, enabling tasks like classification, prediction, and in-context learning on DNA data. The model is pretrained on the human reference genome and can be fine-tuned for various downstream genomic applications.

Target Audience

Genomics researchers, bioinformaticians, and machine learning practitioners working with DNA sequence data who need to model long-range dependencies and fine-grained nucleotide interactions.

Value Proposition

HyenaDNA uniquely combines extreme sequence length capability (up to 1M tokens) with single nucleotide resolution, outperforming existing models on long-range genomic tasks. Its open-source implementation and pretrained weights lower the barrier to applying deep learning to genomics.

Overview

Official implementation for HyenaDNA, a long-range genomic foundation model built with Hyena

Use Cases

Best For

  • Analyzing ultra-long DNA sequences (e.g., full chromosomes)
  • Fine-tuning on genomic classification tasks like enhancer prediction
  • Species classification from DNA sequences
  • Chromatin profile prediction
  • In-context learning experiments on genomic benchmarks
  • Pretraining custom genomic foundation models

Not Ideal For

  • Projects needing out-of-the-box genomic analysis without deep learning expertise
  • Applications focused exclusively on short DNA sequences (under 1k bases) where long-range modeling adds unnecessary complexity
  • Teams without access to high-performance GPUs (e.g., A100/V100) for training or inference on large models
  • Researchers requiring extensive, beginner-friendly documentation and pre-built pipelines for common genomic workflows

Pros & Cons

Pros

Ultra-Long Context

Handles sequences up to 1 million tokens, enabling analysis of full chromosomes or large genomic regions, as highlighted in the intro and key features.

Single Nucleotide Resolution

Processes DNA at individual base level for fine-grained analysis, allowing precise genomic feature detection as stated in the description.

Pretrained Weights Available

Offers multiple model sizes on HuggingFace pretrained on hg38, reducing initialization time for downstream tasks, with GPU requirements specified.

Flexible Fine-Tuning

Supports various downstream tasks like species classification and chromatin profiling, with example configs and dataloaders provided in the README.

Cons

Complex Setup Process

Requires Docker or manual installation of dependencies like Flash Attention, and familiarity with Pytorch Lightning and Hydra, making onboarding challenging.

High Resource Demands

Large models need powerful GPUs (e.g., A100 for 1M sequences in Colab paid tier), and pretraining or fine-tuning can be computationally intensive.

Limited Documentation and Maturity

The repo is described as a 'work in progress,' with users needing to dig into code for custom dataloaders, and experimental features like bidirectional implementation are not fully supported.

Steep Learning Curve

Assumes advanced ML knowledge, with custom configs and dataloaders required for new datasets, as noted in sections on setting up downstream experiments.

Frequently Asked Questions

Quick Stats

Stars791
Forks107
Contributors0
Open Issues34
Last commit1 year ago
CreatedSince 2023

Tags

#deep-learning#dna-sequencing#genomics#language-models#bioinformatics#foundation-models#huggingface#foundation-model#pytorch

Built With

F
Flash Attention
P
PyTorch Lightning
H
Hydra
D
Docker
P
PyTorch
H
Hugging Face

Links & Resources

Website

Included in

Computational Biology122
Auto-fetched 1 day ago

Related Projects

EvoEvo

Biological foundation modeling from molecular to genome scale

Stars1,519
Forks177
Last commit3 months ago
Nucleotide TransformerNucleotide Transformer

Foundation Models for Genomics & Transcriptomics

Stars887
Forks95
Last commit3 months ago
DNABERTDNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Stars759
Forks179
Last commit4 months ago
DNABERT-2DNABERT-2

[ICLR 2024] DNABERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genome

Stars495
Forks101
Last commit5 months ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub