Question 1

How to fine-tune HyenaDNA on my own genomic dataset?

Accepted Answer

You need to create a custom dataset class and dataloader, similar to the GenomicBenchmarks example, and adjust configs in `configs/experiment/`. The README provides steps but requires familiarity with Pytorch Lightning and Hydra.

Question 2

HyenaDNA vs Nucleotide Transformer: which is better for long sequences?

Accepted Answer

HyenaDNA excels at ultra-long contexts (up to 1M tokens) with single nucleotide resolution, while Nucleotide Transformer is optimized for shorter ranges. HyenaDNA includes pretrained weights for long sequences, but integration requires using provided Docker images or custom setup.

Question 3

What GPU do I need for the HyenaDNA large model?

Accepted Answer

For the large-1m model, you'll need high-end GPUs like an A100, as noted in the HuggingFace GPU requirements. The Colab example recommends paid tier for 1M sequences, and local setups require similar hardware.

Question 4

Can HyenaDNA handle RNA or protein sequences?

Accepted Answer

No, HyenaDNA is specifically designed for DNA sequences at single nucleotide resolution, using a tokenization for DNA bases. The README focuses on genomic data, and adapting it for other biomolecules would require significant modification.

Question 5

Is there pretrained support for non-human genomes?

Accepted Answer

Currently, pretrained weights are only available for the human reference genome (hg38). For other species, you'd need to pretrain from scratch using custom data, as mentioned in the 'Pretraining on your own data' section.

Question 6

How to extract embeddings from a HyenaDNA model?

Accepted Answer

Use the provided `huggingface.py` script or the standalone code for embeddings, which loads pretrained weights without the model head. The README includes examples for inference and getting logits or embeddings.

Question 7

What are the main downstream tasks HyenaDNA is tested on?

Accepted Answer

It's tested on tasks like GenomicBenchmarks classification, Nucleotide Transformer datasets, chromatin profiling, and species classification, with configs and commands provided in the README for replication.

HyenaDNA

What is HyenaDNA?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions