A deep learning toolkit for predicting regulatory activity, 3D genome folding, and mRNA half-life from DNA/RNA sequences.
Basenji is a deep learning toolkit for functional genomics that predicts regulatory activity, 3D genome folding, and mRNA half-life from DNA and RNA sequences. It enables researchers to model gene regulation at chromosome scale, annotate regulatory elements, and score genetic variants for their functional impact. The toolkit includes specialized models like Akita for genome folding and Saluki for mRNA stability.
Bioinformatics researchers and computational biologists studying gene regulation, variant effects, and genome architecture who need scalable deep learning models for sequence-based predictions.
Developers choose Basenji for its ability to handle very long chromosome-scale sequences, its quantitative regression approach, and its integration of multiple prediction tasks (regulatory activity, 3D folding, mRNA stability) in one toolkit. It offers improvements over predecessor Basset with TensorFlow support and greater flexibility.
Sequential regulatory activity predictions with deep convolutional neural networks.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Basenji handles very long DNA sequences, enabling predictions across entire chromosomes, a key improvement over Basset for genome-wide analyses.
It uses regression loss functions to predict quantitative regulatory signals, providing more nuanced insights than binary classification for functional genomics.
Built on TensorFlow, it leverages distributed computing and a large community, facilitating scalability and compatibility with modern deep learning workflows.
Includes Akita for 3D genome folding and Saluki for mRNA stability, offering a comprehensive suite for diverse genomic prediction tasks in one toolkit.
Installation requires managing multiple dependencies via conda or pip, separate TensorFlow installation, and environment variable configuration, which can be error-prone and time-consuming.
The README admits tutorials are 'a work in progress' and the package is between personal research code and accessible software, leading to gaps in guidance for new users.
Designed for chromosome-scale analyses, it necessitates significant computational resources like GPUs for efficient training, limiting accessibility for researchers with constrained infrastructure.