Showing 36 of 36 projects
A curated list of awesome open-source bioinformatics software, libraries, and resources, primarily for command-line analysis.
A fast RNA-seq aligner for mapping spliced transcript sequences to a reference genome.
A curated list of deep learning implementations and resources for biological research, with a focus on genomics.
An embeddable C++ storage engine for dense and sparse multi-dimensional arrays, dataframes, and key-value stores.
A suite of command-line tools for manipulating SAM, BAM, and CRAM files in next-generation sequencing data analysis.
An educational tutorial and working demonstration pipeline for RNA-seq analysis on cloud platforms.
A comprehensive collection of notes, tutorials, and resources for RNA-seq data analysis, covering alignment, quantification, differential expression, and more.
An open-source, Python-based data analysis tool with specialized data types and methods for genomic data at scale.
A genomics analysis platform that uses Apache Spark to parallelize genomic data processing across clusters, replacing traditional file-based workflows.
A validated, scalable, community-developed pipeline for variant calling, RNA-seq, and small RNA analysis in genomic sequencing.
A bioinformatics-native AI agent skill library for reproducible, local-first genomic analysis, built on OpenClaw.
An R package for visualizing and annotating phylogenetic trees and other tree-like structures using the grammar of graphics.
A C library for reading and writing high-throughput sequencing data formats like SAM, CRAM, and VCF.
A collection of transformer-based foundation models for genomics and transcriptomics, enabling tasks like sequence analysis, functional prediction, and conversational DNA exploration.
A comprehensive collection of notes, tools, and resources for analyzing ChIP-seq and related epigenomic data.
A long-range genomic foundation model that processes DNA sequences up to 1 million nucleotides at single nucleotide resolution.
A pre-trained BERT model designed for DNA sequence analysis, enabling genome understanding tasks like classification and motif discovery.
A C++ library and command-line toolkit for parsing, manipulating, and analyzing VCF (Variant Call Format) files in bioinformatics.
Fast, sensitive, and accurate integration of single-cell RNA-seq data across multiple datasets, batches, or experimental conditions.
An R package that predicts doublets (multiple cells mistaken as one) in single-cell RNA sequencing data using artificial nearest neighbor analysis.
A curated reading list of foundational genomics papers for computational biologists and statistical genomics students.
A foundation model for multi-species genome understanding, achieving state-of-the-art performance on 28 genomic tasks.
An R package to infer gene regulatory networks and identify cell types from single-cell RNA-seq data.
A deep learning toolkit for predicting regulatory activity, 3D genome folding, and mRNA half-life from DNA/RNA sequences.
A factor analysis framework for unsupervised integration of multi-omics datasets.
A collection of genomic language models for predicting variant effects and evolutionary constraints from DNA sequences.
A deep convolutional neural network that predicts RNA-seq coverage at 32bp resolution from DNA sequence.
A bi-directional equivariant transformer for long-range DNA sequence modeling, enabling reverse-complement aware genomic analysis.
A curated list of web-based interactive visualization tools for exploring biological data across genomics, transcriptomics, and other omics fields.
An R package to interact with NCBI's Entrez system, enabling programmatic search and retrieval of biological data.
A probabilistic cell segmentation method for spatial transcriptomics data from platforms like Xenium, CosMx, MERSCOPE, and Visium HD.
A Python library for blazing-fast, memory-efficient genomics data operations using DataFrames.
A curated collection of databases, software, and papers for computational biology research.
A functional bioinformatics library for Scala providing strongly-typed DNA/RNA/protein sequences, transcription, translation, and alignment utilities.
A suite of tools (wham and whamg) for sensitive and accurate structural variant detection and association testing from genomic sequencing data.
A Ruby on Rails web application for querying and exploring drug-gene interaction data from The Genome Institute's database.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.