Genomics

43 projects

Showing 36 of 43 projects

biopythonPython

Official git repository for Biopython (originally converted from CVS)

#protein-structure#phylogenetics#protein

Stars5.1k

Forks1.9k

Last commit14 days ago

Bioinformatics

A curated list of awesome open-source bioinformatics software, libraries, and resources, primarily for command-line analysis.

#workflow-management#scientific-computing#command-line-tools

Stars4.2k

Forks722

Last commit3 months ago

STARC

A fast RNA-seq aligner for mapping spliced transcript sequences to a reference genome.

#transcriptomics#next-generation-sequencing#computational-biology

Stars2.2k

Forks547

Last commit1 year ago

deeplearning-biology

A curated list of deep learning implementations and resources for biological research, with a focus on genomics.

#deep-learning#single-cell-analysis#protein-structure-prediction

Stars2.2k

Forks487

Last commit4 months ago

TileDBC++

An embeddable C++ storage engine for dense and sparse multi-dimensional arrays, dataframes, and key-value stores.

#multi-dimensional-arrays#c-plus-plus-library#scientific-computing

Stars2.1k

Forks214

Last commit20 days ago

samtoolsC

A suite of command-line tools for manipulating SAM, BAM, and CRAM files in next-generation sequencing data analysis.

#command-line-tools#sequencing-data#genomics

Stars1.9k

Forks612

Last commit11 days ago

Informatics for RNA-seq: A web resource for analysis on the cloudR

An educational tutorial and working demonstration pipeline for RNA-seq analysis on cloud platforms.

#transcriptomics#next-generation-sequencing#educational-tutorial

Stars1.4k

Forks613

Last commit

RNA-seq AnalysisPython

A comprehensive collection of notes, tutorials, and resources for RNA-seq data analysis, covering alignment, quantification, differential expression, and more.

#transcriptomics#bioconductor#single-cell-rna-seq

An open-source, Python-based data analysis tool with specialized data types and methods for genomic data at scale.

#scientific-computing#spark#python-library

Stars1.1k

Forks265

Last commit3 days ago

ADAMScala

A genomics analysis platform that uses Apache Spark to parallelize genomic data processing across clusters, replacing traditional file-based workflows.

#genomic-data#apache-spark#parquet

Stars1.1k

Forks314

Last commit4 months ago

ClawBioPython

A bioinformatics-native AI agent skill library for reproducible, local-first genomic analysis, built on OpenClaw.

#equity#reproducible-research#cli-tool

Stars1.0k

Forks226

Last commit2 days ago

BcbioPython

A validated, scalable, community-developed pipeline for variant calling, RNA-seq, and small RNA analysis in genomic sequencing.

#community-driven#high-throughput-sequencing#genomics

Stars1.0k

Forks356

Last commit1 year ago

htslibC

A C library for reading and writing high-throughput sequencing data formats like SAM, CRAM, and VCF.

#c-library#high-throughput-sequencing#bcf

Stars938

Forks470

Last commit12 days ago

ggtreeR

An R package for visualizing and annotating phylogenetic trees and other tree-like structures using the grammar of graphics.

#scientific-visualization#tree-annotation#r-package

Stars928

Forks183

Last commit2 months ago

Nucleotide TransformerJupyter Notebook

A collection of transformer-based foundation models for genomics and transcriptomics, enabling tasks like sequence analysis, functional prediction, and conversational DNA exploration.

#transformer#transcriptomics#jax

Stars897

Forks95

Last commit

ChIP-seq analysis notes from Tommy TangPython

A comprehensive collection of notes, tools, and resources for analyzing ChIP-seq and related epigenomic data.

#histone-modifications#peak-calling#chromatin

A long-range genomic foundation model that processes DNA sequences up to 1 million nucleotides at single nucleotide resolution.

#deep-learning#dna-sequencing#genomics

Stars797

Forks107

Last commit1 year ago

DNABERTPython

A pre-trained BERT model designed for DNA sequence analysis, enabling genome understanding tasks like classification and motif discovery.

#transformer-model#kmer#deep-learning

Stars767

Forks179

Last commit5 months ago

VcflibC++

A C++ library and command-line toolkit for parsing, manipulating, and analyzing VCF (Variant Call Format) files in bioinformatics.

#structural-variants#vcf-manipulation#command-line-tools

Stars682

Forks222

Last commit4 months ago

HarmonyR

Fast, sensitive, and accurate integration of single-cell RNA-seq data across multiple datasets, batches, or experimental conditions.

#scrna-seq#algorithm#r-package

Stars665

Forks110

Last commit1 month ago

DoubletFinderR

An R package that predicts doublets (multiple cells mistaken as one) in single-cell RNA sequencing data using artificial nearest neighbor analysis.

#r-package#doublet-detection#single-cell-rna-seq

A foundation model for multi-species genome understanding, achieving state-of-the-art performance on 28 genomic tasks.

#transformer#promoter#dna-processing

Stars507

Forks101

Last commit6 months ago

The Leek group guide to genomics papers

A curated reading list of foundational genomics papers for computational biologists and statistical genomics students.

#statistical-genomics#bioconductor#research-papers

Stars505

Forks175

Last commit7 years ago

SCENICHTML

An R package to infer gene regulatory networks and identify cell types from single-cell RNA-seq data.

#r-package#bioconductor#single-cell-rna-seq

Stars490

Forks99

Last commit2 years ago

BasenjiPython

A deep learning toolkit for predicting regulatory activity, 3D genome folding, and mRNA half-life from DNA/RNA sequences.

#deep-learning#dna-sequencing#variant-scoring

Stars473

Forks137

Last commit6 months ago

MOFA+R

A factor analysis framework for unsupervised integration of multi-omics datasets.

#transcriptomics#data-integration#mofa

Stars411

Forks63

Last commit8 days ago

GPN (Genomic Pre-trained Network)Jupyter Notebook

A collection of genomic language models for predicting variant effects and evolutionary constraints from DNA sequences.

#variant-effect-prediction#transformer-models#deep-learning

A deep convolutional neural network that predicts RNA-seq coverage at 32bp resolution from DNA sequence.

#deep-learning#computational-biology#genomics

Stars254

Forks33

Last commit10 months ago

CaduceusPython

A bi-directional equivariant transformer for long-range DNA sequence modeling, enabling reverse-complement aware genomic analysis.

#transformer#dna-sequence-modeling#masked-language-modeling

Stars249

Forks42

Last commit4 months ago

Biological Visualizations

A curated list of web-based interactive visualization tools for exploring biological data across genomics, transcriptomics, and other omics fields.

#transcriptomics#biology#open-science

Stars238

Forks22

Last commit2 years ago

rentrezR

An R package to interact with NCBI's Entrez system, enabling programmatic search and retrieval of biological data.

#r-package#reproducible-research#biological-databases

Stars218

Forks41

Last commit4 days ago

polars-bioPython

A Python library for blazing-fast, memory-efficient genomics data operations using DataFrames.

#apache-arrow#high-performance#dataframe

Stars183

Forks33

Last commit14 days ago

ProsegRust

A probabilistic cell segmentation method for spatial transcriptomics data from platforms like Xenium, CosMx, MERSCOPE, and Visium HD.

#probabilistic-modeling#single-cell-analysis#computational-biology

Stars180

Forks18

Last commit5 days ago

Computational BiologyPython

A curated collection of databases, software, and papers for computational biology research.

#single-cell-analysis#biological-databases#research-tools

A functional bioinformatics library for Scala providing strongly-typed DNA/RNA/protein sequences, transcription, translation, and alignment utilities.

#scientific-computing#functional-programming#dna-sequences

Stars115

Forks19

Last commit11 months ago