Fast, sensitive, and accurate integration of single-cell RNA-seq data across multiple datasets, batches, or experimental conditions.
Harmony is an R package designed for fast, sensitive, and accurate integration of single-cell RNA sequencing data. It solves the critical problem of batch effects and technical variation when combining datasets from different experiments, platforms, or conditions, enabling unified analysis of scRNA-seq data. The algorithm preserves biological heterogeneity while removing unwanted technical artifacts.
Bioinformaticians, computational biologists, and researchers working with single-cell RNA sequencing data who need to integrate multiple datasets or correct for batch effects in their analyses.
Developers choose Harmony for its proven accuracy in preserving biological signals, seamless integration with popular tools like Seurat, and computational efficiency that handles large-scale datasets while remaining accessible through a user-friendly R interface.
Fast, sensitive and accurate integration of single-cell data with Harmony
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Harmony's algorithm is optimized for efficiency, handling large datasets quickly as noted in the README's performance benchmarks and emphasis on speed.
It can simultaneously integrate over multiple covariates like dataset, donor, and batch ID, providing comprehensive batch effect removal for complex experimental designs.
Offers a dedicated RunHarmony() function for easy integration into popular Seurat workflows, reducing pipeline complexity and enhancing user-friendliness.
Can be used independently with PCA embeddings and metadata, allowing integration into various analysis pipelines beyond Seurat, as shown in the standalone vignette.
Performance heavily relies on BLAS vs. OPENBLAS backends, requiring users to configure their R environment for optimal speed, which can be complex and non-intuitive.
Harmony is not optimized for multi-threading; by default, it turns off parallelization, and manual tuning with the ncores parameter is needed for large datasets, often leading to inefficient CPU use.
Primarily an R package with Python support through a separate community project (harmonypy), fragmenting the tool's ecosystem and limiting accessibility for non-R users.