A Python package for benchmarking and evaluating single-cell genomics data integration methods.
scib is a Python package for benchmarking and evaluating data integration tools in single-cell genomics. It provides a standardized set of metrics and workflows to assess how well integration methods remove technical batch effects while preserving biologically meaningful variation. The package was developed to support a large-scale study comparing 16 integration methods across 85 datasets.
Bioinformaticians, computational biologists, and researchers working with single-cell RNA-seq or ATAC-seq data who need to integrate multiple datasets and evaluate integration quality.
Developers choose scib because it offers a comprehensive, reproducible, and extensible framework specifically designed for benchmarking single-cell data integration, backed by a peer-reviewed study and integration with the popular scanpy ecosystem.
Benchmarking analysis of data integration tools
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Implements over a dozen metrics for batch correction (e.g., Batch ASW, kBET) and biological conservation (e.g., ARI, cell type ASW), providing a standardized evaluation framework as detailed in the README table.
Built on the popular scanpy library, streamlining preprocessing and integration workflows within a familiar single-cell analysis environment, reducing implementation friction.
Includes a separate scib-pipeline repository that automates comparisons across preprocessing combinations and integration methods, ensuring reproducibility from the original Nature Methods study.
Supports optional dependencies for R-based methods and additional tools, allowing customization for specific integration needs, though installation requires manual steps as noted in the README.
Requires manual installation of optional dependencies, such as R packages like kBET, which can be cumbersome and error-prone for users not familiar with mixed Python/R environments.
Focused exclusively on single-cell RNA-seq and ATAC-seq data, making it unsuitable for benchmarking integration in other modalities like spatial transcriptomics or proteomics without significant adaptation.
The comprehensive benchmarking process, involving multiple metrics and preprocessing combinations, can be slow and resource-heavy, not ideal for rapid prototyping or very large datasets.