Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Computational Biology
  3. scIB (Single-cell Integration Benchmarks)

scIB (Single-cell Integration Benchmarks)

MITPythonv1.2.1

A Python package for benchmarking and evaluating single-cell genomics data integration methods.

GitHubGitHub
423 stars76 forks0 contributors

What is scIB (Single-cell Integration Benchmarks)?

scib is a Python package for benchmarking and evaluating data integration tools in single-cell genomics. It provides a standardized set of metrics and workflows to assess how well integration methods remove technical batch effects while preserving biologically meaningful variation. The package was developed to support a large-scale study comparing 16 integration methods across 85 datasets.

Target Audience

Bioinformaticians, computational biologists, and researchers working with single-cell RNA-seq or ATAC-seq data who need to integrate multiple datasets and evaluate integration quality.

Value Proposition

Developers choose scib because it offers a comprehensive, reproducible, and extensible framework specifically designed for benchmarking single-cell data integration, backed by a peer-reviewed study and integration with the popular scanpy ecosystem.

Overview

Benchmarking analysis of data integration tools

Use Cases

Best For

  • Benchmarking new single-cell data integration methods against existing tools
  • Evaluating batch correction performance in combined single-cell datasets
  • Assessing biological conservation after integrating multiple single-cell experiments
  • Reproducing integration benchmarks from the original Nature Methods study
  • Streamlining preprocessing and integration workflows in single-cell analysis pipelines
  • Comparing integration methods across different preprocessing combinations

Not Ideal For

  • Projects requiring quick, one-off integration without benchmarking overhead
  • Researchers working with bulk RNA-seq or other non-single-cell omics data
  • Teams exclusively using R-based workflows without Python integration

Pros & Cons

Pros

Comprehensive Metric Suite

Implements over a dozen metrics for batch correction (e.g., Batch ASW, kBET) and biological conservation (e.g., ARI, cell type ASW), providing a standardized evaluation framework as detailed in the README table.

Scanpy Ecosystem Integration

Built on the popular scanpy library, streamlining preprocessing and integration workflows within a familiar single-cell analysis environment, reducing implementation friction.

Reproducible Benchmarking Pipeline

Includes a separate scib-pipeline repository that automates comparisons across preprocessing combinations and integration methods, ensuring reproducibility from the original Nature Methods study.

Extensible with Optional Dependencies

Supports optional dependencies for R-based methods and additional tools, allowing customization for specific integration needs, though installation requires manual steps as noted in the README.

Cons

Complex Dependency Management

Requires manual installation of optional dependencies, such as R packages like kBET, which can be cumbersome and error-prone for users not familiar with mixed Python/R environments.

Limited to Single-Cell Genomics

Focused exclusively on single-cell RNA-seq and ATAC-seq data, making it unsuitable for benchmarking integration in other modalities like spatial transcriptomics or proteomics without significant adaptation.

Computationally Intensive

The comprehensive benchmarking process, involving multiple metrics and preprocessing combinations, can be slow and resource-heavy, not ideal for rapid prototyping or very large datasets.

Frequently Asked Questions

Quick Stats

Stars423
Forks76
Contributors0
Open Issues39
Last commit2 months ago
CreatedSince 2019

Tags

#batch-correction#data-integration#single-cell-genomics#computational-biology#python#bioinformatics#benchmarking#scanpy

Built With

a
anndata
S
Scanpy
p
pre-commit
P
Python
p
pytest

Included in

Computational Biology122
Auto-fetched 2 hours ago

Related Projects

MOSESMOSES

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Stars979
Forks280
Last commit2 years ago
TAPE (Tasks Assessing Protein Embeddings)TAPE (Tasks Assessing Protein Embeddings)

Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology.

Stars740
Forks135
Last commit3 years ago
GuacaMolGuacaMol

Benchmarks for generative chemistry

Stars525
Forks99
Last commit2 years ago
ProteinGymProteinGym

Official repository for the ProteinGym benchmarks

Stars442
Forks58
Last commit3 months ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub