How do I install scib with R dependencies?

Use pip with extras, e.g., 'pip install scib[rpy2]', and separately install R packages like kBET via R commands as specified in the README. This ensures compatibility but adds setup complexity.

What's the difference between scib and scanpy for integration?

scib extends scanpy by providing standardized metrics and workflows specifically for benchmarking integration methods, whereas scanpy is a general toolkit for single-cell analysis without built-in evaluation suites.

How to benchmark a new integration method with scib?

Use the scib.metrics module to evaluate your method's output against the standard metrics, and integrate it into the scib-pipeline for automated comparisons with existing tools, following the documentation.

Does scib work with Seurat?

Yes, scib supports Seurat v3 integration, but it requires optional dependencies and might involve data conversion between Python and R, which can be tricky for seamless workflows.

scib or Harmony for integration evaluation?

scib is a benchmarking framework that can evaluate Harmony among other methods; it's not a direct alternative—use scib to assess Harmony's performance against metrics like Batch ASW in your datasets.

Open-Awesome

scIB (Single-cell Integration Benchmarks)

MITPythonv1.2.1

A Python package for benchmarking and evaluating single-cell genomics data integration methods.

GitHub

423 stars76 forks0 contributors

What is scIB (Single-cell Integration Benchmarks)?

scib is a Python package for benchmarking and evaluating data integration tools in single-cell genomics. It provides a standardized set of metrics and workflows to assess how well integration methods remove technical batch effects while preserving biologically meaningful variation. The package was developed to support a large-scale study comparing 16 integration methods across 85 datasets.

Target Audience

Bioinformaticians, computational biologists, and researchers working with single-cell RNA-seq or ATAC-seq data who need to integrate multiple datasets and evaluate integration quality.

Value Proposition

Developers choose scib because it offers a comprehensive, reproducible, and extensible framework specifically designed for benchmarking single-cell data integration, backed by a peer-reviewed study and integration with the popular scanpy ecosystem.

Overview

Benchmarking analysis of data integration tools

Use Cases

Best For

Benchmarking new single-cell data integration methods against existing tools
Evaluating batch correction performance in combined single-cell datasets
Assessing biological conservation after integrating multiple single-cell experiments
Reproducing integration benchmarks from the original Nature Methods study
Streamlining preprocessing and integration workflows in single-cell analysis pipelines
Comparing integration methods across different preprocessing combinations

Not Ideal For

Projects requiring quick, one-off integration without benchmarking overhead
Researchers working with bulk RNA-seq or other non-single-cell omics data
Teams exclusively using R-based workflows without Python integration

Pros & Cons

Pros

Comprehensive Metric Suite

Implements over a dozen metrics for batch correction (e.g., Batch ASW, kBET) and biological conservation (e.g., ARI, cell type ASW), providing a standardized evaluation framework as detailed in the README table.

Scanpy Ecosystem Integration

Built on the popular scanpy library, streamlining preprocessing and integration workflows within a familiar single-cell analysis environment, reducing implementation friction.

Reproducible Benchmarking Pipeline

Includes a separate scib-pipeline repository that automates comparisons across preprocessing combinations and integration methods, ensuring reproducibility from the original Nature Methods study.

Extensible with Optional Dependencies

Supports optional dependencies for R-based methods and additional tools, allowing customization for specific integration needs, though installation requires manual steps as noted in the README.

Cons

Complex Dependency Management

Requires manual installation of optional dependencies, such as R packages like kBET, which can be cumbersome and error-prone for users not familiar with mixed Python/R environments.

Limited to Single-Cell Genomics

Focused exclusively on single-cell RNA-seq and ATAC-seq data, making it unsuitable for benchmarking integration in other modalities like spatial transcriptomics or proteomics without significant adaptation.

Computationally Intensive

The comprehensive benchmarking process, involving multiple metrics and preprocessing combinations, can be slow and resource-heavy, not ideal for rapid prototyping or very large datasets.

Frequently Asked Questions

Related Projects

MOSES

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Stars979

Forks280

Last commit2 years ago

TAPE (Tasks Assessing Protein Embeddings)

Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology.

Stars740

Forks135

Last commit3 years ago

GuacaMol

Benchmarks for generative chemistry

Stars525

Forks99

Last commit2 years ago

ProteinGym

Official repository for the ProteinGym benchmarks

Stars442

Forks58

Last commit3 months ago

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub

scIB (Single-cell Integration Benchmarks)

MITPythonv1.2.1

A Python package for benchmarking and evaluating single-cell genomics data integration methods.

GitHub

423 stars76 forks0 contributors

What is scIB (Single-cell Integration Benchmarks)?

Target Audience

Bioinformaticians, computational biologists, and researchers working with single-cell RNA-seq or ATAC-seq data who need to integrate multiple datasets and evaluate integration quality.

Value Proposition

Overview

Benchmarking analysis of data integration tools

Use Cases

Best For

Benchmarking new single-cell data integration methods against existing tools
Evaluating batch correction performance in combined single-cell datasets
Assessing biological conservation after integrating multiple single-cell experiments
Reproducing integration benchmarks from the original Nature Methods study
Streamlining preprocessing and integration workflows in single-cell analysis pipelines
Comparing integration methods across different preprocessing combinations

Not Ideal For

Projects requiring quick, one-off integration without benchmarking overhead
Researchers working with bulk RNA-seq or other non-single-cell omics data
Teams exclusively using R-based workflows without Python integration

Pros & Cons

Pros

Comprehensive Metric Suite

Scanpy Ecosystem Integration

Built on the popular scanpy library, streamlining preprocessing and integration workflows within a familiar single-cell analysis environment, reducing implementation friction.

Reproducible Benchmarking Pipeline

Includes a separate scib-pipeline repository that automates comparisons across preprocessing combinations and integration methods, ensuring reproducibility from the original Nature Methods study.

Extensible with Optional Dependencies

Supports optional dependencies for R-based methods and additional tools, allowing customization for specific integration needs, though installation requires manual steps as noted in the README.

Cons

Complex Dependency Management

Requires manual installation of optional dependencies, such as R packages like kBET, which can be cumbersome and error-prone for users not familiar with mixed Python/R environments.

Limited to Single-Cell Genomics

Computationally Intensive

The comprehensive benchmarking process, involving multiple metrics and preprocessing combinations, can be slow and resource-heavy, not ideal for rapid prototyping or very large datasets.

Frequently Asked Questions

Related Projects

MOSES

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

Stars979

Forks280

Last commit2 years ago

TAPE (Tasks Assessing Protein Embeddings)

Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology.

Stars740

Forks135

Last commit3 years ago

GuacaMol

Benchmarks for generative chemistry

Stars525

Forks99

Last commit2 years ago

ProteinGym

Official repository for the ProteinGym benchmarks

Stars442

Forks58