How do I choose the best pK value in DoubletFinder?

Use the paramSweep and find.pK functions to compute BCmvn scores from parameter sweeps, then select the pK with the highest BCmvn peak. The README includes examples and warns to spot-check results in gene expression space for ambiguity.

Can I run DoubletFinder on merged data from multiple 10x lanes?

Only if lanes contain the same sample; the README advises against using aggregated data from distinct samples or integrated Seurat objects to avoid artificial doublets that skew results, as detailed in the best practices section.

DoubletFinder vs Scrublet: which is better?

DoubletFinder is R/Seurat-focused with robust parameter tuning and homotypic adjustment, while Scrublet is Python-based and may be simpler for Scanpy users. Choice depends on your pipeline; DoubletFinder excels in Seurat workflows but requires more manual setup.

How to adjust for homotypic doublets in my analysis?

Use the modelHomotypic function with cell type annotations to estimate the homotypic proportion, then adjust the expected doublet count (nExp) accordingly. The README provides example code and cautions that annotations may not fully capture transcriptional divergence.

Is DoubletFinder compatible with the latest Seurat version?

Yes, updates as of November 2023 made it compatible with Seurat v5, and a maintainer was added in 2025 for ongoing support. Check the installation from GitHub to ensure you have the latest version.

What's a common mistake when using DoubletFinder?

Overestimating doublet rates without homotypic adjustment or using incorrect pK values. The README emphasizes using BCmvn for pK selection and modeling homotypic proportions to avoid false positives, as illustrated in the example code.

How do I estimate doublet rates without ground truth labels?

Use Poisson statistics based on cell loading density from platform guides, then adjust for homotypic doublets with modelHomotypic. The README's FAQ links to issues discussing anticipated rates and provides strategies for estimation.

DoubletFinder — R Package for scRNA-seq Doublets

What is DoubletFinder?

DoubletFinder is an R package that detects doublets—technical artifacts where multiple cells are incorrectly identified as a single cell—in single-cell RNA sequencing data. It solves the problem of data contamination by identifying and removing these doublets, which is critical for accurate clustering, differential expression analysis, and biological interpretation. The package uses a strategy of generating artificial doublets and analyzing k-nearest neighbor proportions in PCA space to predict real doublets in the dataset.

Target Audience

Bioinformaticians, computational biologists, and researchers analyzing single-cell RNA sequencing data who use the Seurat toolkit in R and need to ensure high data quality by removing technical artifacts.

Value Proposition

Developers choose DoubletFinder for its robust, dataset-specific parameter optimization via BCmvn, its seamless integration with Seurat workflows, and its ability to adjust for homotypic doublets, leading to more accurate and interpretable single-cell analyses compared to generic thresholding methods.

R package for detecting doublets in single-cell RNA sequencing data

Use Cases

Best For

Identifying heterotypic doublets in 10x Genomics scRNA-seq data
Quality control preprocessing for Seurat-based single-cell analysis pipelines
Validating doublet detection against ground-truth methods like Cell Hashing or Demuxlet
Analyzing transcriptionally diverse cell populations where doublets are likely
Researchers needing to estimate detectable doublet rates without sample multiplexing
Integrating doublet removal into reproducible bioinformatics workflows in R

Not Ideal For

Projects using Python-based single-cell analysis pipelines like Scanpy or AnnData
Datasets with highly homogenous cell populations or low transcriptional diversity
Aggregated scRNA-seq data from distinct samples or integrated Seurat objects
Researchers needing a fully automated, no-parameter-tuning doublet detection tool

Pros & Cons

Pros

Dataset-Specific Parameter Optimization

Uses mean-variance normalized bimodality coefficient (BCmvn) to automatically determine optimal pK values for each dataset, as detailed in the pK selection section, ensuring tailored accuracy.

Seamless Seurat Integration

Compatible with Seurat versions 2.0 through 5 and supports both standard and SCTransform workflows, as noted in updates and dependencies, simplifying integration into existing pipelines.

Homotypic Doublet Adjustment

Accounts for doublets from transcriptionally similar cell states via modelHomotypic function, reducing false positives, as explained in the doublet number estimation section.

Ground-Truth Validated Performance

Validated against sample-multiplexing methods like Cell Hashing and Demuxlet, with results shown in screenshots, demonstrating reliability in real-world scenarios.

Active Maintenance Updates

Recently added a maintainer for improvements and made compatible with Seurat v5, as per the updates log, ensuring ongoing support and compatibility.

Cons

Manual Parameter Tuning Complexity

Requires users to manually estimate pK using BCmvn and adjust doublet numbers for homotypic proportions, which can be error-prone and time-consuming, as acknowledged in best practices.

Limited Framework Compatibility

Tightly coupled with Seurat; not designed for other single-cell analysis ecosystems without data conversion, limiting flexibility for non-Seurat users.

Poor Performance on Homogeneous Data

Admits reduced sensitivity in transcriptionally homogenous datasets, as shown in simulation results, making it less reliable for certain biological contexts.

Dependency on External Annotations

Homotypic adjustment relies on literature-supported cell type annotations, which may be inaccurate or unavailable, introducing potential bias in doublet estimation.

Frequently Asked Questions

What is DoubletFinder?

Target Audience

Value Proposition

Use Cases

Best For

Identifying heterotypic doublets in 10x Genomics scRNA-seq data
Quality control preprocessing for Seurat-based single-cell analysis pipelines
Validating doublet detection against ground-truth methods like Cell Hashing or Demuxlet
Analyzing transcriptionally diverse cell populations where doublets are likely
Researchers needing to estimate detectable doublet rates without sample multiplexing
Integrating doublet removal into reproducible bioinformatics workflows in R

Not Ideal For

Projects using Python-based single-cell analysis pipelines like Scanpy or AnnData
Datasets with highly homogenous cell populations or low transcriptional diversity
Aggregated scRNA-seq data from distinct samples or integrated Seurat objects
Researchers needing a fully automated, no-parameter-tuning doublet detection tool

Pros & Cons

Pros

Dataset-Specific Parameter Optimization

Uses mean-variance normalized bimodality coefficient (BCmvn) to automatically determine optimal pK values for each dataset, as detailed in the pK selection section, ensuring tailored accuracy.

Seamless Seurat Integration

Compatible with Seurat versions 2.0 through 5 and supports both standard and SCTransform workflows, as noted in updates and dependencies, simplifying integration into existing pipelines.

Homotypic Doublet Adjustment

Accounts for doublets from transcriptionally similar cell states via modelHomotypic function, reducing false positives, as explained in the doublet number estimation section.

Ground-Truth Validated Performance

Validated against sample-multiplexing methods like Cell Hashing and Demuxlet, with results shown in screenshots, demonstrating reliability in real-world scenarios.

Active Maintenance Updates

Recently added a maintainer for improvements and made compatible with Seurat v5, as per the updates log, ensuring ongoing support and compatibility.

Cons

Manual Parameter Tuning Complexity

Requires users to manually estimate pK using BCmvn and adjust doublet numbers for homotypic proportions, which can be error-prone and time-consuming, as acknowledged in best practices.

Limited Framework Compatibility

Tightly coupled with Seurat; not designed for other single-cell analysis ecosystems without data conversion, limiting flexibility for non-Seurat users.

Poor Performance on Homogeneous Data

Admits reduced sensitivity in transcriptionally homogenous datasets, as shown in simulation results, making it less reliable for certain biological contexts.

Dependency on External Annotations

Homotypic adjustment relies on literature-supported cell type annotations, which may be inaccurate or unavailable, introducing potential bias in doublet estimation.

Frequently Asked Questions

DoubletFinder

What is DoubletFinder?

Overview

Use Cases

Best For

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions

Related Projects

Found a gem we're missing?

DoubletFinder

What is DoubletFinder?

Overview

Use Cases

Best For

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions

Related Projects

Found a gem we're missing?