Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Computational Biology
  3. DoubletFinder

DoubletFinder

R

An R package that predicts doublets (multiple cells mistaken as one) in single-cell RNA sequencing data using artificial nearest neighbor analysis.

GitHubGitHub
553 stars128 forks0 contributors

What is DoubletFinder?

DoubletFinder is an R package that detects doublets—technical artifacts where multiple cells are incorrectly identified as a single cell—in single-cell RNA sequencing data. It solves the problem of data contamination by identifying and removing these doublets, which is critical for accurate clustering, differential expression analysis, and biological interpretation. The package uses a strategy of generating artificial doublets and analyzing k-nearest neighbor proportions in PCA space to predict real doublets in the dataset.

Target Audience

Bioinformaticians, computational biologists, and researchers analyzing single-cell RNA sequencing data who use the Seurat toolkit in R and need to ensure high data quality by removing technical artifacts.

Value Proposition

Developers choose DoubletFinder for its robust, dataset-specific parameter optimization via BCmvn, its seamless integration with Seurat workflows, and its ability to adjust for homotypic doublets, leading to more accurate and interpretable single-cell analyses compared to generic thresholding methods.

Overview

R package for detecting doublets in single-cell RNA sequencing data

Use Cases

Best For

  • Identifying heterotypic doublets in 10x Genomics scRNA-seq data
  • Quality control preprocessing for Seurat-based single-cell analysis pipelines
  • Validating doublet detection against ground-truth methods like Cell Hashing or Demuxlet
  • Analyzing transcriptionally diverse cell populations where doublets are likely
  • Researchers needing to estimate detectable doublet rates without sample multiplexing
  • Integrating doublet removal into reproducible bioinformatics workflows in R

Not Ideal For

  • Projects using Python-based single-cell analysis pipelines like Scanpy or AnnData
  • Datasets with highly homogenous cell populations or low transcriptional diversity
  • Aggregated scRNA-seq data from distinct samples or integrated Seurat objects
  • Researchers needing a fully automated, no-parameter-tuning doublet detection tool

Pros & Cons

Pros

Dataset-Specific Parameter Optimization

Uses mean-variance normalized bimodality coefficient (BCmvn) to automatically determine optimal pK values for each dataset, as detailed in the pK selection section, ensuring tailored accuracy.

Seamless Seurat Integration

Compatible with Seurat versions 2.0 through 5 and supports both standard and SCTransform workflows, as noted in updates and dependencies, simplifying integration into existing pipelines.

Homotypic Doublet Adjustment

Accounts for doublets from transcriptionally similar cell states via modelHomotypic function, reducing false positives, as explained in the doublet number estimation section.

Ground-Truth Validated Performance

Validated against sample-multiplexing methods like Cell Hashing and Demuxlet, with results shown in screenshots, demonstrating reliability in real-world scenarios.

Active Maintenance Updates

Recently added a maintainer for improvements and made compatible with Seurat v5, as per the updates log, ensuring ongoing support and compatibility.

Cons

Manual Parameter Tuning Complexity

Requires users to manually estimate pK using BCmvn and adjust doublet numbers for homotypic proportions, which can be error-prone and time-consuming, as acknowledged in best practices.

Limited Framework Compatibility

Tightly coupled with Seurat; not designed for other single-cell analysis ecosystems without data conversion, limiting flexibility for non-Seurat users.

Poor Performance on Homogeneous Data

Admits reduced sensitivity in transcriptionally homogenous datasets, as shown in simulation results, making it less reliable for certain biological contexts.

Dependency on External Annotations

Homotypic adjustment relies on literature-supported cell type annotations, which may be inaccurate or unavailable, introducing potential bias in doublet estimation.

Frequently Asked Questions

Quick Stats

Stars553
Forks128
Contributors0
Open Issues18
Last commit1 year ago
CreatedSince 2018

Tags

#r-package#single-cell-rna-seq#genomics#bioinformatics

Built With

R
R

Included in

Computational Biology122
Auto-fetched 3 hours ago

Related Projects

DeepChemDeepChem

Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry, Materials Science and Biology

Stars6,820
Forks2,250
Last commit9 days ago
RDKitRDKit

The official sources for the RDKit library

Stars3,501
Forks1,029
Last commit3 hours ago
STARSTAR

RNA-seq aligner

Stars2,218
Forks549
Last commit1 year ago
CellChatCellChat

R toolkit for inference, visualization and analysis of cell-cell communication from single-cell data

Stars792
Forks169
Last commit2 years ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub