An R package that predicts doublets (multiple cells mistaken as one) in single-cell RNA sequencing data using artificial nearest neighbor analysis.
DoubletFinder is an R package that detects doublets—technical artifacts where multiple cells are incorrectly identified as a single cell—in single-cell RNA sequencing data. It solves the problem of data contamination by identifying and removing these doublets, which is critical for accurate clustering, differential expression analysis, and biological interpretation. The package uses a strategy of generating artificial doublets and analyzing k-nearest neighbor proportions in PCA space to predict real doublets in the dataset.
Bioinformaticians, computational biologists, and researchers analyzing single-cell RNA sequencing data who use the Seurat toolkit in R and need to ensure high data quality by removing technical artifacts.
Developers choose DoubletFinder for its robust, dataset-specific parameter optimization via BCmvn, its seamless integration with Seurat workflows, and its ability to adjust for homotypic doublets, leading to more accurate and interpretable single-cell analyses compared to generic thresholding methods.
R package for detecting doublets in single-cell RNA sequencing data
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses mean-variance normalized bimodality coefficient (BCmvn) to automatically determine optimal pK values for each dataset, as detailed in the pK selection section, ensuring tailored accuracy.
Compatible with Seurat versions 2.0 through 5 and supports both standard and SCTransform workflows, as noted in updates and dependencies, simplifying integration into existing pipelines.
Accounts for doublets from transcriptionally similar cell states via modelHomotypic function, reducing false positives, as explained in the doublet number estimation section.
Validated against sample-multiplexing methods like Cell Hashing and Demuxlet, with results shown in screenshots, demonstrating reliability in real-world scenarios.
Recently added a maintainer for improvements and made compatible with Seurat v5, as per the updates log, ensuring ongoing support and compatibility.
Requires users to manually estimate pK using BCmvn and adjust doublet numbers for homotypic proportions, which can be error-prone and time-consuming, as acknowledged in best practices.
Tightly coupled with Seurat; not designed for other single-cell analysis ecosystems without data conversion, limiting flexibility for non-Seurat users.
Admits reduced sensitivity in transcriptionally homogenous datasets, as shown in simulation results, making it less reliable for certain biological contexts.
Homotypic adjustment relies on literature-supported cell type annotations, which may be inaccurate or unavailable, introducing potential bias in doublet estimation.