A fast C++ library for fuzzy string matching using Levenshtein Distance, offering MIT licensing and algorithmic improvements.
RapidFuzz is a fast fuzzy string matching library for C++ that calculates string similarity using the Levenshtein Distance. It provides the same functionality as FuzzyWuzzy but with significant performance improvements and a permissive MIT license. The library is designed for efficient text comparison tasks, offering various ratio algorithms and cached scorers for optimal speed.
C++ developers and data engineers who need high-performance fuzzy string matching for applications like data deduplication, search engines, or natural language processing.
Developers choose RapidFuzz over alternatives like FuzzyWuzzy for its MIT licensing (avoiding GPL restrictions), C++-based performance optimizations, and algorithmic improvements that deliver faster matching without sacrificing accuracy.
Rapid fuzzy string matching in C++ using the Levenshtein Distance
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Written in C++ with algorithmic improvements, providing significant speed gains over FuzzyWuzzy, as documented in benchmarks linked from the README.
Allows integration into any project without GPL restrictions, making it suitable for both open-source and commercial use, unlike FuzzyWuzzy's licensing.
Supports multiple ratio calculations (simple, partial, token sort, token set) based on Levenshtein Distance, enabling diverse fuzzy matching scenarios.
Includes CachedRatio for repeated comparisons against multiple strings, reducing computation time in batch operations, as shown in usage examples.
Easily integrates with OpenMP for multithreading, boosting performance on large datasets with example code provided in the README.
Unlike the Python version, C++ lacks ready-to-use modules like process.extract; users must manually implement these functions, as admitted in the README.
Heavy reliance on CMake for installation and linking can be a barrier for projects using alternative build systems or requiring simple package manager support.
For common tasks like comparing a string to a list, additional code is needed, increasing development time compared to more out-of-the-box libraries.
RapidFuzz is an open-source alternative to the following products: