A Ruby gem for calculating edit distance between strings using Levenshtein, Damerau-Levenshtein, and Boehmer & Rees algorithms.
damerau-levenshtein is a Ruby library that implements edit distance algorithms for measuring similarity between strings or arrays. It calculates the minimum number of single-character edits (insertions, deletions, substitutions, and transpositions) required to change one sequence into another. The gem provides three algorithm variants: classic Levenshtein, Damerau-Levenshtein (which counts adjacent character swaps as one edit), and a modified version supporting block transpositions.
Ruby developers working on text processing, spell checking, fuzzy matching, bioinformatics, or any application requiring string similarity measurements. It's particularly useful for those needing fine-grained control over edit distance calculations.
Unlike simpler string comparison methods, this gem offers multiple algorithm choices with configurable parameters like block size and distance thresholds. Its ability to generate detailed diffs and handle UTF-8 text makes it more versatile than basic implementations.
Calculates edit distance using Damerau-Levenshtein algorithm
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Supports Levenshtein, Damerau-Levenshtein, and Boehmer & Rees modifications, allowing precise control over edit distance calculations for different use cases, as demonstrated in the README with examples like block_size adjustments.
Handles international characters correctly (e.g., 'Sjöstedt' vs 'Sjostedt') and can compare arrays of integers, making it versatile for text and sequence analysis beyond simple strings.
Generates differences between strings with tag-based (<ins>, <del>, <subst>) or raw formats, enabling flexible integration with tools like Nokogiri for highlighting changes, as shown in the parsing example.
Includes a max_distance parameter to stop computation early, improving efficiency when only interested in small edit distances, which is documented in the API for reducing unnecessary calculations.
Requires installation of build-essential and libgmp3-dev via apt-get, which can complicate setup on non-Debian-based systems or environments without sudo access, adding friction for cross-platform deployment.
Focuses solely on edit distance without built-in support for common fuzzy matching techniques like n-grams or phonetic algorithms, which might necessitate additional libraries for broader text similarity tasks.
With O(N*M) time complexity, it can be slow for very long strings or high-volume comparisons, and while max_distance helps, it doesn't address fundamental scalability issues for large-scale applications.