A Ruby gem providing a fast, accurate, and encoding-aware implementation of the Jaro-Winkler string similarity algorithm.
jaro_winkler is a Ruby library that calculates the Jaro-Winkler similarity between two strings, providing a measure of how alike they are. It solves the problem of fuzzy string matching with high accuracy and performance, especially for tasks like data deduplication, record linkage, or spell-checking. The implementation supports multiple string encodings and offers configurable parameters for fine-tuned comparisons.
Ruby developers working on text processing, data cleaning, or applications requiring fuzzy string matching, such as search engines, data pipelines, or natural language processing tools.
Developers choose jaro_winkler for its combination of speed (fastest among similar gems), accuracy (matches original algorithm results), and encoding support, along with a clean, idiomatic Ruby API that improves upon older alternatives.
Ruby & C implementation of Jaro-Winkler distance algorithm which supports UTF-8 string.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
The optimized C extension makes it the fastest among similar gems, as demonstrated in benchmark tests showing it outperforms competitors like fuzzy-string-match and amatch.
It handles any string encoding, including UTF-8, EUC-JP, and Big5, ensuring reliable processing of international and multilingual text without errors.
Automatically switches to a pure Ruby implementation on platforms like JRuby or Rubinius, maintaining compatibility where C extensions aren't available.
Closely matches the original Jaro-Winkler algorithm results, with test data showing consistency against the author's C implementation, unlike some buggy alternatives.
Only implements Jaro-Winkler similarity, so developers needing other fuzzy matching methods must integrate additional libraries, increasing complexity.
The README's TODO list admits missing custom adjusting word tables, limiting adaptability to specific phonetic or recognition error patterns beyond the default set.
While it has a fallback, reliance on C extensions can cause installation issues on some Ruby environments or operating systems, potentially requiring manual compilation.