A Python library for approximate and phonetic string matching, implementing algorithms like Levenshtein distance and Soundex.
Jellyfish is a Python library that implements a variety of algorithms for approximate and phonetic string matching. It solves problems like comparing similar strings with typos, variations, or different spellings, which is common in data cleaning, search, and record linkage tasks.
Python developers working on data processing, data cleaning, record deduplication, search engines, or any application requiring fuzzy string comparison.
Developers choose Jellyfish because it bundles multiple well-known string matching algorithms into a single, easy-to-use library with a consistent API, eliminating the need to implement these algorithms from scratch.
🪼 a python library for doing approximate and phonetic matching of strings.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Includes a wide range of string comparison and phonetic encoding methods, such as Levenshtein, Jaro-Winkler, and Metaphone, as listed in the README, covering diverse use cases from typos to name matching.
Offers a uniform interface for all functions, exemplified by the clean usage examples in the README, making it easy to integrate and switch between algorithms without steep learning curves.
Hosted on Codeberg with documentation links and issue tracking, as shown in the README, indicating active development and community support for reliability.
Provides multiple phonetic algorithms like Soundex and NYSIIS, detailed in the README, which are essential for tasks like record linkage with spelling variations.
As a pure Python library, it may not handle large-scale string matching as efficiently as compiled alternatives, which could slow down data-intensive pipelines compared to C or Rust implementations.
Limited to the provided algorithms without built-in extensibility for custom methods, requiring code modifications for state-of-the-art or niche similarity measures not included.
Phonetic encodings like Soundex and Metaphone are optimized for English, as noted in the features, limiting effectiveness for multilingual applications without additional preprocessing or external tools.