A Ruby gem for lemmatizing English text, converting inflected words to their base dictionary forms.
Lemmatizer is a Ruby gem that performs lemmatization on English text, converting inflected word forms (e.g., 'running', 'better', 'dogs') to their base dictionary forms (e.g., 'run', 'well', 'dog'). It solves the problem of text normalization in NLP pipelines by reducing morphological variations to improve consistency in analysis.
Ruby developers working on natural language processing, text mining, or linguistic analysis projects that require word normalization.
Developers choose Lemmatizer for its simplicity, WordNet-based accuracy, and extensibility through custom dictionaries, offering a lightweight alternative to heavier NLP suites.
Lemmatizer for text in English. Inspired by Python's nltk.corpus.reader.wordnet.morphy
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Leverages WordNet data for reliable lemma lookup, ensuring high accuracy for standard English words as demonstrated in the usage examples.
Supports user-supplied dict files to override or extend entries, enabling domain-specific customization shown with sample dictionaries.
Includes functionality to resolve abbreviations via 'abbr' tags in custom dicts, useful for expanding terms like 'utexas' to 'University of Texas'.
Leaves words not found in the dictionary intact, preventing data loss for proper names and unknown terms, as noted in the limitations section.
Limited to English text, making it unsuitable for multilingual projects without additional tools or libraries.
Can fail to lemmatize words like 'higher' correctly if listed as lemmas in the dictionary, requiring manual edits or custom files to fix.
To handle specialized vocabulary or fix inconsistencies, users must create and maintain custom dictionary files, adding maintenance burden.