A native Go implementation of the Porter Stemming algorithm for NLP and machine learning tasks.
Go Porter Stemmer is a Go library that implements the Porter Stemming algorithm for reducing words to their root forms. It's designed specifically for natural language processing and machine learning applications where word normalization is required. The implementation is written natively in Go with efficiency optimizations using rune slices.
Go developers working on natural language processing, text analysis, or machine learning projects that require word stemming functionality.
Developers choose this library because it's a clean room Go implementation (not a port) that includes documented algorithm departures for correctness, offers efficient rune-based processing, and provides multiple API options for different performance needs.
A native Go clean room implementation of the Porter Stemming algorithm.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Written from scratch in Go, not a port, ensuring idiomatic code and better integration with Go's ecosystem, as emphasized in the README.
Uses []rune slices and shared buffers to minimize memory allocations, optimizing performance for large-scale text processing tasks.
Includes two documented departures from the original Porter algorithm to match established test cases, providing accurate and reliable stemming results.
Offers multiple functions like StemString, Stem, and StemWithoutLowerCasing to cater to different use cases, from simplicity to performance optimization.
Functions modify input slices as a side effect, which can lead to data corruption if slices are reused or accessed concurrently, requiring careful handling.
Only implements the Porter Stemmer for English, making it unsuitable for projects needing multi-lingual or language-agnostic stemming capabilities.
The README provides basic usage examples but lacks detailed API docs, error handling guidance, or performance benchmarks for advanced scenarios.