A Unicode-aware, append-only n-gram index library for Go with memory-efficient string pooling.
go-ngram is a Go library for creating n-gram indexes, which break text into overlapping character sequences to enable fast search and similarity matching. It solves the problem of efficient text indexing and retrieval in Go applications, particularly where memory usage and performance are critical.
Go developers building applications that require text search, autocomplete, or similarity detection, such as search engines, data processing pipelines, or NLP tools.
Developers choose go-ngram for its focus on memory efficiency through string pooling, Unicode support, and a simple, append-only API that avoids complex document management overhead.
Ngram index for golang
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Supports international characters beyond ASCII, enabling reliable indexing of multilingual text as highlighted in the README.
Uses string pooling and compression to reduce memory overhead and garbage collection pressure, making it efficient for text-heavy applications.
Application agnostic with no built-in document model, allowing developers to integrate it into various use cases without unnecessary complexity.
Prevents data deletion, ensuring a consistent index structure and avoiding fragmentation issues, which simplifies maintenance.
The append-only design means data cannot be deleted, forcing workarounds like versioning or external filtering for dynamic datasets.
Lacks smoothing functions (e.g., Laplace) mentioned in the TODO, limiting its usefulness for advanced text processing tasks without custom implementation.
Users must build document management and search logic from scratch, as it provides only basic n-gram indexing without out-of-the-box search capabilities.