A high-performance, regex-free Go tokenizer for parsing strings, slices, and infinite streams into customizable tokens.
Tokenizer is a Go library for lexical analysis that converts strings, slices, or infinite data streams into sequences of tokens. It is designed for high performance without using regular expressions, making it suitable for parsing various data formats like JSON, XML, YAML, and programming languages. The library emphasizes single-pass parsing, Unicode support, and customizable token definitions.
Go developers building parsers for custom data formats, programming languages, templates, or large/infinite data streams, such as those working on compilers, interpreters, configuration file processors, or template engines.
Developers choose Tokenizer for its regex-free, high-performance parsing that handles infinite streams without panicking, along with extensive customization options for tokens and support for Unicode and template injections. Its minimal API and flexibility make it stand out for complex parsing tasks where speed and control are critical.
Tokenizer (lexer) for golang
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Benchmarks show parsing speeds up to 9.5 MB/s for strings and 25 MB/s for infinite streams, optimized with single-pass parsing and no regex overhead.
Avoids regular expressions entirely, using manual token definitions for better control and performance, as emphasized in the README's philosophy.
Allows defining custom tokens, framed strings with injections, and Unicode support, demonstrated in SQL and JSON parsing examples.
Parses data from io.Reader in chunks without panicking, suitable for large or continuous data streams, as shown in the ParseStream method.
README admits a known issue where zero-byte (\x00) stops parsing, limiting use cases with binary data or certain text formats.
Requires extensive configuration to define tokens and build parsers; not a drop-in solution, adding development overhead.
Only provides lexical analysis; users must implement grammar rules and AST construction separately, which can be complex for full language parsing.