A Swift library for tokenizing strings using character sets and custom tokenizers when whitespace splitting is insufficient.
Mustard is a Swift library for tokenizing strings when splitting by whitespace is inadequate. It enables substring extraction based on character sets or custom tokenizers, allowing developers to parse text with complex or missing separators. The library returns tokens with metadata like ranges and additional context, such as dates for matched date patterns.
Swift developers working on iOS, macOS, or other Apple platforms who need to parse strings with irregular formats, such as log files, user input, or text with embedded data like dates and emojis.
Developers choose Mustard for its protocol-oriented extensibility, allowing custom tokenizers for specific patterns, and its performance-optimized design compared to alternatives like Foundation's `Scanner`. It fills a gap where standard string splitting methods fall short.
🌭 Mustard is a Swift library for tokenizing strings when splitting by whitespace doesn't cut it.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Enables substring matching using CharacterSet instances, useful for parsing mixed formats without separators, as demonstrated with 'hello2017year' splitting into letters and digits.
Implements the TokenizerType protocol for sophisticated matching like date formats or emoji detection, allowing domain-specific parsing logic without relying on rigid patterns.
Tokens include matched text, range, and additional context such as Date objects for date tokens, providing more insight than simple substring extraction.
Includes benchmarking and performance comparisons against alternatives like Scanner, ensuring efficient tokenization for demanding applications.
Limited to Apple platforms and Swift development, making it unsuitable for cross-platform projects or environments outside the Swift ecosystem.
Creating advanced tokenizers requires implementing protocols and handling matching logic, which can be more complex and time-consuming than using built-in or drop-in libraries.
Roadmap items like 'Include interface for working with Character tokenizers' indicate some features are still in development, potentially limiting immediate usability for certain scenarios.