Generates optimized regular expressions from a set of strings using automata theory and algorithmic minimization.
regexgen is a JavaScript library that automatically generates optimized regular expressions from a set of input strings. It solves the problem of manually crafting complex regex patterns by applying automata theory, trie structures, and DFA minimization algorithms to produce efficient, minimal regexes. The library includes both a Node.js API and a CLI tool for flexible usage.
Developers and data engineers working with text processing, validation, or pattern matching who need to generate regexes programmatically from dynamic string sets.
It eliminates the tedious and error-prone process of manual regex writing by algorithmically deriving optimized patterns, with support for Unicode and ES2015 features, making it a robust tool for modern JavaScript environments.
Generate regular expressions that match a set of strings
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses trie structures and Hopcroft's DFA minimization algorithm to merge common prefixes and reduce state redundancy, producing efficient regexes that minimize manual effort.
Supports ES2015 Unicode regexes with the -u flag, handling complex characters like emojis and international scripts, as shown in the CLI example with surrogate pairs.
Includes a command-line interface for quick regex generation directly from terminal inputs, making it accessible without writing JavaScript code.
Applies formal methods like Brzozowski's algebraic conversion to ensure mathematically sound and minimal regex construction, reducing errors in pattern generation.
The optimized regexes can be difficult for humans to read and maintain, as evidenced by the cryptic Unicode examples in the README with surrogate pairs and hoisted alternations.
DFA minimization and algebraic conversion algorithms are computationally intensive for large input sets, potentially causing slowdowns in performance-critical applications.
As a Node.js library, it's limited to JavaScript environments and cannot be used directly in other languages without porting or external calls, restricting its versatility.