A fast regular expression engine for Common Lisp that compiles regexes to machine code using derivative-based DFA compilation.
one-more-re-nightmare is a regular expression engine for Common Lisp that compiles regex patterns to machine code using a derivative-based approach to construct deterministic finite automata (DFA). It solves the problem of slow regex matching in Common Lisp by generating specialized, inlined native code for each regex, resulting in significantly faster matching times compared to interpreters like CL-PPCRE.
Common Lisp developers who need high-performance, POSIX-compliant regular expression matching, especially in applications where the same regex is applied to large amounts of text or many strings.
Developers choose one-more-re-nightmare for its superior matching speed, achieved by compiling regexes directly to machine code and offering optional SIMD vectorization. Its focus on POSIX compliance and optimization for repeated use cases makes it a compelling alternative to existing regex libraries in the Common Lisp ecosystem.
A fast regular expression compiler in Common Lisp
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Compiles regexes to native machine code using deterministic finite automata, achieving O(n) runtime and outperforming interpreters like CL-PPCRE in benchmarks, with SIMD vectorization further boosting speed.
Aims to fully implement POSIX regex semantics as per Open Group specifications, ensuring standard-compliant matching behavior, with deviations treated as bugs.
Generates inlined code for specific array types (e.g., simple-base-string) and offers optional SIMD vectorization on supported platforms, optimizing performance for different data structures.
Caches compiled regexes in the high-level interface, and constant regex strings are compiled at compile-time with zero runtime overhead, reducing amortized costs for repeated use.
Invoking the Common Lisp compiler for each regex results in high initial compilation overhead, as noted in benchmarks where compilation takes milliseconds, making it unsuitable for fast-changing patterns.
Advanced features like SIMD vectorization require specific setups (SBCL 2.1.10+, AVX2), limiting portability and ease of use across different Common Lisp implementations or hardware.
The README admits the syntax is 'wonky' and differs from typical regex representations, which may hinder adoption and increase the learning curve for developers.