A high-performance string library leveraging SIMD and SWAR to accelerate search, hashing, sorting, and edit distances across C, C++, Python, Rust, and more.
StringZilla is a high-performance string library that accelerates common string operations like search, hashing, sorting, and edit distances using SIMD and SWAR techniques. It provides up to 100x faster performance than standard libraries by leveraging modern CPU and GPU hardware, addressing the inefficiencies of libc and STL across multiple programming languages.
Data engineers parsing large datasets (e.g., CommonCrawl), software engineers optimizing string-heavy applications, bioinformaticians, search engineers, DBMS developers, and hardware designers needing efficient string-processing baselines.
Developers choose StringZilla for its unmatched speed, cross-language consistency, and extensive functionality beyond standard libraries, including Unicode support, batch processing, and memory-efficient operations, making it ideal for high-throughput string processing.
Up to 100x faster strings for C, C++, CUDA, Python, Rust, Swift, JS, & Go, leveraging NEON, AVX2, AVX-512, SVE, GPGPU, & SWAR to accelerate search, hashing, sorting, edit distances, sketches, and memory ops 🦖
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Delivers up to 10x higher CPU throughput and 100x faster GPU kernels than standard libraries, as shown in benchmarks for substring search and edit distances.
Provides native bindings for C, C++, Python, Rust, and more, ensuring predictable performance across diverse programming environments.
Implements case folding and case-insensitive UTF-8 search covering over 1 million codepoints, addressing gaps in libc and STL.
Offers lazy iterators and memory-mapped file support, reducing overhead for large datasets like CommonCrawl, with near-zero allocation in splits.
Functionality varies across languages; for example, sorting is unavailable in JavaScript and Go, and some features like TR29 word boundaries are under development.
Peak performance requires modern CPU features (e.g., AVX-512) or GPUs, limiting benefits on older or resource-constrained systems.
Deviates from libc/STL conventions by using length-based strings and custom interfaces, increasing integration effort for existing codebases.