Rust edit distance library accelerated with SIMD for fast Hamming, Levenshtein, and Damerau-Levenshtein calculations.
triple_accel is a Rust library that accelerates edit distance calculations using SIMD instructions. It provides fast implementations of Hamming, Levenshtein, and restricted Damerau-Levenshtein distances, along with string search capabilities. The library automatically selects optimized vectorized or scalar routines based on CPU support, delivering significant speedups for string comparison tasks.
Rust developers working on performance-sensitive applications involving string matching, such as bioinformatics pipelines, natural language processing tools, or data deduplication systems.
Developers choose triple_accel for its combination of SIMD-driven performance (up to 30x faster than scalar code) and ease of use, with automatic CPU feature detection and fallback. Its lightweight design and clear abstraction over platform-specific details make it a reliable choice for cross-platform projects.
Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance calculations and string search.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Delivers up to 20-30x faster edit distance calculations by leveraging AVX2 and SSE4.1 instructions, with automatic fallback to scalar code for unsupported CPUs, as highlighted in the README.
Supports Hamming, Levenshtein, and restricted Damerau-Levenshtein distances with customizable edit costs, enabling tailored string comparison for diverse use cases like bioinformatics.
Dynamically chooses optimal vector width and data structures based on input string lengths at runtime, ensuring maximum efficiency across short and long strings as described.
Lightweight with no heavy dependencies, making it portable and easy to compile even on machines without SIMD support, as noted in the features section.
Limited to binary strings (u8 bytes) due to SIMD intrinsics, requiring manual encoding for Unicode text and potentially misaligning with character-level edit distances, a stated limitation in the README.
Vectorized implementations are specific to x86/x86-64 CPUs, so projects on ARM or other architectures miss out on peak performance gains, relying solely on slower scalar fallbacks.
Lower-level functions like levenshtein_simd_k_with_opts offer control but add API complexity, which can be daunting for users needing simple edit distance calculations without fine-tuning.