A C/C++ library implementing Unicode algorithms with strict security, performance, and portability, handling ill-formed UTF sequences correctly.
uni-algo is a C/C++ library that implements Unicode algorithms such as case conversion, normalization, text segmentation, and collation. It solves the problem of unsafe Unicode handling in C/C++ by guaranteeing well-formed UTF output even from malformed input, preventing security vulnerabilities and undefined behavior.
C/C++ developers working with international text, needing robust Unicode support for tasks like case-insensitive comparison, normalization, or grapheme/word segmentation.
Developers choose uni-algo for its strict adherence to The Unicode Standard, security-first design that handles ill-formed sequences safely, and high performance comparable to ICU, all in a portable, dependency-free library.
Unicode Algorithms Implementation for C/C++
Guarantees no ill-formed UTF sequences are produced, even from random bytes, preventing undefined behavior and security vulnerabilities, as emphasized in the introduction and design principles.
Implements algorithms like case conversion, normalization, and text segmentation exactly as per The Unicode Standard, using official test files for validation, ensuring reliable international text handling.
Optimized low-level implementation with performance comparable to ICU and WinAPI, as demonstrated in the performance comparison section and design focus on speed.
Offers header-only usage with constexpr support in C++20+ and C++20-style ranges for flexible, single-pass operations, showcased in the examples and ranges vs functions section.
Works consistently across platforms without relying on broken std::locale, and includes locale-specific case mapping for languages like Turkish and Greek, as detailed in the locale and case functions examples.
Lacks key Unicode algorithms such as the Unicode Collation Algorithm (UCA), bidirectional algorithm, and line breaking, which are only planned for future versions, limiting its use for advanced text processing.
Provides multiple integration methods (vcpkg, CMake add_subdirectory, find_package, FetchContent, manual) with various configuration defines, which can be overwhelming for users unfamiliar with modern C++ build systems.
Requires C++17 or higher, as stated in the examples, which may not be compatible with legacy codebases stuck on older C++ standards.
As a newer project, it has fewer community resources, tutorials, and third-party integrations compared to established libraries like ICU, potentially increasing the learning curve for some developers.
uni-algo is an open-source alternative to the following products:
📚 single header utf8 string functions for C and C++
Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension, LoongArch64, POWER. Part of Node.js, WebKit/Safari, Ladybird, Chromium, Cloudflare Workers, Ghostty and Bun.
a clean C library for processing UTF-8 Unicode data
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.