A clean C library for Unicode normalization, case-folding, and UTF-8 processing.
utf8proc is a C library for processing UTF-8 Unicode data, providing functions for normalization, case-folding, and character encoding/decoding. It solves the problem of handling Unicode text consistently across different platforms and applications, ensuring correct text representation and comparison.
C and C++ developers working with internationalized text, embedded systems programmers needing lightweight Unicode support, and language implementers (like Julia) requiring reliable UTF-8 processing.
Developers choose utf8proc for its clean API, small footprint, and regular updates to the latest Unicode standards. It offers a focused, portable alternative to larger Unicode libraries, with proven reliability in production environments like the Julia language.
a clean C library for processing UTF-8 Unicode data
Small codebase with minimal dependencies, suitable for embedded systems and cross-platform via Make or CMake, as highlighted in the README's cross-platform compatibility section.
Regularly updated to the latest Unicode standards (currently 17.0.0), ensuring correctness and reliability, as it's used in the Julia language and kept current.
Provides straightforward functions like utf8proc_map for common operations, with helper functions for normalization forms, making it easy to integrate into C projects.
Serves as the Unicode backend for the Julia programming language, indicating production-ready stability and long-term maintenance.
Focuses only on normalization and case-folding, lacking support for collation, text segmentation, or other complex operations, which the README admits by its minimalistic philosophy.
As a C library, it requires explicit allocation and freeing of memory (e.g., using utf8proc_free), which can be error-prone and adds complexity for developers.
Documentation is primarily confined to the utf8proc.h header file, with no extensive tutorials or guides, making it less accessible for beginners.
Simple Dynamic Strings library for C
📚 single header utf8 string functions for C and C++
Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension, LoongArch64, POWER. Part of Node.js, WebKit/Safari, Ladybird, Chromium, Cloudflare Workers, Ghostty and Bun.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.