A Rust implementation of the Encoding Standard for decoding and encoding Web-compatible character encodings, used in Firefox.
encoding_rs is a Rust library that implements the Encoding Standard, providing robust decoding and encoding of Web-compatible character encodings. It solves the problem of handling legacy text encodings in modern applications, particularly for web browsers like Firefox, by converting between encodings like Shift_JIS, GBK, and UTF-8/UTF-16.
Developers working on web browsers, text processing tools, or applications that need to handle multilingual text data with legacy encodings, especially those integrating with Gecko or requiring no_std support.
It offers a standards-compliant, performant implementation with optional SIMD acceleration, designed for real-world web use cases, and is battle-tested in Firefox. Unlike generic text libraries, it focuses specifically on the Encoding Standard's requirements.
A Gecko-oriented implementation of the Encoding Standard in Rust
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Implements the Encoding Standard exactly, ensuring full compatibility with web content as proven by its integration into Firefox since version 56.
Offers optional SIMD acceleration via the simd-accel feature for critical targets like x86_64 and aarch64, and the mem module provides efficient in-RAM operations like ASCII validation and Latin1 conversion.
Supports no_std environments by allowing the alloc feature to be turned off, making it suitable for embedded systems or kernels without a standard library.
Covers all encodings defined by the Encoding Standard, with decoding and encoding to both UTF-8 and UTF-16, catering to Rust and Gecko use cases.
Encoding to legacy encodings is intentionally slow by default to minimize binary size, requiring optional features that add up to 176 KB to enable fast CJK encoding.
SIMD support depends on nightly Rust and specific targets (e.g., x86_64, aarch64), which can break builds on other architectures and opts out of Rust's stability guarantees.
Lacks support for encodings like DOS encodings and UTF-7, forcing reliance on separate crates such as oem_cp and charset, which adds complexity.