A header-only C/C++ library that replaces slow integer division instructions with fast shift/add/multiply sequences.
libdivide is a header-only C/C++ library that optimizes integer division by replacing slow CPU division instructions with faster sequences of shift, add, and multiply operations. It solves the performance bottleneck caused by integer division, which can be up to 90 times slower than multiplication on modern CPUs. The library is particularly useful when divisors are reused multiple times, such as in loops.
C and C++ developers working on performance-critical applications, including game engines, scientific computing, embedded systems, and data processing pipelines where integer division is a bottleneck.
Developers choose libdivide for its substantial speed improvements (up to 10x), support for vectorized division, and compatibility with a wide range of hardware, from high-end x64 CPUs to 8-bit microcontrollers. Its header-only design and simple API make integration straightforward.
Official git repository for libdivide: optimized integer division
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Achieves up to 10x faster 64-bit integer division by replacing slow CPU instructions with shift and multiply sequences, as shown in benchmark outputs.
Supports SSE2, AVX2, and AVX512 for vectorized integer division, enabling significant performance boosts in parallel computations on x86/x64 CPUs.
Works on 8-bit microcontrollers like AVR without hardware dividers, making it crucial for resource-constrained embedded applications.
Provides both runtime C/C++ APIs for variable divisors and compile-time macros/templates for constant divisors, offering optimization versatility.
Exclusively optimizes integer division, leaving floating-point division and other arithmetic operations unaddressed, which can limit its utility in mixed workloads.
Unsigned branchfree dividers cannot be 1, and they perform worse for signed types, adding complexity and potential pitfalls in implementation.
Requires manual definition of macros like LIBDIVIDE_SSE2 for vector division, introducing platform-specific configuration and build complexity.
The replacement of single division instructions with multiple operations can inflate binary size, a concern for memory-sensitive environments like embedded systems.