A C++ template library providing high-performance SIMD-accelerated sorting algorithms for integers, floats, and custom objects.
x86-simd-sort is a C++ template library that provides SIMD-accelerated sorting algorithms for built-in numeric types and custom objects. It solves the performance bottleneck of traditional sorting by leveraging AVX-512 and AVX-2 instructions to achieve up to 10x speedups over standard library sorts.
C++ developers working on performance-critical applications such as numerical computing, data processing, and scientific simulations where sorting large arrays is a bottleneck.
Developers choose x86-simd-sort for its exceptional performance gains through low-level SIMD optimizations, seamless integration with existing C++ code, and support for both built-in types and custom objects with minimal overhead.
C++ template library for high performance SIMD based sorting algorithms
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Leverages AVX-512/AVX-2 instructions to achieve up to 10x faster sorting for numeric types compared to std::sort, as shown in benchmark data from the README.
Supports sorting custom C++ objects via key-extraction lambdas, with performance gains up to 10x for complex metrics like Euclidean distance, per the provided examples.
Optional OpenMP integration provides up to 3x speedup for large arrays by utilizing multiple threads, configurable via environment variables like OMP_NUM_THREADS.
Dynamically selects the best SIMD implementation (AVX-512 or AVX-2) at runtime based on host processor capabilities, ensuring optimal performance without manual intervention.
Offers a full suite of routines including qsort, qselect, argsort, and key-value sorts, covering common use cases from NumPy and C++ STL, as detailed in the functions list.
Tied exclusively to x86 processors with AVX-512/AVX-2, making it ineffective for ARM, older x86 CPUs, or cross-platform projects, as admitted in the library's focus.
Custom object sorting requires O(N) extra space for keys and indices—specifically arrsize * sizeof(key_t) + arrsize * sizeof(uint32_t) bytes—which can be prohibitive for very large arrays.
Requires C++17 and specific compiler versions (e.g., GCC 12 for _Float16 support), adding setup complexity and limiting compatibility with older or constrained environments.
Handles NaNs by replacing them with quiet_NaN and not preserving original bit-exact values, which may break applications relying on precise NaN behavior, as noted in the documentation.