A C++ template library optimized for GPUs providing high-performance implementations of common algorithms like scan, reduce, transform, and sort.
Bolt is a C++ template library optimized for GPU computing that provides high-performance implementations of common algorithms like scan, reduce, transform, and sort. It enables developers to leverage heterogeneous computing resources through a familiar STL-like interface while significantly reducing code complexity compared to writing equivalent OpenCL functionality.
C++ developers working on performance-critical applications who want to leverage GPU acceleration without learning low-level GPU programming models like OpenCL.
Bolt offers a familiar STL-like interface that reduces the learning curve for GPU programming while providing optimized performance across AMD GPUs and CPUs through a single code path, making heterogeneous computing more accessible to C++ developers.
Bolt is a C++ template library optimized for GPUs. Bolt provides high-performance library implementations for common algorithms such as scan, reduce, transform, and sort.
APIs are modeled on the C++ STL, allowing developers to use patterns like bolt::cl::sort with minimal learning curve, as shown in the example code.
Enables single codebase execution on both CPUs and OpenCL-capable accelerators, reducing development effort for mixed hardware environments.
The bolt::cl::device_vector class abstracts device resident memory with an interface similar to std::vector, easing data transfers between host and device.
Requires significantly fewer lines of code compared to writing equivalent OpenCL functionality, making GPU acceleration more accessible without low-level programming.
Only supports AMD GPUs and APUs listed in the README, with no mention of NVIDIA CUDA or Intel accelerators, limiting versatility in heterogeneous setups.
Requires specific versions of APP SDK, Catalyst drivers (e.g., 13.11 Beta), and TBB for CPU path, which can be cumbersome to install and maintain across platforms.
Prerequisites include older software like Visual Studio 2010 and Catalyst 13.11 Beta, suggesting the project may not be actively updated for modern development environments.
A fast multi-producer, multi-consumer lock-free concurrent queue for C++11
A General-purpose Task-parallel Programming System in C++
A simple C++11 Thread Pool implementation
ArrayFire: a general purpose GPU library.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.