Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. C/C++
  3. Bolt

Bolt

NOASSERTIONC++v1.3GA

A C++ template library optimized for GPUs providing high-performance implementations of common algorithms like scan, reduce, transform, and sort.

GitHubGitHub
379 stars64 forks0 contributors

What is Bolt?

Bolt is a C++ template library optimized for GPU computing that provides high-performance implementations of common algorithms like scan, reduce, transform, and sort. It enables developers to leverage heterogeneous computing resources through a familiar STL-like interface while significantly reducing code complexity compared to writing equivalent OpenCL functionality.

Target Audience

C++ developers working on performance-critical applications who want to leverage GPU acceleration without learning low-level GPU programming models like OpenCL.

Value Proposition

Bolt offers a familiar STL-like interface that reduces the learning curve for GPU programming while providing optimized performance across AMD GPUs and CPUs through a single code path, making heterogeneous computing more accessible to C++ developers.

Overview

Bolt is a C++ template library optimized for GPUs. Bolt provides high-performance library implementations for common algorithms such as scan, reduce, transform, and sort.

Use Cases

Best For

  • Accelerating common algorithms like sorting and reduction on AMD GPUs
  • Adding GPU acceleration to existing C++ codebases with minimal code changes
  • Developing applications that need to run on both CPUs and GPUs with a single code path
  • Simplifying memory management between host and device in heterogeneous computing
  • Reducing OpenCL boilerplate code for common parallel algorithms
  • Performance-critical scientific computing and data processing applications

Not Ideal For

  • Projects targeting NVIDIA or Intel GPUs, as Bolt is optimized primarily for AMD hardware with no mentioned cross-vendor support.
  • Teams needing minimal setup without managing complex dependencies like APP SDK, specific Catalyst drivers, and TBB.
  • Applications requiring the latest C++ standards or compiler features, given the prerequisites list older versions like Visual Studio 2010 and GCC 4.6.3.
  • Environments where portability across diverse hardware vendors is critical, due to Bolt's focus on AMD-specific OpenCL devices.

Pros & Cons

Pros

Familiar STL Interface

APIs are modeled on the C++ STL, allowing developers to use patterns like bolt::cl::sort with minimal learning curve, as shown in the example code.

Heterogeneous Execution Path

Enables single codebase execution on both CPUs and OpenCL-capable accelerators, reducing development effort for mixed hardware environments.

Simplified Memory Management

The bolt::cl::device_vector class abstracts device resident memory with an interface similar to std::vector, easing data transfers between host and device.

Reduced Code Complexity

Requires significantly fewer lines of code compared to writing equivalent OpenCL functionality, making GPU acceleration more accessible without low-level programming.

Cons

AMD-Centric Hardware Lock-in

Only supports AMD GPUs and APUs listed in the README, with no mention of NVIDIA CUDA or Intel accelerators, limiting versatility in heterogeneous setups.

Complex Dependency Management

Requires specific versions of APP SDK, Catalyst drivers (e.g., 13.11 Beta), and TBB for CPU path, which can be cumbersome to install and maintain across platforms.

Potentially Outdated Toolchain

Prerequisites include older software like Visual Studio 2010 and Catalyst 13.11 Beta, suggesting the project may not be actively updated for modern development environments.

Frequently Asked Questions

Quick Stats

Stars379
Forks64
Contributors0
Open Issues19
Last commit10 years ago
CreatedSince 2012

Tags

#template-library#parallel-computing#high-performance-computing#opencl#c-plus-plus#algorithm-library#gpu-computing#amd-gpu#stl-like#heterogeneous-computing

Built With

G
GCC
O
OpenCL
v
visual-studio
T
TBB
C
CMake
C
C++

Included in

C/C++70.6k
Auto-fetched 1 day ago

Related Projects

concurrentqueueconcurrentqueue

A fast multi-producer, multi-consumer lock-free concurrent queue for C++11

Stars12,292
Forks1,919
Last commit1 month ago
TaskflowTaskflow

A General-purpose Task-parallel Programming System in C++

Stars11,987
Forks1,394
Last commit4 days ago
ThreadPoolThreadPool

A simple C++11 Thread Pool implementation

Stars8,747
Forks2,344
Last commit1 year ago
ArrayFireArrayFire

ArrayFire: a general purpose GPU library.

Stars4,884
Forks551
Last commit2 months ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub