A C++ parallel algorithms library that enables high-performance computing on GPUs and multicore CPUs with a productivity-focused interface.
Thrust is a C++ parallel algorithms library that provides high-level abstractions for parallel programming on GPUs and multicore CPUs. It offers STL-like interfaces for common operations like sorting, reduction, and transformation, enabling developers to write portable parallel code with minimal effort. The library solves the problem of writing performant parallel algorithms that work across different hardware architectures without rewriting code.
C++ developers working on high-performance computing applications, scientific computing, or data processing who need to leverage GPU acceleration or multicore CPU parallelism. Researchers and engineers in fields like machine learning, computational physics, and financial modeling.
Developers choose Thrust for its productivity-focused high-level interface that reduces parallel programming complexity while maintaining performance portability. Its seamless integration with CUDA for GPU acceleration and support for CPU backends (OpenMP, TBB) allows writing parallel code once and running it efficiently on different hardware.
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Provides vectors and algorithms like thrust::sort and thrust::reduce that mirror the C++ Standard Library, reducing boilerplate and learning curve for C++ developers.
Enables the same code to run on multicore CPUs via OpenMP or TBB and on NVIDIA GPUs via CUDA by changing compiler definitions, as highlighted in the backend configuration section.
No separate compilation is required; just include headers, simplifying integration into projects without build system complexities.
Supports async copies and reductions for overlapping computation and data transfer, demonstrated in the asynchronous example with device_future.
Allows switching between host and device systems (CPP, OMP, TBB, CUDA) via compiler flags, offering flexibility for different parallel environments.
The Thrust repo is archived and merged into nvidia/cccl, which may confuse users and indicates future development is focused on the new unified repository, as noted in the README warning.
GPU acceleration relies heavily on CUDA, locking users into NVIDIA hardware and excluding alternatives like AMD or Intel GPUs without significant customization.
Using non-default backends requires manual compiler definitions and managing dependencies like CUB, which can be error-prone and less straightforward than plug-and-play libraries.
The high-level interface may introduce performance penalties compared to hand-optimized CUDA or OpenMP code, especially for niche algorithms not directly supported by Thrust's abstractions.