Thin, unified C++ wrappers for NVIDIA's CUDA APIs (Runtime, Driver, NVRTC, NVTX) that improve safety and ease of use.
cuda-api-wrappers is a header-only C++ library that provides thin, unified wrappers for NVIDIA's CUDA APIs, including the Runtime, Driver, NVRTC, and NVTX APIs. It addresses the verbosity, error-prone nature, and inconsistency of the raw C-style CUDA APIs by offering a modern C++ interface with exception-based error handling, RAII resource management, and improved API design.
C++ developers working directly with NVIDIA's CUDA APIs for GPU computing who want a safer, more intuitive, and consistent programming experience without sacrificing control or performance.
Developers choose cuda-api-wrappers because it eliminates boilerplate error checking, automates resource management, and unifies disparate CUDA APIs into a single coherent interface—all while being lightweight, header-only, and maintaining full compatibility with the underlying CUDA functionality.
Thin, unified, C++-flavored wrappers for the CUDA APIs
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Functions throw exceptions on failure instead of returning status codes, eliminating manual checks and making error handling more robust, as stated in the key features.
Proxy objects for devices, streams, and other resources automatically manage lifetimes using RAII, preventing leaks and simplifying code, highlighted in the design principles.
Seamlessly integrates the Runtime, Driver, NVRTC, and NVTX APIs into a single coherent interface, addressing the fragmentation of NVIDIA's original APIs, per the motivation section.
The library is header-only with no compilation needed, and wrappers are thin with minimal overhead, ensuring easy integration and performance close to raw APIs.
Uses C++11 idioms like namespacing, value returns instead of out-parameters, and adorned POD structs for clarity and safety, as demonstrated in the taste examples.
Some wrapper calls require additional CUDA context pushes and pops to ensure safety, which are cheap but non-trivial and can't be optimized away, as noted in the caveats.
Missing support for graphics interoperability APIs like OpenGL and Direct3D, and some CUDA features are omitted, limiting use in graphics-heavy applications.
Best with CUDA v11.x or later, and requires Unified Virtual Addressing support, which may restrict compatibility with older GPUs or CUDA installations.
While header-only, full integration is optimized for CMake, and manual inclusion requires managing CUDA dependencies and linking, which can be complex for non-CMake projects.