A PPX-based DSL for writing GPU kernels in OCaml syntax that compiles to multiple backends (CUDA, OpenCL, Vulkan, Metal).
Sarek is a domain-specific language and compiler that enables GPU computing directly from OCaml. It allows developers to write SIMT (Single Instruction, Multiple Thread) kernels using OCaml syntax, which are then compiled at build time to various GPU backends without requiring code changes. This provides a type-safe, high-level abstraction for GPU programming while maintaining performance and portability across different hardware platforms.
OCaml developers who need to write high-performance, portable GPU kernels for scientific computing, data processing, or parallel algorithms without learning low-level GPU APIs like CUDA or OpenCL.
Developers choose Sarek because it offers a unified, type-safe interface using OCaml's syntax and type system, abstracting away the complexities of different GPU APIs while enabling write-once, run-anywhere compilation to multiple backends (CUDA, OpenCL, Vulkan, Metal, CPU). Its zero-copy memory and plugin architecture ensure efficiency and extensibility.
SIMT Abstractions for Runtime Extensible Kernels (GPGPU programing with OCaml)
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Kernels compile automatically to CUDA, OpenCL, Vulkan, Metal, and CPU backends, enabling write-once, run-anywhere development without code changes.
Uses GADTs and phantom types for compile-time guarantees, preventing common GPU programming errors like memory access violations in kernels.
Implements efficient memory sharing between host and device, minimizing data transfer overhead and improving performance for data-intensive computations.
Fully supports OCaml 5.4 with effect handlers and domains, allowing seamless use of contemporary OCaml features alongside GPU computing.
Requires specific driver and toolkit versions (e.g., CUDA 12.9+ for new GPUs), which can be error-prone and system-dependent, as detailed in troubleshooting.
Lacks the extensive pre-built GPU libraries found in frameworks like PyTorch or CUDA, necessitating more custom implementation for common tasks.
Recent rework with AI assistance may lead to incomplete or inconsistent documentation, and the plugin architecture could introduce instability in newer backends.