Showing 36 of 92 projects
A C++17 library providing efficient STL-like data structures (vector, unordered_map, etc.) for GPU programming with CUDA, OpenMP, and HIP backends.
An open-source suite of ab initio quantum chemistry programs for high-accuracy molecular simulations, written in C++ with a Python driver.
A container runtime that enables GPU acceleration in Docker containers (deprecated in favor of NVIDIA Container Toolkit).
A high-performance Clojure library for matrix and linear algebra computations using optimized BLAS/LAPACK routines on CPU and GPU.
A Kubernetes batch scheduler for high-performance workloads like AI/ML, BigData, and HPC.
A C++ template library providing high-performance SIMD-accelerated sorting algorithms for integers, floats, and custom objects.
A high-performance, portable deep reinforcement learning library for continuous control, optimized for speed across CPUs, GPUs, and microcontrollers.
A tool to convert container images into unprivileged sandboxes, optimized for high-performance and virtualized environments.
A CPU and GPU-accelerated machine learning library optimized for high-performance computing.
An AWS-supported open-source tool to deploy and manage High Performance Computing (HPC) clusters in the AWS cloud.
An AWS-supported open-source tool to deploy and manage High Performance Computing (HPC) clusters in the AWS cloud.
Pythonic orchestration tool for AI/ML, HPC, and quantum computing workflows across heterogeneous compute environments.
A lightweight, header-only C++ tensor algebra framework delivering bare-metal performance for small matrix/tensor operations via compile-time optimizations.
A high-performance, concurrent distributed cache system built in Rust for low-latency, high-throughput workloads.
A high-productivity C++ library for parallel programming across devices using Data Parallel C++ (DPC++) APIs.
An open-source high-performance computing platform for systems analysis and multidisciplinary optimization, written in Python.
A C++ vector expression template library for OpenCL, CUDA, and OpenMP that simplifies GPGPU development.
A fast GPU-accelerated library for training Gradient Boosting Decision Trees (GBDT) and Random Forests.
A high-performance C++/DPC++ library for accelerated machine learning on CPUs, GPUs, and distributed systems.
A header-only C++ template library providing custom arithmetic plug-in types for mixed-precision algorithm development and optimization.
A framework for executing native Java and Scala code on the GPU via OpenCL for data-parallel computation.
An image processing library built on JAX, designed to be optimized and parallelized with JAX transformations.
A high-performance C++ automatic differentiation library for large-scale, performance-critical systems.
A high-performance C++ automatic differentiation library for large-scale, performance-critical systems.
A collection of CI pipelines, Docker images, and optimized examples to simplify JAX development on NVIDIA GPUs.
A curated list of awesome Fortran frameworks, libraries, and software for scientific and high-performance computing.
A tutorial demonstrating how to extend JAX with custom C++ and CUDA operations for high-performance computing.
A V library for AI and high-performance scientific computing with pure-V BLAS/LAPACK implementations.
A V library for AI and high-performance scientific computing with pure-V BLAS/LAPACK implementations.
A C++ template library optimized for GPUs providing high-performance implementations of common algorithms like scan, reduce, transform, and sort.
A Clojure library for high-performance Bayesian data analysis and machine learning on the GPU.
An exascale many-physics flow solver for compressible multi-phase simulations, scaling to 200 trillion grid points on 43K+ GPUs.
A Vulkan-based GPGPU computing framework that reduces boilerplate for portable, high-performance GPU computing.
A CUDA backend for Torch7 that enables GPU-accelerated tensor operations with a familiar Torch API.
A parallel Monte Carlo and machine learning library for scientific inference, available in Python, MATLAB, Fortran, C++, and C.
A Common Lisp library for NVIDIA CUDA programming, providing a kernel description language and memory management.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.