Showing 36 of 93 projects
A deep learning library in Rust featuring shape-checked tensors and neural networks with compile-time safety.
A deep learning library for Rust featuring shape-checked tensors and neural networks with compile-time safety.
A header-only C++ library for CUDA providing accelerated primitives for solving irregularly parallel problems on GPUs.
A domain-specific language and C++ library for automatically synthesizing high-performance machine learning kernels.
A high-performance GPU-accelerated Fast Fourier Transform library supporting Vulkan, CUDA, HIP, OpenCL, Level Zero, and Metal backends.
A JIT compiler for writing high-performance GPU programs in .NET languages like C#, offering CUDA-level performance with C# convenience.
A collection of high-performance GICP-based point cloud registration algorithms with multi-threaded and GPU-accelerated implementations.
A fast Support Vector Machine (SVM) library that leverages GPUs and multi-core CPUs for high-performance machine learning.
A modern C++20 GPU numerical computing library with Python-like syntax for near-native performance on NVIDIA GPUs.
A Pythonic deep learning framework built on NumPy with optional CUDA acceleration.
A deep learning technique for finding semantically-meaningful dense correspondences between images to enable visual attribute transfer.
A library for building high-performance custom human pose estimation applications with real-time inference and flexible model development.
A C++17 library providing efficient STL-like data structures (vector, unordered_map, etc.) for GPU programming with CUDA, OpenMP, and HIP backends.
A GPU-accelerated deep learning library for Python using CUDA via PyCUDA, implementing neural networks with various training methods.
A container runtime that enables GPU acceleration in Docker containers (deprecated in favor of NVIDIA Container Toolkit).
A high-performance Clojure library for matrix and linear algebra computations using optimized BLAS/LAPACK routines on CPU and GPU.
A CUDA-accelerated library for rapid 3D data processing in robotics, enabling GPU-powered SLAM, collision avoidance, and path planning.
A CVPR 2018 algorithm for efficient multi-person pose estimation and tracking in videos, ranking first in the ICCV 2017 PoseTrack challenge.
An open-source GPU-accelerated password cracking tool for BitLocker-encrypted storage devices using dictionary attacks.
A CPU and GPU-accelerated machine learning library optimized for high-performance computing.
A header-only Vulkan-based library providing a CUDA Runtime API interface for GPU-accelerated applications.
GPU-accelerated Python implementation of six fundamental deep learning algorithms using CUDA libraries.
Thin, unified C++ wrappers for NVIDIA's CUDA APIs (Runtime, Driver, NVRTC, NVTX) that improve safety and ease of use.
A header-only C++ library for solving large sparse linear systems using algebraic multigrid (AMG) method with support for GPU acceleration.
Rust bindings for ArrayFire, a high-performance parallel computing library with support for CUDA, OpenCL, and CPU backends.
A C++ vector expression template library for OpenCL, CUDA, and OpenMP that simplifies GPGPU development.
A fast GPU-accelerated library for training Gradient Boosting Decision Trees (GBDT) and Random Forests.
A CUDA-accelerated library collection for point cloud processing, providing GPU-optimized alternatives to PCL functions.
A Common Lisp machine learning library focusing on neural networks, Boltzmann machines, and Gaussian processes with BLAS and CUDA support.
A Lisp-like macro language that compiles to C and C++ code, designed for expressive metaprogramming and high-performance systems.
An extensible Rust framework for backend-agnostic, high-performance parallel computations on CUDA, OpenCL, and CPU.
A fast Clojure library for tensor operations and deep learning with optimized CPU/GPU support.
A GPU-accelerated C++ library for visual-inertial odometry frontend tasks, optimized for high-speed robotics.
A source-to-source compiler that uses Lisp macros for meta programming of C, C++, CUDA, GLSL, and OpenCL.
A minimalist GPU-only framework for N-dimensional convolutional neural networks focused on speed and hackability.
A tutorial demonstrating how to extend JAX with custom C++ and CUDA operations for high-performance computing.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.