Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Stacks
  3. CUDA
C

CUDA

Other
80 projects981.5k total stars202.0k total forks10 languages

Open-source projects built with CUDA

There are currently 80 open-source projects built with CUDA, with a combined total of 981.5k GitHub stars. The most common language among these projects is C++.

Showing 80 open-source projects · page 2 of 3

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
Thrust
Thrustthrust/thrust

A C++ parallel algorithms library that enables high-performance computing on GPUs and multicore CPUs with a productivity-focused interface.

5.0k759C++
2 years ago
ArrayFire
ArrayFirearrayfire/arrayfire

A general-purpose tensor library for parallel computing across CPUs, GPUs, and hardware accelerators.

4.9k548C++
1 month ago
NCCL
NCCLNVIDIA/nccl

A library of optimized communication primitives for multi-GPU and multi-node collective operations.

4.6k1.2kC++
1 day ago
Warp-CTC
Warp-CTCbaidu-research/warp-ctc

A fast parallel implementation of the Connectionist Temporal Classification (CTC) loss function for CPU and GPU.

4.1k1.0kCuda
2 years ago
Boltz-1
Boltz-1jwohlwend/boltz

A family of open-source deep learning models for accurate biomolecular interaction and binding affinity prediction, rivaling AlphaFold3 and physics-based methods.

3.9k802Python
26 days ago
ruvector
ruvectorruvnet/ruvector

A self-learning vector database with graph intelligence, local AI, and PostgreSQL integration, built for real-time adaptation.

3.8k472Rust
1 day ago
TurboPilot
TurboPilotravenscroftj/turbopilot

An open-source, locally-runnable code completion engine using large language models that works on CPU.

3.8k122C++
2 years ago
implicit
implicitbenfred/implicit

Fast Python library for collaborative filtering recommendation algorithms on implicit feedback datasets.

3.8k628Python
1 year ago
dora
doradora-rs/dora

A Rust-based middleware framework for building low-latency, composable, and distributed AI robotic applications using dataflow graphs.

3.7k386Rust
1 day ago
StringZilla
StringZillaashvardanian/StringZilla

A high-performance string library leveraging SIMD and SWAR to accelerate search, hashing, sorting, and edit distances across C, C++, Python, Rust, and more.

3.4k123C
1 month ago
flownet2-pytorch
flownet2-pytorchNVIDIA/flownet2-pytorch

PyTorch implementation of FlowNet 2.0 for optical flow estimation using deep neural networks.

3.3k747Python
24 days ago
Fast-Planner
Fast-PlannerHKUST-Aerial-Robotics/Fast-Planner

A robust and efficient trajectory planner enabling quadrotor fast flight in complex unknown environments.

3.3k764C++
1 year ago
captcha_trainer
captcha_trainerkerlomz/captcha_trainer

A deep learning framework for training image classification models to solve complex captcha and OCR tasks.

3.2k824Python
5 months ago
Falcor
FalcorNVIDIAGameWorks/Falcor

A real-time rendering framework for DirectX 12 and Vulkan that improves productivity in graphics research and prototyping.

3.1k593C++
1 year ago
Neural Style
Neural Stylecysmith/neural-style-tf

A TensorFlow implementation of neural style transfer for images and videos, blending content and artistic styles using convolutional neural networks.

3.1k815Python
5 years ago
simpledet
simpledettusimple/simpledet

A simple and versatile framework for object detection and instance recognition with extensive model coverage and distributed training.

3.1k484Python
4 years ago
OpenSubdiv
OpenSubdivPixarAnimationStudios/OpenSubdiv

An open-source library for high-performance subdivision surface evaluation on CPU and GPU, matching Pixar's Renderman precision.

3.0k580C++
2 months ago
Chatbot
ChatbotConchylicultor/DeepQA

A TensorFlow implementation of a neural conversational model (seq2seq) for building deep learning chatbots.

2.9k1.2kPython
3 years ago
nnabla
nnablasony/nnabla

A deep learning framework for research, development, and production with flexible Python API and C++ core.

2.8k335Python
7 months ago
InvoiceNet
InvoiceNetnaiveHobo/InvoiceNet

Deep neural network to extract structured information from invoice documents with a customizable UI and training tools.

2.7k413Python
2 years ago
Kokkos
Kokkoskokkos/kokkos

A C++ programming model for writing performance-portable applications targeting all major HPC platforms.

2.5k494C++
1 day ago
Decord
Decorddmlc/decord

An efficient video and audio loader for deep learning with hardware-accelerated decoding and smart shuffling.

2.5k223C++
1 year ago
darknet_ros
darknet_rosleggedrobotics/darknet_ros

A ROS package for real-time object detection in camera images using YOLO (V3) on GPU and CPU.

2.4k1.2kC++
1 year ago
darknet_ros
darknet_rosleggedrobotics/darknet_ros

A ROS package for real-time object detection in camera images using YOLO (V3) on GPU and CPU.

2.4k1.2kC++
1 year ago
EGO-Planner
EGO-PlannerZJU-FAST-Lab/ego-planner

A lightweight gradient-based local planner for quadrotors that eliminates ESDF construction, achieving planning times around 1ms.

2.4k385C++
1 year ago
libcudacxx
libcudacxxNVIDIA/libcudacxx

NVIDIA's implementation of the C++ Standard Library for CUDA C++ development.

2.3k191C++
2 years ago
RAPIDS cuGraph
RAPIDS cuGraphrapidsai/cugraph

A collection of GPU-accelerated graph analytics libraries for creating, manipulating, and executing scalable graph algorithms.

2.2k350Cuda
1 day ago
GNSS-SDR
GNSS-SDRgnss-sdr/gnss-sdr

An open-source software-defined receiver for GPS, Galileo, GLONASS, and BeiDou signals, enabling custom GNSS processing.

2.1k692C++
2 days ago
OpenImageDenoise
OpenImageDenoiseOpenImageDenoise/oidn

An open-source library of high-performance, high-quality denoising filters for ray-traced images using deep learning.

2.0k191C++
1 day ago
SfMLearner
SfMLearnertinghuiz/SfMLearner

An unsupervised learning framework for depth and ego-motion estimation from monocular videos using TensorFlow.

2.0k554Jupyter Notebook
4 years ago
Chai-1
Chai-1chaidiscovery/chai-lab

A multi-modal foundation model for state-of-the-art molecular structure prediction of proteins, small molecules, DNA, RNA, and glycosylations.

1.9k266Python
13 days ago
dfdx
dfdxcoreylowman/dfdx

A deep learning library for Rust featuring shape-checked tensors and neural networks with compile-time safety.

1.9k104Rust
1 year ago
dfdx
dfdxchelsea0x3b/dfdx

A deep learning library in Rust featuring shape-checked tensors and neural networks with compile-time safety.

1.9k104Rust
1 year ago
The original code from the DeepMind article + tweaks
The original code from the DeepMind article + tweakskuz/DeepMind-Atari-Deep-Q-Learner

Original DeepMind DQN 3.0 implementation for Atari game reinforcement learning, with community tweaks.

1.8k531Lua
8 years ago
nndeploy
nndeploynndeploy/nndeploy

A visual workflow-based AI deployment framework for multi-platform and multi-backend inference, supporting large models and edge devices.

1.8k214C++
2 days ago
moderngpu
moderngpumoderngpu/moderngpu

A header-only C++ library for CUDA providing accelerated primitives for solving irregularly parallel problems on GPUs.

1.8k283C++
3 months ago
1
2
3