Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Stacks
  3. CUDA
C

CUDA

Other
177 projects1055.4k total stars216.9k total forks23 languages

Open-source projects built with CUDA

There are currently 177 open-source projects built with CUDA, with a combined total of 1055.4k GitHub stars. The most common language among these projects is Python.

Showing 177 open-source projects · page 2 of 5

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
Thrust
Thrustthrust/thrust

A C++ parallel algorithms library that enables high-performance computing on GPUs and multicore CPUs with a productivity-focused interface.

5.0k760C++
2 years ago
ArrayFire
ArrayFirearrayfire/arrayfire

A general-purpose tensor library for parallel computing across CPUs, GPUs, and hardware accelerators.

4.9k552C++
3 months ago
NCCL
NCCLNVIDIA/nccl

A library of optimized communication primitives for multi-GPU and multi-node collective operations.

4.8k1.3kC++
2 days ago
ruvector
ruvectorruvnet/ruvector

A self-learning vector database with graph intelligence, local AI, and PostgreSQL integration, built for real-time adaptation.

4.2k554Rust
1 day ago
Warp-CTC
Warp-CTCbaidu-research/warp-ctc

A fast parallel implementation of the Connectionist Temporal Classification (CTC) loss function for CPU and GPU.

4.1k1.0kCuda
2 years ago
Boltz-1
Boltz-1jwohlwend/boltz

A family of open-source deep learning models for accurate biomolecular interaction and binding affinity prediction, rivaling AlphaFold3 and physics-based methods.

4.0k837Python
10 days ago
implicit
implicitbenfred/implicit

Fast Python library for collaborative filtering recommendation algorithms on implicit feedback datasets.

3.8k629Python
1 month ago
TurboPilot
TurboPilotravenscroftj/turbopilot

An open-source, locally-runnable code completion engine using large language models that works on CPU.

3.8k122C++
2 years ago
dora
doradora-rs/dora

A Rust-based middleware framework for building low-latency, composable, and distributed AI robotic applications using dataflow graphs.

3.8k410Rust
2 days ago
StringZilla
StringZillaashvardanian/StringZilla

A high-performance string library leveraging SIMD and SWAR to accelerate search, hashing, sorting, and edit distances across C, C++, Python, Rust, and more.

3.5k125C
1 day ago
Fast-Planner
Fast-PlannerHKUST-Aerial-Robotics/Fast-Planner

A robust and efficient trajectory planner enabling quadrotor fast flight in complex unknown environments.

3.3k768C++
1 year ago
flownet2-pytorch
flownet2-pytorchNVIDIA/flownet2-pytorch

PyTorch implementation of FlowNet 2.0 for optical flow estimation using deep neural networks.

3.3k749Python
2 months ago
captcha_trainer
captcha_trainerkerlomz/captcha_trainer

A deep learning framework for training image classification models to solve complex captcha and OCR tasks.

3.2k823Python
7 months ago
Falcor
FalcorNVIDIAGameWorks/Falcor

A real-time rendering framework for DirectX 12 and Vulkan that improves productivity in graphics research and prototyping.

3.2k604C++
1 year ago
Neural Style
Neural Stylecysmith/neural-style-tf

A TensorFlow implementation of neural style transfer for images and videos, blending content and artistic styles using convolutional neural networks.

3.1k812Python
5 years ago
simpledet
simpledettusimple/simpledet

A simple and versatile framework for object detection and instance recognition with extensive model coverage and distributed training.

3.1k483Python
4 years ago
OpenSubdiv
OpenSubdivPixarAnimationStudios/OpenSubdiv

An open-source library for high-performance subdivision surface evaluation on CPU and GPU, matching Pixar's Renderman precision.

3.1k580C++
3 months ago
Chatbot
ChatbotConchylicultor/DeepQA

A TensorFlow implementation of a neural conversational model (seq2seq) for building deep learning chatbots.

2.9k1.2kPython
3 years ago
nnabla
nnablasony/nnabla

A deep learning framework for research, development, and production with flexible Python API and C++ core.

2.8k335Python
9 months ago
InvoiceNet
InvoiceNetnaiveHobo/InvoiceNet

Deep neural network to extract structured information from invoice documents with a customizable UI and training tools.

2.7k413Python
2 years ago
Kokkos
Kokkoskokkos/kokkos

A C++ programming model for writing performance-portable applications targeting all major HPC platforms.

2.6k503C++
3 days ago
EGO-Planner
EGO-PlannerZJU-FAST-Lab/ego-planner

A lightweight gradient-based local planner for quadrotors that eliminates ESDF construction, achieving planning times around 1ms.

2.5k396C++
1 year ago
Decord
Decorddmlc/decord

An efficient video and audio loader for deep learning with hardware-accelerated decoding and smart shuffling.

2.5k227C++
1 year ago
darknet_ros
darknet_rosleggedrobotics/darknet_ros

A ROS package for real-time object detection in camera images using YOLO (V3) on GPU and CPU.

2.4k1.2kC++
1 year ago
darknet_ros
darknet_rosleggedrobotics/darknet_ros

A ROS package for real-time object detection in camera images using YOLO (V3) on GPU and CPU.

2.4k1.2kC++
1 year ago
libcudacxx
libcudacxxNVIDIA/libcudacxx

NVIDIA's implementation of the C++ Standard Library for CUDA C++ development.

2.3k192C++
2 years ago
RAPIDS cuGraph
RAPIDS cuGraphrapidsai/cugraph

A collection of GPU-accelerated graph analytics libraries for creating, manipulating, and executing scalable graph algorithms.

2.2k358Cuda
3 days ago
GNSS-SDR
GNSS-SDRgnss-sdr/gnss-sdr

An open-source software-defined receiver for GPS, Galileo, GLONASS, and BeiDou signals, enabling custom GNSS processing.

2.1k704C++
4 days ago
OpenImageDenoise
OpenImageDenoiseOpenImageDenoise/oidn

An open-source library of high-performance, high-quality denoising filters for ray-traced images using deep learning.

2.1k196C++
6 days ago
SfMLearner
SfMLearnertinghuiz/SfMLearner

An unsupervised learning framework for depth and ego-motion estimation from monocular videos using TensorFlow.

2.0k555Jupyter Notebook
4 years ago
Chai-1
Chai-1chaidiscovery/chai-lab

A multi-modal foundation model for state-of-the-art molecular structure prediction of proteins, small molecules, DNA, RNA, and glycosylations.

1.9k275Python
1 month ago
dfdx
dfdxchelsea0x3b/dfdx

A deep learning library in Rust featuring shape-checked tensors and neural networks with compile-time safety.

1.9k105Rust
1 year ago
dfdx
dfdxcoreylowman/dfdx

A deep learning library for Rust featuring shape-checked tensors and neural networks with compile-time safety.

1.9k105Rust
1 year ago
The original code from the DeepMind article + tweaks
The original code from the DeepMind article + tweakskuz/DeepMind-Atari-Deep-Q-Learner

Original DeepMind DQN 3.0 implementation for Atari game reinforcement learning, with community tweaks.

1.8k529Lua
8 years ago
nndeploy
nndeploynndeploy/nndeploy

A visual workflow-based AI deployment framework for multi-platform and multi-backend inference, supporting large models and edge devices.

1.8k221C++
1 month ago
moderngpu
moderngpumoderngpu/moderngpu

A header-only C++ library for CUDA providing accelerated primitives for solving irregularly parallel problems on GPUs.

1.8k283C++
4 months ago
1
2
3
4
5