Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Stacks
  3. CUDA
C

CUDA

Other
80 projects981.5k total stars202.0k total forks10 languages

Open-source projects built with CUDA

There are currently 80 open-source projects built with CUDA, with a combined total of 981.5k GitHub stars. The most common language among these projects is C++.

Showing 80 open-source projects · page 1 of 3

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
llama.cpp
llama.cppggml-org/llama.cpp

A C/C++ library for efficient, cross-platform LLM inference with extensive hardware support and quantization.

105.8k17.2kC++
1 day ago
PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
PyTorch - Tensors and Dynamic neural networks in Python with strong GPU accelerationpytorch/pytorch

A Python package for tensor computation with GPU acceleration and dynamic neural networks built on a tape-based autograd system.

99.4k27.6kPython
1 day ago
vllm
vllmvllm-project/vllm

A high-throughput, memory-efficient inference and serving engine for large language models (LLMs).

77.8k16.0kPython
1 day ago
whisper.cpp
whisper.cppggerganov/whisper.cpp

High-performance C/C++ port of OpenAI's Whisper for efficient, cross-platform speech recognition.

48.9k5.4kC++
4 days ago
Bindings for many languages
Bindings for many languagesggerganov/whisper.cpp

A high-performance C/C++ port of OpenAI's Whisper model for efficient, cross-platform speech recognition.

48.9k5.4kC++
4 days ago
Colossal-AI - An Integrated Large-scale Model Training System with Efficient Parallelization Techniques
Colossal-AI - An Integrated Large-scale Model Training System with Efficient Parallelization Techniqueshpcaitech/ColossalAI

A unified deep learning system for efficient large-scale model training and inference with advanced parallelism strategies.

41.4k4.5kPython
11 days ago
FAISS
FAISSfacebookresearch/faiss

A library for efficient similarity search and clustering of dense vectors, scaling to billions of vectors on a single server.

39.8k4.3kC++
1 day ago
DragGAN
DragGANXingangPan/DragGAN

Interactive point-based manipulation tool for editing GAN-generated images by dragging points to target positions.

35.9k3.4kPython
1 year ago
Caffe Model Zoo
Caffe Model ZooBVLC/caffe

A fast open framework for deep learning with a focus on expression, speed, and modularity.

34.6k18.5kC++
1 year ago
Openpose
OpenposeCMU-Perceptual-Computing-Lab/openpose

Real-time multi-person keypoint detection library for body, face, hands, and foot estimation.

34.0k8.1kC++
1 year ago
GitHub repository
GitHub repositoryApolloAuto/apollo

An open-source, high-performance platform for developing, testing, and deploying autonomous vehicles.

26.6k9.9kC++
8 days ago
Darknet
Darknetpjreddie/darknet

An open source neural network framework in C and CUDA, known for YOLO real-time object detection models.

26.4k21.1kC
2 years ago
candle-wasm-examples
candle-wasm-exampleshuggingface/candle

A minimalist, high-performance machine learning framework for Rust with a focus on serverless inference and GPU support.

20.1k1.5kRust
2 days ago
Buzz
Buzzchidiwilliams/Buzz

An offline desktop application for transcribing and translating audio/video files, live recordings, and YouTube links using OpenAI's Whisper.

18.8k1.4kPython
2 days ago
lightgbm
lightgbmlightgbm-org/LightGBM

A fast, distributed gradient boosting framework based on decision tree algorithms for ranking, classification, and other ML tasks.

18.3k4.0kC++
1 day ago
CNTK - Microsoft Cognitive Toolkit
CNTK - Microsoft Cognitive ToolkitMicrosoft/CNTK

A unified deep learning toolkit for describing neural networks as computational graphs, supporting feed-forward DNNs, CNNs, and RNNs/LSTMs.

17.6k4.2kC++
3 years ago
Kaldi
Kaldikaldi-asr/kaldi

A comprehensive open-source toolkit for speech recognition research and development.

15.4k5.4kShell
7 months ago
burn
burntracel-ai/burn

A Rust-based deep learning framework and tensor library optimized for flexibility, efficiency, and cross-platform portability.

14.9k886Rust
1 day ago
Deeplearning4j
Deeplearning4jdeeplearning4j/deeplearning4j

A comprehensive JVM-based deep learning ecosystem for building, training, and deploying models with support for model import and distributed training.

14.2k3.8kJava
22 days ago
TensorRT
TensorRTNVIDIA/TensorRT

NVIDIA's SDK for high-performance deep learning inference optimization and deployment on NVIDIA GPUs.

12.9k2.3kC++
10 days ago
Taskflow
Taskflowtaskflow/taskflow

A fast, expressive, and header-only C++ library for building task-parallel programs with static, dynamic, and conditional task graphs.

11.9k1.4kC++
2 days ago
cupy
cupycupy/cupy

A NumPy/SciPy-compatible array library for GPU-accelerated computing with Python, supporting NVIDIA CUDA and AMD ROCm.

10.9k1.0kPython
2 days ago
cudf
cudfrapidsai/cudf

A GPU-accelerated DataFrame library for tabular data processing, part of the RAPIDS data science suite.

9.6k1.0kC++
1 day ago
Torch7 Cheat sheet
Torch7 Cheat sheettorch/torch7

A scientific computing framework with wide support for machine learning algorithms, built around multi-dimensional tensor operations.

9.1k2.4kC
1 year ago
gocv
gocvhybridgroup/gocv

Go language bindings for OpenCV 4, enabling computer vision applications with support for CUDA, DNN, and OpenVINO.

7.4k902Go
2 months ago
mistral.rs
mistral.rsEricLBuehler/mistral.rs

A fast, flexible, and hardware-aware LLM inference engine with zero-config support for any Hugging Face model.

7.0k581Rust
9 days ago
Chainer
Chainerchainer/chainer

A flexible Python deep learning framework using define-by-run dynamic computational graphs for neural network research.

5.9k1.4kPython
2 years ago
gorgonia
gorgoniagorgonia/gorgonia

A library for building and evaluating mathematical expressions and neural networks in Go, with automatic differentiation and GPU support.

5.9k449Go
1 year ago
NeuralTalk
NeuralTalkkarpathy/neuraltalk2

Efficient image captioning code in Torch, using a CNN-RNN model to generate captions for images, optimized for GPU training.

5.6k1.3kJupyter Notebook
8 years ago
leaf
leafautumnai/leaf

An open-source machine learning framework for building classical, deep, or hybrid ML applications with a focus on performance and portability.

5.5k269Rust
2 years ago
rust
rusttensorflow/rust

Rust language bindings for TensorFlow, providing idiomatic access to machine learning capabilities.

5.5k436Rust
1 year ago
flashlight
flashlightfacebookresearch/flashlight

A fast, flexible C++ standalone library for machine learning with high-performance defaults and total internal modifiability.

5.4k503C++
2 months ago
flashlight
flashlightflashlight/flashlight

A fast, flexible C++ standalone library for machine learning with high-performance defaults and total internal modifiability.

5.4k503C++
2 months ago
yolact
yolactdbolya/yolact

A fully convolutional neural network for real-time instance segmentation, achieving high speed and accuracy on COCO.

5.2k1.3kPython
7 months ago
cuML
cuMLrapidsai/cuml

A suite of GPU-accelerated machine learning algorithms with scikit-learn compatible APIs for 10-50x faster performance on large datasets.

5.2k622C++
1 day ago
GitHub repository
GitHub repositoryNVIDIAGameWorks/kaolin

A PyTorch library providing GPU-accelerated tools for 3D deep learning, including differentiable rendering and geometric operations.

5.1k619Python
2 days ago
1
2
3