Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Stacks
  3. CUDA
C

CUDA

Other
176 projects1055.4k total stars216.9k total forks23 languages

Open-source projects built with CUDA

There are currently 176 open-source projects built with CUDA, with a combined total of 1055.4k GitHub stars. The most common language among these projects is Python.

Showing 176 open-source projects · page 1 of 5

llama.cpp
llama.cppggml-org/llama.cpp

A C/C++ library for efficient, cross-platform LLM inference with extensive hardware support and quantization.

115.4k19.3kC++
23 hours ago
PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
PyTorch - Tensors and Dynamic neural networks in Python with strong GPU accelerationpytorch/pytorch

A Python package for tensor computation with GPU acceleration and dynamic neural networks built on a tape-based autograd system.

100.6k28.0kPython
21 hours ago
vllm
vllmvllm-project/vllm

A high-throughput, memory-efficient inference and serving engine for large language models (LLMs).

82.2k17.8kPython
22 hours ago
whisper.cpp
whisper.cppggerganov/whisper.cpp

High-performance C/C++ port of OpenAI's Whisper for efficient, cross-platform speech recognition.

50.5k5.6kC++
2 days ago
Bindings for many languages
Bindings for many languagesggerganov/whisper.cpp

A high-performance C/C++ port of OpenAI's Whisper model for efficient, cross-platform speech recognition.

50.5k5.6kC++
2 days ago
Colossal-AI - An Integrated Large-scale Model Training System with Efficient Parallelization Techniques
Colossal-AI - An Integrated Large-scale Model Training System with Efficient Parallelization Techniqueshpcaitech/ColossalAI

A unified deep learning system for efficient large-scale model training and inference with advanced parallelism strategies.

41.4k4.5kPython
14 days ago
FAISS
FAISSfacebookresearch/faiss

A library for efficient similarity search and clustering of dense vectors, scaling to billions of vectors on a single server.

40.2k4.4kC++
21 hours ago
DragGAN
DragGANXingangPan/DragGAN

Interactive point-based manipulation tool for editing GAN-generated images by dragging points to target positions.

35.8k3.4kPython
2 years ago
Caffe Model Zoo
Caffe Model ZooBVLC/caffe

A fast open framework for deep learning with a focus on expression, speed, and modularity.

34.6k18.5kC++
1 year ago
Openpose
OpenposeCMU-Perceptual-Computing-Lab/openpose

Real-time multi-person keypoint detection library for body, face, hands, and foot estimation.

34.1k8.0kC++
1 year ago
GitHub repository
GitHub repositoryApolloAuto/apollo

An open-source, high-performance platform for developing, testing, and deploying autonomous vehicles.

26.7k9.9kC++
1 month ago
Darknet
Darknetpjreddie/darknet

An open source neural network framework in C and CUDA, known for YOLO real-time object detection models.

26.5k21.1kC
2 years ago
candle-wasm-examples
candle-wasm-exampleshuggingface/candle

A minimalist, high-performance machine learning framework for Rust with a focus on serverless inference and GPU support.

20.4k1.6kRust
1 day ago
Buzz
Buzzchidiwilliams/Buzz

An offline desktop application for transcribing and translating audio/video files, live recordings, and YouTube links using OpenAI's Whisper.

19.6k1.4kPython
1 day ago
lightgbm
lightgbmlightgbm-org/LightGBM

A fast, distributed gradient boosting framework based on decision tree algorithms for ranking, classification, and other ML tasks.

18.4k4.0kC++
22 hours ago
CNTK - Microsoft Cognitive Toolkit
CNTK - Microsoft Cognitive ToolkitMicrosoft/CNTK

A unified deep learning toolkit for describing neural networks as computational graphs, supporting feed-forward DNNs, CNNs, and RNNs/LSTMs.

17.6k4.2kC++
3 years ago
Kaldi
Kaldikaldi-asr/kaldi

A comprehensive open-source toolkit for speech recognition research and development.

15.4k5.4kShell
8 months ago
burn
burntracel-ai/burn

A Rust-based deep learning framework and tensor library optimized for flexibility, efficiency, and cross-platform portability.

15.4k933Rust
23 hours ago
Deeplearning4j
Deeplearning4jdeeplearning4j/deeplearning4j

A comprehensive JVM-based deep learning ecosystem for building, training, and deploying models with support for model import and distributed training.

14.2k3.8kJava
3 days ago
TensorRT
TensorRTNVIDIA/TensorRT

NVIDIA's SDK for high-performance deep learning inference optimization and deployment on NVIDIA GPUs.

13.0k2.4kC++
5 days ago
Taskflow
Taskflowtaskflow/taskflow

A fast, expressive, and header-only C++ library for building task-parallel programs with static, dynamic, and conditional task graphs.

12.0k1.4kC++
1 day ago
cupy
cupycupy/cupy

A NumPy/SciPy-compatible array library for GPU-accelerated computing with Python, supporting NVIDIA CUDA and AMD ROCm.

11.0k1.0kPython
2 days ago
cudf
cudfrapidsai/cudf

A GPU-accelerated DataFrame library for tabular data processing, part of the RAPIDS data science suite.

9.7k1.1kC++
1 day ago
Torch7 Cheat sheet
Torch7 Cheat sheettorch/torch7

A scientific computing framework with wide support for machine learning algorithms, built around multi-dimensional tensor operations.

9.1k2.3kC
1 year ago
gocv
gocvhybridgroup/gocv

Go language bindings for OpenCV 4, enabling computer vision applications with support for CUDA, DNN, and OpenVINO.

7.5k897Go
11 days ago
mistral.rs
mistral.rsEricLBuehler/mistral.rs

A fast, flexible, and hardware-aware LLM inference engine with zero-config support for any Hugging Face model.

7.3k621Rust
1 day ago
Chainer
Chainerchainer/chainer

A flexible Python deep learning framework using define-by-run dynamic computational graphs for neural network research.

5.9k1.3kPython
2 years ago
gorgonia
gorgoniagorgonia/gorgonia

A library for building and evaluating mathematical expressions and neural networks in Go, with automatic differentiation and GPU support.

5.9k450Go
1 year ago
NeuralTalk
NeuralTalkkarpathy/neuraltalk2

Efficient image captioning code in Torch, using a CNN-RNN model to generate captions for images, optimized for GPU training.

5.6k1.3kJupyter Notebook
8 years ago
leaf
leafautumnai/leaf

An open-source machine learning framework for building classical, deep, or hybrid ML applications with a focus on performance and portability.

5.5k269Rust
2 years ago
rust
rusttensorflow/rust

Rust language bindings for TensorFlow, providing idiomatic access to machine learning capabilities.

5.5k435Rust
1 year ago
flashlight
flashlightfacebookresearch/flashlight

A fast, flexible C++ standalone library for machine learning with high-performance defaults and total internal modifiability.

5.4k502C++
3 months ago
flashlight
flashlightflashlight/flashlight

A fast, flexible C++ standalone library for machine learning with high-performance defaults and total internal modifiability.

5.4k502C++
3 months ago
yolact
yolactdbolya/yolact

A fully convolutional neural network for real-time instance segmentation, achieving high speed and accuracy on COCO.

5.2k1.3kPython
9 months ago
cuML
cuMLrapidsai/cuml

A suite of GPU-accelerated machine learning algorithms with scikit-learn compatible APIs for 10-50x faster performance on large datasets.

5.2k629Python
3 days ago
GitHub repository
GitHub repositoryNVIDIAGameWorks/kaolin

A PyTorch library providing GPU-accelerated tools for 3D deep learning, including differentiable rendering and geometric operations.

5.1k624Python
5 days ago
12345
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub