There are currently 176 open-source projects built with CUDA, with a combined total of 1055.4k GitHub stars. The most common language among these projects is Python.
Showing 176 open-source projects · page 1 of 5
A C/C++ library for efficient, cross-platform LLM inference with extensive hardware support and quantization.
A Python package for tensor computation with GPU acceleration and dynamic neural networks built on a tape-based autograd system.
A high-throughput, memory-efficient inference and serving engine for large language models (LLMs).
High-performance C/C++ port of OpenAI's Whisper for efficient, cross-platform speech recognition.
A high-performance C/C++ port of OpenAI's Whisper model for efficient, cross-platform speech recognition.
A unified deep learning system for efficient large-scale model training and inference with advanced parallelism strategies.
A library for efficient similarity search and clustering of dense vectors, scaling to billions of vectors on a single server.
Interactive point-based manipulation tool for editing GAN-generated images by dragging points to target positions.
A fast open framework for deep learning with a focus on expression, speed, and modularity.
Real-time multi-person keypoint detection library for body, face, hands, and foot estimation.
An open-source, high-performance platform for developing, testing, and deploying autonomous vehicles.
An open source neural network framework in C and CUDA, known for YOLO real-time object detection models.
A minimalist, high-performance machine learning framework for Rust with a focus on serverless inference and GPU support.
An offline desktop application for transcribing and translating audio/video files, live recordings, and YouTube links using OpenAI's Whisper.
A fast, distributed gradient boosting framework based on decision tree algorithms for ranking, classification, and other ML tasks.
A unified deep learning toolkit for describing neural networks as computational graphs, supporting feed-forward DNNs, CNNs, and RNNs/LSTMs.
A comprehensive open-source toolkit for speech recognition research and development.
A Rust-based deep learning framework and tensor library optimized for flexibility, efficiency, and cross-platform portability.
A comprehensive JVM-based deep learning ecosystem for building, training, and deploying models with support for model import and distributed training.
NVIDIA's SDK for high-performance deep learning inference optimization and deployment on NVIDIA GPUs.
A fast, expressive, and header-only C++ library for building task-parallel programs with static, dynamic, and conditional task graphs.
A NumPy/SciPy-compatible array library for GPU-accelerated computing with Python, supporting NVIDIA CUDA and AMD ROCm.
A GPU-accelerated DataFrame library for tabular data processing, part of the RAPIDS data science suite.
A scientific computing framework with wide support for machine learning algorithms, built around multi-dimensional tensor operations.
Go language bindings for OpenCV 4, enabling computer vision applications with support for CUDA, DNN, and OpenVINO.
A fast, flexible, and hardware-aware LLM inference engine with zero-config support for any Hugging Face model.
A flexible Python deep learning framework using define-by-run dynamic computational graphs for neural network research.
A library for building and evaluating mathematical expressions and neural networks in Go, with automatic differentiation and GPU support.
Efficient image captioning code in Torch, using a CNN-RNN model to generate captions for images, optimized for GPU training.
An open-source machine learning framework for building classical, deep, or hybrid ML applications with a focus on performance and portability.
Rust language bindings for TensorFlow, providing idiomatic access to machine learning capabilities.
A fast, flexible C++ standalone library for machine learning with high-performance defaults and total internal modifiability.
A fast, flexible C++ standalone library for machine learning with high-performance defaults and total internal modifiability.
A fully convolutional neural network for real-time instance segmentation, achieving high speed and accuracy on COCO.
A suite of GPU-accelerated machine learning algorithms with scikit-learn compatible APIs for 10-50x faster performance on large datasets.
A PyTorch library providing GPU-accelerated tools for 3D deep learning, including differentiable rendering and geometric operations.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.