A high-performance C/C++ port of OpenAI's Whisper model for efficient, cross-platform speech recognition.
whisper.cpp is a high-performance, portable C/C++ implementation of OpenAI's Whisper automatic speech recognition model. It enables efficient, offline transcription and translation of audio files across diverse hardware, from Apple Silicon and x86 CPUs to NVIDIA GPUs and mobile devices. The project solves the need for a lightweight, dependency-free ASR solution that can run fully on-device without cloud services.
Developers and researchers building speech-enabled applications that require offline, low-latency, or privacy-focused transcription on embedded systems, mobile apps, servers, or edge devices. It's also suitable for those needing to deploy Whisper in resource-constrained environments.
Developers choose whisper.cpp for its exceptional performance, minimal footprint, and broad hardware support. Unlike the original Python implementation, it offers zero-dependency deployment, optimized inference across CPU/GPU architectures, and the ability to run completely offline on everything from smartphones to high-performance servers.
Port of OpenAI's Whisper model in C/C++
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Runs on a wide range of platforms including macOS, iOS, Android, Linux, Windows, FreeBSD, and Docker, with native optimizations for Apple Silicon, x86 AVX, ARM NEON, and more, enabling deployment from servers to embedded devices.
Supports GPU inference via NVIDIA CUDA, Vulkan, OpenVINO, and other backends, significantly speeding up processing on compatible hardware, as evidenced by Metal acceleration demos for Apple devices.
Uses zero runtime memory allocations and offers integer quantization options, reducing model size and memory footprint—e.g., the 'tiny' model uses only ~273 MB of RAM, making it suitable for resource-constrained environments.
Includes CLI tools, real-time streaming examples, a web server, and demos for voice assistants and karaoke-style videos, providing a comprehensive suite for various ASR applications out of the box.
The project explicitly states it's for inference only, lacking built-in support for training or fine-tuning models, which requires users to rely on external tools for model customization.
Setting up hardware acceleration (e.g., CUDA, OpenVINO) involves manual CMake flags and environment setups, which can be error-prone and time-consuming compared to drop-in Python libraries.
The CLI tool initially supports only 16-bit WAV files; handling other formats like MP3 or Opus requires FFmpeg integration, which is noted as Linux-only and needs additional dependencies.