High-performance C/C++ port of OpenAI's Whisper for efficient, cross-platform speech recognition.
whisper.cpp is a high-performance C/C++ port of OpenAI's Whisper model for automatic speech recognition (ASR). It provides efficient, offline transcription and translation of audio files, optimized to run on a wide variety of hardware from CPUs to GPUs and specialized accelerators. The implementation is lightweight, dependency-free, and designed for cross-platform deployment.
Developers and researchers needing efficient, offline speech recognition on resource-constrained devices, embedded systems, mobile applications, or servers. It's ideal for projects requiring portable ASR without Python dependencies or cloud services.
It offers significantly faster inference and lower resource usage than the original Python implementation, with support for hardware acceleration across multiple platforms. The self-contained, portable design allows integration into diverse applications, from real-time voice assistants to batch transcription pipelines.
Port of OpenAI's Whisper model in C/C++
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Supports Apple Silicon with Metal/Core ML, x86 with AVX, ARM NEON, and WebAssembly, enabling efficient deployment from servers to mobile devices, as highlighted in the feature list.
Integrates with NVIDIA CUDA, Vulkan, OpenVINO, and specialized NPUs for GPU inference, detailed in sections like NVIDIA GPU and Vulkan support.
Uses zero runtime memory allocations, mixed F16/F32 precision, and optional quantization to reduce model size and speed up processing, mentioned in the Efficient Inference and Quantization sections.
Implemented in plain C/C++ with no dependencies, allowing minimal footprint and offline use on edge devices, as stated in the project philosophy.
The CLI tool only accepts 16-bit WAV files, requiring external tools like FFmpeg for conversion, which adds complexity for handling MP3, Opus, or other formats.
Enabling hardware acceleration like Core ML or OpenVINO involves multiple steps, including Python dependencies and model generation, increasing initial setup time.
Focused solely on inference with no support for training or fine-tuning models, restricting use cases to pre-trained models only.