A fast, memory-efficient reimplementation of OpenAI's Whisper speech-to-text model using CTranslate2.
Faster Whisper is a Python library that reimplements OpenAI's Whisper model using the CTranslate2 inference engine. It solves the problem of slow and memory-intensive speech transcription by providing a significantly faster and more efficient alternative that maintains the same high accuracy. The library enables developers to transcribe audio files quickly, with support for batched processing, quantization, and advanced features like word-level timestamps.
Developers and researchers working on speech recognition applications, audio processing pipelines, or any project requiring fast and accurate transcription of audio content. It's particularly valuable for those dealing with large volumes of audio data or resource-constrained environments.
Developers choose Faster Whisper because it offers a substantial performance boost over the original Whisper implementation—up to 4x faster with lower memory usage—while being a drop-in replacement. Its support for quantization, batch processing, and integration with models like Distil-Whisper provides flexibility and efficiency unmatched by other open-source Whisper reimplementations.
Faster Whisper transcription with CTranslate2
Benchmarks show up to 4x faster transcription than OpenAI's Whisper with similar accuracy, thanks to the optimized CTranslate2 engine, reducing time from minutes to seconds for batch processing.
Supports 8-bit integer quantization for both CPU and GPU, cutting VRAM usage by nearly half in tests (e.g., from 4525MB to 2926MB for large-v2 on GPU) without significant accuracy loss.
The BatchedInferencePipeline enables parallel transcription of multiple audio segments, increasing throughput dramatically—benchmarks show a 17s transcription time with batch_size=8 versus 1m03s without.
Includes built-in Silero VAD filtering to remove non-speech segments and word-level timestamps for detailed analysis, enhancing quality out-of-the-box without extra dependencies.
Requires specific NVIDIA libraries (CUDA 12 and cuDNN 9) with manual configuration or workarounds for older versions, adding deployment friction and potential compatibility issues.
As a Python library, it's unsuitable for projects in other languages, limiting integration in polyglot environments or applications requiring native bindings.
Breaking changes in CTranslate2 versions can affect functionality, as noted with the need to downgrade to specific releases for CUDA 11 or cuDNN 8 support, risking maintenance overhead.
Port of OpenAI's Whisper model in C/C++
Port of OpenAI's Whisper model in C/C++
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.