A faster, memory-efficient command-line client for OpenAI's Whisper speech recognition, powered by CTranslate2.
whisper-ctranslate2 is a high-performance command-line client for OpenAI's Whisper model, designed for speech recognition and translation tasks. It uses the CTranslate2 inference engine to deliver up to 4x faster transcription speeds with lower memory usage while maintaining full compatibility with the original Whisper CLI. The tool supports advanced features like speaker diarization, live microphone input, and quantization for efficient CPU/GPU execution.
Developers, researchers, and content creators who need fast, accurate speech-to-text transcription and translation from audio files or live input, especially those already using OpenAI's Whisper CLI.
It offers significantly faster inference and reduced memory footprint compared to the original Whisper, with added features like speaker identification and VAD filtering, all while maintaining a seamless migration path due to full CLI compatibility.
Whisper command line client compatible with original OpenAI client based on CTranslate2.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Offers up to 4x faster transcription than OpenAI Whisper, with batched inference pushing speeds up to 16x faster, as documented in the performance claims, while using less memory.
Uses the exact same command-line interface as OpenAI Whisper, making migration seamless for existing users without requiring changes to scripts or workflows.
Includes Voice Activity Detection filtering and speaker diarization to improve transcription quality by removing non-speech parts and identifying different speakers, enhancing utility for interviews or meetings.
Supports Docker images with pre-loaded models, live microphone transcription, and custom model loading in CTranslate2 format, providing versatility for local or containerized environments.
Speaker identification requires installing pyannote.audio, accepting HuggingFace user conditions, and managing API tokens, adding significant setup overhead beyond the core tool.
Translation tasks only convert audio to English, restricting use cases where multi-language or non-English target translations are needed, as admitted in the usage notes.
Custom fine-tuned models must be converted to CTranslate2 format, which can be a barrier for users with models in other frameworks like PyTorch or TensorFlow, requiring extra conversion steps.