A JAX implementation of OpenAI's Whisper model offering up to 70x faster transcription on TPUs.
Whisper JAX is a high-performance implementation of OpenAI's Whisper speech-to-text model using JAX. It solves the problem of slow audio transcription by leveraging JAX's Just-In-Time compilation and parallelization to achieve up to 70x faster inference compared to PyTorch versions. The library supports transcription, translation, and timestamp prediction across various hardware including TPUs, GPUs, and CPUs.
Machine learning engineers and researchers working with large-scale audio processing, particularly those needing fast Whisper inference on TPU/GPU clusters. Also suitable for developers building speech recognition services requiring high throughput.
Developers choose Whisper JAX for its unmatched inference speed, hardware flexibility, and seamless integration with the Hugging Face ecosystem. Its unique selling point is the 70x speed-up on TPUs while maintaining full Whisper functionality.
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Benchmarks show up to 70x faster than PyTorch on TPUs, with batched processing delivering 10x speed-ups on GPUs due to JAX's JIT compilation.
Runs seamlessly on CPU, GPU, and TPU using pmap for data parallelism, as outlined in the pipeline usage for multi-device support.
Parallel transcription of audio chunks with minimal accuracy loss (~1% WER penalty) provides significant throughput gains, detailed in the batching section.
Built on Transformers and supports all Whisper models from the Hub, including easy conversion from PyTorch checkpoints using the from_pt argument.
Requires JAX installation tailored to specific hardware (e.g., TPU/GPU variants), and advanced features like T5x partitioning demand deep JAX expertise.
First inference run is slow due to compilation, which hinders one-off or dynamic use cases, as noted in the pipeline usage where caching is needed for speed.
Fine-tuned PyTorch models must be manually converted to Flax, adding extra steps and dependency on both PyTorch and Flax for compatibility.