A high-performance automatic speech recognition toolkit from Facebook AI Research, built with fully convolutional neural networks.
wav2letter++ is Facebook AI Research's automatic speech recognition toolkit that implements state-of-the-art end-to-end speech recognition models using fully convolutional neural networks. It provides recipes and pre-trained models for reproducing research results and building production ASR systems. The toolkit focuses on efficiency, scalability, and supports both streaming and offline speech recognition.
AI researchers and engineers working on speech recognition systems, particularly those interested in end-to-end models, convolutional architectures, and reproducible research implementations.
Developers choose wav2letter++ for its production-ready implementation of cutting-edge ASR research, fully convolutional architecture that offers performance advantages over recurrent models, and comprehensive recipes that reproduce published paper results with pre-trained models.
Facebook AI Research's Automatic Speech Recognition Toolkit
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses only CNNs without recurrent layers, leading to faster training and inference as emphasized in the README's focus on efficiency and scalability.
Supports lexicon-free and sequence-to-sequence models that map audio directly to text, simplifying the ASR pipeline as highlighted in the recipes for modern architectures.
Includes recipes for real-time, low-latency speech recognition, making it production-ready for applications requiring immediate processing.
Provides models that reproduce results from published papers, aiding in reproducible research with clear recipes linked in the README.
Implements semi-supervised learning techniques for improved accuracy, as detailed in the self-training recipe section.
Requires building from source with Flashlight and CMake, and the README specifies using the 0.3 branch, adding installation complexity and potential dependency conflicts.
The toolkit has been consolidated into Flashlight, with the old repository less actively developed, leading to confusion and breaking changes for users of pre-consolidation versions.
The README is brief and focused on research reproduction, lacking detailed tutorials or examples for newcomers to speech recognition or the codebase.
Primarily supports fully convolutional models, which may not be optimal for all ASR tasks compared to transformer-based or hybrid approaches available in other toolkits.