Showing 31 of 31 projects
A model-definition framework for state-of-the-art machine learning models across text, vision, audio, and multimodal tasks.
A model-definition framework for state-of-the-art machine learning models across text, vision, audio, and multimodal tasks.
A high-performance C/C++ port of OpenAI's Whisper model for efficient, cross-platform speech recognition.
High-performance C/C++ port of OpenAI's Whisper for efficient, cross-platform speech recognition.
A fast, memory-efficient reimplementation of OpenAI's Whisper speech-to-text model using CTranslate2.
Fast automatic speech recognition with accurate word-level timestamps and speaker diarization, built on OpenAI's Whisper.
A comprehensive open-source toolkit for speech recognition research and development.
Offline speech recognition toolkit supporting 20+ languages with small models and streaming API.
An end-to-end speech processing toolkit for speech recognition, text-to-speech, translation, enhancement, and more.
An open-source Android app for real-time, offline voice translation between multiple languages using on-device AI models.
A browser extension that solves difficult CAPTCHAs by completing reCAPTCHA audio challenges using speech recognition.
A tiny JavaScript library for adding speech recognition and voice commands to websites.
A high-performance automatic speech recognition toolkit from Facebook AI Research, built with fully convolutional neural networks.
Facebook AI Research's automatic speech recognition toolkit for end-to-end ASR with modern neural architectures.
A pipeline that combines OpenAI Whisper for speech-to-text with speaker diarization to identify who said what in audio.
A highly-accurate, lightweight, on-device wake word detection engine powered by deep learning.
A JAX implementation of OpenAI's Whisper model offering up to 70x faster transcription on TPUs.
A native macOS voice-to-text app that transcribes speech to text instantly with 100% offline processing.
A lightweight, open-source continuous speech recognition engine for embedded and offline applications.
A fast parallel implementation of the Connectionist Temporal Classification (CTC) loss function for CPU and GPU.
A fully customizable AI chat component for websites, connecting to any API or hosting models directly in the browser.
Standalone executables of OpenAI's Whisper and Faster-Whisper for speech-to-text transcription without Python dependencies.
A proof-of-concept system that defeats Google's audio reCaptcha with 85% accuracy using speech recognition and browser automation.
A Python library that extends OpenAI's Whisper to provide accurate word-level timestamps and confidence scores for multilingual speech recognition.
An open-source, fully offline voice assistant for many languages, designed for private home automation.
An open-source ChatGPT app with realistic voice capabilities using ElevenLabs text-to-speech.
A curated list of resources, tools, and applications for OpenAI's Whisper speech recognition system.
A curated list of resources, tools, and applications for OpenAI's Whisper speech recognition model.
A Chrome/Edge extension that enables voice conversations with ChatGPT using speech recognition and text-to-speech.
A Swift SDK for fully local, low-latency audio AI on Apple devices, including transcription, text-to-speech, voice activity detection, and speaker diarization.
A robust yet lenient forced aligner built on Kaldi for aligning speech audio with text transcripts.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.