Showing 36 of 59 projects
A model-definition framework for state-of-the-art machine learning models across text, vision, audio, and multimodal tasks.
Cross-platform framework for building customizable on-device machine learning pipelines for live and streaming media.
A fast, memory-efficient reimplementation of OpenAI's Whisper speech-to-text model using CTranslate2.
Fast automatic speech recognition with accurate word-level timestamps and speaker diarization, built on OpenAI's Whisper.
A modern macOS virtual audio loopback driver for routing audio between applications with zero additional latency.
An easy-to-use, multi-track audio editor and recorder for Windows, macOS, GNU/Linux, and other operating systems.
A comprehensive open-source toolkit for speech recognition research and development.
A Web Audio framework for creating interactive music and audio applications in the browser.
A next-generation Kaldi-based toolkit for offline speech-to-text, text-to-speech, and audio processing across 12 languages and diverse hardware.
Audio synthesis, processing, and analysis platform for iOS, macOS, and tvOS applications.
A hands-on tutorial teaching how to use FFmpeg's libav libraries for media processing, from basics to transcoding and transmuxing.
An open-source Python toolkit for speaker diarization with state-of-the-art pretrained models and pipelines.
A fluent Node.js API for FFmpeg that simplifies complex command-line video and audio processing.
A single-file C audio library for playback, capture, and processing with no external dependencies.
A system-wide audio equalizer and volume mixer for macOS with free and pro features.
A Python library for audio feature extraction, classification, segmentation, and machine learning applications.
A comprehensive .NET audio library for playback, recording, format conversion, MIDI, and audio manipulation.
A peer-reviewed, free, open source C++ library for professional-quality creative coding.
A pipeline that combines OpenAI Whisper for speech-to-text with speaker diarization to identify who said what in audio.
An object-oriented PHP library for video and audio manipulation using FFmpeg binaries.
A simple, intuitive audio visualization and processing framework for iOS and macOS built on Core Audio.
A lightweight, open-source continuous speech recognition engine for embedded and offline applications.
A JavaScript plugin for recording and exporting audio from Web Audio API nodes as WAV files.
A cross-platform open-source library for rendering Milkdrop-compatible music visualizations from audio input.
FFmpeg compiled to JavaScript via Emscripten for in-browser video/audio processing.
A pure Rust library for demuxing media formats, reading metadata tags, and decoding audio codecs.
Standalone executables of OpenAI's Whisper and Faster-Whisper for speech-to-text transcription without Python dependencies.
A Swift community-driven package for interacting with the OpenAI API and other compatible providers.
An audio library for PyTorch providing data manipulation, transformations, and dataset loaders for machine learning applications.
A proof-of-concept system that defeats Google's audio reCaptcha with 85% accuracy using speech recognition and browser automation.
A Python library that extends OpenAI's Whisper to provide accurate word-level timestamps and confidence scores for multilingual speech recognition.
A cross-platform, LGPL-licensed software implementation of the OpenAL 3D audio API.
An open-source tool for audio matching and mastering that makes your track sound like a reference song.
A fast, extensible gapless audio player and streamer for iOS and macOS with low CPU usage.
A web implementation of the SoundWave paper that detects motion using the Doppler effect with microphone and speakers.
A comprehensive .NET audio library for playing, recording, encoding, decoding, and real-time processing of audio in C#.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.