Showing 18 of 18 projects
A deep learning toolkit for Text-to-Speech generation with pretrained models in over 1100 languages and tools for training.
An open-source, cross-platform ebook reader with multi-format support, annotations, sync, and accessibility features.
A multi-voice text-to-speech system that produces highly realistic prosody and intonation using autoregressive and diffusion decoders.
A concise and elegant macOS dictionary and translation app with OCR, supporting 20+ services including Apple Dictionary, OpenAI, and DeepL.
A next-generation Kaldi-based toolkit for offline speech-to-text, text-to-speech, and audio processing across 12 languages and diverse hardware.
SDKs for adding private, on-device AI features like LLM chat, speech-to-text, and text-to-speech to mobile and web apps.
An end-to-end speech processing toolkit for speech recognition, text-to-speech, translation, enhancement, and more.
A TensorFlow implementation of DeepMind's WaveNet neural network for generating raw audio waveforms.
A unified web interface for text-to-speech, voice cloning, and audio generation with support for dozens of AI models.
A Swift ePub reader and parser framework for iOS with rich customization and accessibility features.
Python library and CLI tool to interface with Google Translate's text-to-speech API for generating MP3 audio from text.
An open-source ChatGPT app with realistic voice capabilities using ElevenLabs text-to-speech.
A flow-based generative network for fast, high-quality speech synthesis from mel-spectrograms.
A lightweight desktop translator that translates and speaks text using multiple online translation APIs.
A Chrome/Edge extension that enables voice conversations with ChatGPT using speech recognition and text-to-speech.
A Swift SDK for fully local, low-latency audio AI on Apple devices, including transcription, text-to-speech, voice activity detection, and speaker diarization.
An Arduino library for ESP32 multi-core chips to play audio files and streams from SD card or network via I2S to external DACs/amplifiers.
A Python library and CLI tool for converting text to phonetic transcriptions (phones) across multiple languages using various backends.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.