Showing 36 of 41 projects
A deep learning toolkit for Text-to-Speech generation with pretrained models in over 1100 languages and tools for training.
An open-source, cross-platform ebook reader with multi-format support, annotations, sync, and accessibility features.
A multi-voice text-to-speech system that produces highly realistic prosody and intonation using autoregressive and diffusion decoders.
A concise and elegant macOS dictionary and translation app with OCR, supporting 20+ services including Apple Dictionary, OpenAI, and DeepL.
A next-generation Kaldi-based toolkit for offline speech-to-text, text-to-speech, and audio processing across 12 languages and diverse hardware.
SDKs for adding private, on-device AI features like LLM chat, speech-to-text, and text-to-speech to mobile and web apps.
An end-to-end speech processing toolkit for speech recognition, text-to-speech, translation, enhancement, and more.
A TensorFlow implementation of DeepMind's WaveNet neural network for generating raw audio waveforms.
A unified web interface for text-to-speech, voice cloning, and audio generation with support for dozens of AI models.
A Swift ePub reader and parser framework for iOS with rich customization and accessibility features.
Python library and CLI tool to interface with Google Translate's text-to-speech API for generating MP3 audio from text.
An open-source ChatGPT app with realistic voice capabilities using ElevenLabs text-to-speech.
A flow-based generative network for fast, high-quality speech synthesis from mel-spectrograms.
A Swift SDK for fully local, low-latency audio AI on Apple devices, including transcription, text-to-speech, voice activity detection, and speaker diarization.
A lightweight desktop translator that translates and speaks text using multiple online translation APIs.
A Chrome/Edge extension that enables voice conversations with ChatGPT using speech recognition and text-to-speech.
An Arduino library for ESP32 multi-core chips to play audio files and streams from SD card or network via I2S to external DACs/amplifiers.
A Python library and CLI tool for converting text to phonetic transcriptions (phones) across multiple languages using various backends.
A Linux desktop app for offline note-taking, reading, and translation using speech-to-text, text-to-speech, and machine translation.
Node.js client library for accessing IBM Watson AI services like Assistant, Speech-to-Text, and Natural Language Understanding.
A Python client library for interacting with IBM Watson AI services, available via pip as ibm-watson.
Enables natural two-way voice conversations with Claude Code and other MCP agents, perfect for hands-free coding assistance.
Swift SDK for integrating IBM Watson AI services like speech, language, and assistant into iOS and Linux applications.
A voice-based conversation interface for ChatGPT that allows users to speak and receive spoken responses.
A fast and stable translation plugin for PowerToys Run, enabling quick text and clipboard translation with multi-platform support.
An AI plugin for Payload CMS that adds content generation, translation, proofreading, and image/voice creation to your content workflow.
A React Native library for text-to-speech functionality with voice and rate control.
A JavaScript library for adding IBM Watson Speech to Text and Text to Speech capabilities to web applications.
An Arduino library for text-to-speech synthesis using PWM or DAC outputs with external amplifier.
Ruby wrapper for espeak and lame to generate Text-To-Speech MP3 files with customizable voice parameters.
An Android chatbot with voice interaction capabilities powered by IBM Watson's AI services on IBM Cloud.
A high-performance real-time voice processing server in Rust providing unified STT/TTS services via WebSocket and REST APIs.
A JavaScript library for building Speech Synthesis Markup Language (SSML) using a clean builder pattern API.
Android client library for integrating IBM Watson cognitive services like speech recognition, text-to-speech, and visual recognition.
A Capacitor plugin for synthesizing speech from text in cross-platform mobile apps.
Convert folders or RSS feeds into Studio pack zip files for Lunii and compatible audio storytelling devices.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.