Showing 36 of 40 projects
A deep learning toolkit for Text-to-Speech generation with pretrained models in over 1100 languages and tools for training.
An open-source, cross-platform ebook reader with multi-format support, annotations, sync, and accessibility features.
A multi-voice text-to-speech system that produces highly realistic prosody and intonation using autoregressive and diffusion decoders.
A concise and elegant macOS dictionary and translation app with OCR, supporting 20+ services including Apple Dictionary, OpenAI, and DeepL.
A next-generation Kaldi-based toolkit for offline speech-to-text, text-to-speech, and audio processing across 12 languages and diverse hardware.
SDKs for adding private, on-device AI features like LLM chat, speech-to-text, and text-to-speech to mobile and web apps.
An end-to-end speech processing toolkit for speech recognition, text-to-speech, translation, enhancement, and more.
A TensorFlow implementation of DeepMind's WaveNet neural network for generating raw audio waveforms.
A unified web interface for text-to-speech, voice cloning, and audio generation with support for dozens of AI models.
A Swift ePub reader and parser framework for iOS with rich customization and accessibility features.
Python library and CLI tool to interface with Google Translate's text-to-speech API for generating MP3 audio from text.
An open-source ChatGPT app with realistic voice capabilities using ElevenLabs text-to-speech.
A flow-based generative network for fast, high-quality speech synthesis from mel-spectrograms.
A Swift SDK for fully local, low-latency audio AI on Apple devices, including transcription, text-to-speech, voice activity detection, and speaker diarization.
A lightweight desktop translator that translates and speaks text using multiple online translation APIs.
A Chrome/Edge extension that enables voice conversations with ChatGPT using speech recognition and text-to-speech.
An Arduino library for ESP32 multi-core chips to play audio files and streams from SD card or network via I2S to external DACs/amplifiers.
A Python library and CLI tool for converting text to phonetic transcriptions (phones) across multiple languages using various backends.
A Linux desktop app for offline note-taking, reading, and translation using speech-to-text, text-to-speech, and machine translation.
Node.js client library for accessing IBM Watson AI services like Assistant, Speech-to-Text, and Natural Language Understanding.
A Python client library for interacting with IBM Watson AI services, available via pip as ibm-watson.
Enables natural two-way voice conversations with Claude Code and other MCP agents, perfect for hands-free coding assistance.
Swift SDK for integrating IBM Watson AI services like speech, language, and assistant into iOS and Linux applications.
A voice-based conversation interface for ChatGPT that allows users to speak and receive spoken responses.
A fast and stable translation plugin for PowerToys Run, enabling quick text and clipboard translation with multi-platform support.
An AI plugin for Payload CMS that adds content generation, translation, proofreading, and image/voice creation to your content workflow.
A React Native library for text-to-speech functionality with voice and rate control.
A JavaScript library for adding IBM Watson Speech to Text and Text to Speech capabilities to web applications.
An Arduino library for text-to-speech synthesis using PWM or DAC outputs with external amplifier.
An Android chatbot with voice interaction capabilities powered by IBM Watson's AI services on IBM Cloud.
Ruby wrapper for espeak and lame to generate Text-To-Speech MP3 files with customizable voice parameters.
A high-performance real-time voice processing server in Rust providing unified STT/TTS services via WebSocket and REST APIs.
A JavaScript library for building Speech Synthesis Markup Language (SSML) using a clean builder pattern API.
Android client library for integrating IBM Watson cognitive services like speech recognition, text-to-speech, and visual recognition.
A Capacitor plugin for synthesizing speech from text in cross-platform mobile apps.
Convert folders or RSS feeds into Studio pack zip files for Lunii and compatible audio storytelling devices.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.