A high-performance real-time voice processing server in Rust providing unified STT/TTS services via WebSocket and REST APIs.
Sayna is a high-performance real-time voice processing server built in Rust that provides unified Speech-to-Text (STT) and Text-to-Speech (TTS) services through WebSocket and REST APIs. It serves as a voice layer for AI agents, enabling seamless integration with existing agentic frameworks by abstracting multiple voice providers behind a single interface.
Developers building AI agents, voice-enabled applications, or real-time communication systems who need reliable STT/TTS services with provider flexibility and low-latency processing.
Sayna offers a unified API that supports multiple voice providers simultaneously, real-time WebSocket streaming, and advanced features like noise filtering and turn detection—all in a self-hostable, high-performance Rust server that simplifies voice integration complexity.
Sayna is a unified Voice Layer for AI Agents with a seemless integration to an existing agentic frameworks
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Abstracts multiple STT/TTS providers like Deepgram, ElevenLabs, Google Cloud, and Azure behind a single API, enabling easy switching and fallback strategies as highlighted in the 'Unified Voice API' feature.
Supports bidirectional audio streaming over WebSockets for low-latency processing, crucial for interactive AI agents and real-time applications, with configurable sample rates and encodings.
Includes optional features like Silero-VAD for voice activity detection and DeepFilterNet noise suppression, enhancing audio quality and enabling turn detection when the 'stt-vad' feature is enabled.
Supports external authentication services, per-request credential overrides, and audio-disabled mode for development, providing security and testing flexibility without provider keys.
Requires Docker or Rust compilation, configuration of numerous environment variables, and optional feature flags like 'stt-vad' and 'noise-filter', which can be daunting for quick prototyping.
Advanced features like DeepFilterNet noise suppression are CPU-heavy, as noted in the Performance Considerations, potentially impacting server resource usage and scalability.
Core STT/TTS functionality still relies on third-party API keys and services, introducing potential cost, rate limits, and vendor lock-in despite the unified interface.