How do I run Sayna without API keys for local testing?

Use audio-disabled mode by sending a WebSocket configuration message with 'audio: false'. This allows testing control flows and WebSocket messaging without setting up provider credentials, as described in the Quick Start section.

Sayna vs using direct provider APIs: what's the advantage?

Sayna provides a unified interface across multiple providers, real-time streaming, and integrated features like noise filtering and VAD. Direct APIs are simpler for single-provider use but lack Sayna's abstraction and advanced processing capabilities.

Can Sayna handle group voice calls or conferences?

Yes, through LiveKit integration for WebRTC audio streaming with room-based communication. This enables multi-party scenarios, and SIP configuration can extend it to phone call systems with proper setup.

How to enable noise filtering in Sayna?

Enable the 'noise-filter' feature flag during build or use the default Docker image. Configure audio processing pipelines in WebSocket messages to apply DeepFilterNet noise suppression for cleaner audio input.

What authentication methods does Sayna support for production?

Supports JWT-based external authentication services, API key headers, or query parameters. Setup involves environment variables like AUTH_SERVICE_URL and signing keys, with detailed steps in the Authentication section.

Is Sayna suitable for low-latency real-time applications?

Yes, with WebSocket-based bidirectional streaming and optimized audio buffering, it's designed for low latency. However, performance depends on server resources and optional CPU-intensive features like DeepFilterNet.

Sayna — Real-Time Voice Processing Server

What is Sayna?

Sayna is a high-performance real-time voice processing server built in Rust that provides unified Speech-to-Text (STT) and Text-to-Speech (TTS) services through WebSocket and REST APIs. It serves as a voice layer for AI agents, enabling seamless integration with existing agentic frameworks by abstracting multiple voice providers behind a single interface.

Target Audience

Developers building AI agents, voice-enabled applications, or real-time communication systems who need reliable STT/TTS services with provider flexibility and low-latency processing.

Value Proposition

Sayna offers a unified API that supports multiple voice providers simultaneously, real-time WebSocket streaming, and advanced features like noise filtering and turn detection—all in a self-hostable, high-performance Rust server that simplifies voice integration complexity.

Sayna is a unified Voice Layer for AI Agents with a seemless integration to an existing agentic frameworks

Use Cases

Best For

Adding voice interfaces to AI agent frameworks
Building real-time voice applications with WebSocket streaming
Creating multi-provider STT/TTS systems with fallback options
Developing WebRTC-based voice communication with LiveKit integration
Implementing voice features with advanced noise suppression and VAD
Self-hosting voice processing infrastructure for privacy/control

Not Ideal For

Projects requiring simple client-side voice APIs without server infrastructure
Teams that prefer fully managed cloud voice services with zero self-hosting or configuration
Applications committed to a single voice provider with no need for multi-provider abstraction or fallbacks

Pros & Cons

Pros

Unified Provider Interface

Abstracts multiple STT/TTS providers like Deepgram, ElevenLabs, Google Cloud, and Azure behind a single API, enabling easy switching and fallback strategies as highlighted in the 'Unified Voice API' feature.

Real-Time WebSocket Streaming

Supports bidirectional audio streaming over WebSockets for low-latency processing, crucial for interactive AI agents and real-time applications, with configurable sample rates and encodings.

Advanced Audio Processing

Includes optional features like Silero-VAD for voice activity detection and DeepFilterNet noise suppression, enhancing audio quality and enabling turn detection when the 'stt-vad' feature is enabled.

Flexible Authentication Options

Supports external authentication services, per-request credential overrides, and audio-disabled mode for development, providing security and testing flexibility without provider keys.

Cons

Complex Initial Setup

Requires Docker or Rust compilation, configuration of numerous environment variables, and optional feature flags like 'stt-vad' and 'noise-filter', which can be daunting for quick prototyping.

CPU-Intensive Processing

Advanced features like DeepFilterNet noise suppression are CPU-heavy, as noted in the Performance Considerations, potentially impacting server resource usage and scalability.

Dependency on External Providers

Core STT/TTS functionality still relies on third-party API keys and services, introducing potential cost, rate limits, and vendor lock-in despite the unified interface.

Frequently Asked Questions

What is Sayna?

Target Audience

Developers building AI agents, voice-enabled applications, or real-time communication systems who need reliable STT/TTS services with provider flexibility and low-latency processing.

Value Proposition

Use Cases

Best For

Adding voice interfaces to AI agent frameworks
Building real-time voice applications with WebSocket streaming
Creating multi-provider STT/TTS systems with fallback options
Developing WebRTC-based voice communication with LiveKit integration
Implementing voice features with advanced noise suppression and VAD
Self-hosting voice processing infrastructure for privacy/control

Not Ideal For

Projects requiring simple client-side voice APIs without server infrastructure
Teams that prefer fully managed cloud voice services with zero self-hosting or configuration
Applications committed to a single voice provider with no need for multi-provider abstraction or fallbacks

Pros & Cons

Pros

Unified Provider Interface

Real-Time WebSocket Streaming

Supports bidirectional audio streaming over WebSockets for low-latency processing, crucial for interactive AI agents and real-time applications, with configurable sample rates and encodings.

Advanced Audio Processing

Includes optional features like Silero-VAD for voice activity detection and DeepFilterNet noise suppression, enhancing audio quality and enabling turn detection when the 'stt-vad' feature is enabled.

Flexible Authentication Options

Supports external authentication services, per-request credential overrides, and audio-disabled mode for development, providing security and testing flexibility without provider keys.

Cons

Complex Initial Setup

Requires Docker or Rust compilation, configuration of numerous environment variables, and optional feature flags like 'stt-vad' and 'noise-filter', which can be daunting for quick prototyping.

CPU-Intensive Processing

Advanced features like DeepFilterNet noise suppression are CPU-heavy, as noted in the Performance Considerations, potentially impacting server resource usage and scalability.

Dependency on External Providers

Core STT/TTS functionality still relies on third-party API keys and services, introducing potential cost, rate limits, and vendor lock-in despite the unified interface.

Frequently Asked Questions

Sayna

What is Sayna?

Overview

Use Cases

Best For

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions

Related Projects

Found a gem we're missing?

Sayna

What is Sayna?

Overview

Use Cases

Best For

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions

Related Projects

Found a gem we're missing?