Question 1

How to set up FluidAudio for real-time transcription on iOS?

Accepted Answer

Use the SlidingWindowAsrManager with Parakeet EOU models for streaming ASR; initialize the manager, load models, and feed audio chunks in a task, ensuring samples are resampled to 16 kHz mono as per the audio conversion guide.

Question 2

FluidAudio vs Whisper for local speech recognition

Accepted Answer

FluidAudio is optimized for Apple Neural Engine, offering lower latency and better power efficiency on macOS/iOS, while Whisper may have broader model support but can be slower without hardware acceleration. FluidAudio also includes additional features like speaker diarization.

Question 3

Does FluidAudio work offline without internet?

Accepted Answer

Yes, all models run fully on-device after initial download, with no internet required for inference. However, first-time setup requires downloading models from HuggingFace or a custom registry.

Question 4

How to clone a voice with FluidAudio TTS?

Accepted Answer

Use the PocketTTS backend with the --clone-voice flag in CLI or provide a 1-30 second audio sample programmatically; it generates a personalized voice model locally, but cloning is currently limited to English.

Question 5

What are the system requirements for FluidAudio?

Accepted Answer

Requires macOS or iOS with Swift 6.0+, and optimal performance needs Apple Silicon or devices with Neural Engine (e.g., M-series chips). Older Intel Macs may have reduced efficiency.

Question 6

How to handle model downloads behind a corporate firewall?

Accepted Answer

Configure proxy settings via the https_proxy environment variable or use a custom registry URL override, as detailed in the configuration section, to route requests through approved networks.

Question 7

FluidAudio speaker diarization accuracy benchmarks

Accepted Answer

Benchmarks are provided in the documentation, with offline Pyannote pipeline achieving low DER on datasets like AMI, but real-time pipelines like LS-EEND trade some accuracy for lower latency; results vary based on audio quality and thresholds.

FluidAudio

What is FluidAudio?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions