Transcription Models

Local and cloud transcription models — Whisper, Parakeet, and cloud providers.

VivaDicta supports a wide range of transcription engines — both local models that run entirely on your iPhone or iPad and cloud providers that offer speed and accuracy.

Local Models

Local models process audio directly on your device. No data leaves your iPhone, and they work without an internet connection.

Whisper — OpenAI's Whisper model optimized for Apple hardware. Multiple model sizes available (we recommend Large Turbo). Best balance of accuracy and speed on modern iPhones.
Parakeet — NVIDIA's speech recognition model running via FluidAudio. Fast and accurate, optimized for Apple Silicon.

Cloud Providers

Cloud providers process audio on remote servers. They tend to be faster (especially on older devices) and often more accurate, but require an internet connection and an API key.

Groq — free forever, ultra-fast Whisper on custom LPU hardware. Our #1 recommendation.
Deepgram — Nova-3 model with excellent accuracy. $200 free credits for new accounts.
ElevenLabs — Scribe v2 with support for 99 languages. Free tier available.
Gemini — Google's multimodal model with speech transcription capabilities.
Mistral — European AI provider with transcription support.
Soniox — high-accuracy cloud transcription.
OpenAI-compatible — connect any provider that supports the OpenAI Whisper API format.

Local vs Cloud

	Local	Cloud
Cost	Free	Free tiers or pay-per-use
Privacy	Full — nothing leaves your device	Audio sent to provider servers
Internet	Not required	Required
Speed	Depends on device hardware	Consistently fast
Accuracy	Good to excellent	Excellent
Setup	Model download required	API key required

Speaker Diarization

Some models can identify and label different speakers in a multi-voice recording, producing a speaker-separated transcript. Turn this on in Settings > Transcription > Speaker Labels - the setting only takes effect when your active model supports it.

Model	Speaker Labels
Whisper (local)	Yes
Deepgram	Yes
Mistral	Yes
Other models	Not supported

See Speaker Labels for how to enable the feature and tips for cleaner results.

Language Support

Language support varies by model. Most models support 50-100+ languages. Whisper (local) and Groq (cloud, also using Whisper) support 100+ languages. ElevenLabs Scribe v2 supports 99 languages.

See Recommended Models for our top picks based on your use case.