Transcription Models

Local and cloud transcription models — Whisper, Parakeet, and cloud providers.

VivaDicta supports a wide range of transcription engines — both local models that run entirely on your iPhone or iPad and cloud providers that offer speed and accuracy.

Local Models

Local models process audio directly on your device. No data leaves your iPhone, and they work without an internet connection.

  • Whisper — OpenAI's Whisper model optimized for Apple hardware. Multiple model sizes available (we recommend Large Turbo). Best balance of accuracy and speed on modern iPhones.
  • Parakeet — NVIDIA's speech recognition model running via FluidAudio. Fast and accurate, optimized for Apple Silicon.

Cloud Providers

Cloud providers process audio on remote servers. They tend to be faster (especially on older devices) and often more accurate, but require an internet connection and an API key.

  • Groq — free forever, ultra-fast Whisper on custom LPU hardware. Our #1 recommendation.
  • Deepgram — Nova-3 model with excellent accuracy. $200 free credits for new accounts.
  • ElevenLabs — Scribe v2 with support for 99 languages. Free tier available.
  • Gemini — Google's multimodal model with speech transcription capabilities.
  • Mistral — European AI provider with transcription support.
  • Soniox — high-accuracy cloud transcription.
  • OpenAI-compatible — connect any provider that supports the OpenAI Whisper API format.

Local vs Cloud

LocalCloud
CostFreeFree tiers or pay-per-use
PrivacyFull — nothing leaves your deviceAudio sent to provider servers
InternetNot requiredRequired
SpeedDepends on device hardwareConsistently fast
AccuracyGood to excellentExcellent
SetupModel download requiredAPI key required

Language Support

Language support varies by model. Most models support 50-100+ languages. Whisper (local) and Groq (cloud, also using Whisper) support 100+ languages. ElevenLabs Scribe v2 supports 99 languages.

See Recommended Models for our top picks based on your use case.