How It Works
From speech to polished text — recording, transcription, AI processing, and output.
When you record with VivaDicta, your speech goes through a multi-stage pipeline that turns raw audio into clean, polished text.
The Pipeline
Recording
Audio is captured via your device's microphone.
Transcription
Speech is converted to text using your chosen model — local (Whisper, Parakeet) or cloud provider.
Word Replacements
Your custom find-and-replace rules are applied — fixing names, expanding abbreviations, standardizing terms.
Custom Vocabulary
Domain-specific terms and proper nouns you've defined are included as context for more accurate results.
AI Processing
If enabled, the text is sent to your chosen AI provider with your selected preset for polishing, summarizing, translating, or reformatting.
Output Cleanup
AI artifacts (thinking tags, wrapper XML) are stripped for a clean final result.
Paragraph Formatting
Continuous text is broken into readable paragraphs using language-aware sentence detection.
Smart Insert
When using the keyboard, Smart Insert automatically adds spacing and adjusts capitalization so text flows naturally into what you've already typed.
Recording
VivaDicta captures high-quality audio from your device's microphone. Recordings are stored locally and can be played back from your transcription history.
Transcription
Your audio is routed to whichever transcription engine you've selected:
- Local: Whisper (optimized for Apple hardware), Parakeet, — fully private, no internet needed.
- Cloud: Groq, Deepgram, ElevenLabs, Gemini, Mistral, and more — fast and accurate.
See Transcription Models for a full comparison.
Text Processing
After transcription, the raw text goes through several processing stages:
- Word Replacements — custom find-and-replace rules you define in settings. Useful for correcting names, expanding abbreviations, and fixing persistent transcription errors.
- Custom Vocabulary — domain-specific terms included as context to improve AI processing accuracy.
AI Processing
If AI processing is enabled, the cleaned transcription is sent to your chosen AI provider with your selected preset. The AI can:
- Fix grammar and punctuation (Regular preset).
- Reformat for a specific tone (Professional, Casual, Chat).
- Summarize or extract action points.
- Translate to another language.
- Format as code or technical content.
- Follow any custom instructions you've written.
See Transcription Models and Recommended Models for provider options.
Smart Insert (Keyboard)
When inserting text via the VivaDicta custom keyboard, Smart Insert automatically formats text to flow naturally into what you've already typed:
- Smart Spacing — adds a space before the inserted text if it follows a letter, number, or punctuation. Adds a trailing space unless the next character is already whitespace or punctuation.
- Smart Capitalization — capitalizes the first letter when inserting at the start of a sentence (after a period, exclamation mark, question mark, or newline). Lowercases it when continuing a sentence mid-flow.
Smart Insert is a per-mode setting — enable or disable it for each Viva Mode in your mode settings.
Output
The finished text is displayed in the app, saved to your transcription history, and can be copied to your clipboard. When using the VivaDicta keyboard, text is inserted directly into the active text field.
Both the raw transcription and AI-processed version are saved as separate variations, so you can always go back to the original.