How It Works

From speech to polished text — recording, transcription, AI processing, and output.

When you record with VivaDicta, your speech goes through a multi-stage pipeline that turns raw audio into clean, polished text.

The Pipeline

1

Recording

Audio is captured via your device's microphone.

2

Transcription

Speech is converted to text using your chosen model — local (Whisper, Parakeet) or cloud provider.

3

Word Replacements

Your custom find-and-replace rules are applied — fixing names, expanding abbreviations, standardizing terms.

4

Custom Vocabulary

Domain-specific terms and proper nouns you've defined are included as context for more accurate results.

5

AI Processing

If enabled, the text is sent to your chosen AI provider with your selected preset for polishing, summarizing, translating, or reformatting.

6

Output Cleanup

AI artifacts (thinking tags, wrapper XML) are stripped for a clean final result.

7

Paragraph Formatting

Continuous text is broken into readable paragraphs using language-aware sentence detection.

8

Smart Insert

When using the keyboard, Smart Insert automatically adds spacing and adjusts capitalization so text flows naturally into what you've already typed.

Recording

VivaDicta captures high-quality audio from your device's microphone. Recordings are stored locally and can be played back from your transcription history.

Transcription

Your audio is routed to whichever transcription engine you've selected:

  • Local: Whisper (optimized for Apple hardware), Parakeet, — fully private, no internet needed.
  • Cloud: Groq, Deepgram, ElevenLabs, Gemini, Mistral, and more — fast and accurate.

See Transcription Models for a full comparison.

Text Processing

After transcription, the raw text goes through several processing stages:

  • Word Replacements — custom find-and-replace rules you define in settings. Useful for correcting names, expanding abbreviations, and fixing persistent transcription errors.
  • Custom Vocabulary — domain-specific terms included as context to improve AI processing accuracy.

AI Processing

If AI processing is enabled, the cleaned transcription is sent to your chosen AI provider with your selected preset. The AI can:

  • Fix grammar and punctuation (Regular preset).
  • Reformat for a specific tone (Professional, Casual, Chat).
  • Summarize or extract action points.
  • Translate to another language.
  • Format as code or technical content.
  • Follow any custom instructions you've written.

See Transcription Models and Recommended Models for provider options.

Smart Insert (Keyboard)

When inserting text via the VivaDicta custom keyboard, Smart Insert automatically formats text to flow naturally into what you've already typed:

  • Smart Spacing — adds a space before the inserted text if it follows a letter, number, or punctuation. Adds a trailing space unless the next character is already whitespace or punctuation.
  • Smart Capitalization — capitalizes the first letter when inserting at the start of a sentence (after a period, exclamation mark, question mark, or newline). Lowercases it when continuing a sentence mid-flow.

Smart Insert is a per-mode setting — enable or disable it for each Viva Mode in your mode settings.

Output

The finished text is displayed in the app, saved to your transcription history, and can be copied to your clipboard. When using the VivaDicta keyboard, text is inserted directly into the active text field.

Both the raw transcription and AI-processed version are saved as separate variations, so you can always go back to the original.