Local speech stack

Local speech-to-text.
Clean output, full control.

The models shown are defaults, not hardcoded choices. Everything is configurable: STT model, LLM clean loop, dictionary rules, and text injection behavior.

Download voqify.zip Compare modes

VOQIFY

Streaming STT

Ctrl+Shift

Parakeet TDT 0.6B v3 by default

Dictionary + optional LLM clean loop

Pipeline

One fast mode, one ultra-clean mode.
Choose based on context.

Trigger your hotkey

Use hold or toggle on your global shortcut (default: Ctrl+Shift), then Voqify captures audio.

Standard mode: direct STT

Audio is chunked in real time, transcribed locally, then typed directly into the active app.

Clean mode: LLM loop

The STT output can then pass through an LLM loop plus dictionary rules to produce polished final text.

Core capabilities

Configurable by design,
stable in real runtime usage.

Default STT model stack

Ships with Nvidia Parakeet TDT 0.6B v3 by default. The STT model is fully configurable and can be replaced anytime.

Local by default

No telemetry. First run downloads defaults, then inference runs from local cache under your control.

Normal STT or clean loop

Use normal mode for low latency, or enable --clean to route the transcript through a local LLM cleanup loop.

Dictionary control

Custom dictionary rules enforce product names, acronyms, and forced replacements before final output.

Direct text injection

Types into the focused window using native SendInput on Windows and pynput on macOS/Linux.

Tray + cursor feedback

Live status icons and cursor states show recording and cleanup phases in real time.

Config-first workflow

Behavior lives in config: defaults are editable and runtime flags let you switch mode without rewriting code.

Fast mode switching

Use standard STT for speed, then switch to clean loop when you need publication-ready text quality.

Quick start

Run locally from voqify.zip.

Distribution target is a root archive named voqify.zip. Extract, run, then switch between normal and clean loop mode. If the archive is not here yet, this is the expected final workflow.

Requires Miniconda/Conda

terminal

# 1) Extract release archive from root

> Expand-Archive .\voqify.zip -DestinationPath .\voqify -Force

# 2) Enter extracted folder

> Set-Location .\voqify

# Launch normal mode (STT direct)

> python voqify.py --live

# Optional: clean loop mode (STT + LLM)

> python voqify.py --live --clean

OK Ready - hold [Ctrl + Shift] to record

Local speech-to-text.
Clean output, full control.

One fast mode, one ultra-clean mode.
Choose based on context.

Trigger your hotkey

Standard mode: direct STT

Clean mode: LLM loop

Configurable by design,
stable in real runtime usage.

Default STT model stack

Local by default

Normal STT or clean loop

Dictionary control

Direct text injection

Tray + cursor feedback

Config-first workflow

Fast mode switching

Experience the push-to-talk loop.

Run locally from voqify.zip.

Ready to deploy local
speech-to-text?

Local speech-to-text.Clean output, full control.

One fast mode, one ultra-clean mode.Choose based on context.

Trigger your hotkey

Standard mode: direct STT

Clean mode: LLM loop

Configurable by design,stable in real runtime usage.

Default STT model stack

Local by default

Normal STT or clean loop

Dictionary control

Direct text injection

Tray + cursor feedback

Config-first workflow

Fast mode switching

Experience the push-to-talk loop.

Run locally from voqify.zip.

Ready to deploy localspeech-to-text?

Local speech-to-text.
Clean output, full control.

One fast mode, one ultra-clean mode.
Choose based on context.

Configurable by design,
stable in real runtime usage.

Ready to deploy local
speech-to-text?