Local speech stack

Local speech-to-text.
Clean output, full control.

The models shown are defaults, not hardcoded choices. Everything is configurable: STT model, LLM clean loop, dictionary rules, and text injection behavior.

VOQIFY
Streaming STT
Ctrl+Shift
Parakeet TDT 0.6B v3 by default
Dictionary + optional LLM clean loop
Windows
macOS
Linux
PYTHON 3.11+
MIT LICENSE

Pipeline

One fast mode, one ultra-clean mode.
Choose based on context.

1

Trigger your hotkey

Use hold or toggle on your global shortcut (default: Ctrl+Shift), then Voqify captures audio.

2

Standard mode: direct STT

Audio is chunked in real time, transcribed locally, then typed directly into the active app.

3

Clean mode: LLM loop

The STT output can then pass through an LLM loop plus dictionary rules to produce polished final text.

Core capabilities

Configurable by design,
stable in real runtime usage.

Default STT model stack

Ships with Nvidia Parakeet TDT 0.6B v3 by default. The STT model is fully configurable and can be replaced anytime.

Local by default

No telemetry. First run downloads defaults, then inference runs from local cache under your control.

Normal STT or clean loop

Use normal mode for low latency, or enable --clean to route the transcript through a local LLM cleanup loop.

Dictionary control

Custom dictionary rules enforce product names, acronyms, and forced replacements before final output.

Direct text injection

Types into the focused window using native SendInput on Windows and pynput on macOS/Linux.

Tray + cursor feedback

Live status icons and cursor states show recording and cleanup phases in real time.

Config-first workflow

Behavior lives in config: defaults are editable and runtime flags let you switch mode without rewriting code.

Fast mode switching

Use standard STT for speed, then switch to clean loop when you need publication-ready text quality.

Live feel

Experience the push-to-talk loop.

Hold the button or press Space to simulate record state changes.

Hold Space or click and hold the mic
Idle

Quick start

Run locally from voqify.zip.

Distribution target is a root archive named voqify.zip. Extract, run, then switch between normal and clean loop mode. If the archive is not here yet, this is the expected final workflow.

Requires Miniconda/Conda
terminal
# 1) Extract release archive from root
> Expand-Archive .\voqify.zip -DestinationPath .\voqify -Force
# 2) Enter extracted folder
> Set-Location .\voqify
# Launch normal mode (STT direct)
> python voqify.py --live
# Optional: clean loop mode (STT + LLM)
> python voqify.py --live --clean
OK Ready - hold [Ctrl + Shift] to record
"Voqify is built for teams that want transparent local speech-to-text. Models are defaults and fully configurable, the dictionary is first-class, and you can switch between normal STT and a clean LLM loop."
Why this stack exists

Ready to deploy local
speech-to-text?

Normal mode for speed, clean loop mode for quality, dictionary for precise output.