The models shown are defaults, not hardcoded choices. Everything is configurable: STT model, LLM clean loop, dictionary rules, and text injection behavior.
Pipeline
Use hold or toggle on your global shortcut (default: Ctrl+Shift), then Voqify captures audio.
Audio is chunked in real time, transcribed locally, then typed directly into the active app.
The STT output can then pass through an LLM loop plus dictionary rules to produce polished final text.
Core capabilities
Ships with Nvidia Parakeet TDT 0.6B v3 by default. The STT model is fully configurable and can be replaced anytime.
No telemetry. First run downloads defaults, then inference runs from local cache under your control.
Use normal mode for low latency, or enable --clean to route the transcript through a local LLM cleanup loop.
Custom dictionary rules enforce product names, acronyms, and forced replacements before final output.
Types into the focused window using native SendInput on Windows and pynput on macOS/Linux.
Live status icons and cursor states show recording and cleanup phases in real time.
Behavior lives in config: defaults are editable and runtime flags let you switch mode without rewriting code.
Use standard STT for speed, then switch to clean loop when you need publication-ready text quality.
Live feel
Hold the button or press Space to simulate record state changes.
Quick start
Distribution target is a root archive named voqify.zip. Extract, run, then switch between normal and clean loop mode. If the archive is not here yet, this is the expected final workflow.
Miniconda/Conda"Voqify is built for teams that want transparent local speech-to-text. Models are defaults and fully configurable, the dictionary is first-class, and you can switch between normal STT and a clean LLM loop."
Normal mode for speed, clean loop mode for quality, dictionary for precise output.