Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help


title: Technical Architecture – Speakr version: 2025-07-20 status: Draft

Speakr – Technical Architecture

1. Purpose

Speakr is a privacy-first hot-key dictation utility for macOS (with Windows/Linux on the roadmap). When the user presses a global shortcut, it records a short audio segment, runs an on-device Whisper model, and synthesises keystrokes to type the transcript into the currently-focused application – all in under a few seconds.


2. High-Level Architecture

flowchart TB
    subgraph Tauri Shell
        direction TB
        GlobalShortcut["Global Shortcut<br/><i>tauri-plugin-global-shortcut</i>"]
        IPC["IPC Bridge<br/><i>tauri invoke / emit</i>"]
        Tray["System Tray / UI<br/><i>Leptos + WASM</i>"]
    end

    subgraph Core Library
        direction TB
        Recorder["Audio Recorder<br/><i>cpal</i>"]
        STT["Speech-to-Text<br/><i>whisper-rs</i>"]
        Injector["Text Injector<br/><i>enigo</i>"]
    end

    GlobalShortcut -- "hot-key pressed" --> Recorder
    Recorder -- "PCM samples" --> STT
    STT -- "transcript" --> Injector
    Injector -- "keystrokes" --> FocusApp(["Focused Application"])

    %% UI flow
    Recorder -- "status events" --- IPC
    STT ---- IPC
    Injector --- IPC
    IPC ==> Tray

Key points:

  1. All heavy-weight logic lives in pure Rust (speakr-core). The UI may be hidden without affecting functionality.
  2. No network access – Whisper runs entirely on-device.
  3. Plugin isolation – Optional features (auto-start, clipboard, etc.) are added via Tauri plugins with explicit capability JSON.

3. Crate & Directory Layout

LayerCrate / PathMain Responsibilities
Corespeakr-core/Record audio (cpal) ➜ transcribe (whisper-rs) ➜ inject text (enigo)
Backendspeakr-tauri/Registers global hot-key, exposes #[tauri::command] wrappers, persists settings
Frontendspeakr-ui/ (optional)Leptos WASM UI for tray, preferences, status overlay
Assetsmodels/GGUF Whisper models downloaded post-install

All crates live in a single Cargo workspace to guarantee compatible dependency versions.

3.1 Speakr-Tauri Internal Structure

The speakr-tauri backend is organised into focused modules for maintainability and testability:

speakr-tauri/src/
├── commands/           # Tauri command implementations
│   ├── mod.rs         # Command organisation and documentation
│   ├── validation.rs  # Input validation (hotkey format, etc.)
│   ├── system.rs      # System integration (model availability, auto-launch)
│   └── legacy.rs      # Backward compatibility commands
├── services/          # Background services and state management
│   ├── mod.rs         # Service coordination
│   ├── hotkey.rs      # Global hotkey registration and management
│   ├── status.rs      # Backend service status tracking
│   └── types.rs       # Shared service types and enums
├── settings/          # Configuration persistence and validation
│   ├── mod.rs         # Settings management
│   ├── persistence.rs # File I/O for settings
│   ├── migration.rs   # Settings schema migration
│   └── validation.rs  # Settings validation logic
├── debug/             # Debug-only functionality
│   ├── mod.rs         # Debug command coordination
│   ├── commands.rs    # Debug-specific Tauri commands
│   ├── storage.rs     # Debug log storage
│   └── types.rs       # Debug-specific types
├── audio/             # Audio handling utilities
│   ├── mod.rs         # Audio module coordination
│   ├── files.rs       # Audio file operations
│   └── recording.rs   # Audio recording helpers
└── lib.rs             # Tauri app setup, command registration

Key architectural principles:

  • Separation of concerns: Business logic in *_internal() functions, Tauri integration in lib.rs
  • Testability: Internal functions can be tested without Tauri runtime overhead
  • Modularity: Commands grouped by functional domain rather than technical implementation
  • Documentation: Each module has comprehensive rustdoc explaining its purpose and usage

4. Runtime Flow (Happy Path)

StepThread/TaskActionTypical Latency
1Main (OS)User presses ⌘⌥Space
2Tauri shortcut handlerSpawns async task transcribe()< 1 ms
3Tokio workercpal::Stream captures 16-kHz mono PCM into ring-buffer0–10 s (configurable)
4Same taskPCM fed into whisper_rs::full()~1 s per 10 s audio on M-series
5Same taskTranscript returned → enigo.text() synthesises keystrokes≤ 300 ms
6UI taskFrontend receives status events via emit() and updates overlayrealtime

Failure cases (no mic, model missing, permission denied) surface via error events and native notifications.


5. Concurrency & Safety

  • Tokio multi-thread runtime drives asynchronous recording and Whisper inference.
  • The AppState(Mutex<Option<Speakr>>) guards the singleton Whisper context; loading occurs once at app start.
  • Hot-key handler offloads work to the runtime to keep the UI thread non-blocking.
  • Audio buffer uses a bounded sync_channel to avoid unbounded RAM growth.

6. Security & Permissions

PlatformPermissionWhyRequest Mechanism
macOSMicrophone accessRecord audioNSMicrophoneUsageDescription (Info.plist)
macOSAccessibilitySend synthetic keystrokesUser enables app in System Settings ▸ Accessibility
AllGlobal shortcutRegister hot-keyglobal-shortcut:allow-register capability

The app runs offline; no data leaves the device.


7. Build & Packaging

  1. Dev: trunk serve & (frontend) + cargo tauri dev (backend)
  2. Release: trunk build --releasecargo tauri build
  3. macOS notarisation: xcrun notarytool submit --wait after codesign.
  4. Universal binary size ≈ 15 MB (+ model).

8. Extensibility Points

  • Voice Activity Detection: plug-in webrtc-vad before Whisper to auto-stop on silence.
  • Streaming transcripts: call whisper_rs::full_partial() and enqueue keystrokes incrementally.
  • Multi-language: set params.set_language(None) for auto-detect.
  • Cross-platform: replace enigo backend with send_input (Win) or xdo (X11) while keeping public API.

9. Risks & Mitigations

RiskMitigation
Keystroke injection blocked in secure fieldsFallback to clipboard-paste mode with warning
Whisper latency on older CPUsOffer tiny.en.gguf and shorter max record time
Shortcut clashesUI lets user redefine hot-key and validates uniqueness
Model file missing/corruptVerify checksum on load and show error dialogue

10. Future Roadmap

  1. Settings sync via tauri-plugin-store (JSON in AppData).
  2. Auto-start on login (tauri-plugin-autostart).
  3. GPU inference when Whisper Metal backend stabilises.
  4. Installer bundles (DMG/MSI/DEB) with model downloader.

This document replaces the previous placeholder docs/ARCHITECTURE.md and should be kept up-to-date with all architectural changes.