title: Technical Architecture – Speakr version: 2025-07-20 status: Draft

Speakr – Technical Architecture

1. Purpose
2. High-Level Architecture
3. Crate & Directory Layout
4. Runtime Flow (Happy Path)
5. Concurrency & Safety
6. Security & Permissions
7. Build & Packaging
8. Extensibility Points
9. Risks & Mitigations
10. Future Roadmap

1. Purpose

Speakr is a privacy-first hot-key dictation utility for macOS (with Windows/Linux on the roadmap). When the user presses a global shortcut, it records a short audio segment, runs an on-device Whisper model, and synthesises keystrokes to type the transcript into the currently-focused application – all in under a few seconds.

2. High-Level Architecture

flowchart TB
    subgraph Tauri Shell
        direction TB
        GlobalShortcut["Global Shortcut<br/><i>tauri-plugin-global-shortcut</i>"]
        IPC["IPC Bridge<br/><i>tauri invoke / emit</i>"]
        Tray["System Tray / UI<br/><i>Leptos + WASM</i>"]
    end

    subgraph Core Library
        direction TB
        Recorder["Audio Recorder<br/><i>cpal</i>"]
        STT["Speech-to-Text<br/><i>whisper-rs</i>"]
        Injector["Text Injector<br/><i>enigo</i>"]
    end

    GlobalShortcut -- "hot-key pressed" --> Recorder
    Recorder -- "PCM samples" --> STT
    STT -- "transcript" --> Injector
    Injector -- "keystrokes" --> FocusApp(["Focused Application"])

    %% UI flow
    Recorder -- "status events" --- IPC
    STT ---- IPC
    Injector --- IPC
    IPC ==> Tray

Key points:

All heavy-weight logic lives in pure Rust (speakr-core). The UI may be hidden without affecting functionality.
No network access – Whisper runs entirely on-device.
Plugin isolation – Optional features (auto-start, clipboard, etc.) are added via Tauri plugins with explicit capability JSON.

3. Crate & Directory Layout

Layer	Crate / Path	Main Responsibilities
Core	`speakr-core/`	Record audio (cpal) ➜ transcribe (whisper-rs) ➜ inject text (enigo)
Backend	`speakr-tauri/`	Registers global hot-key, exposes `#[tauri::command]` wrappers, persists settings
Frontend	`speakr-ui/` (optional)	Leptos WASM UI for tray, preferences, status overlay
Assets	`models/`	GGUF Whisper models downloaded post-install

All crates live in a single Cargo workspace to guarantee compatible dependency versions.

3.1 Speakr-Tauri Internal Structure

The speakr-tauri backend is organised into focused modules for maintainability and testability:

speakr-tauri/src/
├── commands/           # Tauri command implementations
│   ├── mod.rs         # Command organisation and documentation
│   ├── validation.rs  # Input validation (hotkey format, etc.)
│   ├── system.rs      # System integration (model availability, auto-launch)
│   └── legacy.rs      # Backward compatibility commands
├── services/          # Background services and state management
│   ├── mod.rs         # Service coordination
│   ├── hotkey.rs      # Global hotkey registration and management
│   ├── status.rs      # Backend service status tracking
│   └── types.rs       # Shared service types and enums
├── settings/          # Configuration persistence and validation
│   ├── mod.rs         # Settings management
│   ├── persistence.rs # File I/O for settings
│   ├── migration.rs   # Settings schema migration
│   └── validation.rs  # Settings validation logic
├── debug/             # Debug-only functionality
│   ├── mod.rs         # Debug command coordination
│   ├── commands.rs    # Debug-specific Tauri commands
│   ├── storage.rs     # Debug log storage
│   └── types.rs       # Debug-specific types
├── audio/             # Audio handling utilities
│   ├── mod.rs         # Audio module coordination
│   ├── files.rs       # Audio file operations
│   └── recording.rs   # Audio recording helpers
└── lib.rs             # Tauri app setup, command registration

Key architectural principles:

Separation of concerns: Business logic in *_internal() functions, Tauri integration in lib.rs
Testability: Internal functions can be tested without Tauri runtime overhead
Modularity: Commands grouped by functional domain rather than technical implementation
Documentation: Each module has comprehensive rustdoc explaining its purpose and usage

4. Runtime Flow (Happy Path)

Step	Thread/Task	Action	Typical Latency
1	Main (OS)	User presses ⌘⌥Space	–
2	Tauri shortcut handler	Spawns async task `transcribe()`	< 1 ms
3	Tokio worker	`cpal::Stream` captures 16-kHz mono PCM into ring-buffer	0–10 s (configurable)
4	Same task	PCM fed into `whisper_rs::full()`	~1 s per 10 s audio on M-series
5	Same task	Transcript returned → `enigo.text()` synthesises keystrokes	≤ 300 ms
6	UI task	Frontend receives status events via `emit()` and updates overlay	realtime

Failure cases (no mic, model missing, permission denied) surface via error events and native notifications.

5. Concurrency & Safety

Tokio multi-thread runtime drives asynchronous recording and Whisper inference.
The AppState(Mutex<Option<Speakr>>) guards the singleton Whisper context; loading occurs once at app start.
Hot-key handler offloads work to the runtime to keep the UI thread non-blocking.
Audio buffer uses a bounded sync_channel to avoid unbounded RAM growth.

6. Security & Permissions

Platform	Permission	Why	Request Mechanism
macOS	Microphone access	Record audio	`NSMicrophoneUsageDescription` (Info.plist)
macOS	Accessibility	Send synthetic keystrokes	User enables app in System Settings ▸ Accessibility
All	Global shortcut	Register hot-key	`global-shortcut:allow-register` capability

The app runs offline; no data leaves the device.

7. Build & Packaging

Dev: trunk serve & (frontend) + cargo tauri dev (backend)
Release: trunk build --release ➜ cargo tauri build
macOS notarisation: xcrun notarytool submit --wait after codesign.
Universal binary size ≈ 15 MB (+ model).

8. Extensibility Points

Voice Activity Detection: plug-in webrtc-vad before Whisper to auto-stop on silence.
Streaming transcripts: call whisper_rs::full_partial() and enqueue keystrokes incrementally.
Multi-language: set params.set_language(None) for auto-detect.
Cross-platform: replace enigo backend with send_input (Win) or xdo (X11) while keeping public API.

9. Risks & Mitigations

Risk	Mitigation
Keystroke injection blocked in secure fields	Fallback to clipboard-paste mode with warning
Whisper latency on older CPUs	Offer `tiny.en.gguf` and shorter max record time
Shortcut clashes	UI lets user redefine hot-key and validates uniqueness
Model file missing/corrupt	Verify checksum on load and show error dialogue

10. Future Roadmap

Settings sync via tauri-plugin-store (JSON in AppData).
Auto-start on login (tauri-plugin-autostart).
GPU inference when Whisper Metal backend stabilises.
Installer bundles (DMG/MSI/DEB) with model downloader.

This document replaces the previous placeholder docs/ARCHITECTURE.md and should be kept up-to-date with all architectural changes.

Keyboard shortcuts

Speakr Documentation

title: Technical Architecture – Speakr version: 2025-07-20 status: Draft