Keyboard shortcuts

Press ← or β†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

πŸŽ™οΈ Speakr Documentation

note

Speakr is a privacy-first, hot-key–driven dictation utility that turns your speech into typed text entirely on-device. No cloud, no latency, no compromises.


✨ What is Speakr?

Speakr transforms the way you capture thoughts into text. With a single keystroke, record speech, transcribe it locally using Whisper models, and have the text instantly typed into any application. Perfect for developers, writers, and anyone who thinks faster than they type.

πŸ” Privacy First

  • 100% offline processing – your voice never leaves your device
  • No cloud dependencies – works in air-gapped environments
  • Minimal permissions – only microphone and accessibility access

⚑ Built for Speed

  • ≀ 3 second end-to-end latency for 5-second recordings
  • Global hotkeys work across all applications
  • Lightweight universal macOS binary < 20 MB

🧭 Navigate the Documentation

tip

Use the search box (⌘/Ctrl + K) to quickly jump to any topic, or browse by your role below.

πŸ“‹ Product & Planning

DocumentDescriptionAudience
Product RequirementsVision, goals, and feature specificationsProduct owners, stakeholders
Implementation PlanDevelopment roadmap and milestonesProject managers, engineers

πŸ—οΈ Architecture & Engineering

DocumentDescriptionAudience
Technical ArchitectureSystem design and component overviewEngineers, architects
System DescriptionDetailed system behaviour and flowsDevelopers, maintainers
Development OverviewGetting started with developmentNew contributors

πŸ“ Functional Specifications

DocumentDescriptionStatus
FR-1: Global HotkeyHot-key registration and handlingβœ… Implemented
FR-2: Audio CaptureMicrophone access and recordingβœ… Implemented
FR-3: TranscriptionLocal Whisper integrationπŸ”„ In Progress
FR-4: Text InjectionCross-app text insertionπŸ”„ In Progress
FR-5: Injection FallbackClipboard fallback mechanismπŸ“‹ Planned
FR-6: Settings UIConfiguration interfaceβœ… Implemented

warning

See Specs Overview for the complete functional requirements including non-functional requirements (NFRs) for security, performance, and accessibility.

πŸ”§ Development & Debugging

DocumentDescriptionAudience
Debug PanelDevelopment and troubleshooting toolsDevelopers, QA
Pre-commit HooksCode quality and testing setupContributors
Tauri PluginsPlugin architecture and integrationsBackend developers

πŸš€ Quick Start

note

New to the project? Start with the Development Overview for setup instructions.

For Product People

  1. Read the Product Requirements to understand the vision
  2. Check the Implementation Plan for current progress
  3. Review Functional Specs for detailed features

For Engineers

  1. Study the Technical Architecture for system design
  2. Follow Development Setup to get coding
  3. Reference System Description for implementation details

For Contributors

  1. Set up pre-commit hooks for code quality
  2. Browse functional requirements to find tasks
  3. Use the Debug Panel for development workflow

πŸ“Š Project Status

tip

Current Focus: Core transcription engine and text injection reliability

ComponentStatusNotes
Global Hotkeysβœ… CompleteCross-app hotkey registration working
Audio Captureβœ… CompleteHigh-quality microphone input
Settings UIβœ… CompleteLeptos-based configuration interface
TranscriptionπŸ”„ ActiveWhisper integration in progress
Text InjectionπŸ”„ ActiveCross-app compatibility improvements
Model ManagementπŸ“‹ PlannedGGUF model download and validation

🀝 Contributing

note

This documentation is a living document. Found something unclear or outdated?

  • πŸ“‚ Browse specs in the specs directory for implementation tasks
  • πŸ› Report issues via GitHub Issues
  • πŸ“ Improve docs by opening a pull request
  • πŸ’‘ Suggest features in GitHub Discussions

Built with πŸ¦€ Rust, ⚑ Tauri 2, and 🎨 Leptos

Privacy-first dictation for the modern developer


title: Product Requirements Document – Speakr version: 2025-07-20 status: Draft authors: David Jessup

Product Requirements Document – Speakr

1. Purpose / Vision

Speakr is a privacy-first dictation hot-key utility for macOS (Windows/Linux later). In a single keystroke, users can record speech, transcribe entirely on-device, and have the text typed directly into any active input field. Speakr aims to be the fastest way for developers, writers, and power-users to turn fleeting thoughts into code or prose without breaking flow, and without sending audio to the cloud.

2. Problem Statement

  1. Switching to dedicated dictation apps breaks focus and incurs network latency.
  2. Many corporate or offline environments forbid cloud speech services for privacy reasons.
  3. OS-level dictation is unreliable for code, lacks custom hot-keys, and has high latency on older hardware.

Opportunity: A lightweight, keyboard-driven tool that works anywhere text can be typed, requires no network, and respects user privacy.

3. Goals & Non-Goals

3.1 Goals

  1. <= 3 s end-to-end latency for 5-second recordings on Apple Silicon (M-series).
  2. 100% offline – no external network calls.
  3. Global hot-key works in background apps.
  4. Support customisable models & hot-keys via UI.
  5. Ship notarised universal macOS binary < 20 MB (excluding model).
  6. Provide a clean upgrade path to Windows & Linux.

3.2 Non-Goals

  • Real-time streaming (v1 may paste only after stop).
  • Mobile platforms.
  • Full grammar / punctuation correction.
  • Server-side sync or accounts.

4. Personas

PersonaNeeds / Pain-points
Dev DanaInsert comments/code quickly without losing keyboard context.
Writer WillDraft snippets into any text editor without toggling apps.
Privacy PeterDictate confidential material offline, no data leaves device.
Accessibility AvaReplace or augment typing due to RSI, keep workflow keyboard-first.

5. User Stories

MoSCoW method: Must, Should, Could, Won’t (for now)

PriorityDescription
Mustβ€œAs a user, I press <Opt> + ~ and my spoken words (≀30 s) are typed into the active field within ~3 s.”
Mustβ€œAs a user, the app asks for mic + Accessibility permissions on first run and explains why.”
Mustβ€œAs a user, I can change the hot-key in settings and be warned of conflicts.”
Shouldβ€œAs a user, I can pick a smaller/faster model if my machine is slow.”
Shouldβ€œAs a user, a subtle overlay shows β€˜Recording… / Transcribing…’ states.”
Couldβ€œAs an advanced user, I can turn on auto-punctuation.”
Couldβ€œAs an advanced user, I can add bespoke words to the dictionary.”
Won’t (v1)Live transcript shown word-by-word while speaking.

6. Functional Requirements

FRDescription
FR-1Global hot-key registers at app start and triggers record/transcribe/inject flow.
FR-2Audio capture uses 16 kHz mono via cpal, max configurable duration (default 10 s).
FR-3Transcription runs through Whisper (GGUF) via whisper-rs; language default EN.
FR-4Transcript is injected via synthetic keystrokes (enigo) into current focus.
FR-5If injection fails (secure field), fallback to clipboard-paste with user warning.
FR-6UI (tray or window) exposes: hot-key picker, model selector, auto-launch toggle.
FR-7App emits status events for UI overlay and logs (Recording, Transcribing, Error).
FR-8Settings persist locally (JSON in AppData, no cloud).
FR-9App auto-updates via GitHub Releases (optional in v1).

7. Non-Functional Requirements

CategoryRequirementMetric / Acceptance
LatencyEnd-to-end ≀ 3 s (M1, 5 s audio, small model)95th percentile measured in telemetry log (local).
FootprintBinary ≀ 20 MB; RAM ≀ 400 MB including model.du -sh and Activity Monitor/smoke tests.
ReliabilityNo crashes in 1-hour monkey test (500 invocations).CI integration test + manual QA.
SecurityNo outbound network sockets except auto-update domain (opt-out).Static analysis + firewall test.
CompatibilitymacOS 13+. Intel macs may see doubled latency but functional.QA on Intel MBP (2020) & M1.
AccessibilityFollows macOS VoiceOver / high-contrast guidelines.Apple Accessibility Inspector score β‰₯ 85.

8. Metrics / KPIs

MetricTarget
Time-to-text (P95)≀ 3 s.
Activation success rateβ‰₯ 99% (hot-key triggers & types).
Crash-free sessions> 99.5%.
Daily active users (DAU)post-launch target: 1 k.
% of transcripts requiring manual fix< 15% (optional feedback prompt).

9. Milestones

MilestoneScope
M0 – Prototype spikeHot-key β†’ record β†’ transcribe β†’ paste (CLI)
M1 – MVP macOS appTauri shell, settings window, notarised DMG
M2 – Public betaAuto-update, error logs, model manager
M3 – Windows/Linux alphaReplace injection backend, install bundles
M4 – v1.0 GAStreaming (optional), website + docs

10. Open Questions

  1. Should we bundle a small GGUF model or trigger a first-run download wizard?
  2. How to handle non-Latin languages (auto-detect vs user-select)?
  3. Do we sandbox the app on macOS or rely on hardened runtime?
  4. Which licence (MIT vs GPL) given we embed Whisper weights?
  5. Accept user telemetry opt-in for latency metrics?

11. Appendix – Stakeholders & Review

  • Product Lead – @PM
  • Engineering Lead – @TechLead
  • Design – @UX
  • Security – @Sec
  • QA – @QA

Reviews: Architecture (Tech), Security (Sec), Accessibility (UX).

System Description

Speakr – a Local Dictation Utility (Rust + Tauri + Leptos)

A tiny, privacy-first macOS desktop app that listens for a global hot-key, records a short audio clip, transcribes it locally with Whisper, then types the text into whatever currently has focus.

Everything runs on-device; no network calls (besides the initial model download).


1. System Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚        Speakr (UI)           β”‚  ← Leptos + Tauri WebView (optional window / tray)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚ <invoke/emit>
        Global Shortcut   β–²    Settings (model path, hot-key, …)
                β–Ό         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            speakr-core  (Rust lib)                 β”‚
β”‚                                                    β”‚
β”‚ 1. Audio capture  – **cpal**                       β”‚
β”‚ 2. Transcription  – **whisper-rs** (GGUF models)   β”‚
β”‚ 3. Text inject    – **enigo** (synthetic keys)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Global shortcut, audio, and keystroke injection all live in the backend so Speakr continues to work when the UI window is hidden.


2. Key Crates & Decisions

ConcernCrate / ToolWhy it was chosen
Hot-keytauri-plugin-global-shortcut = "2"Official plugin, cross-platform, Tauri β‰₯ 2.0
Audio capturecpal = "0.15"Mature, async-friendly, works on macOS/Win/Linux
Speech-to-Textwhisper-rs = "0.8"Safe Rust bindings to whisper.cpp; supports GGUF models
Keystroke injectionenigo = "0.1"Simple cross-platform input simulation
UIleptos = "0.6" + trunkAll-Rust reactive UI compiled to WASM
Async runtimetokio = "1" (multi-thread)Needed for non-blocking recording & transcription

Tip Quantised small.en.gguf (~30 MB) loads in β‰ˆ 2 s on Apple Silicon and is usually accurate enough for notes & code comments.


3. Workspace Layout

/speakr
β”œβ”€ speakr-core        # library crate (audio β†’ text β†’ inject)
β”œβ”€ speakr-tauri       # Tauri shell (`src-tauri` here)
β”œβ”€ speakr-ui          # Leptos front-end (optional window)
└─ models/ggml-small.en.gguf  # user-downloaded Whisper model

Use a Cargo workspace so all three crates share versions and CI.


4. Bootstrapping

4.1 Prerequisites

  • Rust 1.88.0 + (stable)
  • Node 18 + & pnpm/yarn/npm (for Tauri/Trunk helpers)
  • Xcode Command-Line Tools (macOS)
  • Download a GGUF Whisper model β†’ models/ggml-small.en.gguf

4.2 Create the workspace

cargo new --lib speakr-core
cargo tauri init --template leptos speakr-tauri   # generates src-tauri + Leptos wiring
cd speakr-tauri
pnpm tauri add global-shortcut                     # JavaScript guest bindings

(Add a sibling speakr-ui crate only if you want the UI separate from the template.)


5. Core Library (speakr-core)

Cargo.toml
[package]
name    = "speakr-core"
version = "0.1.0"
edition = "2021"

[dependencies]
cpal        = "0.15"
whisper-rs  = { version = "0.8", features = ["whisper-runtime-cpu"] }
enigo       = "0.1"
tokio       = { version = "1", features = ["rt-multi-thread", "macros"] }
anyhow      = "1"
#![allow(unused)]
fn main() {
use anyhow::*;
use cpal::traits::*;
use enigo::*;
use std::sync::mpsc;
use whisper_rs::{FullParams, SamplingStrategy, WhisperContext};

pub struct Speakr {
    whisper: WhisperContext,
    enigo:   Enigo,
}

impl Speakr {
    pub fn new(model_path: &str) -> Result<Self> {
        Ok(Self {
            whisper: WhisperContext::new(model_path)?,
            enigo:   Enigo::new(),
        })
    }

    pub async fn capture_and_type(&mut self, seconds: u32) -> Result<()> {
        // 1️⃣  Capture PCM samples --------------------------------------------------
        let (tx, rx) = mpsc::sync_channel(seconds as usize * 16_000);
        let host = cpal::default_host();
        let dev  = host.default_input_device().context("no input device")?;
        let cfg  = dev.default_input_config()?.into();
        let stream = dev.build_input_stream(
            &cfg,
            move |data: &[f32], _| { for &s in data { let _ = tx.send(s); } },
            move |e| eprintln!("cpal error: {e}"),
            None,
        )?;
        stream.play()?;
        let mut samples = Vec::with_capacity(seconds as usize * 16_000);
        for _ in 0..seconds * 16_000 {
            samples.push(rx.recv()?);
        }
        drop(stream);

        // 2️⃣  Transcribe -----------------------------------------------------------
        let mut params = FullParams::new(SamplingStrategy::Greedy { best_of: 1 });
        params.set_language(Some("en"));
        let text = self.whisper.full(params, &samples)?;

        // 3️⃣  Inject ---------------------------------------------------------------
        self.enigo.text(&text);
        Ok(())
    }
}
}

6. Tauri Backend (speakr-tauri / src-tauri)

`src-tauri/Cargo.toml` extras
[dependencies]
speakr-core = { path = "../speakr-core" }
# Tauri β‰₯ 2.0 API-complete build
tauri       = { version = "2", features = ["api-all"] }
# Global hot-key plugin
tauri-plugin-global-shortcut = "2"
tokio       = "1"
anyhow      = "1"
#![cfg_attr(not(debug_assertions), windows_subsystem = "windows")]
use speakr_core::Speakr;
use std::sync::Mutex;
use tauri::{Manager, State};

struct AppState(Mutex<Option<Speakr>>);

#[tauri::command]
async fn transcribe(state: State<'_, AppState>) -> Result<(), String> {
    let mut guard = state.0.lock().unwrap();
    guard
        .as_mut()
        .ok_or("model not ready")?
        .capture_and_type(10)        // 10 s max
        .await
        .map_err(|e| e.to_string())
}

fn main() {
    tauri::Builder::default()
        .plugin(tauri_plugin_global_shortcut::init())
        .manage(AppState(Mutex::new(None)))
        .setup(|app| {
            // Pre-load Whisper model once at startup
            let model = Speakr::new("../models/ggml-small.en.gguf")?;
            *app.state::<AppState>().0.lock().unwrap() = Some(model);

            // Register ⌘βŒ₯Space
            #[cfg(desktop)]
            app.global_shortcut().register("CMD+OPTION+SPACE", move || {
                let handle = app.app_handle();
                tauri::async_runtime::spawn(async move {
                    let _ = handle.invoke("transcribe", &()).await;
                });
            })?;
            Ok(())
        })
        .invoke_handler(tauri::generate_handler![transcribe])
        .run(tauri::generate_context!())
        .expect("error while running Speakr");
}

Capability JSON Add global-shortcut:allow-register to src-tauri/capabilities/default.json (see Tauri docs for full schema).


7. Leptos Front-End (optional)

The Tauri template already wires Trunk + Leptos. A minimal status UI:

#![allow(unused)]
fn main() {
use leptos::*;
use tauri_use::{use_invoke, UseTauri};   // helper hooks

#[component]
pub fn App() -> impl IntoView {
    let UseTauri { trigger: transcribe, .. } = use_invoke::<()>(&"transcribe");
    let (status, set_status) = create_signal("Idle");

    // Listen for status updates from backend
    leptos::window_event_listener("speakr-status", move |evt: String| set_status(evt));

    view! {
        <div class="p-4">
            <h1 class="text-xl font-bold">Speakr</h1>
            <p>{move || format!("Status: {status()}")}</p>
            <button class="mt-4 bg-blue-600 text-white px-3 py-1 rounded"
                    on:click=move |_| transcribe()>
                "Record & Type"
            </button>
        </div>
    }
}
}

tauri.conf.json should already contain:

{
  "build": {
    "beforeDevCommand": "trunk serve",
    "beforeBuildCommand": "trunk build --release",
    "devUrl": "http://localhost:1420",
    "frontendDist": "../dist"
  },
  "app": { "withGlobalTauri": true }
}

8. macOS Permissions

  1. Microphone – Tauri adds NSMicrophoneUsageDescription automatically when you enable audio.
  2. Accessibility – Ask the user to enable Speakr under System Settings β†’ Privacy & Security β†’ Accessibility so Enigo keystrokes reach other apps.
  3. Codesign & Notarise – For distribution run:
cargo tauri build --target universal-apple-darwin   # produces .app bundle
# then codesign & notarise with `xcrun notarytool`

9. Dev & Release Workflow

# hot-reload UI + backend
trunk serve &              # terminal 1 – WASM
cargo tauri dev            # terminal 2 – desktop shell

# production
trunk build --release      # build UI assets
cargo tauri build          # build .app or MSI/DEB

10. Performance Levers

LeverEffectHint
Model sizeLatency vs accuracytiny.en β‰ˆ 30 MB loads fastest
params.set_*Threads / strategySet set_num_threads(num_cpus::get())
Audio chunk lengthTurn-around timePush-to-talk (≀ 10 s) keeps UI snappy
VAD (optional)Trim silence & hallucinationAdd webrtc-vad if needed

11. Roadmap Ideas

  • Config window for model selection & hot-key change
  • Streaming, real-time transcription (partial results)
  • Windows/Linux support (replace Enigo backend where needed)
  • Auto-punctuation & language detection

πŸŽ‰ You now have a single, coherent guideβ€”merge of all three GPT draftsβ€”ready to get Speakr typing for you on macOS in a weekend


title: Technical Architecture – Speakr version: 2025-07-20 status: Draft

Speakr – Technical Architecture

1. Purpose

Speakr is a privacy-first hot-key dictation utility for macOS (with Windows/Linux on the roadmap). When the user presses a global shortcut, it records a short audio segment, runs an on-device Whisper model, and synthesises keystrokes to type the transcript into the currently-focused application – all in under a few seconds.


2. High-Level Architecture

flowchart TB
    subgraph Tauri Shell
        direction TB
        GlobalShortcut["Global Shortcut<br/><i>tauri-plugin-global-shortcut</i>"]
        IPC["IPC Bridge<br/><i>tauri invoke / emit</i>"]
        Tray["System Tray / UI<br/><i>Leptos + WASM</i>"]
    end

    subgraph Core Library
        direction TB
        Recorder["Audio Recorder<br/><i>cpal</i>"]
        STT["Speech-to-Text<br/><i>whisper-rs</i>"]
        Injector["Text Injector<br/><i>enigo</i>"]
    end

    GlobalShortcut -- "hot-key pressed" --> Recorder
    Recorder -- "PCM samples" --> STT
    STT -- "transcript" --> Injector
    Injector -- "keystrokes" --> FocusApp(["Focused Application"])

    %% UI flow
    Recorder -- "status events" --- IPC
    STT ---- IPC
    Injector --- IPC
    IPC ==> Tray

Key points:

  1. All heavy-weight logic lives in pure Rust (speakr-core). The UI may be hidden without affecting functionality.
  2. No network access – Whisper runs entirely on-device.
  3. Plugin isolation – Optional features (auto-start, clipboard, etc.) are added via Tauri plugins with explicit capability JSON.

3. Crate & Directory Layout

LayerCrate / PathMain Responsibilities
Corespeakr-core/Record audio (cpal) ➜ transcribe (whisper-rs) ➜ inject text (enigo)
Backendspeakr-tauri/Registers global hot-key, exposes #[tauri::command] wrappers, persists settings
Frontendspeakr-ui/ (optional)Leptos WASM UI for tray, preferences, status overlay
Assetsmodels/GGUF Whisper models downloaded post-install

All crates live in a single Cargo workspace to guarantee compatible dependency versions.

3.1 Speakr-Tauri Internal Structure

The speakr-tauri backend is organised into focused modules for maintainability and testability:

speakr-tauri/src/
β”œβ”€β”€ commands/           # Tauri command implementations
β”‚   β”œβ”€β”€ mod.rs         # Command organisation and documentation
β”‚   β”œβ”€β”€ validation.rs  # Input validation (hotkey format, etc.)
β”‚   β”œβ”€β”€ system.rs      # System integration (model availability, auto-launch)
β”‚   └── legacy.rs      # Backward compatibility commands
β”œβ”€β”€ services/          # Background services and state management
β”‚   β”œβ”€β”€ mod.rs         # Service coordination
β”‚   β”œβ”€β”€ hotkey.rs      # Global hotkey registration and management
β”‚   β”œβ”€β”€ status.rs      # Backend service status tracking
β”‚   └── types.rs       # Shared service types and enums
β”œβ”€β”€ settings/          # Configuration persistence and validation
β”‚   β”œβ”€β”€ mod.rs         # Settings management
β”‚   β”œβ”€β”€ persistence.rs # File I/O for settings
β”‚   β”œβ”€β”€ migration.rs   # Settings schema migration
β”‚   └── validation.rs  # Settings validation logic
β”œβ”€β”€ debug/             # Debug-only functionality
β”‚   β”œβ”€β”€ mod.rs         # Debug command coordination
β”‚   β”œβ”€β”€ commands.rs    # Debug-specific Tauri commands
β”‚   β”œβ”€β”€ storage.rs     # Debug log storage
β”‚   └── types.rs       # Debug-specific types
β”œβ”€β”€ audio/             # Audio handling utilities
β”‚   β”œβ”€β”€ mod.rs         # Audio module coordination
β”‚   β”œβ”€β”€ files.rs       # Audio file operations
β”‚   └── recording.rs   # Audio recording helpers
└── lib.rs             # Tauri app setup, command registration

Key architectural principles:

  • Separation of concerns: Business logic in *_internal() functions, Tauri integration in lib.rs
  • Testability: Internal functions can be tested without Tauri runtime overhead
  • Modularity: Commands grouped by functional domain rather than technical implementation
  • Documentation: Each module has comprehensive rustdoc explaining its purpose and usage

4. Runtime Flow (Happy Path)

StepThread/TaskActionTypical Latency
1Main (OS)User presses ⌘βŒ₯Space–
2Tauri shortcut handlerSpawns async task transcribe()< 1 ms
3Tokio workercpal::Stream captures 16-kHz mono PCM into ring-buffer0–10 s (configurable)
4Same taskPCM fed into whisper_rs::full()~1 s per 10 s audio on M-series
5Same taskTranscript returned β†’ enigo.text() synthesises keystrokes≀ 300 ms
6UI taskFrontend receives status events via emit() and updates overlayrealtime

Failure cases (no mic, model missing, permission denied) surface via error events and native notifications.


5. Concurrency & Safety

  • Tokio multi-thread runtime drives asynchronous recording and Whisper inference.
  • The AppState(Mutex<Option<Speakr>>) guards the singleton Whisper context; loading occurs once at app start.
  • Hot-key handler offloads work to the runtime to keep the UI thread non-blocking.
  • Audio buffer uses a bounded sync_channel to avoid unbounded RAM growth.

6. Security & Permissions

PlatformPermissionWhyRequest Mechanism
macOSMicrophone accessRecord audioNSMicrophoneUsageDescription (Info.plist)
macOSAccessibilitySend synthetic keystrokesUser enables app in System Settings β–Έ Accessibility
AllGlobal shortcutRegister hot-keyglobal-shortcut:allow-register capability

The app runs offline; no data leaves the device.


7. Build & Packaging

  1. Dev: trunk serve & (frontend) + cargo tauri dev (backend)
  2. Release: trunk build --release ➜ cargo tauri build
  3. macOS notarisation: xcrun notarytool submit --wait after codesign.
  4. Universal binary size β‰ˆ 15 MB (+ model).

8. Extensibility Points

  • Voice Activity Detection: plug-in webrtc-vad before Whisper to auto-stop on silence.
  • Streaming transcripts: call whisper_rs::full_partial() and enqueue keystrokes incrementally.
  • Multi-language: set params.set_language(None) for auto-detect.
  • Cross-platform: replace enigo backend with send_input (Win) or xdo (X11) while keeping public API.

9. Risks & Mitigations

RiskMitigation
Keystroke injection blocked in secure fieldsFallback to clipboard-paste mode with warning
Whisper latency on older CPUsOffer tiny.en.gguf and shorter max record time
Shortcut clashesUI lets user redefine hot-key and validates uniqueness
Model file missing/corruptVerify checksum on load and show error dialogue

10. Future Roadmap

  1. Settings sync via tauri-plugin-store (JSON in AppData).
  2. Auto-start on login (tauri-plugin-autostart).
  3. GPU inference when Whisper Metal backend stabilises.
  4. Installer bundles (DMG/MSI/DEB) with model downloader.

This document replaces the previous placeholder docs/ARCHITECTURE.md and should be kept up-to-date with all architectural changes.

Development Overview

Pre-commit Setup and Optimization

"Quality is not an act, it's a habit." β€” Aristotle

This document describes Speakr's pre-commit hook configuration, optimization strategies, and future improvement opportunities.

πŸ“‹ Table of Contents

Overview

Pre-commit hooks ensure code quality by running automated checks before each commit. This prevents broken code from entering the repository and maintains consistent coding standards across the team.

Why Pre-commit?

  • Early Detection: Catch issues before they reach CI/CD
  • Consistent Quality: Enforce formatting and linting standards
  • Fast Feedback: Immediate results during development
  • Team Alignment: Same standards for all contributors

Current Setup

Our optimized pre-commit configuration targets affected packages only, reducing execution time by ~70% for typical changes.

Configuration Files

Hook Categories

1. Package-Specific Rust Hooks

speakr-core (triggered by ^speakr-core/.*\.rs$):

  • cargo-fmt-core: Code formatting check
  • cargo-clippy-core: Linting with all warnings as errors
  • cargo-test-core: Unit and integration tests

speakr-tauri (triggered by ^speakr-tauri/.*\.rs$):

  • cargo-fmt-tauri: Code formatting check
  • cargo-clippy-tauri: Linting with all warnings as errors
  • cargo-test-tauri: Unit and integration tests

speakr-ui (triggered by ^speakr-ui/.*\.rs$):

  • cargo-fmt-ui: Code formatting check
  • cargo-clippy-ui: Linting with all warnings as errors
  • cargo-test-ui: Unit and integration tests

2. Workspace-Level Hooks

Workspace Changes (triggered by ^(Cargo\.(toml|lock)|\.cargo/.*)$):

  • cargo-fmt-workspace: Format all packages
  • cargo-clippy-workspace: Lint entire workspace

3. Smart Integration Hooks

Dependency Awareness:

  • cargo-test-integration: When speakr-core changes, also test speakr-tauri (dependency relationship)

4. General Quality Hooks

  • Trailing whitespace: Remove unnecessary whitespace
  • YAML/JSON/TOML validation: Syntax checking
  • Large file detection: Prevent accidental commits of large files
  • Merge conflict detection: Catch unresolved conflicts
  • Markdown linting: Documentation quality

Optimizations

🎯 Selective Package Testing

Problem: Previous setup ran all checks on all packages for any Rust file change.

Solution: File pattern matching to target only affected packages.

# Before: Always runs on ANY .rs file
files: \.rs$
entry: cargo test --all

# After: Only runs on speakr-core files
files: ^speakr-core/.*\.rs$
entry: cargo test --package speakr-core

🧠 Dependency-Aware Testing

Problem: Changes to speakr-core could break speakr-tauri without running its tests.

Solution: Smart integration testing when dependencies change.

# Integration test: core changes affect tauri
- id: cargo-test-integration
  name: Cargo Test (integration - core affects tauri)
  entry: cargo test --package speakr-tauri
  files: ^speakr-core/.*\.rs$  # Triggered by core changes

⚑ Performance Optimizations

  1. Parallel Execution: Each package's hooks can run in parallel
  2. Targeted Scoping: Only affected code gets checked
  3. Smart Caching: Cargo's incremental compilation benefits
  4. Early Exit: Hooks fail fast on first error

Usage Guide

Installation

# Install pre-commit (if not already installed)
pip install pre-commit

# Install hooks in repository
pre-commit install

# Optional: Install for push events too
pre-commit install -t pre-push

Daily Workflow

Automatic (Recommended):

git add .
git commit -m "feat: add new feature"
# Hooks run automatically, commit proceeds if all pass

Manual Testing:

# Run all hooks on all files
pre-commit run --all-files

# Run specific hook
pre-commit run cargo-fmt-core

# Run on specific files
pre-commit run --files speakr-core/src/lib.rs

Advanced Selective Testing

For maximum control, use our custom script:

# Test only packages affected by changes since last commit
./scripts/selective-tests.sh

# Compare against specific commit/branch
./scripts/selective-tests.sh main
./scripts/selective-tests.sh abc123def

# Get help
./scripts/selective-tests.sh --help

Bypassing Hooks (Emergency Only)

# Skip all hooks (use sparingly!)
git commit -m "hotfix: urgent fix" --no-verify

# Skip specific hook
SKIP=cargo-test-core git commit -m "fix: skip tests temporarily"

Performance Metrics

Before Optimization

  • Total packages checked: 3/3 (100%)
  • Average execution time: ~45 seconds
  • Parallel efficiency: Low (redundant work)

After Optimization

  • Typical single-package change: 1/3 packages (33%)
  • Average execution time: ~15 seconds (70% improvement)
  • Parallel efficiency: High (targeted work)
  • Smart dependencies: Core changes β†’ Core + Tauri tests

Real-world Example

Scenario: Modify speakr-ui/src/app.rs

Before: βœ— Tests all 3 packages (~45s) After: βœ“ Tests only speakr-ui package (~12s) Speedup: 3.75x faster πŸš€

Future Improvements

πŸš€ Performance Enhancements

1. Incremental Testing with Coverage

Goal: Only run tests affected by specific code changes, not entire packages.

Implementation:

# Future: Ultra-granular testing
cargo test --package speakr-core -- --test-affected-by src/audio.rs

Tools to explore:

  • cargo-difftests: Selective re-testing framework
  • LLVM coverage analysis for affected test discovery
  • determinator: Facebook's affected package detection

2. Caching and Memoization

Goal: Skip checks if code hasn't changed since last successful run.

Implementation:

# Cache test results based on content hash
- id: cargo-test-cached
  entry: cache-wrapper cargo test --package speakr-core
  cache_key: "hash:speakr-core/**/*.rs"

Benefits:

  • Near-instant results for unchanged code
  • Perfect for repeated CI runs on same commit

3. Parallel Package Testing

Goal: Run different package tests truly in parallel.

Current: Sequential package testing Future: Matrix-style parallel execution

# Run in parallel using job control
cargo test --package speakr-core &
cargo test --package speakr-tauri &
cargo test --package speakr-ui &
wait  # Wait for all to complete

πŸ” Enhanced Feedback

1. Rich Diff Display

Goal: Show exactly what code caused failures.

Implementation:

# Future: Rich failure reporting
cargo clippy --message-format json | jq -r '.spans[] | .file_name + ":" + .line_start'

Features:

  • Syntax-highlighted diffs
  • Click-to-fix suggestions
  • Context-aware error messages

2. Performance Profiling

Goal: Track and optimize hook execution time.

Metrics to collect:

  • Per-hook execution time
  • Cache hit/miss ratios
  • Package-level timing breakdown
  • Historical performance trends

3. Smart Notifications

Goal: Contextual feedback based on change type.

Examples:

# API changes detected
⚠️  Public API modified in speakr-core - consider semver impact

# Performance impact detected
🐌 Tests are 20% slower - check for performance regressions

# Security sensitive changes
πŸ”’ Cryptographic code modified - extra security review recommended

πŸ§ͺ Test Quality Improvements

1. Mutation Testing Integration

Goal: Ensure tests actually catch bugs.

Implementation:

# Run mutation tests on changed code
cargo mutants --package speakr-core --in-diff HEAD~1..HEAD

2. Dependency Impact Analysis

Goal: Understand full impact of changes across the dependency graph.

Visualization:

speakr-core change impact:
β”œβ”€β”€ speakr-core (direct) βœ“
β”œβ”€β”€ speakr-tauri (depends on core) βœ“
└── speakr-ui (independent) ⏭️ skipped

3. Flaky Test Detection

Goal: Identify and fix unreliable tests.

Implementation:

  • Run tests multiple times in CI
  • Track test success/failure rates
  • Auto-quarantine flaky tests
  • Generate flakiness reports

πŸ”§ Developer Experience

1. IDE Integration

Goal: Show pre-commit status in development environment.

Features:

  • Real-time hook status in VS Code/Cursor
  • Inline error highlighting
  • One-click fix suggestions

2. Hook Customization

Goal: Allow per-developer customization.

Implementation:

# .pre-commit-config.local.yaml (git-ignored)
hooks:
  - id: cargo-clippy-core
    args: ["--", "-A", "clippy::pedantic"]  # Less strict for local dev

3. Quick Fix Tools

Goal: Automated fixing of common issues.

Examples:

# Auto-fix formatting
pre-commit run cargo-fmt-core --hook-stage manual

# Auto-fix common clippy warnings
cargo clippy --fix --allow-dirty

# Auto-update dependencies
cargo update && pre-commit run cargo-test-all

Troubleshooting

Common Issues

Hook Fails with "Package not found"

Cause: Package name mismatch in hook configuration. Solution: Verify package names match Cargo.toml files:

cargo metadata --format-version 1 | jq '.packages[].name'

Tests Pass Locally but Fail in CI

Cause: Different dependency versions or environment. Solution: Use Cargo.lock and consistent Rust versions:

# CI configuration
rust-toolchain: "1.88.0"  # Pin exact version

Hooks Run on Wrong Files

Cause: Incorrect regex patterns in files: configuration. Solution: Test patterns with realistic file paths:

# Test regex pattern
echo "speakr-core/src/lib.rs" | grep -E "^speakr-core/.*\.rs$"

Performance Issues

Slow Hook Execution

  1. Check package scoping: Ensure hooks target specific packages
  2. Review test suite: Look for slow integration tests
  3. Enable caching: Use --cache-dir for cargo operations

Memory Issues

  1. Limit parallel jobs: Set CARGO_BUILD_JOBS=2
  2. Increase memory limits: Configure system swap
  3. Use release mode for tests: cargo test --release (if appropriate)

Getting Help

  1. Check configuration: Validate with pre-commit validate-config
  2. Debug mode: Run with pre-commit run --verbose
  3. Clean cache: Use pre-commit clean to reset
  4. Manual testing: Test individual hooks in isolation

References

Implementation Plan – Speakr

A step-by-step roadmap to deliver the Speakr application using the test-driven, multi-crate approach defined in the specification set under docs/specs/.


1. Repository Scaffold

Reference: INIT-01 Project Scaffold

  1. Execute the migration steps to create the Cargo workspace (speakr-core, speakr-tauri, optional speakr-ui).
  2. Commit and open a draft PR; CI should fail until tests are added.
  3. Add baseline CI workflows (lint, build, placeholder tests) that currently fail.

2. Core Library (speakr-core)

OrderSpecTask
2.1FR-2Implement audio capture (cpal). Begin with failing unit test asserting 16-kHz mono stream & duration cap.
2.2FR-3Implement transcription (whisper-rs). Add latency test harness.
2.3FR-4Implement text injection (enigo). Integration tests across editors via mock window focus.
2.4FR-5Implement clipboard fallback; write secure-field simulation tests.
2.5FR-7Emit status events; test channel delivery & ordering.

Merge each sub-task when its tests pass and CI is green.

3. Tauri Backend (speakr-tauri)

OrderSpecTask
3.1FR-1Register global hot-key via tauri-plugin-global-shortcut; write E2E test with headless Tauri window.
3.2β€”Wire hot-key β†’ async call into speakr-core pipeline; ensure status events are forwarded via emit.
3.3FR-8Add settings persistence (JSON). Unit tests for load/save & corruption recovery.

4. Front-End (Leptos)

OrderSpecTask
4.1FR-6Build Settings & Status overlay UI; write component tests with Leptos testing utilities.
4.2NFR-accessibilityAdd automated axe-core & VoiceOver tests.

5. Cross-Cutting Non-Functional Work

SpecFocus
NFR-latencyOptimise model loading & thread usage; ensure performance tests pass.
NFR-footprintStrip symbols, enable lto, audit memory.
NFR-reliabilityAdd monkey-test CI job (500 invocations).
NFR-securitySocket-mock tests, Hardened Runtime flags, notarisation script.
NFR-compatibilityAdd Intel macOS runner to CI.

6. Auto-Update

Reference: FR-9 Auto-update

  1. Integrate update check using tauri-plugin-updater (or custom).
  2. Write integration tests mocking GitHub Releases API & download validation.

7. Documentation & Release

  1. Update docs/book/ with usage & contribution guide.
  2. Ensure mdbook build passes in CI.
  3. Produce signed DMG via CI; attach to GitHub Release.

Progress Checklist

    1. Preparation complete
      • Status: Preparation tasks completed.
    1. Repository scaffold merged (INIT-01)
      • Status: Repository scaffold implemented (4 crates; speakr-core (backend processing), speakr-tauri (Tauri backend), speakr-ui (Leptos front-end) and speakr-types (shared types)).
  • 2.1 Audio capture (FR-2) implemented & tested - Status: Audio capture tested via debug UI, verified WAV file is written to disk and contains the expected audio.
  • 2.2 Transcription (FR-3) implemented & tested - Status: Not started
  • 2.3 Text injection (FR-4) implemented & tested - Status: Not started
  • 2.4 Injection Not started - Status: Preparation tasks completed.
  • 2.5 Status events (FR-7) implemented & tested - Status: Not started
  • 3.1 Global hot-key (FR-1) registered & tested - Status: Not started
  • 3.2 Backend pipeline wired - Status: Not started
  • 3.3 Settings persistence (FR-8) implemented & tested - Status:
  • [~] 4.1 Settings UI (FR-6) implemented & tested - Status: Preparation tasks completed.
  • 4.2 Accessibility audits (NFR-accessibility) passing - Status: Preparation tasks completed.
  • Non-functional targets (Latency, Footprint, Reliability, Security, Compatibility) met
  • Auto-update (FR-9) implemented & tested
  • Docs & Release pipeline finished

Tick each box as the corresponding PR merges with passing CI.

Recent Progress (2025-07-20)

  • Scaffolded speakr-core library crate and added it to the workspace manifest.
  • Added stub implementation (record_to_vec) and constants in speakr-core::audio.
  • Committed failing unit test audio_capture.rs verifying 16 kHz mono stream and placeholders.
  • Workspace compiles; test fails as expected, ready for implementation phase.

Debug Panel Documentation

The Speakr debug panel is a development-only interface that provides debugging tools and testing capabilities. It's designed to help developers test features, monitor system behaviour, and troubleshoot issues during development.

Overview

The debug panel is only available in debug builds (cargo tauri dev) and is completely excluded from release builds for security and performance reasons. It provides a comprehensive debugging interface with real-time logging, feature testing, and system monitoring capabilities.

Accessing the Debug Panel

Availability

  • Debug builds only: The panel is conditionally compiled using #[cfg(debug_assertions)]
  • Toggle button: A red "πŸ› οΈ Debug" button appears in the header (debug builds only)
  • Visual indicator: The panel shows a "DEBUG BUILD" badge to remind developers of the build type
  1. Start the application in debug mode: cargo tauri dev
  2. Look for the "πŸ› οΈ Debug" button in the top-right corner of the header
  3. Click to toggle between the settings panel and debug panel
  4. The button text changes to "πŸ› οΈ Hide Debug" when the panel is active

Features

1. Audio Testing

Legacy Test Button

  • Purpose: Basic audio system testing
  • Behaviour: Click to run a simple audio recording test
  • Feedback: Shows progress in the debug output area

Push-to-Talk Recording

  • Purpose: Test real-time audio recording with push-to-talk interaction
  • Behaviour:
    • Hold the button to start recording
    • Release to stop recording
    • Supports both mouse and touch events
  • Visual feedback:
    • Button changes colour and shows pulsing animation when recording
    • Text updates to show current state
    • Recording state is displayed in system info

2. Logging Console

Real-time Log Display

  • Scrolling console: Shows recent log messages from the backend
  • Auto-scroll: Automatically scrolls to show newest messages (toggleable)
  • Timestamp: Each message includes precise timestamp
  • Source tracking: Shows which component generated each log message

Log Level Filtering

  • Dropdown filter: Filter by specific log levels (TRACE, DEBUG, INFO, WARN, ERROR)
  • Visual indicators: Each level has distinct emoji icons and colours
  • Level-specific styling: Error and warning messages have highlighted backgrounds

Console Controls

  • Refresh: Manually refresh log messages from backend
  • Clear: Clear all log messages from display and backend storage
  • Auto-scroll toggle: Enable/disable automatic scrolling to newest messages

3. System Information

Real-time display of:

  • Build type: Always shows "Debug" in debug panel
  • Environment: Shows "Development"
  • Recording state: Live status of audio recording (Active/Inactive)

Technical Implementation

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend      β”‚    β”‚     Backend      β”‚    β”‚  Log Storage    β”‚
β”‚   (Leptos)      β”‚    β”‚    (Tauri)       β”‚    β”‚   (Memory)      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ DebugPanel      │◄──►│ debug_* commands │◄──►│ DEBUG_LOG_      β”‚
β”‚ LoggingConsole  β”‚    β”‚ add_debug_log()  β”‚    β”‚ MESSAGES        β”‚
β”‚ Push-to-talk UI β”‚    β”‚ Log collection   β”‚    β”‚ (VecDeque)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Conditional Compilation

The debug panel uses Rust's conditional compilation to ensure it's only included in debug builds:

#![allow(unused)]
fn main() {
#[cfg(debug_assertions)]
mod debug;

#[cfg(debug_assertions)]
use crate::debug::DebugPanel;
}

Backend Commands

All debug commands are prefixed with debug_ and conditionally compiled:

  • debug_test_audio_recording() - Legacy audio test
  • debug_start_recording() - Start push-to-talk recording
  • debug_stop_recording() - Stop push-to-talk recording
  • debug_get_log_messages() - Retrieve stored log messages
  • debug_clear_log_messages() - Clear log message storage

Log Message Storage

Debug logs are stored in memory using a thread-safe circular buffer:

#![allow(unused)]
fn main() {
static DEBUG_LOG_MESSAGES: LazyLock<Arc<Mutex<VecDeque<DebugLogMessage>>>> =
    LazyLock::new(|| Arc::new(Mutex::new(VecDeque::with_capacity(1000))));
}

Key characteristics:

  • Capacity: Limited to 1000 messages (prevents memory bloat)
  • Thread-safe: Uses Arc<Mutex<>> for concurrent access
  • Circular buffer: Automatically removes old messages when capacity is reached
  • Structured data: Each message includes timestamp, level, target, and content

Event Handling

Push-to-talk functionality uses multiple event handlers for robust interaction:

#![allow(unused)]
fn main() {
on:mousedown=move |_| start_recording()
on:mouseup=move |_| stop_recording()
on:mouseleave=move |_| stop_recording()  // Handles mouse leaving button area
on:touchstart=move |_| start_recording()
on:touchend=move |_| stop_recording()
}

Development Patterns

Adding New Debug Features

  1. Backend Command:

    #![allow(unused)]
    fn main() {
    #[cfg(debug_assertions)]
    #[tauri::command]
    async fn debug_your_feature() -> Result<String, AppError> {
        add_debug_log(DebugLogLevel::Info, "your-component", "Feature tested");
        // Your implementation
        Ok("Success message".to_string())
    }
    }
  2. Frontend Integration:

    #![allow(unused)]
    fn main() {
    impl DebugManager {
        pub async fn test_your_feature() -> Result<String, String> {
            tauri_invoke_no_args("debug_your_feature")
                .await
                .map_err(|e| format!("Failed to test feature: {e}"))
        }
    }
    }
  3. UI Component:

    #![allow(unused)]
    fn main() {
    <button
        class="debug-btn-primary"
        on:click=move |_| test_your_feature()
    >
        "Test Your Feature"
    </button>
    }
  4. Register Command:

    #![allow(unused)]
    fn main() {
    // Add to debug build handler list
    debug_your_feature,
    }

Logging Best Practices

  1. Use appropriate log levels:

    • Trace: Detailed execution flow
    • Debug: Development information
    • Info: General information
    • Warn: Potential issues
    • Error: Actual errors
  2. Include context:

    #![allow(unused)]
    fn main() {
    add_debug_log(
        DebugLogLevel::Info,
        "component-name",
        &format!("Action completed with result: {}", result)
    );
    }
  3. Target naming:

    • Use consistent component names
    • Follow pattern: speakr-{component} (e.g., speakr-core, speakr-tauri)

Testing Debug Features

Debug features should be tested like any other code:

#![allow(unused)]
fn main() {
#[test]
fn test_debug_manager_methods_exist() {
    // Compile-time test for method signatures
    let _fn: fn() -> _ = DebugManager::test_your_feature;
    assert!(true, "Debug method exists and compiles");
}
}

Security Considerations

Build-time Exclusion

  • Debug panel code is completely removed from release builds
  • No performance impact on production builds
  • No security surface area in release builds

Development-only Data

  • Log messages are stored only in memory
  • No persistent storage of debug information
  • Automatic cleanup when application closes

Safe Defaults

  • Mock implementations prevent accidental system access
  • All debug commands return safe, predictable responses
  • Clear visual indicators remind developers of debug mode

Troubleshooting

Debug Panel Not Visible

  • Check build type: Ensure you're running cargo tauri dev, not a release build
  • Look for button: The toggle button appears in the header, not as a separate window
  • Browser cache: If using trunk serve, clear browser cache and reload

Log Messages Not Appearing

  • Click refresh: Use the "πŸ”„ Refresh" button to manually fetch logs
  • Check backend: Ensure debug commands are registered in the invoke handler
  • Memory limit: Log storage is limited to 1000 messages; older messages are automatically removed

Push-to-Talk Not Working

  • Hold, don't click: The button requires holding down, not just clicking
  • Check events: Ensure mouse/touch events are properly handled
  • Visual feedback: Look for button colour change and pulsing animation during recording

Future Enhancements

Potential additions to the debug panel:

  1. Performance Monitoring: CPU, memory usage graphs
  2. Network Activity: Mock API call testing
  3. State Inspection: Real-time application state viewer
  4. Configuration Testing: Dynamic settings modification
  5. Export Functionality: Save debug logs to file
  6. Remote Debugging: WebSocket connection for external debugging tools
  • Frontend: speakr-ui/src/debug.rs - Main debug panel implementation
  • Backend: speakr-tauri/src/lib.rs - Debug commands and log storage
  • Styles: speakr-ui/styles.css - Debug panel CSS styles
  • Types: Log message types and enums
  • Tests: Unit tests for debug functionality

Contributing

When adding debug features:

  1. Follow the established patterns for conditional compilation
  2. Add appropriate logging with meaningful messages
  3. Include tests for new functionality
  4. Update this documentation with new features
  5. Ensure features work in both desktop and mobile layouts

The debug panel is a powerful development tool that should enhance the development experience while maintaining security and performance in production builds.

Tauri Plugins

The following plugins are of interest for this project:

Specifications

This directory contains all functional requirements (FR), non-functional requirements (NFR), and initialisation specifications (INIT) for the Speakr project.

Functional Requirements (FR)

IDNameReport
FR-1Global Hot-keyImplementation Summary
FR-2Audio CaptureImplementation Summary
FR-3Transcription
FR-4Transcript Injection
FR-5Injection Fallback
FR-6Settings UIImplementation Summary
FR-7Status Events
FR-8Settings Persistence
FR-9Auto-update

Non-Functional Requirements (NFR)

IDNameReport
NFR-accessibilityAccessibility
NFR-compatibilityCompatibility
NFR-footprintFootprint
NFR-latencyLatency
NFR-reliabilityReliability
NFR-securitySecurity

Initialisation Specifications (INIT)

IDNameReport
INIT-01Project Scaffold & Initial Structure

note

Implementation Reports contain detailed analysis of completed features, including technical decisions, challenges encountered, and verification steps. See reports/ for additional documentation.

FR-1: Global Hot-key

Registers a system-wide hot-key at application start that toggles the record β†’ transcribe β†’ inject flow.

Requirement

  1. The application must register a global hot-key (default βŒ₯ Option + ~).
  2. Must be active even when Speakr is running in the background.
  3. Pressing the hot-key initiates, in order:
    1. Audio recording
    2. Transcription
    3. Text injection into the current focused field.
  4. The hot-key must be configurable in Settings and warn on conflicts.

Rationale

A single keyboard shortcut lets users capture ideas without context-switching, maintaining focus and flow.

Acceptance Criteria

  • Hot-key can be triggered from any application on macOS 13+.
  • 95th percentile time-to-text ≀ 3 s for 5 s recordings on M-series Macs.
  • 99 % activation success rate in telemetry.
  • Changing the hot-key in Settings updates the registration immediately and prevents duplicates.

Test-Driven Design

Follow TDD: write failing automated tests for every case in Test Cases (formerly Acceptance Criteria) before implementation. CI should pass only when the new tests turn green.

References

PRD Β§6 Functional Requirements – FR-1


date: 2025-07-23 requirement: FR-1-global-hotkey status: PARTIALLY COMPLETE prepared_by: o3

Implementation Report: FR-1 - Global Hot-key

Implementation Summary

The backend (speakr-tauri) integrates tauri-plugin-global-shortcut to register a system-wide shortcut at start-up. A default combination (CmdOrCtrl+Alt+Space) is attempted first; if registration fails (for example due to a conflict) a fallback (CmdOrCtrl+Alt+F2) is tried. The registration logic is implemented in GlobalHotkeyService (speakr-tauri/src/services/hotkey.rs) and invoked from speakr-tauri/src/lib.rs inside the setup callback. The service stores the active shortcut behind a mutex and emits a hotkey-triggered Tauri event each time the key is pressed.

Validation utilities (commands::validation::validate_hot_key_internal) together with the HotkeyConfig type (defined in speakr-types) provide parsing and serialisation support. A comprehensive suite of unit tests exercises many shortcut formats, as well as default configuration behaviour and placeholder Tauri integration scenarios.

Work Remaining

  1. Trigger pipeline – wire the hotkey-triggered event to the record β†’ transcribe β†’ inject flow (FR-2, FR-3, FR-4).
  2. Settings integration – load a user-defined shortcut from persisted settings at start-up and expose a Tauri command that re-registers it at runtime.
  3. Conflict feedback – propagate HotkeyError::ConflictDetected to the UI so users are warned instantly.
  4. Configurable modifier – change the default shortcut to match the PRD (βŒ₯ Option + ~) and let users restore defaults easily.
  5. Cross-platform assurance – create integration tests with a mocked AppHandle or CI desktop harness to confirm registration works on macOS, Windows and Linux.
  6. Performance metric – measure and emit telemetry needed for the 95th-percentile time-to-text ≀ 3 s requirement (once the pipeline is complete).

Architecture

Sequence – current implementation

sequenceDiagram
    autonumber
    participant OS as Operating System
    participant Plugin as GlobalShortcut plugin<<components>>
    participant Service as GlobalHotkeyService<<process>>
    participant App as Speakr backend (Tauri)<<components>>

    App->Plugin: register(shortcut)
    Plugin->OS: register
    OS-->>Plugin: ok / fail
    Plugin-->>App: result
    OS->>Plugin: *User presses shortcut*
    Plugin->Service: on_shortcut callback
    Service->App: emit "hotkey-triggered" event

Target flow – requirement goal

flowchart TD
    Input["User presses global hot-key"]::inputOutput --> Shortcut(Registered shortcut)<<components>>
    Shortcut --> |Tauri event| Record["Audio capture start"]::process
    Record --> Transcribe["Whisper transcription"]::process
    Transcribe --> Inject["Text injection into active field"]::process
    classDef inputOutput fill:#FEE0D2,stroke:#E6550D,color:#E6550D
    classDef process fill:#EAF5EA,stroke:#C6E7C6,color:#77AD77
    classDef components fill:#E6E6FA,stroke:#756BB1,color:#756BB1

Noteworthy

  • The current default shortcut differs from the PRD specification. A TODO in code highlights the pending pipeline integration.
  • Unit tests follow TDD principles, yet integration tests with the real plugin are still placeholders.

References

FR-2: Audio Capture

Captures microphone input suitable for Whisper transcription.

Requirement

  1. Capture 16 kHz mono audio via the cpal crate.
  2. Default maximum duration 10 s; user-configurable up to 30 s.
  3. Recording stops automatically when the duration limit is reached or the user presses the hot-key
  4. again.
  5. Audio is buffered entirely in memory; no files are written to disk.
  6. Handle microphone permission prompts gracefully on first run.

Rationale

Lower sample-rate mono audio minimises processing cost while meeting Whisper’s input requirements.

Acceptance Criteria

  • Recording initialises within 100 ms after hot-key press.
  • Audio stream conforms to 16 kHz, 16-bit, mono.
  • User can change max duration in Settings; value persists across restarts.
  • Recording stops cleanly at limit without crashing or clipping.
  • Permission dialog appears once and records decision.

Test-Driven Design

Adopt test-driven development: begin by writing failing unit/integration tests that assert each Acceptance Criterion. Only then implement capture logic until tests pass in CI.

References

PRD Β§6 Functional Requirements – FR-2


date: 2025-07-23 requirement: FR-2 status: PARTIALLY COMPLETE prepared_by: o4-mini

markdownlint-disable MD013

Implementation Report: FR-2 - Audio Capture

Implementation Summary

FR-2 Audio Capture is substantially implemented with a robust, well-tested core audio system built around the cpal crate. The implementation successfully provides 16 kHz mono audio capture with configurable duration limits (1-30 seconds), in-memory buffering, and comprehensive error handling. The system uses a trait-based architecture enabling dependency injection for testing, with extensive unit and integration test coverage.

The core functionality is implemented in speakr-core/src/audio/mod.rs with the AudioRecorder struct providing the main API. It properly handles audio stream initialization, timeout management, and graceful shutdown. Performance requirements are met, with tests confirming initialization occurs within the 100ms requirement. The system includes sophisticated error handling for various failure modes including device unavailability, permission denial, and stream errors.

Work Remaining

  • Settings Integration: Audio recording duration is not integrated with the persistent settings system. Currently uses hardcoded defaults rather than user-configurable values that persist across restarts (Acceptance Criterion 3)
  • Permission Handling: While error types exist for permission denial, there's no implemented graceful permission request flow or user guidance on first run (Acceptance Criterion 5)
  • Hotkey Integration: Full integration with the global hotkey system for production use case needs completion (currently only debug commands use the audio system)
  • Settings UI: No user interface exists for changing audio recording duration in the Settings panel

Architecture

sequenceDiagram
    participant HK as Global Hotkey
    participant AR as AudioRecorder
    participant AS as AudioSystem (cpal)
    participant CS as CpalAudioStream
    participant TO as Timeout Task

    HK->>AR: start_recording()
    AR->>AR: Check if already recording
    AR->>AS: start_recording(config)
    AS->>CS: Create audio stream
    CS->>CS: Initialize cpal stream
    CS-->>AS: Return stream handle
    AS-->>AR: Return AudioStream
    AR->>TO: Spawn timeout task
    AR-->>HK: Recording started

    Note over CS: Continuously capture<br/>16kHz mono samples

    alt Manual Stop
        HK->>AR: stop_recording()
    else Timeout
        TO->>CS: stream.stop()
    end

    AR->>CS: get_samples()
    CS-->>AR: Vec<i16> samples
    AR-->>HK: RecordingResult

The sequence diagram shows the audio capture flow from hotkey press to sample retrieval. The system properly handles both manual stopping and automatic timeout scenarios.

classDiagram
    class AudioRecorder {
        -state: Arc~Mutex~Option~RecordingState~~~
        -audio_system: Box~dyn AudioSystem~
        +new(config: RecordingConfig) AudioRecorder
        +start_recording() Result~(), AudioCaptureError~
        +stop_recording() Result~RecordingResult, AudioCaptureError~
        +is_recording() bool
        +list_input_devices() Result~Vec~AudioDevice~, AudioCaptureError~
    }

    class AudioSystem {
        <<trait>>
        +start_recording(config: &RecordingConfig) Result~Box~dyn AudioStream~, AudioCaptureError~
        +list_input_devices() Result~Vec~AudioDevice~, AudioCaptureError~
    }

    class CpalAudioSystem {
        -host: cpal::Host
        +new() Result~Self, AudioCaptureError~
    }

    class AudioStream {
        <<trait>>
        +get_samples() Vec~i16~
        +stop()
        +is_active() bool
    }

    class CpalAudioStream {
        -samples: Arc~Mutex~Vec~i16~~~
        -is_recording: Arc~AtomicBool~
    }

    class RecordingConfig {
        -max_duration_secs: u32
        +new(duration: u32) Self
        +max_duration_secs() u32
        +max_samples() usize
    }

    AudioRecorder --> AudioSystem
    CpalAudioSystem ..|> AudioSystem
    CpalAudioSystem --> CpalAudioStream
    CpalAudioStream ..|> AudioStream
    AudioRecorder --> RecordingConfig

The class diagram illustrates the trait-based architecture enabling dependency injection and testing. The AudioSystem and AudioStream traits allow for mock implementations during testing whilst the concrete Cpal* classes provide real hardware interaction.

stateDiagram-v2
    [*] --> Idle
    Idle --> Initializing : start_recording()
    Initializing --> Recording : Stream created successfully
    Initializing --> Error : Device/Permission error
    Recording --> Stopping : Manual stop / Timeout
    Stopping --> Idle : Samples extracted
    Error --> Idle : Error handled

    state Recording {
        [*] --> Capturing
        Capturing --> Capturing : Accumulate samples
    }

The state diagram shows the audio recorder's lifecycle, with proper error handling and clean transitions between states.

Noteworthy

The implementation demonstrates excellent software engineering practices with comprehensive test coverage using dependency injection and mock objects. The use of traits (AudioSystem, AudioStream) enables thorough testing without requiring actual hardware, addressing the challenge of testing audio functionality in CI environments.

Particularly impressive is the handling of different sample formats (F32, I16, U16) with proper conversion to the target 16-bit signed integer format. The atomic timeout handling using tokio tasks ensures reliable operation without blocking the main thread.

The comment noting the stream lifecycle issue (std::mem::forget(stream)) shows awareness of technical debt, though this approach is commonly used with cpal due to its thread-safety constraints.

  • FR-3 FR-3: Transcription (consumes audio samples from FR-2)
  • FR-8 FR-8: Settings Persistence (should store audio duration preference)
  • FR-1 FR-1: Global Hotkey (triggers audio capture)

References

FR-3: Transcription

Offline transcription of recorded audio to text using Whisper.

Requirement

  1. Use whisper-rs to run Whisper (GGUF) models entirely on-device.
  2. Default language: English (en). Allow user language selection in Settings.
  3. Transcription must complete within ≀ 3 s (95th percentile) for 5-second recordings on Apple
  4. Silicon with the small model.
  5. Support user-selectable model sizes for latency/accuracy trade-off.
  6. No external network calls during transcription.

Rationale

On-device inference preserves privacy and removes network latency, achieving the product’s privacy-first promise.

Acceptance Criteria

  • Transcription completes within latency budget on M1 and Intel reference machines.
  • Selecting a different model in Settings updates the engine without restart.
  • No outbound network traffic observed via packet capture.
  • Errors (e.g. model missing) surface in UI overlay/log with actionable message.

Test-Driven Design

Begin with failing automated tests for latency, language selection, and network isolation. Implement transcription until all tests pass, following TDD.

References

PRD Β§6 Functional Requirements – FR-3

FR-4: Transcript Injection

Types the transcribed text into the currently focused input field.

Requirement

  1. Use the enigo crate to emit synthetic keystrokes that reproduce the transcription exactly as
  2. plain text.
  3. Injection must preserve line breaks and punctuation.
  4. Injection must run on the main UI thread to respect macOS accessibility APIs.
  5. Provide feedback event (e.g. Injected) to UI overlay/log once complete.

Rationale

Typing text directly avoids clipboard usage and works in most applications, maintaining illusion of native typing.

Acceptance Criteria

  • For a 100-character transcript, injection latency ≀ 300 ms.
  • Typed characters match transcription byte-for-byte.
  • Works in common editors (VS Code, Xcode, Pages, Safari).
  • Emits completion event for downstream UI.

Test-Driven Design

Write failing integration tests measuring injection latency and correctness across target editors. Deliver code to satisfy the tests.

References

PRD Β§6 Functional Requirements – FR-4

FR-5: Injection Fallback

Clipboard-paste fallback when keystroke injection is blocked.

Requirement

  1. Detect secure text fields or injection failure (e.g. enigo error).
  2. Copy transcript to clipboard and simulate ⌘V paste as fallback.
  3. Display transient warning overlay: β€œSecure field detected – text pasted via clipboard.”
  4. Restore previous clipboard contents after paste to respect user data.

Rationale

Some password or secure fields block synthetic keystrokes. A controlled clipboard fallback ensures functionality while informing the user.

Acceptance Criteria

  • 100 % success rate pasting into macOS secure text fields (Safari password prompt as test).
  • Previous clipboard restored within 500 ms after paste.
  • Warning overlay disappears automatically after 3 s.
  • No sensitive transcript retained on clipboard after restore.

Test-Driven Design

Craft failing tests for secure-field detection, clipboard restoration, and overlay timing. Implement fallback logic until tests succeed.

References

PRD Β§6 Functional Requirements – FR-5

FR-6: Settings UI

Provides a graphical interface (tray or window) for user configuration.

Requirement

  1. Expose configuration for:
    • Global hot-key picker
    • Model selector (small, medium, large GGUF)
    • Auto-launch on login toggle
  2. Implemented as a Tauri window accessible from the menu bar/tray.
  3. Validate hot-key conflicts and model availability.
  4. Preference changes take effect without restarting the app.

Rationale

A minimal settings UI keeps the main workflow keyboard-first while allowing deeper configuration when needed.

Acceptance Criteria

  • Opening Settings from tray displays window within 200 ms.
  • Changing options updates behaviour immediately (e.g. new hot-key active).
  • Invalid configurations (missing model file) display inline errors.
  • Settings persist after app restart.

Test-Driven Design

Define unit/UI tests for each settings control and validation rule before coding. Implementation is complete when all tests pass.

References

PRD Β§6 Functional Requirements – FR-6


date: 2025-07-23 requirement: FR-6 status: PARTIALLY COMPLETE prepared_by: gpt-4.1

Implementation Report: FR-6 - Settings UI

Implementation Summary

The SettingsPanel Leptos component serves as the primary settings interface for Speakr. On launch, it invokes the Tauri load_settings command to retrieve persisted AppSettings and renders:

  • Global hot-key configuration: Real-time validation via the validate_hot_key Tauri command, un/registration through the global-shortcut plugin, and persistence via save_settings.
  • Model selection: Radio options for small, medium, and large Whisper models, availability checks using check_model_availability, disabling unavailable models, and immediate persistence.
  • Auto-launch toggle: Uses the set_auto_launch Tauri command and calls save_settings on change.

All changes trigger save_settings and display inline success or error messages. Backend persistence is handled atomically in settings/persistence.rs. Hot-key and auto-launch preferences apply at runtime without restarting the app.

Work Remaining

  • Add a system tray icon and a β€œSettings” menu item to open or focus the settings window.
  • Implement Tauri system_tray integration and event handling in run() to show/hide the settings window.
  • Enable dynamic transcription-model reload in the backend when model_size changes, without requiring a restart.
  • Develop unit/UI tests for each settings control and validation path (hot-key, model selection, auto-launch).
  • Measure and optimise settings window startup to meet the <200 ms opening requirement.
  • Enhance the hot-key picker with an interactive key-capture control instead of free-text input.
  • Display inline errors for model selection failures (e.g., missing or corrupt model files).

Architecture

Sequence Diagram

sequenceDiagram
  participant UI as "SettingsPanel"
  participant Backend as "Tauri Backend"
  participant FS as "File System"

  UI->>Backend: load_settings()
  Backend->>FS: load_settings_from_dir()
  FS-->>Backend: AppSettings
  Backend-->>UI: AppSettings
  UI->>UI: render settings

  UI->>Backend: validate_hot_key(newHotkey)
  Backend-->>UI: Ok
  UI->>UI: register_global_shortcut

  UI->>Backend: save_settings(AppSettings)
  Backend->>FS: save_settings_to_dir()
  FS-->>Backend: Ok
  Backend-->>UI: Ok

Flowchart

flowchart TD
  A["User modifies setting"] --> B["UI captures change"]
  B --> C{"Validate input"}
  C -->|Valid| D["Invoke Tauri command"]
  C -->|Invalid| E["Show validation error"]
  D --> F["Persist settings via backend"]
  F --> G["Display success or error message"]

Noteworthy

N/A

  • FR-1 Global Hotkey
  • FR-3 Transcription
  • FR-8 Settings Persistence

References

  • speakr-ui/src/settings.rs
  • speakr-tauri/src/lib.rs
  • speakr-tauri/tauri.conf.json

FR-7: Status Events

Emit real-time status updates for UI overlays and logging.

Requirement

  1. Broadcast status events: Recording, Transcribing, Injected, Error (variants).
  2. Events emitted over an internal async channel consumable by UI components and log subsystem.
  3. Include timestamp and optional payload (e.g. error message).
  4. Provide public Rust API subscribe_status() for other components.

Rationale

A decoupled event system lets the overlay and future extensions react without tight coupling to business logic.

Acceptance Criteria

  • Overlay reflects status within 50 ms of event emission.
  • Logs capture all events with accurate timestamps.
  • No missed or duplicated events observed in 1-hour monkey test (500 invocations).

Test-Driven Design

Start with failing tests subscribing to the event channel and asserting delivery guarantees (latency, ordering, no duplicates). Implement until green.

References

PRD Β§6 Functional Requirements – FR-7

FR-8: Settings Persistence

Persist user preferences locally without cloud sync.

Requirement

  1. Store settings in a JSON file located in the platform-appropriate app data directory
  2. ($HOME/Library/Application Support/Speakr/settings.json).
  3. Write changes atomically to avoid corruption.
  4. Migration framework supports future schema evolution with versioning.
  5. No data leaves the device.

Rationale

Local persistence offers instant access, privacy, and offline capability.

Acceptance Criteria

  • Settings file created on first launch with defaults.
  • Modifying settings updates file within 100 ms.
  • Corrupt settings file triggers automatic recovery to defaults.
  • Unit tests cover load/save error paths.

Test-Driven Design

Write failing unit tests for load/save, corruption recovery, and migration before implementation; pass them in CI.

References

PRD Β§6 Functional Requirements – FR-8

FR-9: Auto-update

Provide optional self-update via GitHub Releases.

Requirement

  1. When enabled, periodically (daily) check GitHub Releases for a newer version tag.
  2. Use secure download (HTTPS) and verify code signature / hash before install.
  3. Prompt user with Release Notes and require confirmation before applying update.
  4. Allow users to disable auto-update in Settings.
  5. Feature optional in v1; must degrade gracefully when disabled.

Rationale

Easy updates encourage users to stay on latest version, reducing support burden and delivering security fixes.

Acceptance Criteria

  • Update check runs off main thread; no UI freeze.
  • Failed update check logs but does not crash application.
  • Downloaded binary passes macOS notarisation verification.
  • User can opt-out entirely; no network calls when disabled.

Test-Driven Design

Begin with failing integration tests that simulate update availability, download verification,

References

PRD Β§6 Functional Requirements – FR-9

INIT-01: Project Scaffold & Initial Structure

Define the baseline repository layout, build tooling, and development workflows for Speakr.

Requirement

  1. Workspace Layout (multi-crate)
    • speakr-core/ – pure Rust library (record β†’ transcribe β†’ inject).
    • speakr-tauri/ – Tauri desktop shell; contains src-tauri/ and embeds Leptos frontend by default.
    • speakr-ui/ – optional standalone Leptos UI crate (only if the UI is fully separated).
    • models/ – user-downloaded GGUF Whisper models (git-ignored).
    • docs/ – architecture, PRD, and spec docs (this folder).
    • nix/ – flakes, overlays, devenv.nix, CI helpers.
    • scripts/ – one-off dev scripts (lint, release, etc.).
    • Root-level Cargo.toml / Cargo.lock defining a [workspace] with members.
  2. Build Tooling
    • Use Cargo workspace to manage crates and enable incremental rebuilds.
    • Root-level Nix flake + devenv.nix for reproducible shells.
    • Trunk.toml (in speakr-tauri/) bundles static assets for the WebView.
  3. CI / CD
    • GitHub Actions workflow for: lint (rustfmt, clippy), test, macOS build, docs build.
    • Release workflow signs and notarises macOS DMG.
  4. Linters & Hooks
    • Pre-commit config: rustfmt, markdownlint, shellcheck, nixpkgs-fmt.
  5. Documentation Site
    • mdBook in docs/book/ published via GitHub Pages.
  6. Version Control Hygiene
    • .gitignore tracks target, model files, and local config overrides.

Rationale

A consistent scaffold accelerates onboarding, enforces build reproducibility, and aligns with the project’s privacy-first & cross-platform goals.

Acceptance Criteria

  • Fresh clone followed by devenv shell (or devenv up) yields a working shell with cargo,
  • tauri, and mdbook available.
  • cargo test passes with placeholder tests.
  • npm run tauri dev (via Trunk) launches stub window.
  • GitHub Actions green on lint + test.
  • mdbook serve builds documentation without errors.

Migration Steps (from mono-crate β†’ multi-crate)

  1. Create workspace file

    # At repo root
    echo "[workspace]\nmembers = [ \"speakr-core\", \"speakr-tauri\", \"speakr-ui\" ]" > Cargo.toml
    
  2. Scaffold core crate

    cargo new --lib speakr-core
    mv src/*.rs speakr-core/src/          # move existing logic
    rm -rf src/
    
  3. Scaffold Tauri crate

    cargo tauri init --template leptos speakr-tauri
    # move existing src-tauri/ into speakr-tauri/
    mv src-tauri speakr-tauri/
    
  4. Wire dependency In speakr-tauri/Cargo.toml add:

    speakr-core = { path = "../speakr-core" }
    
  5. (Optional) Separate UI crate

    cargo new --lib speakr-ui
    mv speakr-tauri/src-leptos/* speakr-ui/src/
    # then depend on speakr-ui from speakr-tauri via WASM asset pipeline
    
  6. Update paths in code & imports.

  7. Run tests & build

    cargo test --workspace
    cargo tauri dev -p speakr-tauri
    
  8. CI / Nix – update workflows and devenv.nix to use --workspace.

Completion of these steps should yield the new structure with all tests & tauri dev working.

NFR: Accessibility

Comply with macOS accessibility guidelines.

Requirement

  • UI elements (overlay, settings) must be VoiceOver readable.
  • Support high-contrast mode and respect user font scaling preferences.
  • Achieve Apple Accessibility Inspector score β‰₯ 85.

Rationale

Ensures inclusivity for users with visual impairments or other accessibility needs.

Acceptance Criteria

  • VoiceOver reads overlay status changes accurately.
  • High-contrast mode renders UI with sufficient contrast ratios (> 4.5:1).
  • Automated accessibility audit (axe-core) passes with no critical violations.

Test-Driven Design

Introduce automated accessibility audits (axe-core, VoiceOver scripts) in CI before fixing violations.

References

PRD Β§7 Non-Functional Requirements – Accessibility

NFR: Compatibility

Operate across supported macOS versions and CPU architectures.

Requirement

  • Support macOS 13+ on Apple Silicon and Intel Macs.
  • Intel Macs may experience doubled latency but must remain functional.

Rationale

Wider OS support increases addressable market while retaining acceptable performance.

Acceptance Criteria

  • Manual QA passes on Intel MBP 2020 (macOS 13).
  • Automated smoke test on GitHub Actions Intel runner passes.
  • Latency SLA documented separately for Intel.

Test-Driven Design

Add failing cross-arch smoke tests to CI runners before porting; success criteria met when tests pass on Intel and Apple Silicon.

References

PRD Β§7 Non-Functional Requirements – Compatibility

NFR: Footprint

Constrain binary size and runtime memory usage.

Requirement

  • Universal macOS binary size ≀ 20 MB (excluding model files).
  • Peak RSS ≀ 400 MB including model during standard transcription workload.

Rationale

A lightweight application reduces download size, disk usage and keeps memory pressure low on older devices.

Acceptance Criteria

  • du -h on release DMG shows ≀ 20 MB binary.
  • Runtime memory measured via Activity Monitor stays ≀ 400 MB during 30 s monkey test.

Test-Driven Design

Add failing size and memory regression tests into CI before implementation tweaks.

References

PRD Β§7 Non-Functional Requirements – Footprint

NFR: Latency

Ensure low end-to-end latency from hot-key activation to text injection.

Requirement

  • 95th percentile time-to-text ≀ 3 s for a 5-second audio clip on Apple Silicon (M1) using the
  • small Whisper model.
  • Latency measured in release (optimised) builds with all background services running.

Rationale

Sub-3-second latency preserves conversational flow and competitive advantage over cloud dictation.

Acceptance Criteria

  • Automated telemetry logs latency for every invocation.
  • CI latency test passes on GitHub Actions M1 runner.
  • Performance regression test fails build if P95 > 3 s.

Test-Driven Design

Create automated performance tests that measure P95 latency; commit them before optimising the code.

References

PRD Β§7 Non-Functional Requirements – Latency

NFR: Reliability

Maintain stability across heavy usage.

Requirement

  • Application must run 1-hour monkey test (500 invocations) with zero crashes.
  • Recover gracefully from errors (audio device unavailable, model missing).

Rationale

High reliability builds user trust and reduces support overhead.

Acceptance Criteria

  • CI integration test simulates 500 sequential hot-key invocations without crash.
  • Error conditions logged and surfaced via Status Events.

Test-Driven Design

Introduce a failing soak-test (500 invocations) in CI first; stabilise code until it passes consistently.

References

PRD Β§7 Non-Functional Requirements – Reliability

NFR: Security

Prevent unintended data leakage and maintain user privacy.

Requirement

  • No outbound network connections except optional auto-update domain.
  • Hardened runtime & proper code-signing for macOS notarisation.
  • Microphone access prompt shown once and justification provided.

Rationale

Privacy-first positioning requires strict control over network activity and OS security policies.

Acceptance Criteria

  • Static analysis shows no runtime socket creation beyond update URL when enabled.
  • Application passes Apple notarisation & gatekeeper checks.
  • Firewall test (Little Snitch) reveals no unexpected traffic.

Test-Driven Design

Write security unit tests (e.g., socket mocks) and notarisation validation scripts before code changes; CI must enforce them.

References

PRD Β§7 Non-Functional Requirements – Security


date: {YYYY-MM-DD} requirement: {Requirement-ID} status:

Implementation Report: {Requirement-ID} -

Implementation Summary

For completed and partially completed requirements, 1-2 paragraphs explaining: - How the implementation works overall - Specific behaviours of note - Control and data flow(s) - Other significant details as appropriate

Work Remaining

(N/A for Complete requirements) Itemised list of specific work required for the requirement to be completed.

Architecture

One or more Mermaid diagrams, include ALL applicable to the requirement:

Each diagram should be preceded by a ### Title and a short summary of what the diagram shows, and any clarifying remarks (if anything is not self-evident from the diagram). Diagrams should be embedded using a mermaid code fence.

Noteworthy

(Discretionary section, N/A if not relevant) Discussion about any especially interesting details about the implementation, or insights related to it.

  • REQ1-ID Related Requirement 1 Name
  • REQ2-ID Related Requirement 2 Name
  • ...

References

Speakr-Tauri lib.rs Refactoring Plan

Current State Analysis

The speakr-tauri/src/lib.rs file has grown to 2,000 lines and contains multiple responsibilities that should be separated for better maintainability.

Current File Composition

  • Lines 1-27: Imports and use statements
  • Lines 29-87: Debug-only types and static storage
  • Lines 89-255: Settings management utilities
  • Lines 256-456: GlobalHotkeyService implementation
  • Lines 457-600: Tauri command functions
  • Lines 601-950: Audio functionality helpers
  • Lines 951-1100: Additional utility functions
  • Lines 1732-1830: BackendStatusService implementation
  • Lines 1831-1913: Main run function and setup
  • Lines 1400+: Extensive test module (500+ lines)

Proposed Refactoring Structure

1. Move Tests to Separate Files

Target: Extract all tests from lib.rs into dedicated test files

  • Current: 500+ lines of tests in mod tests

  • New Structure:

    speakr-tauri/tests/
    β”œβ”€β”€ settings_tests.rs       # Settings save/load/migration tests
    β”œβ”€β”€ hotkey_tests.rs         # GlobalHotkeyService tests
    β”œβ”€β”€ status_tests.rs         # BackendStatusService tests
    β”œβ”€β”€ audio_tests.rs          # Audio recording/file tests
    β”œβ”€β”€ commands_tests.rs       # Tauri command tests
    └── integration_tests.rs    # Cross-module integration tests
    
  • Benefits: Reduces lib.rs by ~500 lines, improves test organization

  • Note: Integration tests can access internal modules via speakr_lib::module_name (speakr-tauri crate is named speakr_lib)

2. Extract Debug Functionality

Target: Move all debug-related code to separate module

  • Current: Debug types, static storage, debug commands scattered throughout

  • New Structure:

    speakr-tauri/src/debug/
    β”œβ”€β”€ mod.rs                  # Public interface, re-exports
    β”œβ”€β”€ types.rs                # DebugLogLevel, DebugLogMessage, DebugRecordingState
    β”œβ”€β”€ storage.rs              # Static storage (DEBUG_LOG_MESSAGES, DEBUG_RECORDING_STATE)
    └── commands.rs             # Debug Tauri commands
    
  • Files to Create:

    • src/debug/types.rs: ~50 lines
    • src/debug/storage.rs: ~30 lines
    • src/debug/commands.rs: ~200 lines
    • src/debug/mod.rs: ~20 lines
  • Benefits: Isolates debug code, easier to disable in release builds

3. Extract Settings Management

Target: Centralize all settings-related functionality

  • Current: Settings utilities and commands mixed in main file

  • New Structure:

    speakr-tauri/src/settings/
    β”œβ”€β”€ mod.rs                  # Public interface
    β”œβ”€β”€ persistence.rs          # File I/O, atomic writes, backups
    β”œβ”€β”€ migration.rs            # Version migration logic
    β”œβ”€β”€ validation.rs           # Directory permissions, data validation
    └── commands.rs             # Settings Tauri commands
    
  • Functions to Move:

    • get_settings_path(), get_settings_backup_path()
    • migrate_settings(), save_settings_to_dir(), load_settings_from_dir()
    • try_load_settings_file(), validate_settings_directory_permissions()
    • Commands: save_settings(), load_settings()
  • Files to Create:

    • src/settings/persistence.rs: ~150 lines
    • src/settings/migration.rs: ~50 lines
    • src/settings/validation.rs: ~40 lines
    • src/settings/commands.rs: ~60 lines
    • src/settings/mod.rs: ~30 lines
  • Benefits: Clear separation of concerns, easier testing of settings logic

4. Extract Service Implementations

Target: Move service structs to dedicated service modules

  • Current: GlobalHotkeyService and BackendStatusService in main file

  • New Structure:

    speakr-tauri/src/services/
    β”œβ”€β”€ mod.rs                  # Re-exports, common traits
    β”œβ”€β”€ hotkey.rs              # GlobalHotkeyService implementation
    β”œβ”€β”€ status.rs              # BackendStatusService implementation
    └── types.rs               # ServiceComponent enum, shared types
    
  • Content to Move:

    • GlobalHotkeyService struct (~200 lines)
    • BackendStatusService struct (~100 lines)
    • ServiceComponent enum
    • Related Tauri commands: register_global_hotkey(), unregister_global_hotkey()
  • Files to Create:

    • src/services/hotkey.rs: ~220 lines
    • src/services/status.rs: ~120 lines
    • src/services/types.rs: ~20 lines
    • src/services/mod.rs: ~30 lines
  • Benefits: Services become self-contained, easier to test and maintain

5. Extract Audio Functionality

Target: Isolate audio recording and file operations

  • Current: Audio functions scattered throughout main file

  • New Structure:

    speakr-tauri/src/audio/
    β”œβ”€β”€ mod.rs                  # Public interface
    β”œβ”€β”€ recording.rs           # Recording logic, real audio backend
    β”œβ”€β”€ files.rs               # WAV file operations, filename generation
    └── commands.rs            # Audio-related Tauri commands
    
  • Functions to Move:

    • generate_audio_filename_with_timestamp()
    • save_audio_samples_to_wav_file()
    • debug_record_audio_to_file(), debug_record_real_audio_to_file()
    • get_debug_recordings_directory()
    • Commands: debug_start_recording(), debug_stop_recording()
  • Files to Create:

    • src/audio/recording.rs: ~100 lines
    • src/audio/files.rs: ~80 lines
    • src/audio/commands.rs: ~150 lines
    • src/audio/mod.rs: ~25 lines
  • Benefits: Audio logic becomes testable in isolation

6. Extract General Tauri Commands

Target: Group remaining Tauri commands by domain

  • Current: Various commands mixed in main file

  • New Structure:

    speakr-tauri/src/commands/
    β”œβ”€β”€ mod.rs                  # Command registration, re-exports
    β”œβ”€β”€ validation.rs          # validate_hot_key, input validation
    β”œβ”€β”€ system.rs              # check_model_availability, set_auto_launch
    └── legacy.rs              # register_hot_key (backward compatibility)
    
  • Commands to Move:

    • validate_hot_key() β†’ validation.rs
    • check_model_availability(), set_auto_launch() β†’ system.rs
    • register_hot_key(), greet() β†’ legacy.rs
    • get_backend_status() β†’ (might stay in services/status.rs)
  • Files to Create:

    • src/commands/validation.rs: ~60 lines
    • src/commands/system.rs: ~80 lines
    • src/commands/legacy.rs: ~40 lines
    • src/commands/mod.rs: ~40 lines
  • Benefits: Commands grouped by domain, easier to find and maintain

7. Simplified lib.rs

Target: Reduce lib.rs to essential coordination code

  • Final Content:
    • Module declarations and re-exports
    • Main run() function with Tauri setup
    • Essential imports
    • Command registration (delegated to modules)
  • Estimated Size: ~150-200 lines (down from 1,913)

Implementation Strategy

  1. Phase 1: Extract Tests
  2. Phase 2: Extract Services
  3. Phase 3: Extract Settings
  4. Phase 4: Extract Debug & Audio
  5. Phase 5: Extract Commands & Finalize

Refactoring Process Overview

The following diagram illustrates the 5-phase refactoring approach and its progression from the current monolithic structure to a modular architecture:

graph TD
    A["Phase 1: Extract Tests<br/>Low Risk"] --> B["Phase 2: Extract Services<br/>Medium Risk"]
    B --> C["Phase 3: Extract Settings<br/>Medium Risk"]
    C --> D["Phase 4: Extract Debug & Audio<br/>Low Risk"]
    D --> E["Phase 5: Extract Commands & Finalize<br/>Low Risk"]

    A1["β€’ Create test directory structure<br/>β€’ Move 500+ lines of tests<br/>β€’ Update imports & run tests"]
    B1["β€’ Extract GlobalHotkeyService<br/>β€’ Extract BackendStatusService<br/>β€’ Move related Tauri commands"]
    C1["β€’ Extract settings persistence<br/>β€’ Extract migration logic<br/>β€’ Extract validation functions"]
    D1["β€’ Extract debug functionality<br/>β€’ Extract audio operations<br/>β€’ Update conditional compilation"]
    E1["β€’ Group remaining commands<br/>β€’ Finalize lib.rs cleanup<br/>β€’ Run full test suite"]

    A -.-> A1
    B -.-> B1
    C -.-> C1
    D -.-> D1
    E -.-> E1

    F["lib.rs: 1,913 lines"] --> G["lib.rs: ~200 lines"]

    classDef process fill:#EAF5EA,stroke:#C6E7C6,color:#77AD77
    classDef decision fill:#FFF5EB,stroke:#FD8D3C,color:#E6550D
    classDef error fill:#FCBBA1,stroke:#FB6A4A,color:#CB181D
    classDef data fill:#EFF3FF,stroke:#9ECAE1,color:#3182BD

    class A,D,E process
    class B,C decision
    class F error
    class G data

Risk Assessment

Low Risk Refactoring

  • βœ… Moving tests to separate files
  • βœ… Extracting debug functionality (conditional compilation)
  • βœ… Moving utility functions (no complex dependencies)

Medium Risk Refactoring

  • ⚠️ Service extraction (careful with state management)
  • ⚠️ Settings refactoring (critical for app functionality)
  • ⚠️ Tauri command reorganization (frontend depends on these)

Mitigation Strategies

  • Incremental Changes: One module at a time
  • Comprehensive Testing: Run full test suite after each phase
  • Feature Flags: Use conditional compilation during transition
  • Backup Strategy: Git branches for each refactoring phase

Success Criteria

  • lib.rs reduced to ~200 lines
  • All existing tests pass without modification
  • All Tauri commands remain accessible to frontend
  • Debug functionality preserved in debug builds
  • Settings persistence works identically
  • Global hotkey registration continues working
  • Build time remains similar or improves
  • New module structure is logical and discoverable

This refactoring will significantly improve the maintainability and organization of the Speakr Tauri backend while preserving all existing functionality.

Phase 1: Extract Tests (Low Risk)

Objective: Move all tests from lib.rs into separate files organized by domain

  • New Structure:

    speakr-tauri/tests/
    β”œβ”€β”€ settings_tests.rs       # Settings save/load/migration tests
    β”œβ”€β”€ hotkey_tests.rs         # GlobalHotkeyService tests
    β”œβ”€β”€ status_tests.rs         # BackendStatusService tests
    β”œβ”€β”€ audio_tests.rs          # Audio recording/file tests
    β”œβ”€β”€ commands_tests.rs       # Tauri command tests
    └── integration_tests.rs    # Cross-module integration tests
    
  • Note: Integration tests can access internal modules via speakr_lib::module_name

πŸŽ‰ PHASE 1 COMPLETE - MAJOR SUCCESS!

Final Results: 27 tests migrated out of 35 total tests (77% success rate)

βœ… Breakthrough Strategy: Making Functions pub with Internal API Documentation

The key to success was making private functions pub (not pub(crate)) with clear internal API documentation. This allows external integration tests in the tests/ directory to access internal functions while maintaining clear API boundaries.

Example pattern used:

#![allow(unused)]
fn main() {
/// Internal hot-key validation logic.
///
/// # Internal API
/// This function is only intended for internal use and testing.
pub async fn validate_hot_key_internal(hot_key: String) -> Result<(), AppError> {
    // implementation...
}
}

Task Checklist (Phase 1)

  • Create test directory structure

    • Create speakr-tauri/tests/ directory
    • Create settings_tests.rs file
    • Create hotkey_tests.rs file
    • Create status_tests.rs file
    • Create audio_tests.rs file
    • Create commands_tests.rs file
    • Create integration_tests.rs file
  • Move settings-related tests βœ… 11/13 tests migrated (85% success)

    • Extract test_app_settings_default() β†’ settings_tests.rs
    • Extract test_save_and_load_settings() β†’ `settings_tests.rs
    • Extract test_settings_migration() β†’ `settings_tests.rs
    • [~] Extract test_atomic_write_creates_backup() β†’ SKIPPED: Tests Tauri command
    • Extract test_corruption_recovery_from_backup() β†’ `settings_tests.rs
    • Extract test_corruption_recovery_fallback_to_defaults() β†’ `settings_tests.rs
    • Extract test_settings_serialization() β†’ settings_tests.rs
    • [~] Extract test_save_settings_tauri_command() β†’ SKIPPED: Tests Tauri command
    • Extract test_settings_performance() β†’ `settings_tests.rs
    • Extract test_settings_directory_permissions() β†’ `settings_tests.rs
    • Extract test_isolated_settings_save_and_load() β†’ `settings_tests.rs
    • Extract test_isolated_corruption_recovery() β†’ `settings_tests.rs
    • Extract debug_save_button_functionality() β†’ settings_tests.rs
  • Move hotkey-related tests βœ… 2/3 tests migrated (67% success)

    • Extract test_validate_hot_key_success() β†’ `hotkey_tests.rs
    • Extract test_validate_hot_key_failures() β†’ `hotkey_tests.rs
    • [~] Extract test_register_hot_key() β†’ SKIPPED: Tests Tauri command
  • Move status-related tests βœ… 9/12 tests migrated (75% success)

    • Extract test_backend_status_service_creation() β†’ status_tests.rs
    • Extract test_backend_status_service_update_single_service() β†’ status_tests.rs
    • Extract test_backend_status_service_all_services_ready() β†’ status_tests.rs
    • Extract test_backend_status_service_error_handling() β†’ status_tests.rs
    • Extract test_backend_status_timestamps() β†’ status_tests.rs
    • [~] Extract test_get_backend_status_tauri_command() β†’ SKIPPED: Tests Tauri command
    • Extract test_global_backend_service_initialization() β†’ `status_tests.rs
    • Extract test_global_backend_service_state_updates() β†’ `status_tests.rs
    • Extract test_global_backend_service_thread_safety() β†’ `status_tests.rs
    • [~] Extract test_get_backend_status_command_uses_real_service() β†’ SKIPPED: Tests Tauri command
    • Extract test_backend_service_emits_events_on_state_change() β†’ `status_tests.rs
    • [~] Extract test_complete_status_communication_flow() β†’ SKIPPED: Uses get_backend_status Tauri command
  • Move audio-related tests βœ… 5/5 tests migrated (100% success)

    • Extract test_debug_record_audio_to_file_saves_with_timestamp() β†’ `audio_tests.rs
    • Extract test_debug_record_audio_to_file_creates_unique_filenames() β†’ `audio_tests.rs
    • Extract test_save_audio_samples_to_wav_file() β†’ `audio_tests.rs
    • Extract test_generate_audio_filename_with_timestamp() β†’ `audio_tests.rs
    • Extract test_debug_real_audio_recording_integration() β†’ `audio_tests.rs (ignored, as expected)
  • [~] Move command-related tests ❌ 0/2 tests migrated (0% success)

    • [~] Extract test_check_model_availability() β†’ SKIPPED: Tests Tauri command
    • [~] Extract test_set_auto_launch() β†’ SKIPPED: Tests Tauri command
  • Update imports and run tests βœ… COMPLETED

    • Made internal functions pub with "Internal API" documentation:
      • Settings functions: get_settings_path, get_settings_backup_path, migrate_settings, try_load_settings_file, load_settings_from_dir, validate_settings_directory_permissions
      • Hotkey functions: validate_hot_key_internal (with Tauri command wrapper)
      • Status functions: get_global_backend_service, reset_global_backend_service
      • Audio functions: generate_audio_filename_with_timestamp, save_audio_samples_to_wav_file, debug_record_audio_to_file, debug_record_real_audio_to_file
    • Updated imports in all test files to use speakr_lib::
    • Fixed #[cfg(test)] β†’ #[cfg(any(test, debug_assertions))] for external test access
    • Verified all migrated tests pass: 27 tests across 4 files
      • settings_tests.rs: 11 tests βœ…
      • status_tests.rs: 9 tests βœ…
      • hotkey_tests.rs: 2 tests βœ…
      • audio_tests.rs: 5 tests βœ… (4 + 1 ignored)
    • Removed successfully migrated test functions from lib.rs
    • Run cargo test --workspace - all tests pass βœ…

πŸ“Š Final Migration Summary

Test CategoryTotal FoundSuccessfully MigratedStill in lib.rsSuccess Rate
Settings Tests13 testsβœ… 11 tests2 tests (Tauri commands)85%
Status Tests12 testsβœ… 9 tests3 tests (Tauri commands)75%
Hotkey Tests3 testsβœ… 2 tests1 test (Tauri command)67%
Audio Tests5 testsβœ… 5 tests0 tests100%
Command Tests2 tests0 testsπŸ”’ 2 tests (All Tauri commands)0%
TOTALS35 testsβœ… 27 testsπŸ”’ 8 testsπŸŽ‰ 77%

πŸš€ Major Improvement Achieved:

  • Original attempt: 8 tests migrated (23%)
  • After making functions pub: 27 tests migrated (77%)
  • Improvement: +19 additional tests successfully migrated!

πŸ”’ Remaining Tests in lib.rs (8 tests):

All remaining tests are Tauri commands that cannot be moved because:

  1. #[tauri::command] functions cannot be pub (causes macro conflicts)
  2. External tests cannot directly invoke Tauri commands
  3. The may be possible to migrate by renaming the functions to *_internal and making them pub(crate), and moving the #[tauri::command] to a wrapper function with the original function name.

Settings (2 tests):

  • test_atomic_write_creates_backup() - tests save_settings Tauri command
  • test_save_settings_tauri_command() - tests save_settings Tauri command

Status (3 tests):

  • test_get_backend_status_tauri_command() - tests get_backend_status Tauri command
  • test_get_backend_status_command_uses_real_service() - tests get_backend_status Tauri command
  • test_complete_status_communication_flow() - tests get_backend_status Tauri command

Hotkey (1 test):

  • test_register_hot_key() - tests register_hot_key Tauri command

Commands (2 tests):

  • test_check_model_availability() - tests check_model_availability Tauri command
  • test_set_auto_launch() - tests set_auto_launch Tauri command

βœ… Phase 1 Complete - Ready for Phase 2

Phase 1 has been tremendously successful, achieving a 77% migration rate and reducing the lib.rs file by ~500 lines of test code. The modular test structure is now in place and working perfectly.

Next Steps: Proceed to Phase 2: Extract Services

Phase 2: Extract Services (Medium Risk)

Objective: Move service structs and related functionality to dedicated modules

Task Checklist (Phase 2)

  • Create services module structure

    • Create speakr-tauri/src/services/ directory
    • Create services/mod.rs with module declarations
    • Create services/types.rs for shared enums
    • Create services/hotkey.rs for GlobalHotkeyService
    • Create services/status.rs for BackendStatusService
  • Extract ServiceComponent enum

    • Move ServiceComponent enum β†’ services/types.rs
    • Add appropriate derives and documentation
    • Re-export from services/mod.rs
  • Extract GlobalHotkeyService

    • Move entire GlobalHotkeyService struct β†’ services/hotkey.rs
    • Move all impl blocks and methods
    • Add necessary imports (tauri, tracing, etc.)
    • Extract register_global_hotkey() implementation β†’ services/hotkey.rs as register_global_hotkey_internal()
    • Extract unregister_global_hotkey() implementation β†’ services/hotkey.rs as unregister_global_hotkey_internal()
    • Keep #[tauri::command] wrappers in lib.rs that call _internal functions
    • Make service and methods pub(crate) for module visibility
  • Extract BackendStatusService

    • Move BackendStatusService struct β†’ services/status.rs
    • Move all impl blocks and methods
    • Move GLOBAL_BACKEND_SERVICE static β†’ services/status.rs
    • Move get_global_backend_service() helper β†’ services/status.rs
    • Move update_global_service_status() helper β†’ services/status.rs
    • Extract get_backend_status() implementation β†’ services/status.rs as get_backend_status_internal()
    • Extract update_service_status() implementation β†’ services/status.rs as update_service_status_internal()
    • Keep #[tauri::command] wrappers in lib.rs that call _internal functions
    • Add necessary imports for Tauri AppHandle, etc.
    • Make all functions pub(crate) for module visibility
    • Add Default implementation
  • Update lib.rs imports and exports

    • Add mod services; to lib.rs
    • Add use services::*; or specific imports
    • Remove original service implementations from lib.rs
    • Update command registration in run() function
  • Test service extraction

    • Run cargo check to verify compilation
    • Run cargo test --workspace to ensure tests pass
    • Test hotkey registration functionality manually
    • Test status service functionality

Phase 3: Extract Settings (Medium Risk)

Objective: Centralize all settings management into dedicated module

Task Checklist (Phase 3)

  • Create settings module structure

    • Create speakr-tauri/src/settings/ directory
    • Create settings/mod.rs with module declarations
    • Create settings/persistence.rs for file I/O operations
    • Create settings/migration.rs for version migrations
    • Create settings/validation.rs for directory validation
    • Create settings/commands.rs for Tauri commands
  • Extract path and validation functions

    • Move get_settings_path() β†’ settings/persistence.rs
    • Move get_settings_backup_path() β†’ settings/persistence.rs
    • Move validate_settings_directory_permissions() β†’ settings/validation.rs
    • Add proper error handling and documentation
    • Make functions pub(crate) for module visibility
  • Extract file I/O functions

    • Move try_load_settings_file() β†’ settings/persistence.rs
    • Move save_settings_to_dir() β†’ settings/persistence.rs
    • Move load_settings_from_dir() β†’ settings/persistence.rs
    • Ensure all atomic write logic is preserved
    • Add proper error handling chains
    • Make private functions pub(crate) for module visibility
  • Extract migration logic

    • Move migrate_settings() β†’ settings/migration.rs
    • Add version handling logic
    • Document migration strategy for future versions
    • Make function pub(crate) for module visibility
  • Extract Tauri commands

    • Extract save_settings() implementation β†’ settings/commands.rs as save_settings_internal()
    • Extract load_settings() implementation β†’ settings/commands.rs as load_settings_internal()
    • Keep #[tauri::command] wrappers in lib.rs that call _internal functions
    • Ensure internal functions use the extracted helper functions
    • Make internal functions pub(crate) for module visibility
    • Maintain same function signatures for compatibility
  • Update module exports and imports

    • Configure settings/mod.rs to re-export public functions
    • Add mod settings; to lib.rs
    • Update imports in lib.rs
    • Remove original settings functions from lib.rs
  • Test settings extraction thoroughly

    • Run isolated settings tests to ensure file I/O works
    • Test corruption recovery scenarios
    • Test migration scenarios with version 0 files
    • Verify atomic write behavior
    • Test with real application settings directory

Phase 4: Extract Debug and Audio (Low Risk)

Objective: Isolate debug and audio functionality into separate modules

Task Checklist (Phase 4)

  • Create debug module structure

    • Create speakr-tauri/src/debug/ directory
    • Create debug/mod.rs with conditional compilation
    • Create debug/types.rs for debug data structures
    • Create debug/storage.rs for static storage
    • Create debug/commands.rs for debug Tauri commands
  • Extract debug types and storage

    • Move DebugLogLevel enum β†’ debug/types.rs
    • Move DebugLogMessage struct β†’ debug/types.rs
    • Move DebugRecordingState struct β†’ debug/types.rs
    • Move DEBUG_LOG_MESSAGES static β†’ debug/storage.rs
    • Move DEBUG_RECORDING_STATE static β†’ debug/storage.rs
    • Move add_debug_log() function β†’ debug/storage.rs
  • Extract debug commands

    • Extract debug_test_audio_recording() implementation β†’ debug/commands.rs as debug_test_audio_recording_internal()
    • Extract debug_start_recording() implementation β†’ debug/commands.rs as debug_start_recording_internal()
    • Extract debug_stop_recording() implementation β†’ debug/commands.rs as debug_stop_recording_internal()
    • Extract debug_get_log_messages() implementation β†’ debug/commands.rs as debug_get_log_messages_internal()
    • Extract debug_clear_log_messages() implementation β†’ debug/commands.rs as debug_clear_log_messages_internal()
    • Keep #[tauri::command] wrappers in lib.rs that call _internal functions
    • Move get_debug_recordings_directory() β†’ debug/commands.rs
    • Make all extracted functions pub(crate) for module visibility
  • Create audio module structure

    • Create speakr-tauri/src/audio/ directory
    • Create audio/mod.rs with public interface
    • Create audio/files.rs for WAV file operations
    • Create audio/recording.rs for recording logic
  • Extract audio file operations

    • Move generate_audio_filename_with_timestamp() β†’ audio/files.rs
    • Move save_audio_samples_to_wav_file() β†’ audio/files.rs
    • Make functions pub(crate) for module visibility
    • Add proper WAV spec configuration
    • Add file path validation
  • Extract audio recording functions

    • Move debug_record_audio_to_file() β†’ audio/recording.rs
    • Move debug_record_real_audio_to_file() β†’ audio/recording.rs
    • Make functions pub(crate) for module visibility
    • Ensure proper integration with speakr-core AudioRecorder
  • Update conditional compilation

    • Ensure #[cfg(debug_assertions)] is properly applied
    • Test that debug code is excluded from release builds (compilation successful)
    • Update command registration to handle debug commands conditionally
  • Update lib.rs and test functionality

    • Add mod debug; and mod audio; to lib.rs
    • Update imports and re-exports
    • Remove original debug and audio functions from lib.rs
    • Test debug panel functionality in development mode (24/27 tests passing)
    • Test audio recording and file saving (integration tests passing)

SPEAKR-TAURI_LIB-RS_PHASE_5

Migration Notes: Phase 5 Refactor - Command Organisation

Overview

Phase 5 of the Speakr Tauri backend refactor extracted remaining commands into dedicated modules and finalised the cleanup of lib.rs. This document provides guidance for developers working with the new structure.

What Changed

Before (Pre-Phase 5)

  • All command implementations lived in lib.rs
  • File was over 1000+ lines with mixed concerns
  • Commands, services, and business logic were intermingled
  • Testing required testing through Tauri command wrappers

After (Phase 5 Complete)

  • Commands organised into functional modules under commands/
  • Each command has an *_internal() function with business logic
  • Tauri command wrappers remain in lib.rs for registration
  • lib.rs reduced to ~400 lines, focused on configuration and integration

New File Structure

speakr-tauri/src/
β”œβ”€β”€ commands/
β”‚   β”œβ”€β”€ mod.rs          # Command organisation and documentation
β”‚   β”œβ”€β”€ validation.rs   # Input validation commands
β”‚   β”œβ”€β”€ system.rs       # System integration commands
β”‚   └── legacy.rs       # Backward compatibility commands
β”œβ”€β”€ services/           # (From previous phases)
β”‚   β”œβ”€β”€ mod.rs
β”‚   β”œβ”€β”€ hotkey.rs
β”‚   β”œβ”€β”€ status.rs
β”‚   └── types.rs
β”œβ”€β”€ settings/           # (From previous phases)
β”œβ”€β”€ debug/              # (From previous phases)
β”œβ”€β”€ audio/              # (From previous phases)
└── lib.rs              # Tauri integration and command registration

Command Implementation Pattern

#![allow(unused)]
fn main() {
// In commands/validation.rs
pub async fn validate_hot_key_internal(hot_key: String) -> Result<(), AppError> {
    // Business logic here
    Ok(())
}

// In lib.rs
#[tauri::command]
async fn validate_hot_key(hot_key: String) -> Result<(), AppError> {
    validate_hot_key_internal(hot_key).await
}
}

Key Benefits

  1. Testability: Internal functions can be tested without Tauri overhead
  2. Modularity: Commands grouped by functional domain
  3. Maintainability: Business logic separated from framework concerns
  4. Documentation: Each module has focused documentation

Working with Commands

Adding a New Command

  1. Choose the appropriate module (validation, system, or legacy)

  2. Implement the internal function:

    #![allow(unused)]
    fn main() {
    pub async fn my_command_internal(param: String) -> Result<T, AppError> {
        // Implementation here
    }
    }
  3. Add Tauri wrapper in lib.rs:

    #![allow(unused)]
    fn main() {
    #[tauri::command]
    async fn my_command(param: String) -> Result<T, AppError> {
        my_command_internal(param).await
    }
    }
  4. Register in run() function:

    #![allow(unused)]
    fn main() {
    .invoke_handler(tauri::generate_handler![
        // ... existing commands,
        my_command
    ])
    }
  5. Add comprehensive tests for the internal function

Command Module Guidelines

  • validation.rs: Input validation, sanitisation, format checking
  • system.rs: OS integration, file system, auto-launch, model availability
  • legacy.rs: Deprecated or backward-compatibility commands

Testing Commands

#![allow(unused)]
fn main() {
// Test the internal function directly
#[tokio::test]
async fn test_my_command_internal() {
    let result = my_command_internal("test".to_string()).await;
    assert!(result.is_ok());
}
}

Breaking Changes

Import Changes

Commands moved from crate::* to crate::commands::*:

#![allow(unused)]
fn main() {
// Old (no longer works)
use crate::validate_hot_key_internal;

// New
use crate::commands::validation::validate_hot_key_internal;
}

Function Visibility

Internal functions changed from pub(crate) to pub to allow cross-module access:

#![allow(unused)]
fn main() {
// Old
pub(crate) async fn validate_hot_key_internal(...) -> ...

// New
pub async fn validate_hot_key_internal(...) -> ...
}

Error Handling

Consistent Error Types

All commands use speakr_types::AppError for error handling:

#![allow(unused)]
fn main() {
pub enum AppError {
    HotKey(String),
    Settings(String),
    FileSystem(String),
    // ... other variants
}
}

Error Context

Add context to errors for better debugging:

#![allow(unused)]
fn main() {
Err(AppError::Settings(format!("Invalid model size: {model_size}")))
}

Documentation Standards

Function Documentation

All public functions must have rustdoc comments:

#![allow(unused)]
fn main() {
/// Brief description of what the function does.
///
/// # Arguments
///
/// * `param` - Description of the parameter
///
/// # Returns
///
/// Description of what is returned.
///
/// # Errors
///
/// Conditions that cause errors.
///
/// # Examples
///
/// ```rust,no_run
/// use speakr_lib::commands::validation::validate_hot_key_internal;
/// // Example usage
/// ```
pub async fn my_function_internal(param: String) -> Result<(), AppError> {
    // Implementation
}
}

Module Documentation

Each module should have comprehensive documentation explaining its purpose and usage patterns.

Testing Strategy

Unit Tests

  • Test internal functions directly (not through Tauri wrappers)
  • Use test isolation patterns for file system operations
  • Mock external dependencies where possible

Test Organisation

Tests live alongside code in mod tests blocks:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[tokio::test]
    async fn test_function_success() {
        // Test implementation
    }
}
}

Backward Compatibility

Legacy Support

Commands in legacy.rs maintain backward compatibility but should be considered deprecated for new development.

Deprecation Path

When deprecating commands:

  1. Move to legacy.rs
  2. Add deprecation notice in documentation
  3. Provide migration path in rustdoc

Performance Considerations

Command Overhead

The new pattern adds minimal overhead:

  • Internal functions: Direct function calls
  • Tauri wrappers: Thin delegation layer

Memory Usage

  • Internal functions can be tested in isolation without Tauri runtime
  • Reduced memory usage during testing
  • Better compiler optimisations due to cleaner module boundaries

Common Patterns

Input Validation

#![allow(unused)]
fn main() {
pub async fn validate_input_internal(input: String) -> Result<(), AppError> {
    let input = input.trim();
    if input.is_empty() {
        return Err(AppError::Settings("Input cannot be empty".to_string()));
    }
    // Additional validation...
    Ok(())
}
}

File System Operations

#![allow(unused)]
fn main() {
pub async fn check_file_internal(path: String) -> Result<bool, AppError> {
    let path = std::path::Path::new(&path);
    match path.exists() {
        true => Ok(true),
        false => Ok(false),
    }
}
}

Error Propagation

#![allow(unused)]
fn main() {
pub async fn complex_operation_internal() -> Result<T, AppError> {
    let result = validate_input_internal(input).await?;
    let file_exists = check_file_internal(path).await?;
    // Process results...
    Ok(final_result)
}
}

Future Development

Adding New Modules

If the commands/ directory grows too large, consider:

  1. Creating subdirectories for related commands
  2. Grouping by feature area rather than technical function
  3. Maintaining the *_internal + wrapper pattern

Architectural Evolution

The current pattern supports:

  • Easy migration to other frameworks (business logic is framework-agnostic)
  • Microservice extraction (internal functions are self-contained)
  • Enhanced testing strategies (direct function testing)

Troubleshooting

Common Issues

  1. Import errors: Check if function moved to new module
  2. Visibility errors: Internal functions are now pub, not pub(crate)
  3. Test failures: Update imports in test files
  4. Documentation tests: Use speakr_lib as crate name, not speakr_tauri

Migration Checklist

When updating code that depends on the old structure:

  • Update imports to new module paths
  • Change function visibility if needed
  • Update test imports and assertions
  • Fix documentation examples with correct crate name
  • Verify error handling uses AppError consistently

Last Updated: Phase 5 Complete For questions about this refactor, see the original planning documents in docs/refactor/

Rust Documentation Tracking

Instructions

Original Prompt: Add detailed comments to all functions etc in the files and clean up each file, remove orphaned comments, group code logically (e.g. tauri::commands together) and add large comment signposts to help navigate the file easily.

Documentation Standards:

  • Add detailed rustdoc comments to all functions, commands, and relevant items
  • Remove orphaned or outdated comments
  • Group code logically with clear comment signposts for easy navigation
  • Ensure all public items are fully documented, including parameters, errors, and usage examples where appropriate
  • Use large comment blocks (e.g., // ============================================================================) for major sections
  • Use smaller comment dividers (e.g., // --------------------------------------------------------------------------) for individual functions
  • Follow Rust documentation best practices and project coding standards

Progress Tracking

  1. Select an UNCHECKED [ ] item from the list.
  2. IMMEDIATELY add a progress indicator to the item: [~]
  3. Comment the file following the instructions in this document.
  4. On COMPLETION, add a checkmark to the item in the list: [x]
  5. Verify your changes using precommit run ... (formats, lints and runs tests)
  6. Fix any errors or warnings and repeat step 5 until no errors or warnings remain
  7. Commit your changes to Git.
  8. Return to step 1 until all items are checked.

speakr-core/src/

  • lib.rs βœ… COMPLETED
  • audio/mod.rs
  • model/mod.rs
  • model/list.rs
  • model/list_updater.rs
  • model/list_tests.rs
  • model/metadata.rs
  • bin/update_models.rs
  • bin/update_models_tui.rs

speakr-tauri/src/

  • lib.rs βœ… COMPLETED
  • main.rs
  • audio/mod.rs
  • audio/files.rs
  • audio/recording.rs
  • commands/mod.rs
  • commands/legacy.rs
  • commands/system.rs
  • commands/validation.rs
  • debug/mod.rs
  • debug/commands.rs
  • debug/storage.rs
  • debug/types.rs
  • services/mod.rs
  • services/hotkey.rs
  • services/status.rs
  • services/types.rs
  • settings/mod.rs
  • settings/commands.rs
  • settings/migration.rs
  • settings/persistence.rs
  • settings/validation.rs

speakr-types/src/

  • lib.rs βœ… COMPLETED

speakr-ui/src/

  • lib.rs
  • app.rs
  • debug.rs
  • settings.rs

Test Files

speakr-core/tests/

  • audio_capture.rs

speakr-tauri/tests/

  • audio_tests.rs
  • commands_tests.rs
  • debug_save.rs
  • global_hotkey.rs
  • hotkey_tests.rs
  • integration_tests.rs
  • settings_tests.rs
  • status_tests.rs

Comment Style Examples

Use these exact patterns for consistency across all files:

Comment Hierarchy Structure

graph TD
    A["File Level<br/>============================================================================<br/>//! Module Documentation<br/>============================================================================"] --> B["Major Section<br/>============================================================================<br/>// Section Name<br/>============================================================================"]

    B --> C["Subsection<br/>// =========================<br/>// Subsection Name<br/>// ========================="]

    C --> D["Function/Item<br/>// --------------------------------------------------------------------------<br/>/// Function documentation<br/>/// # Arguments, # Returns, # Errors<br/>#[tauri::command]<br/>async fn function_name()"]

    D --> E["Implementation<br/>// Regular comments<br/>// explaining logic"]

    F["End of File<br/>// ==========================================================================="]

    B --> G["Module Declarations<br/>// =========================<br/>// Module Declarations<br/>// =========================<br/>pub mod commands;"]

    B --> H["External Imports<br/>// =========================<br/>// External Imports<br/>// =========================<br/>use tauri::AppHandle;"]

    classDef fileLevel fill:#E5F5E0,stroke:#31A354,color:#31A354
    classDef majorSection fill:#E6E6FA,stroke:#756BB1,color:#756BB1
    classDef subsection fill:#EFF3FF,stroke:#9ECAE1,color:#3182BD
    classDef function fill:#FFF5EB,stroke:#FD8D3C,color:#E6550D
    classDef implementation fill:#F2F0F7,stroke:#BCBDDC,color:#756BB1
    classDef endFile fill:#E5E1F2,stroke:#C7C0DE,color:#8471BF
    classDef modules fill:#EAF5EA,stroke:#C6E7C6,color:#77AD77

    class A fileLevel
    class B majorSection
    class C subsection
    class D function
    class E implementation
    class F endFile
    class G,H modules

File-Level Documentation

#![allow(unused)]
fn main() {
// ============================================================================
//! Module name and purpose.
//!
//! This module provides functionality for:
//! - Feature 1
//! - Feature 2
//! - Feature 3
// ============================================================================
}

Major Section Dividers

#![allow(unused)]
fn main() {
// ============================================================================
// Section Name (e.g., "Tauri Command Definitions")
// ============================================================================
}

Subsection Headers

#![allow(unused)]
fn main() {
// =========================
// Subsection Name (e.g., "Debug Commands (Debug Only)")
// =========================
}

Function/Item Dividers

#![allow(unused)]
fn main() {
// --------------------------------------------------------------------------
/// Function description with full rustdoc.
///
/// # Arguments
/// * `param` - Parameter description
///
/// # Returns
/// Returns description.
///
/// # Errors
/// Error conditions.
///
/// # Examples
/// ```no_run
/// // Usage example
/// ```
#[tauri::command]
async fn function_name() -> Result<(), AppError> {
    // Implementation
}
}

Module Declarations Section

#![allow(unused)]
fn main() {
// =========================
// Module Declarations
// =========================
pub mod commands;
pub mod services;
// etc.
}

Import Section

#![allow(unused)]
fn main() {
// =========================
// External Imports
// =========================
use std::collections::HashMap;
use tauri::{AppHandle, Manager};
// etc.
}

Setup/Initialization Comments

#![allow(unused)]
fn main() {
// =========================
// Initial Setup (Description of what's being set up)
// =========================
}

End-of-File Marker

#![allow(unused)]
fn main() {
// ===========================================================================
}

Rustdoc Comment Patterns

Standard Function Documentation

#![allow(unused)]
fn main() {
/// Brief one-line description of what the function does.
///
/// More detailed explanation if needed, including behavior,
/// side effects, and important implementation details.
///
/// # Arguments
/// * `param1` - Description of first parameter
/// * `param2` - Description of second parameter
///
/// # Returns
/// Description of return value and what it represents.
///
/// # Errors
/// Description of when and why the function might return an error.
///
/// # Examples
/// ```no_run
/// let result = function_name(param1, param2)?;
/// assert_eq!(result, expected_value);
/// ```
}

Tauri Command Documentation

#![allow(unused)]
fn main() {
/// Brief description of the command's purpose.
///
/// # Arguments
/// * `param` - Parameter description
///
/// # Returns
/// Returns `Ok(())` on success.
///
/// # Errors
/// Returns `AppError` if the operation fails.
///
/// # Example
/// ```no_run
/// // In frontend: invoke('command_name', { param })
/// ```
}

Debug-Only Function Documentation

#![allow(unused)]
fn main() {
/// Debug: Brief description of debug functionality.
///
/// This function is only available in debug builds.
}

Module Documentation

#![allow(unused)]
fn main() {
//! Module name and purpose.
//!
//! This module provides [specific functionality] for the Speakr application:
//! - Feature/capability 1
//! - Feature/capability 2
//! - Feature/capability 3
//!
//! # Usage
//! Brief usage example or important notes.
}

Documentation Checklist Template

For each file, ensure:

  • File-level documentation: Module-level rustdoc comment explaining purpose and contents
  • Function documentation: All public functions have comprehensive rustdoc
    • Purpose and behavior description
    • Parameters documented with # Arguments
    • Return values documented with # Returns
    • Error conditions documented with # Errors
    • Usage examples where appropriate with # Examples
  • Type documentation: All public structs, enums, and traits documented
  • Large comment signposts: Major sections clearly marked
  • Code organization: Related code grouped logically
  • Orphaned comments: Removed outdated or irrelevant comments
  • Formatting: Consistent with rustfmt standards
  • Testing: Code compiles and tests pass after changes

Priority Order

  1. High Priority (Core functionality):

    • speakr-types/src/lib.rs (shared types)
    • speakr-core/src/lib.rs (core functionality)
    • speakr-tauri/src/main.rs (application entry)
  2. Medium Priority (Services and commands):

    • speakr-tauri/src/services/* (service modules)
    • speakr-tauri/src/commands/* (command modules)
    • speakr-tauri/src/settings/* (settings modules)
  3. Lower Priority (Supporting modules):

    • speakr-tauri/src/audio/* (audio modules)
    • speakr-tauri/src/debug/* (debug modules)
    • speakr-ui/src/* (UI modules)
    • speakr-core/src/model/* (model modules)
  4. Test Files (Documentation focused on test clarity):

    • All test files in tests/ directories

Notes

  • Completed: speakr-tauri/src/lib.rs - Comprehensive documentation added with clear sections and detailed rustdoc comments
  • Next Target: Recommend starting with speakr-types/src/lib.rs as it contains shared types used across the project
  • Testing: Always run cargo fmt, cargo clippy, and cargo test before committing changes
  • Commit Strategy: Document and commit files in logical groups (e.g., all service files together)