🎙️ Speakr Documentation

note

Speakr is a privacy-first, hot-key–driven dictation utility that turns your speech into typed text entirely on-device. No cloud, no latency, no compromises.

✨ What is Speakr?

Speakr transforms the way you capture thoughts into text. With a single keystroke, record speech, transcribe it locally using Whisper models, and have the text instantly typed into any application. Perfect for developers, writers, and anyone who thinks faster than they type.

🔐 Privacy First

100% offline processing – your voice never leaves your device
No cloud dependencies – works in air-gapped environments
Minimal permissions – only microphone and accessibility access

⚡ Built for Speed

≤ 3 second end-to-end latency for 5-second recordings
Global hotkeys work across all applications
Lightweight universal macOS binary < 20 MB

🧭 Navigate the Documentation

tip

Use the search box (⌘/Ctrl + K) to quickly jump to any topic, or browse by your role below.

📋 Product & Planning

Document	Description	Audience
Product Requirements	Vision, goals, and feature specifications	Product owners, stakeholders
Implementation Plan	Development roadmap and milestones	Project managers, engineers

🏗️ Architecture & Engineering

Document	Description	Audience
Technical Architecture	System design and component overview	Engineers, architects
System Description	Detailed system behaviour and flows	Developers, maintainers
Development Overview	Getting started with development	New contributors

📝 Functional Specifications

Document	Description	Status
FR-1: Global Hotkey	Hot-key registration and handling	✅ Implemented
FR-2: Audio Capture	Microphone access and recording	✅ Implemented
FR-3: Transcription	Local Whisper integration	🔄 In Progress
FR-4: Text Injection	Cross-app text insertion	🔄 In Progress
FR-5: Injection Fallback	Clipboard fallback mechanism	📋 Planned
FR-6: Settings UI	Configuration interface	✅ Implemented

warning

See Specs Overview for the complete functional requirements including non-functional requirements (NFRs) for security, performance, and accessibility.

🔧 Development & Debugging

Document	Description	Audience
Debug Panel	Development and troubleshooting tools	Developers, QA
Pre-commit Hooks	Code quality and testing setup	Contributors
Tauri Plugins	Plugin architecture and integrations	Backend developers

🚀 Quick Start

note

New to the project? Start with the Development Overview for setup instructions.

For Product People

Read the Product Requirements to understand the vision
Check the Implementation Plan for current progress
Review Functional Specs for detailed features

For Engineers

Study the Technical Architecture for system design
Follow Development Setup to get coding
Reference System Description for implementation details

For Contributors

Set up pre-commit hooks for code quality
Browse functional requirements to find tasks
Use the Debug Panel for development workflow

📊 Project Status

tip

Current Focus: Core transcription engine and text injection reliability

Component	Status	Notes
Global Hotkeys	✅ Complete	Cross-app hotkey registration working
Audio Capture	✅ Complete	High-quality microphone input
Settings UI	✅ Complete	Leptos-based configuration interface
Transcription	🔄 Active	Whisper integration in progress
Text Injection	🔄 Active	Cross-app compatibility improvements
Model Management	📋 Planned	GGUF model download and validation

🤝 Contributing

note

This documentation is a living document. Found something unclear or outdated?

📂 Browse specs in the specs directory for implementation tasks
🐛 Report issues via GitHub Issues
📝 Improve docs by opening a pull request
💡 Suggest features in GitHub Discussions

Built with 🦀 Rust, ⚡ Tauri 2, and 🎨 Leptos

Privacy-first dictation for the modern developer

title: Product Requirements Document – Speakr version: 2025-07-20 status: Draft authors: David Jessup

Product Requirements Document – Speakr

1. Purpose / Vision

Speakr is a privacy-first dictation hot-key utility for macOS (Windows/Linux later). In a single keystroke, users can record speech, transcribe entirely on-device, and have the text typed directly into any active input field. Speakr aims to be the fastest way for developers, writers, and power-users to turn fleeting thoughts into code or prose without breaking flow, and without sending audio to the cloud.

2. Problem Statement

Switching to dedicated dictation apps breaks focus and incurs network latency.
Many corporate or offline environments forbid cloud speech services for privacy reasons.
OS-level dictation is unreliable for code, lacks custom hot-keys, and has high latency on older hardware.

Opportunity: A lightweight, keyboard-driven tool that works anywhere text can be typed, requires no network, and respects user privacy.

3. Goals & Non-Goals

3.1 Goals

<= 3 s end-to-end latency for 5-second recordings on Apple Silicon (M-series).
100% offline – no external network calls.
Global hot-key works in background apps.
Support customisable models & hot-keys via UI.
Ship notarised universal macOS binary < 20 MB (excluding model).
Provide a clean upgrade path to Windows & Linux.

3.2 Non-Goals

Real-time streaming (v1 may paste only after stop).
Mobile platforms.
Full grammar / punctuation correction.
Server-side sync or accounts.

4. Personas

Persona	Needs / Pain-points
Dev Dana	Insert comments/code quickly without losing keyboard context.
Writer Will	Draft snippets into any text editor without toggling apps.
Privacy Peter	Dictate confidential material offline, no data leaves device.
Accessibility Ava	Replace or augment typing due to RSI, keep workflow keyboard-first.

5. User Stories

MoSCoW method: Must, Should, Could, Won’t (for now)

Priority	Description
Must	“As a user, I press `<Opt>` + `~` and my spoken words (≤30 s) are typed into the active field within ~3 s.”
Must	“As a user, the app asks for mic + Accessibility permissions on first run and explains why.”
Must	“As a user, I can change the hot-key in settings and be warned of conflicts.”
Should	“As a user, I can pick a smaller/faster model if my machine is slow.”
Should	“As a user, a subtle overlay shows ‘Recording… / Transcribing…’ states.”
Could	“As an advanced user, I can turn on auto-punctuation.”
Could	“As an advanced user, I can add bespoke words to the dictionary.”
Won’t (v1)	Live transcript shown word-by-word while speaking.

6. Functional Requirements

FR	Description
FR-1	Global hot-key registers at app start and triggers record/transcribe/inject flow.
FR-2	Audio capture uses 16 kHz mono via `cpal`, max configurable duration (default 10 s).
FR-3	Transcription runs through Whisper (GGUF) via `whisper-rs`; language default EN.
FR-4	Transcript is injected via synthetic keystrokes (`enigo`) into current focus.
FR-5	If injection fails (secure field), fallback to clipboard-paste with user warning.
FR-6	UI (tray or window) exposes: hot-key picker, model selector, auto-launch toggle.
FR-7	App emits status events for UI overlay and logs (Recording, Transcribing, Error).
FR-8	Settings persist locally (JSON in AppData, no cloud).
FR-9	App auto-updates via GitHub Releases (optional in v1).

7. Non-Functional Requirements

Category	Requirement	Metric / Acceptance
Latency	End-to-end ≤ 3 s (M1, 5 s audio, small model)	95th percentile measured in telemetry log (local).
Footprint	Binary ≤ 20 MB; RAM ≤ 400 MB including model.	`du -sh` and Activity Monitor/smoke tests.
Reliability	No crashes in 1-hour monkey test (500 invocations).	CI integration test + manual QA.
Security	No outbound network sockets except auto-update domain (opt-out).	Static analysis + firewall test.
Compatibility	macOS 13+. Intel macs may see doubled latency but functional.	QA on Intel MBP (2020) & M1.
Accessibility	Follows macOS VoiceOver / high-contrast guidelines.	Apple Accessibility Inspector score ≥ 85.

8. Metrics / KPIs

Metric	Target
Time-to-text (P95)	≤ 3 s.
Activation success rate	≥ 99% (hot-key triggers & types).
Crash-free sessions	> 99.5%.
Daily active users (DAU)	post-launch target: 1 k.
% of transcripts requiring manual fix	< 15% (optional feedback prompt).

9. Milestones

Milestone	Scope
M0 – Prototype spike	Hot-key → record → transcribe → paste (CLI)
M1 – MVP macOS app	Tauri shell, settings window, notarised DMG
M2 – Public beta	Auto-update, error logs, model manager
M3 – Windows/Linux alpha	Replace injection backend, install bundles
M4 – v1.0 GA	Streaming (optional), website + docs

10. Open Questions

Should we bundle a small GGUF model or trigger a first-run download wizard?
How to handle non-Latin languages (auto-detect vs user-select)?
Do we sandbox the app on macOS or rely on hardened runtime?
Which licence (MIT vs GPL) given we embed Whisper weights?
Accept user telemetry opt-in for latency metrics?

11. Appendix – Stakeholders & Review

Product Lead – @PM
Engineering Lead – @TechLead
Design – @UX
Security – @Sec
QA – @QA

Reviews: Architecture (Tech), Security (Sec), Accessibility (UX).

System Description

Speakr – a Local Dictation Utility (Rust + Tauri + Leptos)

A tiny, privacy-first macOS desktop app that listens for a global hot-key, records a short audio clip, transcribes it locally with Whisper, then types the text into whatever currently has focus.

Everything runs on-device; no network calls (besides the initial model download).

1. System Overview

┌──────────────────────────────┐
│        Speakr (UI)           │  ← Leptos + Tauri WebView (optional window / tray)
└───────────────┬──────────────┘
                │ <invoke/emit>
        Global Shortcut   ▲    Settings (model path, hot-key, …)
                ▼         │
┌─────────────────────────┴──────────────────────────┐
│            speakr-core  (Rust lib)                 │
│                                                    │
│ 1. Audio capture  – **cpal**                       │
│ 2. Transcription  – **whisper-rs** (GGUF models)   │
│ 3. Text inject    – **enigo** (synthetic keys)     │
└────────────────────────────────────────────────────┘

Global shortcut, audio, and keystroke injection all live in the backend so Speakr continues to work when the UI window is hidden.

2. Key Crates & Decisions

Concern	Crate / Tool	Why it was chosen
Hot-key	`tauri-plugin-global-shortcut = "2"`	Official plugin, cross-platform, Tauri ≥ 2.0
Audio capture	`cpal = "0.15"`	Mature, async-friendly, works on macOS/Win/Linux
Speech-to-Text	`whisper-rs = "0.8"`	Safe Rust bindings to whisper.cpp; supports GGUF models
Keystroke injection	`enigo = "0.1"`	Simple cross-platform input simulation
UI	`leptos = "0.6"` + `trunk`	All-Rust reactive UI compiled to WASM
Async runtime	`tokio = "1"` (multi-thread)	Needed for non-blocking recording & transcription

Tip Quantised small.en.gguf (~30 MB) loads in ≈ 2 s on Apple Silicon and is usually accurate enough for notes & code comments.

3. Workspace Layout

/speakr
├─ speakr-core        # library crate (audio → text → inject)
├─ speakr-tauri       # Tauri shell (`src-tauri` here)
├─ speakr-ui          # Leptos front-end (optional window)
└─ models/ggml-small.en.gguf  # user-downloaded Whisper model

Use a Cargo workspace so all three crates share versions and CI.

4. Bootstrapping

4.1 Prerequisites

Rust 1.88.0 + (stable)
Node 18 + & pnpm/yarn/npm (for Tauri/Trunk helpers)
Xcode Command-Line Tools (macOS)
Download a GGUF Whisper model → models/ggml-small.en.gguf

4.2 Create the workspace

cargo new --lib speakr-core
cargo tauri init --template leptos speakr-tauri   # generates src-tauri + Leptos wiring
cd speakr-tauri
pnpm tauri add global-shortcut                     # JavaScript guest bindings

(Add a sibling speakr-ui crate only if you want the UI separate from the template.)

5. Core Library (speakr-core)

Cargo.toml

[package]
name    = "speakr-core"
version = "0.1.0"
edition = "2021"

[dependencies]
cpal        = "0.15"
whisper-rs  = { version = "0.8", features = ["whisper-runtime-cpu"] }
enigo       = "0.1"
tokio       = { version = "1", features = ["rt-multi-thread", "macros"] }
anyhow      = "1"

#![allow(unused)]
fn main() {
use anyhow::*;
use cpal::traits::*;
use enigo::*;
use std::sync::mpsc;
use whisper_rs::{FullParams, SamplingStrategy, WhisperContext};

pub struct Speakr {
    whisper: WhisperContext,
    enigo:   Enigo,
}

impl Speakr {
    pub fn new(model_path: &str) -> Result<Self> {
        Ok(Self {
            whisper: WhisperContext::new(model_path)?,
            enigo:   Enigo::new(),
        })
    }

    pub async fn capture_and_type(&mut self, seconds: u32) -> Result<()> {
        // 1️⃣  Capture PCM samples --------------------------------------------------
        let (tx, rx) = mpsc::sync_channel(seconds as usize * 16_000);
        let host = cpal::default_host();
        let dev  = host.default_input_device().context("no input device")?;
        let cfg  = dev.default_input_config()?.into();
        let stream = dev.build_input_stream(
            &cfg,
            move |data: &[f32], _| { for &s in data { let _ = tx.send(s); } },
            move |e| eprintln!("cpal error: {e}"),
            None,
        )?;
        stream.play()?;
        let mut samples = Vec::with_capacity(seconds as usize * 16_000);
        for _ in 0..seconds * 16_000 {
            samples.push(rx.recv()?);
        }
        drop(stream);

        // 2️⃣  Transcribe -----------------------------------------------------------
        let mut params = FullParams::new(SamplingStrategy::Greedy { best_of: 1 });
        params.set_language(Some("en"));
        let text = self.whisper.full(params, &samples)?;

        // 3️⃣  Inject ---------------------------------------------------------------
        self.enigo.text(&text);
        Ok(())
    }
}
}

6. Tauri Backend (speakr-tauri / `src-tauri`)

`src-tauri/Cargo.toml` extras

[dependencies]
speakr-core = { path = "../speakr-core" }
# Tauri ≥ 2.0 API-complete build
tauri       = { version = "2", features = ["api-all"] }
# Global hot-key plugin
tauri-plugin-global-shortcut = "2"
tokio       = "1"
anyhow      = "1"

#![cfg_attr(not(debug_assertions), windows_subsystem = "windows")]
use speakr_core::Speakr;
use std::sync::Mutex;
use tauri::{Manager, State};

struct AppState(Mutex<Option<Speakr>>);

#[tauri::command]
async fn transcribe(state: State<'_, AppState>) -> Result<(), String> {
    let mut guard = state.0.lock().unwrap();
    guard
        .as_mut()
        .ok_or("model not ready")?
        .capture_and_type(10)        // 10 s max
        .await
        .map_err(|e| e.to_string())
}

fn main() {
    tauri::Builder::default()
        .plugin(tauri_plugin_global_shortcut::init())
        .manage(AppState(Mutex::new(None)))
        .setup(|app| {
            // Pre-load Whisper model once at startup
            let model = Speakr::new("../models/ggml-small.en.gguf")?;
            *app.state::<AppState>().0.lock().unwrap() = Some(model);

            // Register ⌘⌥Space
            #[cfg(desktop)]
            app.global_shortcut().register("CMD+OPTION+SPACE", move || {
                let handle = app.app_handle();
                tauri::async_runtime::spawn(async move {
                    let _ = handle.invoke("transcribe", &()).await;
                });
            })?;
            Ok(())
        })
        .invoke_handler(tauri::generate_handler![transcribe])
        .run(tauri::generate_context!())
        .expect("error while running Speakr");
}

Capability JSON Add global-shortcut:allow-register to src-tauri/capabilities/default.json (see Tauri docs for full schema).

7. Leptos Front-End (optional)

The Tauri template already wires Trunk + Leptos. A minimal status UI:

#![allow(unused)]
fn main() {
use leptos::*;
use tauri_use::{use_invoke, UseTauri};   // helper hooks

#[component]
pub fn App() -> impl IntoView {
    let UseTauri { trigger: transcribe, .. } = use_invoke::<()>(&"transcribe");
    let (status, set_status) = create_signal("Idle");

    // Listen for status updates from backend
    leptos::window_event_listener("speakr-status", move |evt: String| set_status(evt));

    view! {
        <div class="p-4">
            <h1 class="text-xl font-bold">Speakr</h1>
            <p>{move || format!("Status: {status()}")}</p>
            <button class="mt-4 bg-blue-600 text-white px-3 py-1 rounded"
                    on:click=move |_| transcribe()>
                "Record & Type"
            </button>
        </div>
    }
}
}

tauri.conf.json should already contain:

{
  "build": {
    "beforeDevCommand": "trunk serve",
    "beforeBuildCommand": "trunk build --release",
    "devUrl": "http://localhost:1420",
    "frontendDist": "../dist"
  },
  "app": { "withGlobalTauri": true }
}

8. macOS Permissions

Microphone – Tauri adds NSMicrophoneUsageDescription automatically when you enable audio.
Accessibility – Ask the user to enable Speakr under System Settings → Privacy & Security → Accessibility so Enigo keystrokes reach other apps.
Codesign & Notarise – For distribution run:

cargo tauri build --target universal-apple-darwin   # produces .app bundle
# then codesign & notarise with `xcrun notarytool`

9. Dev & Release Workflow

# hot-reload UI + backend
trunk serve &              # terminal 1 – WASM
cargo tauri dev            # terminal 2 – desktop shell

# production
trunk build --release      # build UI assets
cargo tauri build          # build .app or MSI/DEB

10. Performance Levers

Lever	Effect	Hint
Model size	Latency vs accuracy	`tiny.en` ≈ 30 MB loads fastest
`params.set_*`	Threads / strategy	Set `set_num_threads(num_cpus::get())`
Audio chunk length	Turn-around time	Push-to-talk (≤ 10 s) keeps UI snappy
VAD (optional)	Trim silence & hallucination	Add `webrtc-vad` if needed

11. Roadmap Ideas

Config window for model selection & hot-key change
Streaming, real-time transcription (partial results)
Windows/Linux support (replace Enigo backend where needed)
Auto-punctuation & language detection

🎉 You now have a single, coherent guide—merge of all three GPT drafts—ready to get Speakr typing for you on macOS in a weekend

title: Technical Architecture – Speakr version: 2025-07-20 status: Draft

Speakr – Technical Architecture

1. Purpose

Speakr is a privacy-first hot-key dictation utility for macOS (with Windows/Linux on the roadmap). When the user presses a global shortcut, it records a short audio segment, runs an on-device Whisper model, and synthesises keystrokes to type the transcript into the currently-focused application – all in under a few seconds.

2. High-Level Architecture

flowchart TB
    subgraph Tauri Shell
        direction TB
        GlobalShortcut["Global Shortcut<br/><i>tauri-plugin-global-shortcut</i>"]
        IPC["IPC Bridge<br/><i>tauri invoke / emit</i>"]
        Tray["System Tray / UI<br/><i>Leptos + WASM</i>"]
    end

    subgraph Core Library
        direction TB
        Recorder["Audio Recorder<br/><i>cpal</i>"]
        STT["Speech-to-Text<br/><i>whisper-rs</i>"]
        Injector["Text Injector<br/><i>enigo</i>"]
    end

    GlobalShortcut -- "hot-key pressed" --> Recorder
    Recorder -- "PCM samples" --> STT
    STT -- "transcript" --> Injector
    Injector -- "keystrokes" --> FocusApp(["Focused Application"])

    %% UI flow
    Recorder -- "status events" --- IPC
    STT ---- IPC
    Injector --- IPC
    IPC ==> Tray

Key points:

All heavy-weight logic lives in pure Rust (speakr-core). The UI may be hidden without affecting functionality.
No network access – Whisper runs entirely on-device.
Plugin isolation – Optional features (auto-start, clipboard, etc.) are added via Tauri plugins with explicit capability JSON.

3. Crate & Directory Layout

Layer	Crate / Path	Main Responsibilities
Core	`speakr-core/`	Record audio (cpal) ➜ transcribe (whisper-rs) ➜ inject text (enigo)
Backend	`speakr-tauri/`	Registers global hot-key, exposes `#[tauri::command]` wrappers, persists settings
Frontend	`speakr-ui/` (optional)	Leptos WASM UI for tray, preferences, status overlay
Assets	`models/`	GGUF Whisper models downloaded post-install

All crates live in a single Cargo workspace to guarantee compatible dependency versions.

3.1 Speakr-Tauri Internal Structure

The speakr-tauri backend is organised into focused modules for maintainability and testability:

speakr-tauri/src/
├── commands/           # Tauri command implementations
│   ├── mod.rs         # Command organisation and documentation
│   ├── validation.rs  # Input validation (hotkey format, etc.)
│   ├── system.rs      # System integration (model availability, auto-launch)
│   └── legacy.rs      # Backward compatibility commands
├── services/          # Background services and state management
│   ├── mod.rs         # Service coordination
│   ├── hotkey.rs      # Global hotkey registration and management
│   ├── status.rs      # Backend service status tracking
│   └── types.rs       # Shared service types and enums
├── settings/          # Configuration persistence and validation
│   ├── mod.rs         # Settings management
│   ├── persistence.rs # File I/O for settings
│   ├── migration.rs   # Settings schema migration
│   └── validation.rs  # Settings validation logic
├── debug/             # Debug-only functionality
│   ├── mod.rs         # Debug command coordination
│   ├── commands.rs    # Debug-specific Tauri commands
│   ├── storage.rs     # Debug log storage
│   └── types.rs       # Debug-specific types
├── audio/             # Audio handling utilities
│   ├── mod.rs         # Audio module coordination
│   ├── files.rs       # Audio file operations
│   └── recording.rs   # Audio recording helpers
└── lib.rs             # Tauri app setup, command registration

Key architectural principles:

Separation of concerns: Business logic in *_internal() functions, Tauri integration in lib.rs
Testability: Internal functions can be tested without Tauri runtime overhead
Modularity: Commands grouped by functional domain rather than technical implementation
Documentation: Each module has comprehensive rustdoc explaining its purpose and usage

4. Runtime Flow (Happy Path)

Step	Thread/Task	Action	Typical Latency
1	Main (OS)	User presses ⌘⌥Space	–
2	Tauri shortcut handler	Spawns async task `transcribe()`	< 1 ms
3	Tokio worker	`cpal::Stream` captures 16-kHz mono PCM into ring-buffer	0–10 s (configurable)
4	Same task	PCM fed into `whisper_rs::full()`	~1 s per 10 s audio on M-series
5	Same task	Transcript returned → `enigo.text()` synthesises keystrokes	≤ 300 ms
6	UI task	Frontend receives status events via `emit()` and updates overlay	realtime

Failure cases (no mic, model missing, permission denied) surface via error events and native notifications.

5. Concurrency & Safety

Tokio multi-thread runtime drives asynchronous recording and Whisper inference.
The AppState(Mutex<Option<Speakr>>) guards the singleton Whisper context; loading occurs once at app start.
Hot-key handler offloads work to the runtime to keep the UI thread non-blocking.
Audio buffer uses a bounded sync_channel to avoid unbounded RAM growth.

6. Security & Permissions

Platform	Permission	Why	Request Mechanism
macOS	Microphone access	Record audio	`NSMicrophoneUsageDescription` (Info.plist)
macOS	Accessibility	Send synthetic keystrokes	User enables app in System Settings ▸ Accessibility
All	Global shortcut	Register hot-key	`global-shortcut:allow-register` capability

The app runs offline; no data leaves the device.

7. Build & Packaging

Dev: trunk serve & (frontend) + cargo tauri dev (backend)
Release: trunk build --release ➜ cargo tauri build
macOS notarisation: xcrun notarytool submit --wait after codesign.
Universal binary size ≈ 15 MB (+ model).

8. Extensibility Points

Voice Activity Detection: plug-in webrtc-vad before Whisper to auto-stop on silence.
Streaming transcripts: call whisper_rs::full_partial() and enqueue keystrokes incrementally.
Multi-language: set params.set_language(None) for auto-detect.
Cross-platform: replace enigo backend with send_input (Win) or xdo (X11) while keeping public API.

9. Risks & Mitigations

Risk	Mitigation
Keystroke injection blocked in secure fields	Fallback to clipboard-paste mode with warning
Whisper latency on older CPUs	Offer `tiny.en.gguf` and shorter max record time
Shortcut clashes	UI lets user redefine hot-key and validates uniqueness
Model file missing/corrupt	Verify checksum on load and show error dialogue

10. Future Roadmap

Settings sync via tauri-plugin-store (JSON in AppData).
Auto-start on login (tauri-plugin-autostart).
GPU inference when Whisper Metal backend stabilises.
Installer bundles (DMG/MSI/DEB) with model downloader.

This document replaces the previous placeholder docs/ARCHITECTURE.md and should be kept up-to-date with all architectural changes.

Development Overview

Pre-commit Setup and Optimization

"Quality is not an act, it's a habit." — Aristotle

This document describes Speakr's pre-commit hook configuration, optimization strategies, and future improvement opportunities.

📋 Table of Contents

Overview

Pre-commit hooks ensure code quality by running automated checks before each commit. This prevents broken code from entering the repository and maintains consistent coding standards across the team.

Why Pre-commit?

Early Detection: Catch issues before they reach CI/CD
Consistent Quality: Enforce formatting and linting standards
Fast Feedback: Immediate results during development
Team Alignment: Same standards for all contributors

Current Setup

Our optimized pre-commit configuration targets affected packages only, reducing execution time by ~70% for typical changes.

Configuration Files

.pre-commit-config.yaml: Main configuration
scripts/selective-tests.sh: Advanced selective testing script

Hook Categories

1. Package-Specific Rust Hooks

speakr-core (triggered by ^speakr-core/.*\.rs$):

cargo-fmt-core: Code formatting check
cargo-clippy-core: Linting with all warnings as errors
cargo-test-core: Unit and integration tests

speakr-tauri (triggered by ^speakr-tauri/.*\.rs$):

cargo-fmt-tauri: Code formatting check
cargo-clippy-tauri: Linting with all warnings as errors
cargo-test-tauri: Unit and integration tests

speakr-ui (triggered by ^speakr-ui/.*\.rs$):

cargo-fmt-ui: Code formatting check
cargo-clippy-ui: Linting with all warnings as errors
cargo-test-ui: Unit and integration tests

2. Workspace-Level Hooks

Workspace Changes (triggered by ^(Cargo\.(toml|lock)|\.cargo/.*)$):

cargo-fmt-workspace: Format all packages
cargo-clippy-workspace: Lint entire workspace

3. Smart Integration Hooks

Dependency Awareness:

cargo-test-integration: When speakr-core changes, also test speakr-tauri (dependency relationship)

4. General Quality Hooks

Trailing whitespace: Remove unnecessary whitespace
YAML/JSON/TOML validation: Syntax checking
Large file detection: Prevent accidental commits of large files
Merge conflict detection: Catch unresolved conflicts
Markdown linting: Documentation quality

Optimizations

🎯 Selective Package Testing

Problem: Previous setup ran all checks on all packages for any Rust file change.

Solution: File pattern matching to target only affected packages.

# Before: Always runs on ANY .rs file
files: \.rs$
entry: cargo test --all

# After: Only runs on speakr-core files
files: ^speakr-core/.*\.rs$
entry: cargo test --package speakr-core

🧠 Dependency-Aware Testing

Problem: Changes to speakr-core could break speakr-tauri without running its tests.

Solution: Smart integration testing when dependencies change.

# Integration test: core changes affect tauri
- id: cargo-test-integration
  name: Cargo Test (integration - core affects tauri)
  entry: cargo test --package speakr-tauri
  files: ^speakr-core/.*\.rs$  # Triggered by core changes

⚡ Performance Optimizations

Parallel Execution: Each package's hooks can run in parallel
Targeted Scoping: Only affected code gets checked
Smart Caching: Cargo's incremental compilation benefits
Early Exit: Hooks fail fast on first error

Usage Guide

Installation

# Install pre-commit (if not already installed)
pip install pre-commit

# Install hooks in repository
pre-commit install

# Optional: Install for push events too
pre-commit install -t pre-push

Daily Workflow

Automatic (Recommended):

git add .
git commit -m "feat: add new feature"
# Hooks run automatically, commit proceeds if all pass

Manual Testing:

# Run all hooks on all files
pre-commit run --all-files

# Run specific hook
pre-commit run cargo-fmt-core

# Run on specific files
pre-commit run --files speakr-core/src/lib.rs

Advanced Selective Testing

For maximum control, use our custom script:

# Test only packages affected by changes since last commit
./scripts/selective-tests.sh

# Compare against specific commit/branch
./scripts/selective-tests.sh main
./scripts/selective-tests.sh abc123def

# Get help
./scripts/selective-tests.sh --help

Bypassing Hooks (Emergency Only)

# Skip all hooks (use sparingly!)
git commit -m "hotfix: urgent fix" --no-verify

# Skip specific hook
SKIP=cargo-test-core git commit -m "fix: skip tests temporarily"

Performance Metrics

Before Optimization

Total packages checked: 3/3 (100%)
Average execution time: ~45 seconds
Parallel efficiency: Low (redundant work)

After Optimization

Typical single-package change: 1/3 packages (33%)
Average execution time: ~15 seconds (70% improvement)
Parallel efficiency: High (targeted work)
Smart dependencies: Core changes → Core + Tauri tests

Real-world Example

Scenario: Modify speakr-ui/src/app.rs

Before: ✗ Tests all 3 packages (~45s) After: ✓ Tests only speakr-ui package (~12s) Speedup: 3.75x faster 🚀

Future Improvements

🚀 Performance Enhancements

1. Incremental Testing with Coverage

Goal: Only run tests affected by specific code changes, not entire packages.

Implementation:

# Future: Ultra-granular testing
cargo test --package speakr-core -- --test-affected-by src/audio.rs

Tools to explore:

cargo-difftests: Selective re-testing framework
LLVM coverage analysis for affected test discovery
determinator: Facebook's affected package detection

2. Caching and Memoization

Goal: Skip checks if code hasn't changed since last successful run.

Implementation:

# Cache test results based on content hash
- id: cargo-test-cached
  entry: cache-wrapper cargo test --package speakr-core
  cache_key: "hash:speakr-core/**/*.rs"

Benefits:

Near-instant results for unchanged code
Perfect for repeated CI runs on same commit

3. Parallel Package Testing

Goal: Run different package tests truly in parallel.

Current: Sequential package testing Future: Matrix-style parallel execution

# Run in parallel using job control
cargo test --package speakr-core &
cargo test --package speakr-tauri &
cargo test --package speakr-ui &
wait  # Wait for all to complete

🔍 Enhanced Feedback

1. Rich Diff Display

Goal: Show exactly what code caused failures.

Implementation:

# Future: Rich failure reporting
cargo clippy --message-format json | jq -r '.spans[] | .file_name + ":" + .line_start'

Features:

Syntax-highlighted diffs
Click-to-fix suggestions
Context-aware error messages

2. Performance Profiling

Goal: Track and optimize hook execution time.

Metrics to collect:

Per-hook execution time
Cache hit/miss ratios
Package-level timing breakdown
Historical performance trends

3. Smart Notifications

Goal: Contextual feedback based on change type.

Examples:

# API changes detected
⚠️  Public API modified in speakr-core - consider semver impact

# Performance impact detected
🐌 Tests are 20% slower - check for performance regressions

# Security sensitive changes
🔒 Cryptographic code modified - extra security review recommended

🧪 Test Quality Improvements

1. Mutation Testing Integration

Goal: Ensure tests actually catch bugs.

Implementation:

# Run mutation tests on changed code
cargo mutants --package speakr-core --in-diff HEAD~1..HEAD

2. Dependency Impact Analysis

Goal: Understand full impact of changes across the dependency graph.

Visualization:

speakr-core change impact:
├── speakr-core (direct) ✓
├── speakr-tauri (depends on core) ✓
└── speakr-ui (independent) ⏭️ skipped

3. Flaky Test Detection

Goal: Identify and fix unreliable tests.

Implementation:

Run tests multiple times in CI
Track test success/failure rates
Auto-quarantine flaky tests
Generate flakiness reports

🔧 Developer Experience

1. IDE Integration

Goal: Show pre-commit status in development environment.

Features:

Real-time hook status in VS Code/Cursor
Inline error highlighting
One-click fix suggestions

2. Hook Customization

Goal: Allow per-developer customization.

Implementation:

# .pre-commit-config.local.yaml (git-ignored)
hooks:
  - id: cargo-clippy-core
    args: ["--", "-A", "clippy::pedantic"]  # Less strict for local dev

3. Quick Fix Tools

Goal: Automated fixing of common issues.

Examples:

# Auto-fix formatting
pre-commit run cargo-fmt-core --hook-stage manual

# Auto-fix common clippy warnings
cargo clippy --fix --allow-dirty

# Auto-update dependencies
cargo update && pre-commit run cargo-test-all

Troubleshooting

Common Issues

Hook Fails with "Package not found"

Cause: Package name mismatch in hook configuration. Solution: Verify package names match Cargo.toml files:

cargo metadata --format-version 1 | jq '.packages[].name'

Tests Pass Locally but Fail in CI

Cause: Different dependency versions or environment. Solution: Use Cargo.lock and consistent Rust versions:

# CI configuration
rust-toolchain: "1.88.0"  # Pin exact version

Hooks Run on Wrong Files

Cause: Incorrect regex patterns in files: configuration. Solution: Test patterns with realistic file paths:

# Test regex pattern
echo "speakr-core/src/lib.rs" | grep -E "^speakr-core/.*\.rs$"

Performance Issues

Slow Hook Execution

Check package scoping: Ensure hooks target specific packages
Review test suite: Look for slow integration tests
Enable caching: Use --cache-dir for cargo operations

Memory Issues

Limit parallel jobs: Set CARGO_BUILD_JOBS=2
Increase memory limits: Configure system swap
Use release mode for tests: cargo test --release (if appropriate)

Getting Help

Check configuration: Validate with pre-commit validate-config
Debug mode: Run with pre-commit run --verbose
Clean cache: Use pre-commit clean to reset
Manual testing: Test individual hooks in isolation

References

Implementation Plan – Speakr

A step-by-step roadmap to deliver the Speakr application using the test-driven, multi-crate approach defined in the specification set under docs/specs/.

1. Repository Scaffold

Reference: INIT-01 Project Scaffold

Execute the migration steps to create the Cargo workspace (speakr-core, speakr-tauri, optional speakr-ui).
Commit and open a draft PR; CI should fail until tests are added.
Add baseline CI workflows (lint, build, placeholder tests) that currently fail.

2. Core Library (`speakr-core`)

Order	Spec	Task
2.1	FR-2	Implement audio capture (`cpal`). Begin with failing unit test asserting 16-kHz mono stream & duration cap.
2.2	FR-3	Implement transcription (`whisper-rs`). Add latency test harness.
2.3	FR-4	Implement text injection (`enigo`). Integration tests across editors via mock window focus.
2.4	FR-5	Implement clipboard fallback; write secure-field simulation tests.
2.5	FR-7	Emit status events; test channel delivery & ordering.

Merge each sub-task when its tests pass and CI is green.

3. Tauri Backend (`speakr-tauri`)

Order	Spec	Task
3.1	FR-1	Register global hot-key via `tauri-plugin-global-shortcut`; write E2E test with headless Tauri window.
3.2	—	Wire hot-key → async call into `speakr-core` pipeline; ensure status events are forwarded via `emit`.
3.3	FR-8	Add settings persistence (JSON). Unit tests for load/save & corruption recovery.

4. Front-End (Leptos)

Order	Spec	Task
4.1	FR-6	Build Settings & Status overlay UI; write component tests with Leptos testing utilities.
4.2	NFR-accessibility	Add automated axe-core & VoiceOver tests.

5. Cross-Cutting Non-Functional Work

Spec	Focus
NFR-latency	Optimise model loading & thread usage; ensure performance tests pass.
NFR-footprint	Strip symbols, enable `lto`, audit memory.
NFR-reliability	Add monkey-test CI job (500 invocations).
NFR-security	Socket-mock tests, Hardened Runtime flags, notarisation script.
NFR-compatibility	Add Intel macOS runner to CI.

6. Auto-Update

Reference: FR-9 Auto-update

Integrate update check using tauri-plugin-updater (or custom).
Write integration tests mocking GitHub Releases API & download validation.

7. Documentation & Release

Update docs/book/ with usage & contribution guide.
Ensure mdbook build passes in CI.
Produce signed DMG via CI; attach to GitHub Release.

Progress Checklist

1. Preparation complete
  - Status: Preparation tasks completed.
1. Repository scaffold merged (INIT-01)
  - Status: Repository scaffold implemented (4 crates; speakr-core (backend processing), speakr-tauri (Tauri backend), speakr-ui (Leptos front-end) and speakr-types (shared types)).
2.1 Audio capture (FR-2) implemented & tested - Status: Audio capture tested via debug UI, verified WAV file is written to disk and contains the expected audio.
2.2 Transcription (FR-3) implemented & tested - Status: Not started
2.3 Text injection (FR-4) implemented & tested - Status: Not started
2.4 Injection Not started - Status: Preparation tasks completed.
2.5 Status events (FR-7) implemented & tested - Status: Not started
3.1 Global hot-key (FR-1) registered & tested - Status: Not started
3.2 Backend pipeline wired - Status: Not started
3.3 Settings persistence (FR-8) implemented & tested - Status:
[~] 4.1 Settings UI (FR-6) implemented & tested - Status: Preparation tasks completed.
4.2 Accessibility audits (NFR-accessibility) passing - Status: Preparation tasks completed.
Non-functional targets (Latency, Footprint, Reliability, Security, Compatibility) met
Auto-update (FR-9) implemented & tested
Docs & Release pipeline finished

Tick each box as the corresponding PR merges with passing CI.

Recent Progress (2025-07-20)

Scaffolded speakr-core library crate and added it to the workspace manifest.
Added stub implementation (record_to_vec) and constants in speakr-core::audio.
Committed failing unit test audio_capture.rs verifying 16 kHz mono stream and placeholders.
Workspace compiles; test fails as expected, ready for implementation phase.

Debug Panel Documentation

The Speakr debug panel is a development-only interface that provides debugging tools and testing capabilities. It's designed to help developers test features, monitor system behaviour, and troubleshoot issues during development.

Overview

The debug panel is only available in debug builds (cargo tauri dev) and is completely excluded from release builds for security and performance reasons. It provides a comprehensive debugging interface with real-time logging, feature testing, and system monitoring capabilities.

Accessing the Debug Panel

Availability

Debug builds only: The panel is conditionally compiled using #[cfg(debug_assertions)]
Toggle button: A red "🛠️ Debug" button appears in the header (debug builds only)
Visual indicator: The panel shows a "DEBUG BUILD" badge to remind developers of the build type

Start the application in debug mode: cargo tauri dev
Look for the "🛠️ Debug" button in the top-right corner of the header
Click to toggle between the settings panel and debug panel
The button text changes to "🛠️ Hide Debug" when the panel is active

Features

1. Audio Testing

Legacy Test Button

Purpose: Basic audio system testing
Behaviour: Click to run a simple audio recording test
Feedback: Shows progress in the debug output area

Push-to-Talk Recording

Purpose: Test real-time audio recording with push-to-talk interaction
Behaviour:
- Hold the button to start recording
- Release to stop recording
- Supports both mouse and touch events
Visual feedback:
- Button changes colour and shows pulsing animation when recording
- Text updates to show current state
- Recording state is displayed in system info

2. Logging Console

Real-time Log Display

Scrolling console: Shows recent log messages from the backend
Auto-scroll: Automatically scrolls to show newest messages (toggleable)
Timestamp: Each message includes precise timestamp
Source tracking: Shows which component generated each log message

Log Level Filtering

Dropdown filter: Filter by specific log levels (TRACE, DEBUG, INFO, WARN, ERROR)
Visual indicators: Each level has distinct emoji icons and colours
Level-specific styling: Error and warning messages have highlighted backgrounds

Console Controls

Refresh: Manually refresh log messages from backend
Clear: Clear all log messages from display and backend storage
Auto-scroll toggle: Enable/disable automatic scrolling to newest messages

3. System Information

Real-time display of:

Build type: Always shows "Debug" in debug panel
Environment: Shows "Development"
Recording state: Live status of audio recording (Active/Inactive)

Technical Implementation

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Frontend      │    │     Backend      │    │  Log Storage    │
│   (Leptos)      │    │    (Tauri)       │    │   (Memory)      │
├─────────────────┤    ├──────────────────┤    ├─────────────────┤
│ DebugPanel      │◄──►│ debug_* commands │◄──►│ DEBUG_LOG_      │
│ LoggingConsole  │    │ add_debug_log()  │    │ MESSAGES        │
│ Push-to-talk UI │    │ Log collection   │    │ (VecDeque)      │
└─────────────────┘    └──────────────────┘    └─────────────────┘

Conditional Compilation

The debug panel uses Rust's conditional compilation to ensure it's only included in debug builds:

#![allow(unused)]
fn main() {
#[cfg(debug_assertions)]
mod debug;

#[cfg(debug_assertions)]
use crate::debug::DebugPanel;
}

Backend Commands

All debug commands are prefixed with debug_ and conditionally compiled:

debug_test_audio_recording() - Legacy audio test
debug_start_recording() - Start push-to-talk recording
debug_stop_recording() - Stop push-to-talk recording
debug_get_log_messages() - Retrieve stored log messages
debug_clear_log_messages() - Clear log message storage

Log Message Storage

Debug logs are stored in memory using a thread-safe circular buffer:

#![allow(unused)]
fn main() {
static DEBUG_LOG_MESSAGES: LazyLock<Arc<Mutex<VecDeque<DebugLogMessage>>>> =
    LazyLock::new(|| Arc::new(Mutex::new(VecDeque::with_capacity(1000))));
}

Key characteristics:

Capacity: Limited to 1000 messages (prevents memory bloat)
Thread-safe: Uses Arc<Mutex<>> for concurrent access
Circular buffer: Automatically removes old messages when capacity is reached
Structured data: Each message includes timestamp, level, target, and content

Event Handling

Push-to-talk functionality uses multiple event handlers for robust interaction:

#![allow(unused)]
fn main() {
on:mousedown=move |_| start_recording()
on:mouseup=move |_| stop_recording()
on:mouseleave=move |_| stop_recording()  // Handles mouse leaving button area
on:touchstart=move |_| start_recording()
on:touchend=move |_| stop_recording()
}

Development Patterns

Adding New Debug Features

Backend Command:

#![allow(unused)]
fn main() {
#[cfg(debug_assertions)]
#[tauri::command]
async fn debug_your_feature() -> Result<String, AppError> {
    add_debug_log(DebugLogLevel::Info, "your-component", "Feature tested");
    // Your implementation
    Ok("Success message".to_string())
}
}

Frontend Integration:

#![allow(unused)]
fn main() {
impl DebugManager {
    pub async fn test_your_feature() -> Result<String, String> {
        tauri_invoke_no_args("debug_your_feature")
            .await
            .map_err(|e| format!("Failed to test feature: {e}"))
    }
}
}

UI Component:

#![allow(unused)]
fn main() {
<button
    class="debug-btn-primary"
    on:click=move |_| test_your_feature()
>
    "Test Your Feature"
</button>
}

Register Command:

#![allow(unused)]
fn main() {
// Add to debug build handler list
debug_your_feature,
}

Logging Best Practices

Use appropriate log levels:
- Trace: Detailed execution flow
- Debug: Development information
- Info: General information
- Warn: Potential issues
- Error: Actual errors

Include context:

#![allow(unused)]
fn main() {
add_debug_log(
    DebugLogLevel::Info,
    "component-name",
    &format!("Action completed with result: {}", result)
);
}

Target naming:
- Use consistent component names
- Follow pattern: speakr-{component} (e.g., speakr-core, speakr-tauri)

Testing Debug Features

Debug features should be tested like any other code:

#![allow(unused)]
fn main() {
#[test]
fn test_debug_manager_methods_exist() {
    // Compile-time test for method signatures
    let _fn: fn() -> _ = DebugManager::test_your_feature;
    assert!(true, "Debug method exists and compiles");
}
}

Security Considerations

Build-time Exclusion

Debug panel code is completely removed from release builds
No performance impact on production builds
No security surface area in release builds

Development-only Data

Log messages are stored only in memory
No persistent storage of debug information
Automatic cleanup when application closes

Safe Defaults

Mock implementations prevent accidental system access
All debug commands return safe, predictable responses
Clear visual indicators remind developers of debug mode

Troubleshooting

Debug Panel Not Visible

Check build type: Ensure you're running cargo tauri dev, not a release build
Look for button: The toggle button appears in the header, not as a separate window
Browser cache: If using trunk serve, clear browser cache and reload

Log Messages Not Appearing

Click refresh: Use the "🔄 Refresh" button to manually fetch logs
Check backend: Ensure debug commands are registered in the invoke handler
Memory limit: Log storage is limited to 1000 messages; older messages are automatically removed

Push-to-Talk Not Working

Hold, don't click: The button requires holding down, not just clicking
Check events: Ensure mouse/touch events are properly handled
Visual feedback: Look for button colour change and pulsing animation during recording

Future Enhancements

Potential additions to the debug panel:

Performance Monitoring: CPU, memory usage graphs
Network Activity: Mock API call testing
State Inspection: Real-time application state viewer
Configuration Testing: Dynamic settings modification
Export Functionality: Save debug logs to file
Remote Debugging: WebSocket connection for external debugging tools

Frontend: speakr-ui/src/debug.rs - Main debug panel implementation
Backend: speakr-tauri/src/lib.rs - Debug commands and log storage
Styles: speakr-ui/styles.css - Debug panel CSS styles
Types: Log message types and enums
Tests: Unit tests for debug functionality

Contributing

When adding debug features:

Follow the established patterns for conditional compilation
Add appropriate logging with meaningful messages
Include tests for new functionality
Update this documentation with new features
Ensure features work in both desktop and mobile layouts

The debug panel is a powerful development tool that should enhance the development experience while maintaining security and performance in production builds.

Tauri Plugins

The following plugins are of interest for this project:

Specifications

This directory contains all functional requirements (FR), non-functional requirements (NFR), and initialisation specifications (INIT) for the Speakr project.

Functional Requirements (FR)

ID	Name	Report
FR-1	Global Hot-key	Implementation Summary
FR-2	Audio Capture	Implementation Summary
FR-3	Transcription
FR-4	Transcript Injection
FR-5	Injection Fallback
FR-6	Settings UI	Implementation Summary
FR-7	Status Events
FR-8	Settings Persistence
FR-9	Auto-update

Non-Functional Requirements (NFR)

ID	Name	Report
NFR-accessibility	Accessibility
NFR-compatibility	Compatibility
NFR-footprint	Footprint
NFR-latency	Latency
NFR-reliability	Reliability
NFR-security	Security

Initialisation Specifications (INIT)

ID	Name	Report
INIT-01	Project Scaffold & Initial Structure

note

Implementation Reports contain detailed analysis of completed features, including technical decisions, challenges encountered, and verification steps. See reports/ for additional documentation.

FR-1: Global Hot-key

Registers a system-wide hot-key at application start that toggles the record → transcribe → inject flow.

Requirement

The application must register a global hot-key (default ⌥ Option + ~).
Must be active even when Speakr is running in the background.
Pressing the hot-key initiates, in order:
1. Audio recording
2. Transcription
3. Text injection into the current focused field.
The hot-key must be configurable in Settings and warn on conflicts.

Rationale

A single keyboard shortcut lets users capture ideas without context-switching, maintaining focus and flow.

Acceptance Criteria

Hot-key can be triggered from any application on macOS 13+.
95th percentile time-to-text ≤ 3 s for 5 s recordings on M-series Macs.
99 % activation success rate in telemetry.
Changing the hot-key in Settings updates the registration immediately and prevents duplicates.

Test-Driven Design

Follow TDD: write failing automated tests for every case in Test Cases (formerly Acceptance Criteria) before implementation. CI should pass only when the new tests turn green.

References

PRD §6 Functional Requirements – FR-1

date: 2025-07-23 requirement: FR-1-global-hotkey status: PARTIALLY COMPLETE prepared_by: o3

Implementation Report: FR-1 - Global Hot-key

Implementation Summary

The backend (speakr-tauri) integrates tauri-plugin-global-shortcut to register a system-wide shortcut at start-up. A default combination (CmdOrCtrl+Alt+Space) is attempted first; if registration fails (for example due to a conflict) a fallback (CmdOrCtrl+Alt+F2) is tried. The registration logic is implemented in GlobalHotkeyService (speakr-tauri/src/services/hotkey.rs) and invoked from speakr-tauri/src/lib.rs inside the setup callback. The service stores the active shortcut behind a mutex and emits a hotkey-triggered Tauri event each time the key is pressed.

Validation utilities (commands::validation::validate_hot_key_internal) together with the HotkeyConfig type (defined in speakr-types) provide parsing and serialisation support. A comprehensive suite of unit tests exercises many shortcut formats, as well as default configuration behaviour and placeholder Tauri integration scenarios.

Work Remaining

Trigger pipeline – wire the hotkey-triggered event to the record → transcribe → inject flow (FR-2, FR-3, FR-4).
Settings integration – load a user-defined shortcut from persisted settings at start-up and expose a Tauri command that re-registers it at runtime.
Conflict feedback – propagate HotkeyError::ConflictDetected to the UI so users are warned instantly.
Configurable modifier – change the default shortcut to match the PRD (⌥ Option + ~) and let users restore defaults easily.
Cross-platform assurance – create integration tests with a mocked AppHandle or CI desktop harness to confirm registration works on macOS, Windows and Linux.
Performance metric – measure and emit telemetry needed for the 95th-percentile time-to-text ≤ 3 s requirement (once the pipeline is complete).

Architecture

Sequence – current implementation

sequenceDiagram
    autonumber
    participant OS as Operating System
    participant Plugin as GlobalShortcut plugin<<components>>
    participant Service as GlobalHotkeyService<<process>>
    participant App as Speakr backend (Tauri)<<components>>

    App->Plugin: register(shortcut)
    Plugin->OS: register
    OS-->>Plugin: ok / fail
    Plugin-->>App: result
    OS->>Plugin: *User presses shortcut*
    Plugin->Service: on_shortcut callback
    Service->App: emit "hotkey-triggered" event

Target flow – requirement goal

flowchart TD
    Input["User presses global hot-key"]::inputOutput --> Shortcut(Registered shortcut)<<components>>
    Shortcut --> |Tauri event| Record["Audio capture start"]::process
    Record --> Transcribe["Whisper transcription"]::process
    Transcribe --> Inject["Text injection into active field"]::process
    classDef inputOutput fill:#FEE0D2,stroke:#E6550D,color:#E6550D
    classDef process fill:#EAF5EA,stroke:#C6E7C6,color:#77AD77
    classDef components fill:#E6E6FA,stroke:#756BB1,color:#756BB1

Noteworthy

The current default shortcut differs from the PRD specification. A TODO in code highlights the pending pipeline integration.
Unit tests follow TDD principles, yet integration tests with the real plugin are still placeholders.

References

FR-2: Audio Capture

Captures microphone input suitable for Whisper transcription.

Requirement

Capture 16 kHz mono audio via the cpal crate.
Default maximum duration 10 s; user-configurable up to 30 s.
Recording stops automatically when the duration limit is reached or the user presses the hot-key
again.
Audio is buffered entirely in memory; no files are written to disk.
Handle microphone permission prompts gracefully on first run.

Rationale

Lower sample-rate mono audio minimises processing cost while meeting Whisper’s input requirements.

Acceptance Criteria

Recording initialises within 100 ms after hot-key press.
Audio stream conforms to 16 kHz, 16-bit, mono.
User can change max duration in Settings; value persists across restarts.
Recording stops cleanly at limit without crashing or clipping.
Permission dialog appears once and records decision.

Test-Driven Design

Adopt test-driven development: begin by writing failing unit/integration tests that assert each Acceptance Criterion. Only then implement capture logic until tests pass in CI.

References

PRD §6 Functional Requirements – FR-2

date: 2025-07-23 requirement: FR-2 status: PARTIALLY COMPLETE prepared_by: o4-mini

markdownlint-disable MD013

Implementation Report: FR-2 - Audio Capture

Implementation Summary

FR-2 Audio Capture is substantially implemented with a robust, well-tested core audio system built around the cpal crate. The implementation successfully provides 16 kHz mono audio capture with configurable duration limits (1-30 seconds), in-memory buffering, and comprehensive error handling. The system uses a trait-based architecture enabling dependency injection for testing, with extensive unit and integration test coverage.

The core functionality is implemented in speakr-core/src/audio/mod.rs with the AudioRecorder struct providing the main API. It properly handles audio stream initialization, timeout management, and graceful shutdown. Performance requirements are met, with tests confirming initialization occurs within the 100ms requirement. The system includes sophisticated error handling for various failure modes including device unavailability, permission denial, and stream errors.

Work Remaining

Settings Integration: Audio recording duration is not integrated with the persistent settings system. Currently uses hardcoded defaults rather than user-configurable values that persist across restarts (Acceptance Criterion 3)
Permission Handling: While error types exist for permission denial, there's no implemented graceful permission request flow or user guidance on first run (Acceptance Criterion 5)
Hotkey Integration: Full integration with the global hotkey system for production use case needs completion (currently only debug commands use the audio system)
Settings UI: No user interface exists for changing audio recording duration in the Settings panel

Architecture

sequenceDiagram
    participant HK as Global Hotkey
    participant AR as AudioRecorder
    participant AS as AudioSystem (cpal)
    participant CS as CpalAudioStream
    participant TO as Timeout Task

    HK->>AR: start_recording()
    AR->>AR: Check if already recording
    AR->>AS: start_recording(config)
    AS->>CS: Create audio stream
    CS->>CS: Initialize cpal stream
    CS-->>AS: Return stream handle
    AS-->>AR: Return AudioStream
    AR->>TO: Spawn timeout task
    AR-->>HK: Recording started

    Note over CS: Continuously capture<br/>16kHz mono samples

    alt Manual Stop
        HK->>AR: stop_recording()
    else Timeout
        TO->>CS: stream.stop()
    end

    AR->>CS: get_samples()
    CS-->>AR: Vec<i16> samples
    AR-->>HK: RecordingResult

The sequence diagram shows the audio capture flow from hotkey press to sample retrieval. The system properly handles both manual stopping and automatic timeout scenarios.

classDiagram
    class AudioRecorder {
        -state: Arc~Mutex~Option~RecordingState~~~
        -audio_system: Box~dyn AudioSystem~
        +new(config: RecordingConfig) AudioRecorder
        +start_recording() Result~(), AudioCaptureError~
        +stop_recording() Result~RecordingResult, AudioCaptureError~
        +is_recording() bool
        +list_input_devices() Result~Vec~AudioDevice~, AudioCaptureError~
    }

    class AudioSystem {
        <<trait>>
        +start_recording(config: &RecordingConfig) Result~Box~dyn AudioStream~, AudioCaptureError~
        +list_input_devices() Result~Vec~AudioDevice~, AudioCaptureError~
    }

    class CpalAudioSystem {
        -host: cpal::Host
        +new() Result~Self, AudioCaptureError~
    }

    class AudioStream {
        <<trait>>
        +get_samples() Vec~i16~
        +stop()
        +is_active() bool
    }

    class CpalAudioStream {
        -samples: Arc~Mutex~Vec~i16~~~
        -is_recording: Arc~AtomicBool~
    }

    class RecordingConfig {
        -max_duration_secs: u32
        +new(duration: u32) Self
        +max_duration_secs() u32
        +max_samples() usize
    }

    AudioRecorder --> AudioSystem
    CpalAudioSystem ..|> AudioSystem
    CpalAudioSystem --> CpalAudioStream
    CpalAudioStream ..|> AudioStream
    AudioRecorder --> RecordingConfig

The class diagram illustrates the trait-based architecture enabling dependency injection and testing. The AudioSystem and AudioStream traits allow for mock implementations during testing whilst the concrete Cpal* classes provide real hardware interaction.

stateDiagram-v2
    [*] --> Idle
    Idle --> Initializing : start_recording()
    Initializing --> Recording : Stream created successfully
    Initializing --> Error : Device/Permission error
    Recording --> Stopping : Manual stop / Timeout
    Stopping --> Idle : Samples extracted
    Error --> Idle : Error handled

    state Recording {
        [*] --> Capturing
        Capturing --> Capturing : Accumulate samples
    }

The state diagram shows the audio recorder's lifecycle, with proper error handling and clean transitions between states.

Noteworthy

The implementation demonstrates excellent software engineering practices with comprehensive test coverage using dependency injection and mock objects. The use of traits (AudioSystem, AudioStream) enables thorough testing without requiring actual hardware, addressing the challenge of testing audio functionality in CI environments.

Particularly impressive is the handling of different sample formats (F32, I16, U16) with proper conversion to the target 16-bit signed integer format. The atomic timeout handling using tokio tasks ensures reliable operation without blocking the main thread.

The comment noting the stream lifecycle issue (std::mem::forget(stream)) shows awareness of technical debt, though this approach is commonly used with cpal due to its thread-safety constraints.

FR-3 FR-3: Transcription (consumes audio samples from FR-2)
FR-8 FR-8: Settings Persistence (should store audio duration preference)
FR-1 FR-1: Global Hotkey (triggers audio capture)

References

FR-3: Transcription

Offline transcription of recorded audio to text using Whisper.

Requirement

Use whisper-rs to run Whisper (GGUF) models entirely on-device.
Default language: English (en). Allow user language selection in Settings.
Transcription must complete within ≤ 3 s (95th percentile) for 5-second recordings on Apple
Silicon with the small model.
Support user-selectable model sizes for latency/accuracy trade-off.
No external network calls during transcription.

Rationale

On-device inference preserves privacy and removes network latency, achieving the product’s privacy-first promise.

Acceptance Criteria

Transcription completes within latency budget on M1 and Intel reference machines.
Selecting a different model in Settings updates the engine without restart.
No outbound network traffic observed via packet capture.
Errors (e.g. model missing) surface in UI overlay/log with actionable message.

Test-Driven Design

Begin with failing automated tests for latency, language selection, and network isolation. Implement transcription until all tests pass, following TDD.

References

PRD §6 Functional Requirements – FR-3

FR-4: Transcript Injection

Types the transcribed text into the currently focused input field.

Requirement

Use the enigo crate to emit synthetic keystrokes that reproduce the transcription exactly as
plain text.
Injection must preserve line breaks and punctuation.
Injection must run on the main UI thread to respect macOS accessibility APIs.
Provide feedback event (e.g. Injected) to UI overlay/log once complete.

Rationale

Typing text directly avoids clipboard usage and works in most applications, maintaining illusion of native typing.

Acceptance Criteria

For a 100-character transcript, injection latency ≤ 300 ms.
Typed characters match transcription byte-for-byte.
Works in common editors (VS Code, Xcode, Pages, Safari).
Emits completion event for downstream UI.

Test-Driven Design

Write failing integration tests measuring injection latency and correctness across target editors. Deliver code to satisfy the tests.

References

PRD §6 Functional Requirements – FR-4

FR-5: Injection Fallback

Clipboard-paste fallback when keystroke injection is blocked.

Requirement

Detect secure text fields or injection failure (e.g. enigo error).
Copy transcript to clipboard and simulate ⌘V paste as fallback.
Display transient warning overlay: “Secure field detected – text pasted via clipboard.”
Restore previous clipboard contents after paste to respect user data.

Rationale

Some password or secure fields block synthetic keystrokes. A controlled clipboard fallback ensures functionality while informing the user.

Acceptance Criteria

100 % success rate pasting into macOS secure text fields (Safari password prompt as test).
Previous clipboard restored within 500 ms after paste.
Warning overlay disappears automatically after 3 s.
No sensitive transcript retained on clipboard after restore.

Test-Driven Design

Craft failing tests for secure-field detection, clipboard restoration, and overlay timing. Implement fallback logic until tests succeed.

References

PRD §6 Functional Requirements – FR-5

FR-6: Settings UI

Provides a graphical interface (tray or window) for user configuration.

Requirement

Expose configuration for:
- Global hot-key picker
- Model selector (small, medium, large GGUF)
- Auto-launch on login toggle
Implemented as a Tauri window accessible from the menu bar/tray.
Validate hot-key conflicts and model availability.
Preference changes take effect without restarting the app.

Rationale

A minimal settings UI keeps the main workflow keyboard-first while allowing deeper configuration when needed.

Acceptance Criteria

Opening Settings from tray displays window within 200 ms.
Changing options updates behaviour immediately (e.g. new hot-key active).
Invalid configurations (missing model file) display inline errors.
Settings persist after app restart.

Test-Driven Design

Define unit/UI tests for each settings control and validation rule before coding. Implementation is complete when all tests pass.

References

PRD §6 Functional Requirements – FR-6

date: 2025-07-23 requirement: FR-6 status: PARTIALLY COMPLETE prepared_by: gpt-4.1

Implementation Report: FR-6 - Settings UI

Implementation Summary

The SettingsPanel Leptos component serves as the primary settings interface for Speakr. On launch, it invokes the Tauri load_settings command to retrieve persisted AppSettings and renders:

Global hot-key configuration: Real-time validation via the validate_hot_key Tauri command, un/registration through the global-shortcut plugin, and persistence via save_settings.
Model selection: Radio options for small, medium, and large Whisper models, availability checks using check_model_availability, disabling unavailable models, and immediate persistence.
Auto-launch toggle: Uses the set_auto_launch Tauri command and calls save_settings on change.

All changes trigger save_settings and display inline success or error messages. Backend persistence is handled atomically in settings/persistence.rs. Hot-key and auto-launch preferences apply at runtime without restarting the app.

Work Remaining

Add a system tray icon and a “Settings” menu item to open or focus the settings window.
Implement Tauri system_tray integration and event handling in run() to show/hide the settings window.
Enable dynamic transcription-model reload in the backend when model_size changes, without requiring a restart.
Develop unit/UI tests for each settings control and validation path (hot-key, model selection, auto-launch).
Measure and optimise settings window startup to meet the <200 ms opening requirement.
Enhance the hot-key picker with an interactive key-capture control instead of free-text input.
Display inline errors for model selection failures (e.g., missing or corrupt model files).

Architecture

Sequence Diagram

sequenceDiagram
  participant UI as "SettingsPanel"
  participant Backend as "Tauri Backend"
  participant FS as "File System"

  UI->>Backend: load_settings()
  Backend->>FS: load_settings_from_dir()
  FS-->>Backend: AppSettings
  Backend-->>UI: AppSettings
  UI->>UI: render settings

  UI->>Backend: validate_hot_key(newHotkey)
  Backend-->>UI: Ok
  UI->>UI: register_global_shortcut

  UI->>Backend: save_settings(AppSettings)
  Backend->>FS: save_settings_to_dir()
  FS-->>Backend: Ok
  Backend-->>UI: Ok

Flowchart

flowchart TD
  A["User modifies setting"] --> B["UI captures change"]
  B --> C{"Validate input"}
  C -->|Valid| D["Invoke Tauri command"]
  C -->|Invalid| E["Show validation error"]
  D --> F["Persist settings via backend"]
  F --> G["Display success or error message"]

Noteworthy

N/A

FR-1 Global Hotkey
FR-3 Transcription
FR-8 Settings Persistence

References

speakr-ui/src/settings.rs
speakr-tauri/src/lib.rs
speakr-tauri/tauri.conf.json

FR-7: Status Events

Emit real-time status updates for UI overlays and logging.

Requirement

Broadcast status events: Recording, Transcribing, Injected, Error (variants).
Events emitted over an internal async channel consumable by UI components and log subsystem.
Include timestamp and optional payload (e.g. error message).
Provide public Rust API subscribe_status() for other components.

Rationale

A decoupled event system lets the overlay and future extensions react without tight coupling to business logic.

Acceptance Criteria

Overlay reflects status within 50 ms of event emission.
Logs capture all events with accurate timestamps.
No missed or duplicated events observed in 1-hour monkey test (500 invocations).

Test-Driven Design

Start with failing tests subscribing to the event channel and asserting delivery guarantees (latency, ordering, no duplicates). Implement until green.

References

PRD §6 Functional Requirements – FR-7

FR-8: Settings Persistence

Persist user preferences locally without cloud sync.

Requirement

Store settings in a JSON file located in the platform-appropriate app data directory
($HOME/Library/Application Support/Speakr/settings.json).
Write changes atomically to avoid corruption.
Migration framework supports future schema evolution with versioning.
No data leaves the device.

Rationale

Local persistence offers instant access, privacy, and offline capability.

Acceptance Criteria

Settings file created on first launch with defaults.
Modifying settings updates file within 100 ms.
Corrupt settings file triggers automatic recovery to defaults.
Unit tests cover load/save error paths.

Test-Driven Design

Write failing unit tests for load/save, corruption recovery, and migration before implementation; pass them in CI.

References

PRD §6 Functional Requirements – FR-8

FR-9: Auto-update

Provide optional self-update via GitHub Releases.

Requirement

When enabled, periodically (daily) check GitHub Releases for a newer version tag.
Use secure download (HTTPS) and verify code signature / hash before install.
Prompt user with Release Notes and require confirmation before applying update.
Allow users to disable auto-update in Settings.
Feature optional in v1; must degrade gracefully when disabled.

Rationale

Easy updates encourage users to stay on latest version, reducing support burden and delivering security fixes.

Acceptance Criteria

Update check runs off main thread; no UI freeze.
Failed update check logs but does not crash application.
Downloaded binary passes macOS notarisation verification.
User can opt-out entirely; no network calls when disabled.

Test-Driven Design

Begin with failing integration tests that simulate update availability, download verification,

References

PRD §6 Functional Requirements – FR-9

INIT-01: Project Scaffold & Initial Structure

Define the baseline repository layout, build tooling, and development workflows for Speakr.

Requirement

Workspace Layout (multi-crate)
- speakr-core/ – pure Rust library (record → transcribe → inject).
- speakr-tauri/ – Tauri desktop shell; contains src-tauri/ and embeds Leptos frontend by default.
- speakr-ui/ – optional standalone Leptos UI crate (only if the UI is fully separated).
- models/ – user-downloaded GGUF Whisper models (git-ignored).
- docs/ – architecture, PRD, and spec docs (this folder).
- nix/ – flakes, overlays, devenv.nix, CI helpers.
- scripts/ – one-off dev scripts (lint, release, etc.).
- Root-level Cargo.toml / Cargo.lock defining a [workspace] with members.
Build Tooling
- Use Cargo workspace to manage crates and enable incremental rebuilds.
- Root-level Nix flake + devenv.nix for reproducible shells.
- Trunk.toml (in speakr-tauri/) bundles static assets for the WebView.
CI / CD
- GitHub Actions workflow for: lint (rustfmt, clippy), test, macOS build, docs build.
- Release workflow signs and notarises macOS DMG.
Linters & Hooks
- Pre-commit config: rustfmt, markdownlint, shellcheck, nixpkgs-fmt.
Documentation Site
- mdBook in docs/book/ published via GitHub Pages.
Version Control Hygiene
- .gitignore tracks target, model files, and local config overrides.

Rationale

A consistent scaffold accelerates onboarding, enforces build reproducibility, and aligns with the project’s privacy-first & cross-platform goals.

Acceptance Criteria

Fresh clone followed by devenv shell (or devenv up) yields a working shell with cargo,
tauri, and mdbook available.
cargo test passes with placeholder tests.
npm run tauri dev (via Trunk) launches stub window.
GitHub Actions green on lint + test.
mdbook serve builds documentation without errors.

Migration Steps (from mono-crate → multi-crate)

Create workspace file

# At repo root
echo "[workspace]\nmembers = [ \"speakr-core\", \"speakr-tauri\", \"speakr-ui\" ]" > Cargo.toml

Scaffold core crate

cargo new --lib speakr-core
mv src/*.rs speakr-core/src/          # move existing logic
rm -rf src/

Scaffold Tauri crate

cargo tauri init --template leptos speakr-tauri
# move existing src-tauri/ into speakr-tauri/
mv src-tauri speakr-tauri/

Wire dependency In speakr-tauri/Cargo.toml add:
```
speakr-core = { path = "../speakr-core" }
```

(Optional) Separate UI crate

cargo new --lib speakr-ui
mv speakr-tauri/src-leptos/* speakr-ui/src/
# then depend on speakr-ui from speakr-tauri via WASM asset pipeline

Update paths in code & imports.

Run tests & build

cargo test --workspace
cargo tauri dev -p speakr-tauri

CI / Nix – update workflows and devenv.nix to use --workspace.

Completion of these steps should yield the new structure with all tests & tauri dev working.

NFR: Accessibility

Comply with macOS accessibility guidelines.

Requirement

UI elements (overlay, settings) must be VoiceOver readable.
Support high-contrast mode and respect user font scaling preferences.
Achieve Apple Accessibility Inspector score ≥ 85.

Rationale

Ensures inclusivity for users with visual impairments or other accessibility needs.

Acceptance Criteria

VoiceOver reads overlay status changes accurately.
High-contrast mode renders UI with sufficient contrast ratios (> 4.5:1).
Automated accessibility audit (axe-core) passes with no critical violations.

Test-Driven Design

Introduce automated accessibility audits (axe-core, VoiceOver scripts) in CI before fixing violations.

References

PRD §7 Non-Functional Requirements – Accessibility

NFR: Compatibility

Operate across supported macOS versions and CPU architectures.

Requirement

Support macOS 13+ on Apple Silicon and Intel Macs.
Intel Macs may experience doubled latency but must remain functional.

Rationale

Wider OS support increases addressable market while retaining acceptable performance.

Acceptance Criteria

Manual QA passes on Intel MBP 2020 (macOS 13).
Automated smoke test on GitHub Actions Intel runner passes.
Latency SLA documented separately for Intel.

Test-Driven Design

Add failing cross-arch smoke tests to CI runners before porting; success criteria met when tests pass on Intel and Apple Silicon.

References

PRD §7 Non-Functional Requirements – Compatibility

NFR: Footprint

Constrain binary size and runtime memory usage.

Requirement

Universal macOS binary size ≤ 20 MB (excluding model files).
Peak RSS ≤ 400 MB including model during standard transcription workload.

Rationale

A lightweight application reduces download size, disk usage and keeps memory pressure low on older devices.

Acceptance Criteria

du -h on release DMG shows ≤ 20 MB binary.
Runtime memory measured via Activity Monitor stays ≤ 400 MB during 30 s monkey test.

Test-Driven Design

Add failing size and memory regression tests into CI before implementation tweaks.

References

PRD §7 Non-Functional Requirements – Footprint

NFR: Latency

Ensure low end-to-end latency from hot-key activation to text injection.

Requirement

95th percentile time-to-text ≤ 3 s for a 5-second audio clip on Apple Silicon (M1) using the
small Whisper model.
Latency measured in release (optimised) builds with all background services running.

Rationale

Sub-3-second latency preserves conversational flow and competitive advantage over cloud dictation.

Acceptance Criteria

Automated telemetry logs latency for every invocation.
CI latency test passes on GitHub Actions M1 runner.
Performance regression test fails build if P95 > 3 s.

Test-Driven Design

Create automated performance tests that measure P95 latency; commit them before optimising the code.

References

PRD §7 Non-Functional Requirements – Latency

NFR: Reliability

Maintain stability across heavy usage.

Requirement

Application must run 1-hour monkey test (500 invocations) with zero crashes.
Recover gracefully from errors (audio device unavailable, model missing).

Rationale

High reliability builds user trust and reduces support overhead.

Acceptance Criteria

CI integration test simulates 500 sequential hot-key invocations without crash.
Error conditions logged and surfaced via Status Events.

Test-Driven Design

Introduce a failing soak-test (500 invocations) in CI first; stabilise code until it passes consistently.

References

PRD §7 Non-Functional Requirements – Reliability

NFR: Security

Prevent unintended data leakage and maintain user privacy.

Requirement

No outbound network connections except optional auto-update domain.
Hardened runtime & proper code-signing for macOS notarisation.
Microphone access prompt shown once and justification provided.

Rationale

Privacy-first positioning requires strict control over network activity and OS security policies.

Acceptance Criteria

Static analysis shows no runtime socket creation beyond update URL when enabled.
Application passes Apple notarisation & gatekeeper checks.
Firewall test (Little Snitch) reveals no unexpected traffic.

Test-Driven Design

Write security unit tests (e.g., socket mocks) and notarisation validation scripts before code changes; CI must enforce them.

References

PRD §7 Non-Functional Requirements – Security

date: {YYYY-MM-DD} requirement: {Requirement-ID} status:

Implementation Report: {Requirement-ID} -

Implementation Summary

For completed and partially completed requirements, 1-2 paragraphs explaining: - How the implementation works overall - Specific behaviours of note - Control and data flow(s) - Other significant details as appropriate

Work Remaining

(N/A for Complete requirements) Itemised list of specific work required for the requirement to be completed.

Architecture

One or more Mermaid diagrams, include ALL applicable to the requirement:

Sequence diagrams (e.g. IPC, user interactions)
State diagrams (e.g. system state transitions)
Entity relationships (e.g. data entities)
Class diagrams
Flowcharts (e.g. process/control flows)
Any other diagram type that best describes the information

Each diagram should be preceded by a ### Title and a short summary of what the diagram shows, and any clarifying remarks (if anything is not self-evident from the diagram). Diagrams should be embedded using a mermaid code fence.

Noteworthy

(Discretionary section, N/A if not relevant) Discussion about any especially interesting details about the implementation, or insights related to it.

REQ1-ID Related Requirement 1 Name
REQ2-ID Related Requirement 2 Name
...

References

Speakr-Tauri lib.rs Refactoring Plan

Current State Analysis

The speakr-tauri/src/lib.rs file has grown to 2,000 lines and contains multiple responsibilities that should be separated for better maintainability.

Current File Composition

Lines 1-27: Imports and use statements
Lines 29-87: Debug-only types and static storage
Lines 89-255: Settings management utilities
Lines 256-456: GlobalHotkeyService implementation
Lines 457-600: Tauri command functions
Lines 601-950: Audio functionality helpers
Lines 951-1100: Additional utility functions
Lines 1732-1830: BackendStatusService implementation
Lines 1831-1913: Main run function and setup
Lines 1400+: Extensive test module (500+ lines)

Proposed Refactoring Structure

1. Move Tests to Separate Files

Target: Extract all tests from lib.rs into dedicated test files

Current: 500+ lines of tests in mod tests

New Structure:

speakr-tauri/tests/
├── settings_tests.rs       # Settings save/load/migration tests
├── hotkey_tests.rs         # GlobalHotkeyService tests
├── status_tests.rs         # BackendStatusService tests
├── audio_tests.rs          # Audio recording/file tests
├── commands_tests.rs       # Tauri command tests
└── integration_tests.rs    # Cross-module integration tests

Benefits: Reduces lib.rs by ~500 lines, improves test organization
Note: Integration tests can access internal modules via speakr_lib::module_name (speakr-tauri crate is named speakr_lib)

2. Extract Debug Functionality

Target: Move all debug-related code to separate module

Current: Debug types, static storage, debug commands scattered throughout

New Structure:

speakr-tauri/src/debug/
├── mod.rs                  # Public interface, re-exports
├── types.rs                # DebugLogLevel, DebugLogMessage, DebugRecordingState
├── storage.rs              # Static storage (DEBUG_LOG_MESSAGES, DEBUG_RECORDING_STATE)
└── commands.rs             # Debug Tauri commands

Files to Create:
- src/debug/types.rs: ~50 lines
- src/debug/storage.rs: ~30 lines
- src/debug/commands.rs: ~200 lines
- src/debug/mod.rs: ~20 lines
Benefits: Isolates debug code, easier to disable in release builds

3. Extract Settings Management

Target: Centralize all settings-related functionality

Current: Settings utilities and commands mixed in main file

New Structure:

speakr-tauri/src/settings/
├── mod.rs                  # Public interface
├── persistence.rs          # File I/O, atomic writes, backups
├── migration.rs            # Version migration logic
├── validation.rs           # Directory permissions, data validation
└── commands.rs             # Settings Tauri commands

Functions to Move:
- get_settings_path(), get_settings_backup_path()
- migrate_settings(), save_settings_to_dir(), load_settings_from_dir()
- try_load_settings_file(), validate_settings_directory_permissions()
- Commands: save_settings(), load_settings()
Files to Create:
- src/settings/persistence.rs: ~150 lines
- src/settings/migration.rs: ~50 lines
- src/settings/validation.rs: ~40 lines
- src/settings/commands.rs: ~60 lines
- src/settings/mod.rs: ~30 lines
Benefits: Clear separation of concerns, easier testing of settings logic

4. Extract Service Implementations

Target: Move service structs to dedicated service modules

Current: GlobalHotkeyService and BackendStatusService in main file

New Structure:

speakr-tauri/src/services/
├── mod.rs                  # Re-exports, common traits
├── hotkey.rs              # GlobalHotkeyService implementation
├── status.rs              # BackendStatusService implementation
└── types.rs               # ServiceComponent enum, shared types

Content to Move:
- GlobalHotkeyService struct (~200 lines)
- BackendStatusService struct (~100 lines)
- ServiceComponent enum
- Related Tauri commands: register_global_hotkey(), unregister_global_hotkey()
Files to Create:
- src/services/hotkey.rs: ~220 lines
- src/services/status.rs: ~120 lines
- src/services/types.rs: ~20 lines
- src/services/mod.rs: ~30 lines
Benefits: Services become self-contained, easier to test and maintain

5. Extract Audio Functionality

Target: Isolate audio recording and file operations

Current: Audio functions scattered throughout main file

New Structure:

speakr-tauri/src/audio/
├── mod.rs                  # Public interface
├── recording.rs           # Recording logic, real audio backend
├── files.rs               # WAV file operations, filename generation
└── commands.rs            # Audio-related Tauri commands

Functions to Move:
- generate_audio_filename_with_timestamp()
- save_audio_samples_to_wav_file()
- debug_record_audio_to_file(), debug_record_real_audio_to_file()
- get_debug_recordings_directory()
- Commands: debug_start_recording(), debug_stop_recording()
Files to Create:
- src/audio/recording.rs: ~100 lines
- src/audio/files.rs: ~80 lines
- src/audio/commands.rs: ~150 lines
- src/audio/mod.rs: ~25 lines
Benefits: Audio logic becomes testable in isolation

6. Extract General Tauri Commands

Target: Group remaining Tauri commands by domain

Current: Various commands mixed in main file

New Structure:

speakr-tauri/src/commands/
├── mod.rs                  # Command registration, re-exports
├── validation.rs          # validate_hot_key, input validation
├── system.rs              # check_model_availability, set_auto_launch
└── legacy.rs              # register_hot_key (backward compatibility)

Commands to Move:
- validate_hot_key() → validation.rs
- check_model_availability(), set_auto_launch() → system.rs
- register_hot_key(), greet() → legacy.rs
- get_backend_status() → (might stay in services/status.rs)
Files to Create:
- src/commands/validation.rs: ~60 lines
- src/commands/system.rs: ~80 lines
- src/commands/legacy.rs: ~40 lines
- src/commands/mod.rs: ~40 lines
Benefits: Commands grouped by domain, easier to find and maintain

7. Simplified lib.rs

Target: Reduce lib.rs to essential coordination code

Final Content:
- Module declarations and re-exports
- Main run() function with Tauri setup
- Essential imports
- Command registration (delegated to modules)
Estimated Size: ~150-200 lines (down from 1,913)

Implementation Strategy

Refactoring Process Overview

The following diagram illustrates the 5-phase refactoring approach and its progression from the current monolithic structure to a modular architecture:

graph TD
    A["Phase 1: Extract Tests<br/>Low Risk"] --> B["Phase 2: Extract Services<br/>Medium Risk"]
    B --> C["Phase 3: Extract Settings<br/>Medium Risk"]
    C --> D["Phase 4: Extract Debug & Audio<br/>Low Risk"]
    D --> E["Phase 5: Extract Commands & Finalize<br/>Low Risk"]

    A1["• Create test directory structure<br/>• Move 500+ lines of tests<br/>• Update imports & run tests"]
    B1["• Extract GlobalHotkeyService<br/>• Extract BackendStatusService<br/>• Move related Tauri commands"]
    C1["• Extract settings persistence<br/>• Extract migration logic<br/>• Extract validation functions"]
    D1["• Extract debug functionality<br/>• Extract audio operations<br/>• Update conditional compilation"]
    E1["• Group remaining commands<br/>• Finalize lib.rs cleanup<br/>• Run full test suite"]

    A -.-> A1
    B -.-> B1
    C -.-> C1
    D -.-> D1
    E -.-> E1

    F["lib.rs: 1,913 lines"] --> G["lib.rs: ~200 lines"]

    classDef process fill:#EAF5EA,stroke:#C6E7C6,color:#77AD77
    classDef decision fill:#FFF5EB,stroke:#FD8D3C,color:#E6550D
    classDef error fill:#FCBBA1,stroke:#FB6A4A,color:#CB181D
    classDef data fill:#EFF3FF,stroke:#9ECAE1,color:#3182BD

    class A,D,E process
    class B,C decision
    class F error
    class G data

Risk Assessment

Low Risk Refactoring

✅ Moving tests to separate files
✅ Extracting debug functionality (conditional compilation)
✅ Moving utility functions (no complex dependencies)

Medium Risk Refactoring

⚠️ Service extraction (careful with state management)
⚠️ Settings refactoring (critical for app functionality)
⚠️ Tauri command reorganization (frontend depends on these)

Mitigation Strategies

Incremental Changes: One module at a time
Comprehensive Testing: Run full test suite after each phase
Feature Flags: Use conditional compilation during transition
Backup Strategy: Git branches for each refactoring phase

Success Criteria

lib.rs reduced to ~200 lines
All existing tests pass without modification
All Tauri commands remain accessible to frontend
Debug functionality preserved in debug builds
Settings persistence works identically
Global hotkey registration continues working
Build time remains similar or improves
New module structure is logical and discoverable

This refactoring will significantly improve the maintainability and organization of the Speakr Tauri backend while preserving all existing functionality.

Phase 1: Extract Tests (Low Risk)

Objective: Move all tests from lib.rs into separate files organized by domain

New Structure:

speakr-tauri/tests/
├── settings_tests.rs       # Settings save/load/migration tests
├── hotkey_tests.rs         # GlobalHotkeyService tests
├── status_tests.rs         # BackendStatusService tests
├── audio_tests.rs          # Audio recording/file tests
├── commands_tests.rs       # Tauri command tests
└── integration_tests.rs    # Cross-module integration tests

Note: Integration tests can access internal modules via speakr_lib::module_name

🎉 PHASE 1 COMPLETE - MAJOR SUCCESS!

Final Results: 27 tests migrated out of 35 total tests (77% success rate)

✅ Breakthrough Strategy: Making Functions `pub` with Internal API Documentation

The key to success was making private functions pub (not pub(crate)) with clear internal API documentation. This allows external integration tests in the tests/ directory to access internal functions while maintaining clear API boundaries.

Example pattern used:

#![allow(unused)]
fn main() {
/// Internal hot-key validation logic.
///
/// # Internal API
/// This function is only intended for internal use and testing.
pub async fn validate_hot_key_internal(hot_key: String) -> Result<(), AppError> {
    // implementation...
}
}

Task Checklist (Phase 1)

Create test directory structure
- Create speakr-tauri/tests/ directory
- Create settings_tests.rs file
- Create hotkey_tests.rs file
- Create status_tests.rs file
- Create audio_tests.rs file
- Create commands_tests.rs file
- Create integration_tests.rs file
Move settings-related tests ✅ 11/13 tests migrated (85% success)
- Extract test_app_settings_default() → settings_tests.rs
- Extract test_save_and_load_settings() → `settings_tests.rs
- Extract test_settings_migration() → `settings_tests.rs
- [~] Extract test_atomic_write_creates_backup() → ~~SKIPPED: Tests Tauri command~~
- Extract test_corruption_recovery_from_backup() → `settings_tests.rs
- Extract test_corruption_recovery_fallback_to_defaults() → `settings_tests.rs
- Extract test_settings_serialization() → settings_tests.rs
- [~] Extract test_save_settings_tauri_command() → ~~SKIPPED: Tests Tauri command~~
- Extract test_settings_performance() → `settings_tests.rs
- Extract test_settings_directory_permissions() → `settings_tests.rs
- Extract test_isolated_settings_save_and_load() → `settings_tests.rs
- Extract test_isolated_corruption_recovery() → `settings_tests.rs
- Extract debug_save_button_functionality() → settings_tests.rs
Move hotkey-related tests ✅ 2/3 tests migrated (67% success)
- Extract test_validate_hot_key_success() → `hotkey_tests.rs
- Extract test_validate_hot_key_failures() → `hotkey_tests.rs
- [~] Extract test_register_hot_key() → ~~SKIPPED: Tests Tauri command~~
Move status-related tests ✅ 9/12 tests migrated (75% success)
- Extract test_backend_status_service_creation() → status_tests.rs
- Extract test_backend_status_service_update_single_service() → status_tests.rs
- Extract test_backend_status_service_all_services_ready() → status_tests.rs
- Extract test_backend_status_service_error_handling() → status_tests.rs
- Extract test_backend_status_timestamps() → status_tests.rs
- [~] Extract test_get_backend_status_tauri_command() → ~~SKIPPED: Tests Tauri command~~
- Extract test_global_backend_service_initialization() → `status_tests.rs
- Extract test_global_backend_service_state_updates() → `status_tests.rs
- Extract test_global_backend_service_thread_safety() → `status_tests.rs
- [~] Extract test_get_backend_status_command_uses_real_service() → ~~SKIPPED: Tests Tauri command~~
- Extract test_backend_service_emits_events_on_state_change() → `status_tests.rs
- [~] Extract test_complete_status_communication_flow() → ~~SKIPPED: Uses get_backend_status Tauri command~~
Move audio-related tests ✅ 5/5 tests migrated (100% success)
- Extract test_debug_record_audio_to_file_saves_with_timestamp() → `audio_tests.rs
- Extract test_debug_record_audio_to_file_creates_unique_filenames() → `audio_tests.rs
- Extract test_save_audio_samples_to_wav_file() → `audio_tests.rs
- Extract test_generate_audio_filename_with_timestamp() → `audio_tests.rs
- Extract test_debug_real_audio_recording_integration() → `audio_tests.rs (ignored, as expected)
[~] Move command-related tests ❌ 0/2 tests migrated (0% success)
- [~] Extract test_check_model_availability() → ~~SKIPPED: Tests Tauri command~~
- [~] Extract test_set_auto_launch() → ~~SKIPPED: Tests Tauri command~~
Update imports and run tests ✅ COMPLETED
- Made internal functions pub with "Internal API" documentation:
  - Settings functions: get_settings_path, get_settings_backup_path, migrate_settings, try_load_settings_file, load_settings_from_dir, validate_settings_directory_permissions
  - Hotkey functions: validate_hot_key_internal (with Tauri command wrapper)
  - Status functions: get_global_backend_service, reset_global_backend_service
  - Audio functions: generate_audio_filename_with_timestamp, save_audio_samples_to_wav_file, debug_record_audio_to_file, debug_record_real_audio_to_file
- Updated imports in all test files to use speakr_lib::
- Fixed #[cfg(test)] → #[cfg(any(test, debug_assertions))] for external test access
- Verified all migrated tests pass: 27 tests across 4 files
  - settings_tests.rs: 11 tests ✅
  - status_tests.rs: 9 tests ✅
  - hotkey_tests.rs: 2 tests ✅
  - audio_tests.rs: 5 tests ✅ (4 + 1 ignored)
- Removed successfully migrated test functions from lib.rs
- Run cargo test --workspace - all tests pass ✅

📊 Final Migration Summary

Test Category	Total Found	Successfully Migrated	Still in lib.rs	Success Rate
Settings Tests	13 tests	✅ 11 tests	2 tests (Tauri commands)	85%
Status Tests	12 tests	✅ 9 tests	3 tests (Tauri commands)	75%
Hotkey Tests	3 tests	✅ 2 tests	1 test (Tauri command)	67%
Audio Tests	5 tests	✅ 5 tests	0 tests	100%
Command Tests	2 tests	0 tests	🔒 2 tests (All Tauri commands)	0%
TOTALS	35 tests	✅ 27 tests	🔒 8 tests	🎉 77%

🚀 Major Improvement Achieved:

Original attempt: 8 tests migrated (23%)
After making functions pub: 27 tests migrated (77%)
Improvement: +19 additional tests successfully migrated!

🔒 Remaining Tests in lib.rs (8 tests):

All remaining tests are Tauri commands that cannot be moved because:

#[tauri::command] functions cannot be pub (causes macro conflicts)
External tests cannot directly invoke Tauri commands
The may be possible to migrate by renaming the functions to *_internal and making them pub(crate), and moving the #[tauri::command] to a wrapper function with the original function name.

Settings (2 tests):

test_atomic_write_creates_backup() - tests save_settings Tauri command
test_save_settings_tauri_command() - tests save_settings Tauri command

Status (3 tests):

test_get_backend_status_tauri_command() - tests get_backend_status Tauri command
test_get_backend_status_command_uses_real_service() - tests get_backend_status Tauri command
test_complete_status_communication_flow() - tests get_backend_status Tauri command

Hotkey (1 test):

test_register_hot_key() - tests register_hot_key Tauri command

Commands (2 tests):

test_check_model_availability() - tests check_model_availability Tauri command
test_set_auto_launch() - tests set_auto_launch Tauri command

✅ Phase 1 Complete - Ready for Phase 2

Phase 1 has been tremendously successful, achieving a 77% migration rate and reducing the lib.rs file by ~500 lines of test code. The modular test structure is now in place and working perfectly.

Next Steps: Proceed to Phase 2: Extract Services

Phase 2: Extract Services (Medium Risk)

Objective: Move service structs and related functionality to dedicated modules

Task Checklist (Phase 2)

Create services module structure
- Create speakr-tauri/src/services/ directory
- Create services/mod.rs with module declarations
- Create services/types.rs for shared enums
- Create services/hotkey.rs for GlobalHotkeyService
- Create services/status.rs for BackendStatusService
Extract ServiceComponent enum
- Move ServiceComponent enum → services/types.rs
- Add appropriate derives and documentation
- Re-export from services/mod.rs
Extract GlobalHotkeyService
- Move entire GlobalHotkeyService struct → services/hotkey.rs
- Move all impl blocks and methods
- Add necessary imports (tauri, tracing, etc.)
- Extract register_global_hotkey() implementation → services/hotkey.rs as register_global_hotkey_internal()
- Extract unregister_global_hotkey() implementation → services/hotkey.rs as unregister_global_hotkey_internal()
- Keep #[tauri::command] wrappers in lib.rs that call _internal functions
- Make service and methods pub(crate) for module visibility
Extract BackendStatusService
- Move BackendStatusService struct → services/status.rs
- Move all impl blocks and methods
- Move GLOBAL_BACKEND_SERVICE static → services/status.rs
- Move get_global_backend_service() helper → services/status.rs
- Move update_global_service_status() helper → services/status.rs
- Extract get_backend_status() implementation → services/status.rs as get_backend_status_internal()
- Extract update_service_status() implementation → services/status.rs as update_service_status_internal()
- Keep #[tauri::command] wrappers in lib.rs that call _internal functions
- Add necessary imports for Tauri AppHandle, etc.
- Make all functions pub(crate) for module visibility
- Add Default implementation
Update lib.rs imports and exports
- Add mod services; to lib.rs
- Add use services::*; or specific imports
- Remove original service implementations from lib.rs
- Update command registration in run() function
Test service extraction
- Run cargo check to verify compilation
- Run cargo test --workspace to ensure tests pass
- Test hotkey registration functionality manually
- Test status service functionality

Phase 3: Extract Settings (Medium Risk)

Objective: Centralize all settings management into dedicated module

Task Checklist (Phase 3)

Create settings module structure
- Create speakr-tauri/src/settings/ directory
- Create settings/mod.rs with module declarations
- Create settings/persistence.rs for file I/O operations
- Create settings/migration.rs for version migrations
- Create settings/validation.rs for directory validation
- Create settings/commands.rs for Tauri commands
Extract path and validation functions
- Move get_settings_path() → settings/persistence.rs
- Move get_settings_backup_path() → settings/persistence.rs
- Move validate_settings_directory_permissions() → settings/validation.rs
- Add proper error handling and documentation
- Make functions pub(crate) for module visibility
Extract file I/O functions
- Move try_load_settings_file() → settings/persistence.rs
- Move save_settings_to_dir() → settings/persistence.rs
- Move load_settings_from_dir() → settings/persistence.rs
- Ensure all atomic write logic is preserved
- Add proper error handling chains
- Make private functions pub(crate) for module visibility
Extract migration logic
- Move migrate_settings() → settings/migration.rs
- Add version handling logic
- Document migration strategy for future versions
- Make function pub(crate) for module visibility
Extract Tauri commands
- Extract save_settings() implementation → settings/commands.rs as save_settings_internal()
- Extract load_settings() implementation → settings/commands.rs as load_settings_internal()
- Keep #[tauri::command] wrappers in lib.rs that call _internal functions
- Ensure internal functions use the extracted helper functions
- Make internal functions pub(crate) for module visibility
- Maintain same function signatures for compatibility
Update module exports and imports
- Configure settings/mod.rs to re-export public functions
- Add mod settings; to lib.rs
- Update imports in lib.rs
- Remove original settings functions from lib.rs
Test settings extraction thoroughly
- Run isolated settings tests to ensure file I/O works
- Test corruption recovery scenarios
- Test migration scenarios with version 0 files
- Verify atomic write behavior
- Test with real application settings directory

Phase 4: Extract Debug and Audio (Low Risk)

Objective: Isolate debug and audio functionality into separate modules

Task Checklist (Phase 4)

Create debug module structure
- Create speakr-tauri/src/debug/ directory
- Create debug/mod.rs with conditional compilation
- Create debug/types.rs for debug data structures
- Create debug/storage.rs for static storage
- Create debug/commands.rs for debug Tauri commands
Extract debug types and storage
- Move DebugLogLevel enum → debug/types.rs
- Move DebugLogMessage struct → debug/types.rs
- Move DebugRecordingState struct → debug/types.rs
- Move DEBUG_LOG_MESSAGES static → debug/storage.rs
- Move DEBUG_RECORDING_STATE static → debug/storage.rs
- Move add_debug_log() function → debug/storage.rs
Extract debug commands
- Extract debug_test_audio_recording() implementation → debug/commands.rs as debug_test_audio_recording_internal()
- Extract debug_start_recording() implementation → debug/commands.rs as debug_start_recording_internal()
- Extract debug_stop_recording() implementation → debug/commands.rs as debug_stop_recording_internal()
- Extract debug_get_log_messages() implementation → debug/commands.rs as debug_get_log_messages_internal()
- Extract debug_clear_log_messages() implementation → debug/commands.rs as debug_clear_log_messages_internal()
- Keep #[tauri::command] wrappers in lib.rs that call _internal functions
- Move get_debug_recordings_directory() → debug/commands.rs
- Make all extracted functions pub(crate) for module visibility
Create audio module structure
- Create speakr-tauri/src/audio/ directory
- Create audio/mod.rs with public interface
- Create audio/files.rs for WAV file operations
- Create audio/recording.rs for recording logic
Extract audio file operations
- Move generate_audio_filename_with_timestamp() → audio/files.rs
- Move save_audio_samples_to_wav_file() → audio/files.rs
- Make functions pub(crate) for module visibility
- Add proper WAV spec configuration
- Add file path validation
Extract audio recording functions
- Move debug_record_audio_to_file() → audio/recording.rs
- Move debug_record_real_audio_to_file() → audio/recording.rs
- Make functions pub(crate) for module visibility
- Ensure proper integration with speakr-core AudioRecorder
Update conditional compilation
- Ensure #[cfg(debug_assertions)] is properly applied
- Test that debug code is excluded from release builds (compilation successful)
- Update command registration to handle debug commands conditionally
Update lib.rs and test functionality
- Add mod debug; and mod audio; to lib.rs
- Update imports and re-exports
- Remove original debug and audio functions from lib.rs
- Test debug panel functionality in development mode (24/27 tests passing)
- Test audio recording and file saving (integration tests passing)

SPEAKR-TAURI_LIB-RS_PHASE_5

Migration Notes: Phase 5 Refactor - Command Organisation

Overview

Phase 5 of the Speakr Tauri backend refactor extracted remaining commands into dedicated modules and finalised the cleanup of lib.rs. This document provides guidance for developers working with the new structure.

What Changed

Before (Pre-Phase 5)

All command implementations lived in lib.rs
File was over 1000+ lines with mixed concerns
Commands, services, and business logic were intermingled
Testing required testing through Tauri command wrappers

After (Phase 5 Complete)

Commands organised into functional modules under commands/
Each command has an *_internal() function with business logic
Tauri command wrappers remain in lib.rs for registration
lib.rs reduced to ~400 lines, focused on configuration and integration

New File Structure

speakr-tauri/src/
├── commands/
│   ├── mod.rs          # Command organisation and documentation
│   ├── validation.rs   # Input validation commands
│   ├── system.rs       # System integration commands
│   └── legacy.rs       # Backward compatibility commands
├── services/           # (From previous phases)
│   ├── mod.rs
│   ├── hotkey.rs
│   ├── status.rs
│   └── types.rs
├── settings/           # (From previous phases)
├── debug/              # (From previous phases)
├── audio/              # (From previous phases)
└── lib.rs              # Tauri integration and command registration

Command Implementation Pattern

New Pattern (Recommended)

#![allow(unused)]
fn main() {
// In commands/validation.rs
pub async fn validate_hot_key_internal(hot_key: String) -> Result<(), AppError> {
    // Business logic here
    Ok(())
}

// In lib.rs
#[tauri::command]
async fn validate_hot_key(hot_key: String) -> Result<(), AppError> {
    validate_hot_key_internal(hot_key).await
}
}

Key Benefits

Testability: Internal functions can be tested without Tauri overhead
Modularity: Commands grouped by functional domain
Maintainability: Business logic separated from framework concerns
Documentation: Each module has focused documentation

Working with Commands

Adding a New Command

Choose the appropriate module (validation, system, or legacy)

Implement the internal function:

#![allow(unused)]
fn main() {
pub async fn my_command_internal(param: String) -> Result<T, AppError> {
    // Implementation here
}
}

Add Tauri wrapper in lib.rs:

#![allow(unused)]
fn main() {
#[tauri::command]
async fn my_command(param: String) -> Result<T, AppError> {
    my_command_internal(param).await
}
}

Register in run() function:

#![allow(unused)]
fn main() {
.invoke_handler(tauri::generate_handler![
    // ... existing commands,
    my_command
])
}

Add comprehensive tests for the internal function

Command Module Guidelines

validation.rs: Input validation, sanitisation, format checking
system.rs: OS integration, file system, auto-launch, model availability
legacy.rs: Deprecated or backward-compatibility commands

Testing Commands

#![allow(unused)]
fn main() {
// Test the internal function directly
#[tokio::test]
async fn test_my_command_internal() {
    let result = my_command_internal("test".to_string()).await;
    assert!(result.is_ok());
}
}

Breaking Changes

Import Changes

Commands moved from crate::* to crate::commands::*:

#![allow(unused)]
fn main() {
// Old (no longer works)
use crate::validate_hot_key_internal;

// New
use crate::commands::validation::validate_hot_key_internal;
}

Function Visibility

Internal functions changed from pub(crate) to pub to allow cross-module access:

#![allow(unused)]
fn main() {
// Old
pub(crate) async fn validate_hot_key_internal(...) -> ...

// New
pub async fn validate_hot_key_internal(...) -> ...
}

Error Handling

Consistent Error Types

All commands use speakr_types::AppError for error handling:

#![allow(unused)]
fn main() {
pub enum AppError {
    HotKey(String),
    Settings(String),
    FileSystem(String),
    // ... other variants
}
}

Error Context

Add context to errors for better debugging:

#![allow(unused)]
fn main() {
Err(AppError::Settings(format!("Invalid model size: {model_size}")))
}

Documentation Standards

Function Documentation

All public functions must have rustdoc comments:

#![allow(unused)]
fn main() {
/// Brief description of what the function does.
///
/// # Arguments
///
/// * `param` - Description of the parameter
///
/// # Returns
///
/// Description of what is returned.
///
/// # Errors
///
/// Conditions that cause errors.
///
/// # Examples
///
/// ```rust,no_run
/// use speakr_lib::commands::validation::validate_hot_key_internal;
/// // Example usage
/// ```
pub async fn my_function_internal(param: String) -> Result<(), AppError> {
    // Implementation
}
}

Module Documentation

Each module should have comprehensive documentation explaining its purpose and usage patterns.

Testing Strategy

Unit Tests

Test internal functions directly (not through Tauri wrappers)
Use test isolation patterns for file system operations
Mock external dependencies where possible

Test Organisation

Tests live alongside code in mod tests blocks:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[tokio::test]
    async fn test_function_success() {
        // Test implementation
    }
}
}

Backward Compatibility

Legacy Support

Commands in legacy.rs maintain backward compatibility but should be considered deprecated for new development.

Deprecation Path

When deprecating commands:

Move to legacy.rs
Add deprecation notice in documentation
Provide migration path in rustdoc

Performance Considerations

Command Overhead

The new pattern adds minimal overhead:

Internal functions: Direct function calls
Tauri wrappers: Thin delegation layer

Memory Usage

Internal functions can be tested in isolation without Tauri runtime
Reduced memory usage during testing
Better compiler optimisations due to cleaner module boundaries

Common Patterns

Input Validation

#![allow(unused)]
fn main() {
pub async fn validate_input_internal(input: String) -> Result<(), AppError> {
    let input = input.trim();
    if input.is_empty() {
        return Err(AppError::Settings("Input cannot be empty".to_string()));
    }
    // Additional validation...
    Ok(())
}
}

File System Operations

#![allow(unused)]
fn main() {
pub async fn check_file_internal(path: String) -> Result<bool, AppError> {
    let path = std::path::Path::new(&path);
    match path.exists() {
        true => Ok(true),
        false => Ok(false),
    }
}
}

Error Propagation

#![allow(unused)]
fn main() {
pub async fn complex_operation_internal() -> Result<T, AppError> {
    let result = validate_input_internal(input).await?;
    let file_exists = check_file_internal(path).await?;
    // Process results...
    Ok(final_result)
}
}

Future Development

Adding New Modules

If the commands/ directory grows too large, consider:

Creating subdirectories for related commands
Grouping by feature area rather than technical function
Maintaining the *_internal + wrapper pattern

Architectural Evolution

The current pattern supports:

Easy migration to other frameworks (business logic is framework-agnostic)
Microservice extraction (internal functions are self-contained)
Enhanced testing strategies (direct function testing)

Troubleshooting

Common Issues

Import errors: Check if function moved to new module
Visibility errors: Internal functions are now pub, not pub(crate)
Test failures: Update imports in test files
Documentation tests: Use speakr_lib as crate name, not speakr_tauri

Migration Checklist

When updating code that depends on the old structure:

Update imports to new module paths
Change function visibility if needed
Update test imports and assertions
Fix documentation examples with correct crate name
Verify error handling uses AppError consistently

Last Updated: Phase 5 Complete For questions about this refactor, see the original planning documents in docs/refactor/

Rust Documentation Tracking

Instructions

Original Prompt: Add detailed comments to all functions etc in the files and clean up each file, remove orphaned comments, group code logically (e.g. tauri::commands together) and add large comment signposts to help navigate the file easily.

Documentation Standards:

Add detailed rustdoc comments to all functions, commands, and relevant items
Remove orphaned or outdated comments
Group code logically with clear comment signposts for easy navigation
Ensure all public items are fully documented, including parameters, errors, and usage examples where appropriate
Use large comment blocks (e.g., // ============================================================================) for major sections
Use smaller comment dividers (e.g., // --------------------------------------------------------------------------) for individual functions
Follow Rust documentation best practices and project coding standards

Progress Tracking

Select an UNCHECKED [ ] item from the list.
IMMEDIATELY add a progress indicator to the item: [~]
Comment the file following the instructions in this document.
On COMPLETION, add a checkmark to the item in the list: [x]
Verify your changes using precommit run ... (formats, lints and runs tests)
Fix any errors or warnings and repeat step 5 until no errors or warnings remain
Commit your changes to Git.
Return to step 1 until all items are checked.

speakr-core/src/

lib.rs ✅ COMPLETED
audio/mod.rs
model/mod.rs
model/list.rs
model/list_updater.rs
model/list_tests.rs
model/metadata.rs
bin/update_models.rs
bin/update_models_tui.rs

speakr-tauri/src/

lib.rs ✅ COMPLETED
main.rs
audio/mod.rs
audio/files.rs
audio/recording.rs
commands/mod.rs
commands/legacy.rs
commands/system.rs
commands/validation.rs
debug/mod.rs
debug/commands.rs
debug/storage.rs
debug/types.rs
services/mod.rs
services/hotkey.rs
services/status.rs
services/types.rs
settings/mod.rs
settings/commands.rs
settings/migration.rs
settings/persistence.rs
settings/validation.rs

speakr-types/src/

lib.rs ✅ COMPLETED

speakr-ui/src/

lib.rs
app.rs
debug.rs
settings.rs

Test Files

speakr-core/tests/

audio_capture.rs

speakr-tauri/tests/

audio_tests.rs
commands_tests.rs
debug_save.rs
global_hotkey.rs
hotkey_tests.rs
integration_tests.rs
settings_tests.rs
status_tests.rs

Comment Style Examples

Use these exact patterns for consistency across all files:

Comment Hierarchy Structure

graph TD
    A["File Level<br/>============================================================================<br/>//! Module Documentation<br/>============================================================================"] --> B["Major Section<br/>============================================================================<br/>// Section Name<br/>============================================================================"]

    B --> C["Subsection<br/>// =========================<br/>// Subsection Name<br/>// ========================="]

    C --> D["Function/Item<br/>// --------------------------------------------------------------------------<br/>/// Function documentation<br/>/// # Arguments, # Returns, # Errors<br/>#[tauri::command]<br/>async fn function_name()"]

    D --> E["Implementation<br/>// Regular comments<br/>// explaining logic"]

    F["End of File<br/>// ==========================================================================="]

    B --> G["Module Declarations<br/>// =========================<br/>// Module Declarations<br/>// =========================<br/>pub mod commands;"]

    B --> H["External Imports<br/>// =========================<br/>// External Imports<br/>// =========================<br/>use tauri::AppHandle;"]

    classDef fileLevel fill:#E5F5E0,stroke:#31A354,color:#31A354
    classDef majorSection fill:#E6E6FA,stroke:#756BB1,color:#756BB1
    classDef subsection fill:#EFF3FF,stroke:#9ECAE1,color:#3182BD
    classDef function fill:#FFF5EB,stroke:#FD8D3C,color:#E6550D
    classDef implementation fill:#F2F0F7,stroke:#BCBDDC,color:#756BB1
    classDef endFile fill:#E5E1F2,stroke:#C7C0DE,color:#8471BF
    classDef modules fill:#EAF5EA,stroke:#C6E7C6,color:#77AD77

    class A fileLevel
    class B majorSection
    class C subsection
    class D function
    class E implementation
    class F endFile
    class G,H modules

File-Level Documentation

#![allow(unused)]
fn main() {
// ============================================================================
//! Module name and purpose.
//!
//! This module provides functionality for:
//! - Feature 1
//! - Feature 2
//! - Feature 3
// ============================================================================
}

Major Section Dividers

#![allow(unused)]
fn main() {
// ============================================================================
// Section Name (e.g., "Tauri Command Definitions")
// ============================================================================
}

Subsection Headers

#![allow(unused)]
fn main() {
// =========================
// Subsection Name (e.g., "Debug Commands (Debug Only)")
// =========================
}

Function/Item Dividers

#![allow(unused)]
fn main() {
// --------------------------------------------------------------------------
/// Function description with full rustdoc.
///
/// # Arguments
/// * `param` - Parameter description
///
/// # Returns
/// Returns description.
///
/// # Errors
/// Error conditions.
///
/// # Examples
/// ```no_run
/// // Usage example
/// ```
#[tauri::command]
async fn function_name() -> Result<(), AppError> {
    // Implementation
}
}

Module Declarations Section

#![allow(unused)]
fn main() {
// =========================
// Module Declarations
// =========================
pub mod commands;
pub mod services;
// etc.
}

Import Section

#![allow(unused)]
fn main() {
// =========================
// External Imports
// =========================
use std::collections::HashMap;
use tauri::{AppHandle, Manager};
// etc.
}

Setup/Initialization Comments

#![allow(unused)]
fn main() {
// =========================
// Initial Setup (Description of what's being set up)
// =========================
}

End-of-File Marker

#![allow(unused)]
fn main() {
// ===========================================================================
}

Rustdoc Comment Patterns

Standard Function Documentation

#![allow(unused)]
fn main() {
/// Brief one-line description of what the function does.
///
/// More detailed explanation if needed, including behavior,
/// side effects, and important implementation details.
///
/// # Arguments
/// * `param1` - Description of first parameter
/// * `param2` - Description of second parameter
///
/// # Returns
/// Description of return value and what it represents.
///
/// # Errors
/// Description of when and why the function might return an error.
///
/// # Examples
/// ```no_run
/// let result = function_name(param1, param2)?;
/// assert_eq!(result, expected_value);
/// ```
}

Tauri Command Documentation

#![allow(unused)]
fn main() {
/// Brief description of the command's purpose.
///
/// # Arguments
/// * `param` - Parameter description
///
/// # Returns
/// Returns `Ok(())` on success.
///
/// # Errors
/// Returns `AppError` if the operation fails.
///
/// # Example
/// ```no_run
/// // In frontend: invoke('command_name', { param })
/// ```
}

Debug-Only Function Documentation

#![allow(unused)]
fn main() {
/// Debug: Brief description of debug functionality.
///
/// This function is only available in debug builds.
}

Module Documentation

#![allow(unused)]
fn main() {
//! Module name and purpose.
//!
//! This module provides [specific functionality] for the Speakr application:
//! - Feature/capability 1
//! - Feature/capability 2
//! - Feature/capability 3
//!
//! # Usage
//! Brief usage example or important notes.
}

Documentation Checklist Template

For each file, ensure:

File-level documentation: Module-level rustdoc comment explaining purpose and contents
Function documentation: All public functions have comprehensive rustdoc
- Purpose and behavior description
- Parameters documented with # Arguments
- Return values documented with # Returns
- Error conditions documented with # Errors
- Usage examples where appropriate with # Examples
Type documentation: All public structs, enums, and traits documented
Large comment signposts: Major sections clearly marked
Code organization: Related code grouped logically
Orphaned comments: Removed outdated or irrelevant comments
Formatting: Consistent with rustfmt standards
Testing: Code compiles and tests pass after changes

Priority Order

High Priority (Core functionality):
- speakr-types/src/lib.rs (shared types)
- speakr-core/src/lib.rs (core functionality)
- speakr-tauri/src/main.rs (application entry)
Medium Priority (Services and commands):
- speakr-tauri/src/services/* (service modules)
- speakr-tauri/src/commands/* (command modules)
- speakr-tauri/src/settings/* (settings modules)
Lower Priority (Supporting modules):
- speakr-tauri/src/audio/* (audio modules)
- speakr-tauri/src/debug/* (debug modules)
- speakr-ui/src/* (UI modules)
- speakr-core/src/model/* (model modules)
Test Files (Documentation focused on test clarity):
- All test files in tests/ directories

Notes

Completed: speakr-tauri/src/lib.rs - Comprehensive documentation added with clear sections and detailed rustdoc comments
Next Target: Recommend starting with speakr-types/src/lib.rs as it contains shared types used across the project
Testing: Always run cargo fmt, cargo clippy, and cargo test before committing changes
Commit Strategy: Document and commit files in logical groups (e.g., all service files together)

Keyboard shortcuts

Speakr Documentation

title: Product Requirements Document – Speakr version: 2025-07-20 status: Draft authors: David Jessup

title: Technical Architecture – Speakr version: 2025-07-20 status: Draft