ποΈ Speakr Documentation
note
Speakr is a privacy-first, hot-keyβdriven dictation utility that turns your speech into typed text entirely on-device. No cloud, no latency, no compromises.
β¨ What is Speakr?
Speakr transforms the way you capture thoughts into text. With a single keystroke, record speech, transcribe it locally using Whisper models, and have the text instantly typed into any application. Perfect for developers, writers, and anyone who thinks faster than they type.
π Privacy First
- 100% offline processing β your voice never leaves your device
- No cloud dependencies β works in air-gapped environments
- Minimal permissions β only microphone and accessibility access
β‘ Built for Speed
- β€ 3 second end-to-end latency for 5-second recordings
- Global hotkeys work across all applications
- Lightweight universal macOS binary < 20 MB
π§ Navigate the Documentation
tip
Use the search box (β/Ctrl + K) to quickly jump to any topic, or browse by your role below.
π Product & Planning
Document | Description | Audience |
---|---|---|
Product Requirements | Vision, goals, and feature specifications | Product owners, stakeholders |
Implementation Plan | Development roadmap and milestones | Project managers, engineers |
ποΈ Architecture & Engineering
Document | Description | Audience |
---|---|---|
Technical Architecture | System design and component overview | Engineers, architects |
System Description | Detailed system behaviour and flows | Developers, maintainers |
Development Overview | Getting started with development | New contributors |
π Functional Specifications
Document | Description | Status |
---|---|---|
FR-1: Global Hotkey | Hot-key registration and handling | β Implemented |
FR-2: Audio Capture | Microphone access and recording | β Implemented |
FR-3: Transcription | Local Whisper integration | π In Progress |
FR-4: Text Injection | Cross-app text insertion | π In Progress |
FR-5: Injection Fallback | Clipboard fallback mechanism | π Planned |
FR-6: Settings UI | Configuration interface | β Implemented |
warning
See Specs Overview for the complete functional requirements including non-functional requirements (NFRs) for security, performance, and accessibility.
π§ Development & Debugging
Document | Description | Audience |
---|---|---|
Debug Panel | Development and troubleshooting tools | Developers, QA |
Pre-commit Hooks | Code quality and testing setup | Contributors |
Tauri Plugins | Plugin architecture and integrations | Backend developers |
π Quick Start
note
New to the project? Start with the Development Overview for setup instructions.
For Product People
- Read the Product Requirements to understand the vision
- Check the Implementation Plan for current progress
- Review Functional Specs for detailed features
For Engineers
- Study the Technical Architecture for system design
- Follow Development Setup to get coding
- Reference System Description for implementation details
For Contributors
- Set up pre-commit hooks for code quality
- Browse functional requirements to find tasks
- Use the Debug Panel for development workflow
π Project Status
tip
Current Focus: Core transcription engine and text injection reliability
Component | Status | Notes |
---|---|---|
Global Hotkeys | β Complete | Cross-app hotkey registration working |
Audio Capture | β Complete | High-quality microphone input |
Settings UI | β Complete | Leptos-based configuration interface |
Transcription | π Active | Whisper integration in progress |
Text Injection | π Active | Cross-app compatibility improvements |
Model Management | π Planned | GGUF model download and validation |
π€ Contributing
note
This documentation is a living document. Found something unclear or outdated?
- π Browse specs in the specs directory for implementation tasks
- π Report issues via GitHub Issues
- π Improve docs by opening a pull request
- π‘ Suggest features in GitHub Discussions
Built with π¦ Rust, β‘ Tauri 2, and π¨ Leptos
Privacy-first dictation for the modern developer
title: Product Requirements Document β Speakr version: 2025-07-20 status: Draft authors: David Jessup
Product Requirements Document β Speakr
- 1. Purpose / Vision
- 2. Problem Statement
- 3. Goals & Non-Goals
- 4. Personas
- 5. User Stories
- 6. Functional Requirements
- 7. Non-Functional Requirements
- 8. Metrics / KPIs
- 9. Milestones
- 10. Open Questions
- 11. Appendix β Stakeholders & Review
1. Purpose / Vision
Speakr is a privacy-first dictation hot-key utility for macOS (Windows/Linux later). In a single keystroke, users can record speech, transcribe entirely on-device, and have the text typed directly into any active input field. Speakr aims to be the fastest way for developers, writers, and power-users to turn fleeting thoughts into code or prose without breaking flow, and without sending audio to the cloud.
2. Problem Statement
- Switching to dedicated dictation apps breaks focus and incurs network latency.
- Many corporate or offline environments forbid cloud speech services for privacy reasons.
- OS-level dictation is unreliable for code, lacks custom hot-keys, and has high latency on older hardware.
Opportunity: A lightweight, keyboard-driven tool that works anywhere text can be typed, requires no network, and respects user privacy.
3. Goals & Non-Goals
3.1 Goals
- <= 3 s end-to-end latency for 5-second recordings on Apple Silicon (M-series).
- 100% offline β no external network calls.
- Global hot-key works in background apps.
- Support customisable models & hot-keys via UI.
- Ship notarised universal macOS binary < 20 MB (excluding model).
- Provide a clean upgrade path to Windows & Linux.
3.2 Non-Goals
- Real-time streaming (v1 may paste only after stop).
- Mobile platforms.
- Full grammar / punctuation correction.
- Server-side sync or accounts.
4. Personas
Persona | Needs / Pain-points |
---|---|
Dev Dana | Insert comments/code quickly without losing keyboard context. |
Writer Will | Draft snippets into any text editor without toggling apps. |
Privacy Peter | Dictate confidential material offline, no data leaves device. |
Accessibility Ava | Replace or augment typing due to RSI, keep workflow keyboard-first. |
5. User Stories
MoSCoW method: Must, Should, Could, Wonβt (for now)
Priority | Description |
---|---|
Must | βAs a user, I press <Opt> + ~ and my spoken words (β€30 s) are typed into the active field within ~3 s.β |
Must | βAs a user, the app asks for mic + Accessibility permissions on first run and explains why.β |
Must | βAs a user, I can change the hot-key in settings and be warned of conflicts.β |
Should | βAs a user, I can pick a smaller/faster model if my machine is slow.β |
Should | βAs a user, a subtle overlay shows βRecordingβ¦ / Transcribingβ¦β states.β |
Could | βAs an advanced user, I can turn on auto-punctuation.β |
Could | βAs an advanced user, I can add bespoke words to the dictionary.β |
Wonβt (v1) | Live transcript shown word-by-word while speaking. |
6. Functional Requirements
FR | Description |
---|---|
FR-1 | Global hot-key registers at app start and triggers record/transcribe/inject flow. |
FR-2 | Audio capture uses 16 kHz mono via cpal , max configurable duration (default 10 s). |
FR-3 | Transcription runs through Whisper (GGUF) via whisper-rs ; language default EN. |
FR-4 | Transcript is injected via synthetic keystrokes (enigo ) into current focus. |
FR-5 | If injection fails (secure field), fallback to clipboard-paste with user warning. |
FR-6 | UI (tray or window) exposes: hot-key picker, model selector, auto-launch toggle. |
FR-7 | App emits status events for UI overlay and logs (Recording, Transcribing, Error). |
FR-8 | Settings persist locally (JSON in AppData, no cloud). |
FR-9 | App auto-updates via GitHub Releases (optional in v1). |
7. Non-Functional Requirements
Category | Requirement | Metric / Acceptance |
---|---|---|
Latency | End-to-end β€ 3 s (M1, 5 s audio, small model) | 95th percentile measured in telemetry log (local). |
Footprint | Binary β€ 20 MB; RAM β€ 400 MB including model. | du -sh and Activity Monitor/smoke tests. |
Reliability | No crashes in 1-hour monkey test (500 invocations). | CI integration test + manual QA. |
Security | No outbound network sockets except auto-update domain (opt-out). | Static analysis + firewall test. |
Compatibility | macOS 13+. Intel macs may see doubled latency but functional. | QA on Intel MBP (2020) & M1. |
Accessibility | Follows macOS VoiceOver / high-contrast guidelines. | Apple Accessibility Inspector score β₯ 85. |
8. Metrics / KPIs
Metric | Target |
---|---|
Time-to-text (P95) | β€ 3 s. |
Activation success rate | β₯ 99% (hot-key triggers & types). |
Crash-free sessions | > 99.5%. |
Daily active users (DAU) | post-launch target: 1 k. |
% of transcripts requiring manual fix | < 15% (optional feedback prompt). |
9. Milestones
Milestone | Scope |
---|---|
M0 β Prototype spike | Hot-key β record β transcribe β paste (CLI) |
M1 β MVP macOS app | Tauri shell, settings window, notarised DMG |
M2 β Public beta | Auto-update, error logs, model manager |
M3 β Windows/Linux alpha | Replace injection backend, install bundles |
M4 β v1.0 GA | Streaming (optional), website + docs |
10. Open Questions
- Should we bundle a small GGUF model or trigger a first-run download wizard?
- How to handle non-Latin languages (auto-detect vs user-select)?
- Do we sandbox the app on macOS or rely on hardened runtime?
- Which licence (MIT vs GPL) given we embed Whisper weights?
- Accept user telemetry opt-in for latency metrics?
11. Appendix β Stakeholders & Review
- Product Lead β @PM
- Engineering Lead β @TechLead
- Design β @UX
- Security β @Sec
- QA β @QA
Reviews: Architecture (Tech), Security (Sec), Accessibility (UX).
System Description
Speakr β a Local Dictation Utility (Rust + Tauri + Leptos)
A tiny, privacy-first macOS desktop app that listens for a global hot-key, records a short audio clip, transcribes it locally with Whisper, then types the text into whatever currently has focus.
Everything runs on-device; no network calls (besides the initial model download).
1. System Overview
ββββββββββββββββββββββββββββββββ
β Speakr (UI) β β Leptos + Tauri WebView (optional window / tray)
βββββββββββββββββ¬βββββββββββββββ
β <invoke/emit>
Global Shortcut β² Settings (model path, hot-key, β¦)
βΌ β
βββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββ
β speakr-core (Rust lib) β
β β
β 1. Audio capture β **cpal** β
β 2. Transcription β **whisper-rs** (GGUF models) β
β 3. Text inject β **enigo** (synthetic keys) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Global shortcut, audio, and keystroke injection all live in the backend so Speakr continues to work when the UI window is hidden.
2. Key Crates & Decisions
Concern | Crate / Tool | Why it was chosen |
---|---|---|
Hot-key | tauri-plugin-global-shortcut = "2" | Official plugin, cross-platform, Tauri β₯ 2.0 |
Audio capture | cpal = "0.15" | Mature, async-friendly, works on macOS/Win/Linux |
Speech-to-Text | whisper-rs = "0.8" | Safe Rust bindings to whisper.cpp; supports GGUF models |
Keystroke injection | enigo = "0.1" | Simple cross-platform input simulation |
UI | leptos = "0.6" + trunk | All-Rust reactive UI compiled to WASM |
Async runtime | tokio = "1" (multi-thread) | Needed for non-blocking recording & transcription |
TipβQuantised small.en.gguf (~30 MB) loads in β 2 s on Apple Silicon and is usually accurate enough for notes & code comments.
3. Workspace Layout
/speakr
ββ speakr-core # library crate (audio β text β inject)
ββ speakr-tauri # Tauri shell (`src-tauri` here)
ββ speakr-ui # Leptos front-end (optional window)
ββ models/ggml-small.en.gguf # user-downloaded Whisper model
Use a Cargo workspace so all three crates share versions and CI.
4. Bootstrapping
4.1 Prerequisites
- Rust 1.88.0 + (stable)
- Node 18 + & pnpm/yarn/npm (for Tauri/Trunk helpers)
- Xcode Command-Line Tools (macOS)
- Download a GGUF Whisper model β
models/ggml-small.en.gguf
4.2 Create the workspace
cargo new --lib speakr-core
cargo tauri init --template leptos speakr-tauri # generates src-tauri + Leptos wiring
cd speakr-tauri
pnpm tauri add global-shortcut # JavaScript guest bindings
(Add a sibling speakr-ui
crate only if you want the UI separate from the template.)
5. Core Library (speakr-core)
Cargo.toml
[package]
name = "speakr-core"
version = "0.1.0"
edition = "2021"
[dependencies]
cpal = "0.15"
whisper-rs = { version = "0.8", features = ["whisper-runtime-cpu"] }
enigo = "0.1"
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
anyhow = "1"
#![allow(unused)] fn main() { use anyhow::*; use cpal::traits::*; use enigo::*; use std::sync::mpsc; use whisper_rs::{FullParams, SamplingStrategy, WhisperContext}; pub struct Speakr { whisper: WhisperContext, enigo: Enigo, } impl Speakr { pub fn new(model_path: &str) -> Result<Self> { Ok(Self { whisper: WhisperContext::new(model_path)?, enigo: Enigo::new(), }) } pub async fn capture_and_type(&mut self, seconds: u32) -> Result<()> { // 1οΈβ£ Capture PCM samples -------------------------------------------------- let (tx, rx) = mpsc::sync_channel(seconds as usize * 16_000); let host = cpal::default_host(); let dev = host.default_input_device().context("no input device")?; let cfg = dev.default_input_config()?.into(); let stream = dev.build_input_stream( &cfg, move |data: &[f32], _| { for &s in data { let _ = tx.send(s); } }, move |e| eprintln!("cpal error: {e}"), None, )?; stream.play()?; let mut samples = Vec::with_capacity(seconds as usize * 16_000); for _ in 0..seconds * 16_000 { samples.push(rx.recv()?); } drop(stream); // 2οΈβ£ Transcribe ----------------------------------------------------------- let mut params = FullParams::new(SamplingStrategy::Greedy { best_of: 1 }); params.set_language(Some("en")); let text = self.whisper.full(params, &samples)?; // 3οΈβ£ Inject --------------------------------------------------------------- self.enigo.text(&text); Ok(()) } } }
6. Tauri Backend (speakr-tauri / src-tauri
)
`src-tauri/Cargo.toml` extras
[dependencies]
speakr-core = { path = "../speakr-core" }
# Tauri β₯ 2.0 API-complete build
tauri = { version = "2", features = ["api-all"] }
# Global hot-key plugin
tauri-plugin-global-shortcut = "2"
tokio = "1"
anyhow = "1"
#![cfg_attr(not(debug_assertions), windows_subsystem = "windows")] use speakr_core::Speakr; use std::sync::Mutex; use tauri::{Manager, State}; struct AppState(Mutex<Option<Speakr>>); #[tauri::command] async fn transcribe(state: State<'_, AppState>) -> Result<(), String> { let mut guard = state.0.lock().unwrap(); guard .as_mut() .ok_or("model not ready")? .capture_and_type(10) // 10 s max .await .map_err(|e| e.to_string()) } fn main() { tauri::Builder::default() .plugin(tauri_plugin_global_shortcut::init()) .manage(AppState(Mutex::new(None))) .setup(|app| { // Pre-load Whisper model once at startup let model = Speakr::new("../models/ggml-small.en.gguf")?; *app.state::<AppState>().0.lock().unwrap() = Some(model); // Register ββ₯Space #[cfg(desktop)] app.global_shortcut().register("CMD+OPTION+SPACE", move || { let handle = app.app_handle(); tauri::async_runtime::spawn(async move { let _ = handle.invoke("transcribe", &()).await; }); })?; Ok(()) }) .invoke_handler(tauri::generate_handler![transcribe]) .run(tauri::generate_context!()) .expect("error while running Speakr"); }
Capability JSONβAdd
global-shortcut:allow-register
tosrc-tauri/capabilities/default.json
(see Tauri docs for full schema).
7. Leptos Front-End (optional)
The Tauri template already wires Trunk + Leptos. A minimal status UI:
#![allow(unused)] fn main() { use leptos::*; use tauri_use::{use_invoke, UseTauri}; // helper hooks #[component] pub fn App() -> impl IntoView { let UseTauri { trigger: transcribe, .. } = use_invoke::<()>(&"transcribe"); let (status, set_status) = create_signal("Idle"); // Listen for status updates from backend leptos::window_event_listener("speakr-status", move |evt: String| set_status(evt)); view! { <div class="p-4"> <h1 class="text-xl font-bold">Speakr</h1> <p>{move || format!("Status: {status()}")}</p> <button class="mt-4 bg-blue-600 text-white px-3 py-1 rounded" on:click=move |_| transcribe()> "Record & Type" </button> </div> } } }
tauri.conf.json
should already contain:
{
"build": {
"beforeDevCommand": "trunk serve",
"beforeBuildCommand": "trunk build --release",
"devUrl": "http://localhost:1420",
"frontendDist": "../dist"
},
"app": { "withGlobalTauri": true }
}
8. macOS Permissions
- Microphone β Tauri adds
NSMicrophoneUsageDescription
automatically when you enable audio. - Accessibility β Ask the user to enable Speakr under System Settings β Privacy & Security β Accessibility so Enigo keystrokes reach other apps.
- Codesign & Notarise β For distribution run:
cargo tauri build --target universal-apple-darwin # produces .app bundle
# then codesign & notarise with `xcrun notarytool`
9. Dev & Release Workflow
# hot-reload UI + backend
trunk serve & # terminal 1 β WASM
cargo tauri dev # terminal 2 β desktop shell
# production
trunk build --release # build UI assets
cargo tauri build # build .app or MSI/DEB
10. Performance Levers
Lever | Effect | Hint |
---|---|---|
Model size | Latency vs accuracy | tiny.en β 30 MB loads fastest |
params.set_* | Threads / strategy | Set set_num_threads(num_cpus::get()) |
Audio chunk length | Turn-around time | Push-to-talk (β€ 10 s) keeps UI snappy |
VAD (optional) | Trim silence & hallucination | Add webrtc-vad if needed |
11. Roadmap Ideas
- Config window for model selection & hot-key change
- Streaming, real-time transcription (partial results)
- Windows/Linux support (replace Enigo backend where needed)
- Auto-punctuation & language detection
π You now have a single, coherent guideβmerge of all three GPT draftsβready to get Speakr typing for you on macOS in a weekend
title: Technical Architecture β Speakr version: 2025-07-20 status: Draft
Speakr β Technical Architecture
- 1. Purpose
- 2. High-Level Architecture
- 3. Crate & Directory Layout
- 4. Runtime Flow (Happy Path)
- 5. Concurrency & Safety
- 6. Security & Permissions
- 7. Build & Packaging
- 8. Extensibility Points
- 9. Risks & Mitigations
- 10. Future Roadmap
1. Purpose
Speakr is a privacy-first hot-key dictation utility for macOS (with Windows/Linux on the roadmap). When the user presses a global shortcut, it records a short audio segment, runs an on-device Whisper model, and synthesises keystrokes to type the transcript into the currently-focused application β all in under a few seconds.
2. High-Level Architecture
flowchart TB subgraph Tauri Shell direction TB GlobalShortcut["Global Shortcut<br/><i>tauri-plugin-global-shortcut</i>"] IPC["IPC Bridge<br/><i>tauri invoke / emit</i>"] Tray["System Tray / UI<br/><i>Leptos + WASM</i>"] end subgraph Core Library direction TB Recorder["Audio Recorder<br/><i>cpal</i>"] STT["Speech-to-Text<br/><i>whisper-rs</i>"] Injector["Text Injector<br/><i>enigo</i>"] end GlobalShortcut -- "hot-key pressed" --> Recorder Recorder -- "PCM samples" --> STT STT -- "transcript" --> Injector Injector -- "keystrokes" --> FocusApp(["Focused Application"]) %% UI flow Recorder -- "status events" --- IPC STT ---- IPC Injector --- IPC IPC ==> Tray
Key points:
- All heavy-weight logic lives in pure Rust (
speakr-core
). The UI may be hidden without affecting functionality. - No network access β Whisper runs entirely on-device.
- Plugin isolation β Optional features (auto-start, clipboard, etc.) are added via Tauri plugins with explicit capability JSON.
3. Crate & Directory Layout
Layer | Crate / Path | Main Responsibilities |
---|---|---|
Core | speakr-core/ | Record audio (cpal) β transcribe (whisper-rs) β inject text (enigo) |
Backend | speakr-tauri/ | Registers global hot-key, exposes #[tauri::command] wrappers, persists settings |
Frontend | speakr-ui/ (optional) | Leptos WASM UI for tray, preferences, status overlay |
Assets | models/ | GGUF Whisper models downloaded post-install |
All crates live in a single Cargo workspace to guarantee compatible dependency versions.
3.1 Speakr-Tauri Internal Structure
The speakr-tauri
backend is organised into focused modules for maintainability and testability:
speakr-tauri/src/
βββ commands/ # Tauri command implementations
β βββ mod.rs # Command organisation and documentation
β βββ validation.rs # Input validation (hotkey format, etc.)
β βββ system.rs # System integration (model availability, auto-launch)
β βββ legacy.rs # Backward compatibility commands
βββ services/ # Background services and state management
β βββ mod.rs # Service coordination
β βββ hotkey.rs # Global hotkey registration and management
β βββ status.rs # Backend service status tracking
β βββ types.rs # Shared service types and enums
βββ settings/ # Configuration persistence and validation
β βββ mod.rs # Settings management
β βββ persistence.rs # File I/O for settings
β βββ migration.rs # Settings schema migration
β βββ validation.rs # Settings validation logic
βββ debug/ # Debug-only functionality
β βββ mod.rs # Debug command coordination
β βββ commands.rs # Debug-specific Tauri commands
β βββ storage.rs # Debug log storage
β βββ types.rs # Debug-specific types
βββ audio/ # Audio handling utilities
β βββ mod.rs # Audio module coordination
β βββ files.rs # Audio file operations
β βββ recording.rs # Audio recording helpers
βββ lib.rs # Tauri app setup, command registration
Key architectural principles:
- Separation of concerns: Business logic in
*_internal()
functions, Tauri integration inlib.rs
- Testability: Internal functions can be tested without Tauri runtime overhead
- Modularity: Commands grouped by functional domain rather than technical implementation
- Documentation: Each module has comprehensive rustdoc explaining its purpose and usage
4. Runtime Flow (Happy Path)
Step | Thread/Task | Action | Typical Latency |
---|---|---|---|
1 | Main (OS) | User presses ββ₯Space | β |
2 | Tauri shortcut handler | Spawns async task transcribe() | < 1 ms |
3 | Tokio worker | cpal::Stream captures 16-kHz mono PCM into ring-buffer | 0β10 s (configurable) |
4 | Same task | PCM fed into whisper_rs::full() | ~1 s per 10 s audio on M-series |
5 | Same task | Transcript returned β enigo.text() synthesises keystrokes | β€ 300 ms |
6 | UI task | Frontend receives status events via emit() and updates overlay | realtime |
Failure cases (no mic, model missing, permission denied) surface via error events and native notifications.
5. Concurrency & Safety
- Tokio multi-thread runtime drives asynchronous recording and Whisper inference.
- The
AppState(Mutex<Option<Speakr>>)
guards the singleton Whisper context; loading occurs once at app start. - Hot-key handler offloads work to the runtime to keep the UI thread non-blocking.
- Audio buffer uses a bounded
sync_channel
to avoid unbounded RAM growth.
6. Security & Permissions
Platform | Permission | Why | Request Mechanism |
---|---|---|---|
macOS | Microphone access | Record audio | NSMicrophoneUsageDescription (Info.plist) |
macOS | Accessibility | Send synthetic keystrokes | User enables app in System Settings βΈ Accessibility |
All | Global shortcut | Register hot-key | global-shortcut:allow-register capability |
The app runs offline; no data leaves the device.
7. Build & Packaging
- Dev:
trunk serve &
(frontend) +cargo tauri dev
(backend) - Release:
trunk build --release
βcargo tauri build
- macOS notarisation:
xcrun notarytool submit --wait
after codesign. - Universal binary size β 15 MB (+ model).
8. Extensibility Points
- Voice Activity Detection: plug-in
webrtc-vad
before Whisper to auto-stop on silence. - Streaming transcripts: call
whisper_rs::full_partial()
and enqueue keystrokes incrementally. - Multi-language: set
params.set_language(None)
for auto-detect. - Cross-platform: replace
enigo
backend withsend_input
(Win) orxdo
(X11) while keeping public API.
9. Risks & Mitigations
Risk | Mitigation |
---|---|
Keystroke injection blocked in secure fields | Fallback to clipboard-paste mode with warning |
Whisper latency on older CPUs | Offer tiny.en.gguf and shorter max record time |
Shortcut clashes | UI lets user redefine hot-key and validates uniqueness |
Model file missing/corrupt | Verify checksum on load and show error dialogue |
10. Future Roadmap
- Settings sync via
tauri-plugin-store
(JSON in AppData). - Auto-start on login (
tauri-plugin-autostart
). - GPU inference when Whisper Metal backend stabilises.
- Installer bundles (DMG/MSI/DEB) with model downloader.
This document replaces the previous placeholder docs/ARCHITECTURE.md
and should be kept
up-to-date with all architectural changes.
Development Overview
Pre-commit Setup and Optimization
"Quality is not an act, it's a habit." β Aristotle
This document describes Speakr's pre-commit hook configuration, optimization strategies, and future improvement opportunities.
π Table of Contents
- Overview
- Current Setup
- Optimizations
- Usage Guide
- Performance Metrics
- Future Improvements
- Troubleshooting
Overview
Pre-commit hooks ensure code quality by running automated checks before each commit. This prevents broken code from entering the repository and maintains consistent coding standards across the team.
Why Pre-commit?
- Early Detection: Catch issues before they reach CI/CD
- Consistent Quality: Enforce formatting and linting standards
- Fast Feedback: Immediate results during development
- Team Alignment: Same standards for all contributors
Current Setup
Our optimized pre-commit configuration targets affected packages only, reducing execution time by ~70% for typical changes.
Configuration Files
.pre-commit-config.yaml
: Main configurationscripts/selective-tests.sh
: Advanced selective testing script
Hook Categories
1. Package-Specific Rust Hooks
speakr-core (triggered by ^speakr-core/.*\.rs$
):
cargo-fmt-core
: Code formatting checkcargo-clippy-core
: Linting with all warnings as errorscargo-test-core
: Unit and integration tests
speakr-tauri (triggered by ^speakr-tauri/.*\.rs$
):
cargo-fmt-tauri
: Code formatting checkcargo-clippy-tauri
: Linting with all warnings as errorscargo-test-tauri
: Unit and integration tests
speakr-ui (triggered by ^speakr-ui/.*\.rs$
):
cargo-fmt-ui
: Code formatting checkcargo-clippy-ui
: Linting with all warnings as errorscargo-test-ui
: Unit and integration tests
2. Workspace-Level Hooks
Workspace Changes (triggered by ^(Cargo\.(toml|lock)|\.cargo/.*)$
):
cargo-fmt-workspace
: Format all packagescargo-clippy-workspace
: Lint entire workspace
3. Smart Integration Hooks
Dependency Awareness:
cargo-test-integration
: Whenspeakr-core
changes, also testspeakr-tauri
(dependency relationship)
4. General Quality Hooks
- Trailing whitespace: Remove unnecessary whitespace
- YAML/JSON/TOML validation: Syntax checking
- Large file detection: Prevent accidental commits of large files
- Merge conflict detection: Catch unresolved conflicts
- Markdown linting: Documentation quality
Optimizations
π― Selective Package Testing
Problem: Previous setup ran all checks on all packages for any Rust file change.
Solution: File pattern matching to target only affected packages.
# Before: Always runs on ANY .rs file
files: \.rs$
entry: cargo test --all
# After: Only runs on speakr-core files
files: ^speakr-core/.*\.rs$
entry: cargo test --package speakr-core
π§ Dependency-Aware Testing
Problem: Changes to speakr-core
could break speakr-tauri
without running its tests.
Solution: Smart integration testing when dependencies change.
# Integration test: core changes affect tauri
- id: cargo-test-integration
name: Cargo Test (integration - core affects tauri)
entry: cargo test --package speakr-tauri
files: ^speakr-core/.*\.rs$ # Triggered by core changes
β‘ Performance Optimizations
- Parallel Execution: Each package's hooks can run in parallel
- Targeted Scoping: Only affected code gets checked
- Smart Caching: Cargo's incremental compilation benefits
- Early Exit: Hooks fail fast on first error
Usage Guide
Installation
# Install pre-commit (if not already installed)
pip install pre-commit
# Install hooks in repository
pre-commit install
# Optional: Install for push events too
pre-commit install -t pre-push
Daily Workflow
Automatic (Recommended):
git add .
git commit -m "feat: add new feature"
# Hooks run automatically, commit proceeds if all pass
Manual Testing:
# Run all hooks on all files
pre-commit run --all-files
# Run specific hook
pre-commit run cargo-fmt-core
# Run on specific files
pre-commit run --files speakr-core/src/lib.rs
Advanced Selective Testing
For maximum control, use our custom script:
# Test only packages affected by changes since last commit
./scripts/selective-tests.sh
# Compare against specific commit/branch
./scripts/selective-tests.sh main
./scripts/selective-tests.sh abc123def
# Get help
./scripts/selective-tests.sh --help
Bypassing Hooks (Emergency Only)
# Skip all hooks (use sparingly!)
git commit -m "hotfix: urgent fix" --no-verify
# Skip specific hook
SKIP=cargo-test-core git commit -m "fix: skip tests temporarily"
Performance Metrics
Before Optimization
- Total packages checked: 3/3 (100%)
- Average execution time: ~45 seconds
- Parallel efficiency: Low (redundant work)
After Optimization
- Typical single-package change: 1/3 packages (33%)
- Average execution time: ~15 seconds (70% improvement)
- Parallel efficiency: High (targeted work)
- Smart dependencies: Core changes β Core + Tauri tests
Real-world Example
Scenario: Modify speakr-ui/src/app.rs
Before: β Tests all 3 packages (~45s)
After: β Tests only speakr-ui
package (~12s)
Speedup: 3.75x faster π
Future Improvements
π Performance Enhancements
1. Incremental Testing with Coverage
Goal: Only run tests affected by specific code changes, not entire packages.
Implementation:
# Future: Ultra-granular testing
cargo test --package speakr-core -- --test-affected-by src/audio.rs
Tools to explore:
cargo-difftests
: Selective re-testing framework- LLVM coverage analysis for affected test discovery
determinator
: Facebook's affected package detection
2. Caching and Memoization
Goal: Skip checks if code hasn't changed since last successful run.
Implementation:
# Cache test results based on content hash
- id: cargo-test-cached
entry: cache-wrapper cargo test --package speakr-core
cache_key: "hash:speakr-core/**/*.rs"
Benefits:
- Near-instant results for unchanged code
- Perfect for repeated CI runs on same commit
3. Parallel Package Testing
Goal: Run different package tests truly in parallel.
Current: Sequential package testing Future: Matrix-style parallel execution
# Run in parallel using job control
cargo test --package speakr-core &
cargo test --package speakr-tauri &
cargo test --package speakr-ui &
wait # Wait for all to complete
π Enhanced Feedback
1. Rich Diff Display
Goal: Show exactly what code caused failures.
Implementation:
# Future: Rich failure reporting
cargo clippy --message-format json | jq -r '.spans[] | .file_name + ":" + .line_start'
Features:
- Syntax-highlighted diffs
- Click-to-fix suggestions
- Context-aware error messages
2. Performance Profiling
Goal: Track and optimize hook execution time.
Metrics to collect:
- Per-hook execution time
- Cache hit/miss ratios
- Package-level timing breakdown
- Historical performance trends
3. Smart Notifications
Goal: Contextual feedback based on change type.
Examples:
# API changes detected
β οΈ Public API modified in speakr-core - consider semver impact
# Performance impact detected
π Tests are 20% slower - check for performance regressions
# Security sensitive changes
π Cryptographic code modified - extra security review recommended
π§ͺ Test Quality Improvements
1. Mutation Testing Integration
Goal: Ensure tests actually catch bugs.
Implementation:
# Run mutation tests on changed code
cargo mutants --package speakr-core --in-diff HEAD~1..HEAD
2. Dependency Impact Analysis
Goal: Understand full impact of changes across the dependency graph.
Visualization:
speakr-core change impact:
βββ speakr-core (direct) β
βββ speakr-tauri (depends on core) β
βββ speakr-ui (independent) βοΈ skipped
3. Flaky Test Detection
Goal: Identify and fix unreliable tests.
Implementation:
- Run tests multiple times in CI
- Track test success/failure rates
- Auto-quarantine flaky tests
- Generate flakiness reports
π§ Developer Experience
1. IDE Integration
Goal: Show pre-commit status in development environment.
Features:
- Real-time hook status in VS Code/Cursor
- Inline error highlighting
- One-click fix suggestions
2. Hook Customization
Goal: Allow per-developer customization.
Implementation:
# .pre-commit-config.local.yaml (git-ignored)
hooks:
- id: cargo-clippy-core
args: ["--", "-A", "clippy::pedantic"] # Less strict for local dev
3. Quick Fix Tools
Goal: Automated fixing of common issues.
Examples:
# Auto-fix formatting
pre-commit run cargo-fmt-core --hook-stage manual
# Auto-fix common clippy warnings
cargo clippy --fix --allow-dirty
# Auto-update dependencies
cargo update && pre-commit run cargo-test-all
Troubleshooting
Common Issues
Hook Fails with "Package not found"
Cause: Package name mismatch in hook configuration.
Solution: Verify package names match Cargo.toml
files:
cargo metadata --format-version 1 | jq '.packages[].name'
Tests Pass Locally but Fail in CI
Cause: Different dependency versions or environment.
Solution: Use Cargo.lock
and consistent Rust versions:
# CI configuration
rust-toolchain: "1.88.0" # Pin exact version
Hooks Run on Wrong Files
Cause: Incorrect regex patterns in files:
configuration.
Solution: Test patterns with realistic file paths:
# Test regex pattern
echo "speakr-core/src/lib.rs" | grep -E "^speakr-core/.*\.rs$"
Performance Issues
Slow Hook Execution
- Check package scoping: Ensure hooks target specific packages
- Review test suite: Look for slow integration tests
- Enable caching: Use
--cache-dir
for cargo operations
Memory Issues
- Limit parallel jobs: Set
CARGO_BUILD_JOBS=2
- Increase memory limits: Configure system swap
- Use release mode for tests:
cargo test --release
(if appropriate)
Getting Help
- Check configuration: Validate with
pre-commit validate-config
- Debug mode: Run with
pre-commit run --verbose
- Clean cache: Use
pre-commit clean
to reset - Manual testing: Test individual hooks in isolation
References
- Pre-commit Documentation
- Cargo Book - Workspaces
- Rust RFC - Cargo Selective Testing
- Speakr Development Guide
Implementation Plan β Speakr
A step-by-step roadmap to deliver the Speakr application using the test-driven, multi-crate
approach defined in the specification set under docs/specs/
.
1. Repository Scaffold
Reference: INIT-01 Project Scaffold
- Execute the migration steps to create the Cargo workspace (
speakr-core
,speakr-tauri
, optionalspeakr-ui
). - Commit and open a draft PR; CI should fail until tests are added.
- Add baseline CI workflows (lint, build, placeholder tests) that currently fail.
2. Core Library (speakr-core
)
Order | Spec | Task |
---|---|---|
2.1 | FR-2 | Implement audio capture (cpal ). Begin with failing unit test asserting 16-kHz mono stream & duration cap. |
2.2 | FR-3 | Implement transcription (whisper-rs ). Add latency test harness. |
2.3 | FR-4 | Implement text injection (enigo ). Integration tests across editors via mock window focus. |
2.4 | FR-5 | Implement clipboard fallback; write secure-field simulation tests. |
2.5 | FR-7 | Emit status events; test channel delivery & ordering. |
Merge each sub-task when its tests pass and CI is green.
3. Tauri Backend (speakr-tauri
)
Order | Spec | Task |
---|---|---|
3.1 | FR-1 | Register global hot-key via tauri-plugin-global-shortcut ; write E2E test with headless Tauri window. |
3.2 | β | Wire hot-key β async call into speakr-core pipeline; ensure status events are forwarded via emit . |
3.3 | FR-8 | Add settings persistence (JSON). Unit tests for load/save & corruption recovery. |
4. Front-End (Leptos)
Order | Spec | Task |
---|---|---|
4.1 | FR-6 | Build Settings & Status overlay UI; write component tests with Leptos testing utilities. |
4.2 | NFR-accessibility | Add automated axe-core & VoiceOver tests. |
5. Cross-Cutting Non-Functional Work
Spec | Focus |
---|---|
NFR-latency | Optimise model loading & thread usage; ensure performance tests pass. |
NFR-footprint | Strip symbols, enable lto , audit memory. |
NFR-reliability | Add monkey-test CI job (500 invocations). |
NFR-security | Socket-mock tests, Hardened Runtime flags, notarisation script. |
NFR-compatibility | Add Intel macOS runner to CI. |
6. Auto-Update
Reference: FR-9 Auto-update
- Integrate update check using
tauri-plugin-updater
(or custom). - Write integration tests mocking GitHub Releases API & download validation.
7. Documentation & Release
- Update
docs/book/
with usage & contribution guide. - Ensure
mdbook
build passes in CI. - Produce signed DMG via CI; attach to GitHub Release.
Progress Checklist
-
- Preparation complete
- Status: Preparation tasks completed.
- Preparation complete
-
- Repository scaffold merged (INIT-01)
- Status: Repository scaffold implemented (4 crates;
speakr-core
(backend processing),speakr-tauri
(Tauri backend),speakr-ui
(Leptos front-end) andspeakr-types
(shared types)).
- Status: Repository scaffold implemented (4 crates;
- Repository scaffold merged (INIT-01)
- 2.1 Audio capture (FR-2) implemented & tested - Status: Audio capture tested via debug UI, verified WAV file is written to disk and contains the expected audio.
- 2.2 Transcription (FR-3) implemented & tested - Status: Not started
- 2.3 Text injection (FR-4) implemented & tested - Status: Not started
- 2.4 Injection Not started - Status: Preparation tasks completed.
- 2.5 Status events (FR-7) implemented & tested - Status: Not started
- 3.1 Global hot-key (FR-1) registered & tested - Status: Not started
- 3.2 Backend pipeline wired - Status: Not started
- 3.3 Settings persistence (FR-8) implemented & tested - Status:
- [~] 4.1 Settings UI (FR-6) implemented & tested - Status: Preparation tasks completed.
- 4.2 Accessibility audits (NFR-accessibility) passing - Status: Preparation tasks completed.
- Non-functional targets (Latency, Footprint, Reliability, Security, Compatibility) met
- Auto-update (FR-9) implemented & tested
- Docs & Release pipeline finished
Tick each box as the corresponding PR merges with passing CI.
Recent Progress (2025-07-20)
- Scaffolded
speakr-core
library crate and added it to the workspace manifest. - Added stub implementation (
record_to_vec
) and constants inspeakr-core::audio
. - Committed failing unit test
audio_capture.rs
verifying 16 kHz mono stream and placeholders. - Workspace compiles; test fails as expected, ready for implementation phase.
Debug Panel Documentation
The Speakr debug panel is a development-only interface that provides debugging tools and testing capabilities. It's designed to help developers test features, monitor system behaviour, and troubleshoot issues during development.
Overview
The debug panel is only available in debug builds (cargo tauri dev
) and is completely excluded from release builds for security and performance reasons. It provides a comprehensive debugging interface with real-time logging, feature testing, and system monitoring capabilities.
Accessing the Debug Panel
Availability
- Debug builds only: The panel is conditionally compiled using
#[cfg(debug_assertions)]
- Toggle button: A red "π οΈ Debug" button appears in the header (debug builds only)
- Visual indicator: The panel shows a "DEBUG BUILD" badge to remind developers of the build type
Navigation
- Start the application in debug mode:
cargo tauri dev
- Look for the "π οΈ Debug" button in the top-right corner of the header
- Click to toggle between the settings panel and debug panel
- The button text changes to "π οΈ Hide Debug" when the panel is active
Features
1. Audio Testing
Legacy Test Button
- Purpose: Basic audio system testing
- Behaviour: Click to run a simple audio recording test
- Feedback: Shows progress in the debug output area
Push-to-Talk Recording
- Purpose: Test real-time audio recording with push-to-talk interaction
- Behaviour:
- Hold the button to start recording
- Release to stop recording
- Supports both mouse and touch events
- Visual feedback:
- Button changes colour and shows pulsing animation when recording
- Text updates to show current state
- Recording state is displayed in system info
2. Logging Console
Real-time Log Display
- Scrolling console: Shows recent log messages from the backend
- Auto-scroll: Automatically scrolls to show newest messages (toggleable)
- Timestamp: Each message includes precise timestamp
- Source tracking: Shows which component generated each log message
Log Level Filtering
- Dropdown filter: Filter by specific log levels (TRACE, DEBUG, INFO, WARN, ERROR)
- Visual indicators: Each level has distinct emoji icons and colours
- Level-specific styling: Error and warning messages have highlighted backgrounds
Console Controls
- Refresh: Manually refresh log messages from backend
- Clear: Clear all log messages from display and backend storage
- Auto-scroll toggle: Enable/disable automatic scrolling to newest messages
3. System Information
Real-time display of:
- Build type: Always shows "Debug" in debug panel
- Environment: Shows "Development"
- Recording state: Live status of audio recording (Active/Inactive)
Technical Implementation
Architecture
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Frontend β β Backend β β Log Storage β
β (Leptos) β β (Tauri) β β (Memory) β
βββββββββββββββββββ€ ββββββββββββββββββββ€ βββββββββββββββββββ€
β DebugPanel βββββΊβ debug_* commands βββββΊβ DEBUG_LOG_ β
β LoggingConsole β β add_debug_log() β β MESSAGES β
β Push-to-talk UI β β Log collection β β (VecDeque) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
Conditional Compilation
The debug panel uses Rust's conditional compilation to ensure it's only included in debug builds:
#![allow(unused)] fn main() { #[cfg(debug_assertions)] mod debug; #[cfg(debug_assertions)] use crate::debug::DebugPanel; }
Backend Commands
All debug commands are prefixed with debug_
and conditionally compiled:
debug_test_audio_recording()
- Legacy audio testdebug_start_recording()
- Start push-to-talk recordingdebug_stop_recording()
- Stop push-to-talk recordingdebug_get_log_messages()
- Retrieve stored log messagesdebug_clear_log_messages()
- Clear log message storage
Log Message Storage
Debug logs are stored in memory using a thread-safe circular buffer:
#![allow(unused)] fn main() { static DEBUG_LOG_MESSAGES: LazyLock<Arc<Mutex<VecDeque<DebugLogMessage>>>> = LazyLock::new(|| Arc::new(Mutex::new(VecDeque::with_capacity(1000)))); }
Key characteristics:
- Capacity: Limited to 1000 messages (prevents memory bloat)
- Thread-safe: Uses
Arc<Mutex<>>
for concurrent access - Circular buffer: Automatically removes old messages when capacity is reached
- Structured data: Each message includes timestamp, level, target, and content
Event Handling
Push-to-talk functionality uses multiple event handlers for robust interaction:
#![allow(unused)] fn main() { on:mousedown=move |_| start_recording() on:mouseup=move |_| stop_recording() on:mouseleave=move |_| stop_recording() // Handles mouse leaving button area on:touchstart=move |_| start_recording() on:touchend=move |_| stop_recording() }
Development Patterns
Adding New Debug Features
-
Backend Command:
#![allow(unused)] fn main() { #[cfg(debug_assertions)] #[tauri::command] async fn debug_your_feature() -> Result<String, AppError> { add_debug_log(DebugLogLevel::Info, "your-component", "Feature tested"); // Your implementation Ok("Success message".to_string()) } }
-
Frontend Integration:
#![allow(unused)] fn main() { impl DebugManager { pub async fn test_your_feature() -> Result<String, String> { tauri_invoke_no_args("debug_your_feature") .await .map_err(|e| format!("Failed to test feature: {e}")) } } }
-
UI Component:
#![allow(unused)] fn main() { <button class="debug-btn-primary" on:click=move |_| test_your_feature() > "Test Your Feature" </button> }
-
Register Command:
#![allow(unused)] fn main() { // Add to debug build handler list debug_your_feature, }
Logging Best Practices
-
Use appropriate log levels:
Trace
: Detailed execution flowDebug
: Development informationInfo
: General informationWarn
: Potential issuesError
: Actual errors
-
Include context:
#![allow(unused)] fn main() { add_debug_log( DebugLogLevel::Info, "component-name", &format!("Action completed with result: {}", result) ); }
-
Target naming:
- Use consistent component names
- Follow pattern:
speakr-{component}
(e.g.,speakr-core
,speakr-tauri
)
Testing Debug Features
Debug features should be tested like any other code:
#![allow(unused)] fn main() { #[test] fn test_debug_manager_methods_exist() { // Compile-time test for method signatures let _fn: fn() -> _ = DebugManager::test_your_feature; assert!(true, "Debug method exists and compiles"); } }
Security Considerations
Build-time Exclusion
- Debug panel code is completely removed from release builds
- No performance impact on production builds
- No security surface area in release builds
Development-only Data
- Log messages are stored only in memory
- No persistent storage of debug information
- Automatic cleanup when application closes
Safe Defaults
- Mock implementations prevent accidental system access
- All debug commands return safe, predictable responses
- Clear visual indicators remind developers of debug mode
Troubleshooting
Debug Panel Not Visible
- Check build type: Ensure you're running
cargo tauri dev
, not a release build - Look for button: The toggle button appears in the header, not as a separate window
- Browser cache: If using
trunk serve
, clear browser cache and reload
Log Messages Not Appearing
- Click refresh: Use the "π Refresh" button to manually fetch logs
- Check backend: Ensure debug commands are registered in the invoke handler
- Memory limit: Log storage is limited to 1000 messages; older messages are automatically removed
Push-to-Talk Not Working
- Hold, don't click: The button requires holding down, not just clicking
- Check events: Ensure mouse/touch events are properly handled
- Visual feedback: Look for button colour change and pulsing animation during recording
Future Enhancements
Potential additions to the debug panel:
- Performance Monitoring: CPU, memory usage graphs
- Network Activity: Mock API call testing
- State Inspection: Real-time application state viewer
- Configuration Testing: Dynamic settings modification
- Export Functionality: Save debug logs to file
- Remote Debugging: WebSocket connection for external debugging tools
Related Files
- Frontend:
speakr-ui/src/debug.rs
- Main debug panel implementation - Backend:
speakr-tauri/src/lib.rs
- Debug commands and log storage - Styles:
speakr-ui/styles.css
- Debug panel CSS styles - Types: Log message types and enums
- Tests: Unit tests for debug functionality
Contributing
When adding debug features:
- Follow the established patterns for conditional compilation
- Add appropriate logging with meaningful messages
- Include tests for new functionality
- Update this documentation with new features
- Ensure features work in both desktop and mobile layouts
The debug panel is a powerful development tool that should enhance the development experience while maintaining security and performance in production builds.
Tauri Plugins
The following plugins are of interest for this project:
- https://github.com/freethinkel/tauri-nspopover-plugin
- https://v2.tauri.app/plugin/global-shortcut/
- https://v2.tauri.app/plugin/autostart/
- https://v2.tauri.app/plugin/single-instance/
- https://github.com/ayangweb/tauri-plugin-macos-permissions
- https://github.com/ahkohd/tauri-macos-menubar-app-example/tree/v2-popover
Specifications
This directory contains all functional requirements (FR), non-functional requirements (NFR), and initialisation specifications (INIT) for the Speakr project.
Functional Requirements (FR)
ID | Name | Report |
---|---|---|
FR-1 | Global Hot-key | Implementation Summary |
FR-2 | Audio Capture | Implementation Summary |
FR-3 | Transcription | |
FR-4 | Transcript Injection | |
FR-5 | Injection Fallback | |
FR-6 | Settings UI | Implementation Summary |
FR-7 | Status Events | |
FR-8 | Settings Persistence | |
FR-9 | Auto-update |
Non-Functional Requirements (NFR)
ID | Name | Report |
---|---|---|
NFR-accessibility | Accessibility | |
NFR-compatibility | Compatibility | |
NFR-footprint | Footprint | |
NFR-latency | Latency | |
NFR-reliability | Reliability | |
NFR-security | Security |
Initialisation Specifications (INIT)
ID | Name | Report |
---|---|---|
INIT-01 | Project Scaffold & Initial Structure |
note
Implementation Reports contain detailed analysis of completed features, including technical decisions, challenges encountered, and verification steps. See reports/ for additional documentation.
FR-1: Global Hot-key
Registers a system-wide hot-key at application start that toggles the record β transcribe β inject flow.
Requirement
- The application must register a global hot-key (default β₯ Option +
~
). - Must be active even when Speakr is running in the background.
- Pressing the hot-key initiates, in order:
- Audio recording
- Transcription
- Text injection into the current focused field.
- The hot-key must be configurable in Settings and warn on conflicts.
Rationale
A single keyboard shortcut lets users capture ideas without context-switching, maintaining focus and flow.
Acceptance Criteria
- Hot-key can be triggered from any application on macOS 13+.
- 95th percentile time-to-text β€ 3 s for 5 s recordings on M-series Macs.
- 99 % activation success rate in telemetry.
- Changing the hot-key in Settings updates the registration immediately and prevents duplicates.
Test-Driven Design
Follow TDD: write failing automated tests for every case in Test Cases (formerly Acceptance Criteria) before implementation. CI should pass only when the new tests turn green.
References
PRD Β§6 Functional Requirements β FR-1
date: 2025-07-23 requirement: FR-1-global-hotkey status: PARTIALLY COMPLETE prepared_by: o3
Implementation Report: FR-1 - Global Hot-key
Implementation Summary
The backend (speakr-tauri
) integrates tauri-plugin-global-shortcut to register a system-wide
shortcut at start-up. A default combination (CmdOrCtrl+Alt+Space
) is attempted first; if
registration fails (for example due to a conflict) a fallback (CmdOrCtrl+Alt+F2
) is tried. The
registration logic is implemented in GlobalHotkeyService
(speakr-tauri/src/services/hotkey.rs
) and invoked from speakr-tauri/src/lib.rs
inside the
setup
callback. The service stores the active shortcut behind a mutex and emits a
hotkey-triggered
Tauri event each time the key is pressed.
Validation utilities (commands::validation::validate_hot_key_internal
) together with the
HotkeyConfig
type (defined in speakr-types
) provide parsing and serialisation support. A
comprehensive suite of unit tests exercises many shortcut formats, as well as default configuration
behaviour and placeholder Tauri integration scenarios.
Work Remaining
- Trigger pipeline β wire the
hotkey-triggered
event to the record β transcribe β inject flow (FR-2, FR-3, FR-4). - Settings integration β load a user-defined shortcut from persisted settings at start-up and expose a Tauri command that re-registers it at runtime.
- Conflict feedback β propagate
HotkeyError::ConflictDetected
to the UI so users are warned instantly. - Configurable modifier β change the default shortcut to match the PRD (
β₯ Option + ~
) and let users restore defaults easily. - Cross-platform assurance β create integration tests with a mocked
AppHandle
or CI desktop harness to confirm registration works on macOS, Windows and Linux. - Performance metric β measure and emit telemetry needed for the 95th-percentile time-to-text β€ 3 s requirement (once the pipeline is complete).
Architecture
Sequence β current implementation
sequenceDiagram autonumber participant OS as Operating System participant Plugin as GlobalShortcut plugin<<components>> participant Service as GlobalHotkeyService<<process>> participant App as Speakr backend (Tauri)<<components>> App->Plugin: register(shortcut) Plugin->OS: register OS-->>Plugin: ok / fail Plugin-->>App: result OS->>Plugin: *User presses shortcut* Plugin->Service: on_shortcut callback Service->App: emit "hotkey-triggered" event
Target flow β requirement goal
flowchart TD Input["User presses global hot-key"]::inputOutput --> Shortcut(Registered shortcut)<<components>> Shortcut --> |Tauri event| Record["Audio capture start"]::process Record --> Transcribe["Whisper transcription"]::process Transcribe --> Inject["Text injection into active field"]::process classDef inputOutput fill:#FEE0D2,stroke:#E6550D,color:#E6550D classDef process fill:#EAF5EA,stroke:#C6E7C6,color:#77AD77 classDef components fill:#E6E6FA,stroke:#756BB1,color:#756BB1
Noteworthy
- The current default shortcut differs from the PRD specification. A TODO in code highlights the pending pipeline integration.
- Unit tests follow TDD principles, yet integration tests with the real plugin are still placeholders.
Related Requirements
References
FR-2: Audio Capture
Captures microphone input suitable for Whisper transcription.
Requirement
- Capture 16 kHz mono audio via the
cpal
crate. - Default maximum duration 10 s; user-configurable up to 30 s.
- Recording stops automatically when the duration limit is reached or the user presses the hot-key
- again.
- Audio is buffered entirely in memory; no files are written to disk.
- Handle microphone permission prompts gracefully on first run.
Rationale
Lower sample-rate mono audio minimises processing cost while meeting Whisperβs input requirements.
Acceptance Criteria
- Recording initialises within 100 ms after hot-key press.
- Audio stream conforms to 16 kHz, 16-bit, mono.
- User can change max duration in Settings; value persists across restarts.
- Recording stops cleanly at limit without crashing or clipping.
- Permission dialog appears once and records decision.
Test-Driven Design
Adopt test-driven development: begin by writing failing unit/integration tests that assert each Acceptance Criterion. Only then implement capture logic until tests pass in CI.
References
PRD Β§6 Functional Requirements β FR-2
date: 2025-07-23 requirement: FR-2 status: PARTIALLY COMPLETE prepared_by: o4-mini
markdownlint-disable MD013
Implementation Report: FR-2 - Audio Capture
Implementation Summary
FR-2 Audio Capture is substantially implemented with a robust, well-tested core audio system
built around the cpal
crate. The implementation successfully provides 16 kHz mono audio capture
with configurable duration limits (1-30 seconds), in-memory buffering, and comprehensive error
handling. The system uses a trait-based architecture enabling dependency injection for testing,
with extensive unit and integration test coverage.
The core functionality is implemented in speakr-core/src/audio/mod.rs
with the AudioRecorder
struct providing the main API. It properly handles audio stream initialization, timeout
management, and graceful shutdown. Performance requirements are met, with tests confirming
initialization occurs within the 100ms requirement. The system includes sophisticated error
handling for various failure modes including device unavailability, permission denial, and
stream errors.
Work Remaining
- Settings Integration: Audio recording duration is not integrated with the persistent settings system. Currently uses hardcoded defaults rather than user-configurable values that persist across restarts (Acceptance Criterion 3)
- Permission Handling: While error types exist for permission denial, there's no implemented graceful permission request flow or user guidance on first run (Acceptance Criterion 5)
- Hotkey Integration: Full integration with the global hotkey system for production use case needs completion (currently only debug commands use the audio system)
- Settings UI: No user interface exists for changing audio recording duration in the Settings panel
Architecture
sequenceDiagram participant HK as Global Hotkey participant AR as AudioRecorder participant AS as AudioSystem (cpal) participant CS as CpalAudioStream participant TO as Timeout Task HK->>AR: start_recording() AR->>AR: Check if already recording AR->>AS: start_recording(config) AS->>CS: Create audio stream CS->>CS: Initialize cpal stream CS-->>AS: Return stream handle AS-->>AR: Return AudioStream AR->>TO: Spawn timeout task AR-->>HK: Recording started Note over CS: Continuously capture<br/>16kHz mono samples alt Manual Stop HK->>AR: stop_recording() else Timeout TO->>CS: stream.stop() end AR->>CS: get_samples() CS-->>AR: Vec<i16> samples AR-->>HK: RecordingResult
The sequence diagram shows the audio capture flow from hotkey press to sample retrieval. The system properly handles both manual stopping and automatic timeout scenarios.
classDiagram class AudioRecorder { -state: Arc~Mutex~Option~RecordingState~~~ -audio_system: Box~dyn AudioSystem~ +new(config: RecordingConfig) AudioRecorder +start_recording() Result~(), AudioCaptureError~ +stop_recording() Result~RecordingResult, AudioCaptureError~ +is_recording() bool +list_input_devices() Result~Vec~AudioDevice~, AudioCaptureError~ } class AudioSystem { <<trait>> +start_recording(config: &RecordingConfig) Result~Box~dyn AudioStream~, AudioCaptureError~ +list_input_devices() Result~Vec~AudioDevice~, AudioCaptureError~ } class CpalAudioSystem { -host: cpal::Host +new() Result~Self, AudioCaptureError~ } class AudioStream { <<trait>> +get_samples() Vec~i16~ +stop() +is_active() bool } class CpalAudioStream { -samples: Arc~Mutex~Vec~i16~~~ -is_recording: Arc~AtomicBool~ } class RecordingConfig { -max_duration_secs: u32 +new(duration: u32) Self +max_duration_secs() u32 +max_samples() usize } AudioRecorder --> AudioSystem CpalAudioSystem ..|> AudioSystem CpalAudioSystem --> CpalAudioStream CpalAudioStream ..|> AudioStream AudioRecorder --> RecordingConfig
The class diagram illustrates the trait-based architecture enabling dependency injection and
testing. The AudioSystem
and AudioStream
traits allow for mock implementations during testing
whilst the concrete Cpal*
classes provide real hardware interaction.
stateDiagram-v2 [*] --> Idle Idle --> Initializing : start_recording() Initializing --> Recording : Stream created successfully Initializing --> Error : Device/Permission error Recording --> Stopping : Manual stop / Timeout Stopping --> Idle : Samples extracted Error --> Idle : Error handled state Recording { [*] --> Capturing Capturing --> Capturing : Accumulate samples }
The state diagram shows the audio recorder's lifecycle, with proper error handling and clean transitions between states.
Noteworthy
The implementation demonstrates excellent software engineering practices with comprehensive test
coverage using dependency injection and mock objects. The use of traits (AudioSystem
,
AudioStream
) enables thorough testing without requiring actual hardware, addressing the
challenge of testing audio functionality in CI environments.
Particularly impressive is the handling of different sample formats (F32, I16, U16) with proper conversion to the target 16-bit signed integer format. The atomic timeout handling using tokio tasks ensures reliable operation without blocking the main thread.
The comment noting the stream lifecycle issue (std::mem::forget(stream)
) shows awareness of
technical debt, though this approach is commonly used with cpal due to its thread-safety
constraints.
Related Requirements
- FR-3 FR-3: Transcription (consumes audio samples from FR-2)
- FR-8 FR-8: Settings Persistence (should store audio duration preference)
- FR-1 FR-1: Global Hotkey (triggers audio capture)
References
FR-3: Transcription
Offline transcription of recorded audio to text using Whisper.
Requirement
- Use
whisper-rs
to run Whisper (GGUF) models entirely on-device. - Default language: English (en). Allow user language selection in Settings.
- Transcription must complete within β€ 3 s (95th percentile) for 5-second recordings on Apple
- Silicon with the small model.
- Support user-selectable model sizes for latency/accuracy trade-off.
- No external network calls during transcription.
Rationale
On-device inference preserves privacy and removes network latency, achieving the productβs privacy-first promise.
Acceptance Criteria
- Transcription completes within latency budget on M1 and Intel reference machines.
- Selecting a different model in Settings updates the engine without restart.
- No outbound network traffic observed via packet capture.
- Errors (e.g. model missing) surface in UI overlay/log with actionable message.
Test-Driven Design
Begin with failing automated tests for latency, language selection, and network isolation. Implement transcription until all tests pass, following TDD.
References
PRD Β§6 Functional Requirements β FR-3
FR-4: Transcript Injection
Types the transcribed text into the currently focused input field.
Requirement
- Use the
enigo
crate to emit synthetic keystrokes that reproduce the transcription exactly as - plain text.
- Injection must preserve line breaks and punctuation.
- Injection must run on the main UI thread to respect macOS accessibility APIs.
- Provide feedback event (e.g.
Injected
) to UI overlay/log once complete.
Rationale
Typing text directly avoids clipboard usage and works in most applications, maintaining illusion of native typing.
Acceptance Criteria
- For a 100-character transcript, injection latency β€ 300 ms.
- Typed characters match transcription byte-for-byte.
- Works in common editors (VS Code, Xcode, Pages, Safari).
- Emits completion event for downstream UI.
Test-Driven Design
Write failing integration tests measuring injection latency and correctness across target editors. Deliver code to satisfy the tests.
References
PRD Β§6 Functional Requirements β FR-4
FR-5: Injection Fallback
Clipboard-paste fallback when keystroke injection is blocked.
Requirement
- Detect secure text fields or injection failure (e.g.
enigo
error). - Copy transcript to clipboard and simulate βV paste as fallback.
- Display transient warning overlay: βSecure field detected β text pasted via clipboard.β
- Restore previous clipboard contents after paste to respect user data.
Rationale
Some password or secure fields block synthetic keystrokes. A controlled clipboard fallback ensures functionality while informing the user.
Acceptance Criteria
- 100 % success rate pasting into macOS secure text fields (Safari password prompt as test).
- Previous clipboard restored within 500 ms after paste.
- Warning overlay disappears automatically after 3 s.
- No sensitive transcript retained on clipboard after restore.
Test-Driven Design
Craft failing tests for secure-field detection, clipboard restoration, and overlay timing. Implement fallback logic until tests succeed.
References
PRD Β§6 Functional Requirements β FR-5
FR-6: Settings UI
Provides a graphical interface (tray or window) for user configuration.
Requirement
- Expose configuration for:
- Global hot-key picker
- Model selector (small, medium, large GGUF)
- Auto-launch on login toggle
- Implemented as a Tauri window accessible from the menu bar/tray.
- Validate hot-key conflicts and model availability.
- Preference changes take effect without restarting the app.
Rationale
A minimal settings UI keeps the main workflow keyboard-first while allowing deeper configuration when needed.
Acceptance Criteria
- Opening Settings from tray displays window within 200 ms.
- Changing options updates behaviour immediately (e.g. new hot-key active).
- Invalid configurations (missing model file) display inline errors.
- Settings persist after app restart.
Test-Driven Design
Define unit/UI tests for each settings control and validation rule before coding. Implementation is complete when all tests pass.
References
PRD Β§6 Functional Requirements β FR-6
date: 2025-07-23 requirement: FR-6 status: PARTIALLY COMPLETE prepared_by: gpt-4.1
Implementation Report: FR-6 - Settings UI
Implementation Summary
The SettingsPanel
Leptos component serves as the primary settings interface for Speakr. On launch,
it invokes the Tauri load_settings
command to retrieve persisted AppSettings
and renders:
- Global hot-key configuration: Real-time validation via the
validate_hot_key
Tauri command, un/registration through the global-shortcut plugin, and persistence viasave_settings
. - Model selection: Radio options for small, medium, and large Whisper models, availability
checks using
check_model_availability
, disabling unavailable models, and immediate persistence. - Auto-launch toggle: Uses the
set_auto_launch
Tauri command and callssave_settings
on change.
All changes trigger save_settings
and display inline success or error messages. Backend
persistence is handled atomically in settings/persistence.rs
. Hot-key and auto-launch preferences
apply at runtime without restarting the app.
Work Remaining
- Add a system tray icon and a βSettingsβ menu item to open or focus the settings window.
- Implement Tauri
system_tray
integration and event handling inrun()
to show/hide the settings window. - Enable dynamic transcription-model reload in the backend when
model_size
changes, without requiring a restart. - Develop unit/UI tests for each settings control and validation path (hot-key, model selection, auto-launch).
- Measure and optimise settings window startup to meet the <200 ms opening requirement.
- Enhance the hot-key picker with an interactive key-capture control instead of free-text input.
- Display inline errors for model selection failures (e.g., missing or corrupt model files).
Architecture
Sequence Diagram
sequenceDiagram participant UI as "SettingsPanel" participant Backend as "Tauri Backend" participant FS as "File System" UI->>Backend: load_settings() Backend->>FS: load_settings_from_dir() FS-->>Backend: AppSettings Backend-->>UI: AppSettings UI->>UI: render settings UI->>Backend: validate_hot_key(newHotkey) Backend-->>UI: Ok UI->>UI: register_global_shortcut UI->>Backend: save_settings(AppSettings) Backend->>FS: save_settings_to_dir() FS-->>Backend: Ok Backend-->>UI: Ok
Flowchart
flowchart TD A["User modifies setting"] --> B["UI captures change"] B --> C{"Validate input"} C -->|Valid| D["Invoke Tauri command"] C -->|Invalid| E["Show validation error"] D --> F["Persist settings via backend"] F --> G["Display success or error message"]
Noteworthy
N/A
Related Requirements
References
speakr-ui/src/settings.rs
speakr-tauri/src/lib.rs
speakr-tauri/tauri.conf.json
FR-7: Status Events
Emit real-time status updates for UI overlays and logging.
Requirement
- Broadcast status events:
Recording
,Transcribing
,Injected
,Error
(variants). - Events emitted over an internal async channel consumable by UI components and log subsystem.
- Include timestamp and optional payload (e.g. error message).
- Provide public Rust API
subscribe_status()
for other components.
Rationale
A decoupled event system lets the overlay and future extensions react without tight coupling to business logic.
Acceptance Criteria
- Overlay reflects status within 50 ms of event emission.
- Logs capture all events with accurate timestamps.
- No missed or duplicated events observed in 1-hour monkey test (500 invocations).
Test-Driven Design
Start with failing tests subscribing to the event channel and asserting delivery guarantees (latency, ordering, no duplicates). Implement until green.
References
PRD Β§6 Functional Requirements β FR-7
FR-8: Settings Persistence
Persist user preferences locally without cloud sync.
Requirement
- Store settings in a JSON file located in the platform-appropriate app data directory
- (
$HOME/Library/Application Support/Speakr/settings.json
). - Write changes atomically to avoid corruption.
- Migration framework supports future schema evolution with versioning.
- No data leaves the device.
Rationale
Local persistence offers instant access, privacy, and offline capability.
Acceptance Criteria
- Settings file created on first launch with defaults.
- Modifying settings updates file within 100 ms.
- Corrupt settings file triggers automatic recovery to defaults.
- Unit tests cover load/save error paths.
Test-Driven Design
Write failing unit tests for load/save, corruption recovery, and migration before implementation; pass them in CI.
References
PRD Β§6 Functional Requirements β FR-8
FR-9: Auto-update
Provide optional self-update via GitHub Releases.
Requirement
- When enabled, periodically (daily) check GitHub Releases for a newer version tag.
- Use secure download (HTTPS) and verify code signature / hash before install.
- Prompt user with Release Notes and require confirmation before applying update.
- Allow users to disable auto-update in Settings.
- Feature optional in v1; must degrade gracefully when disabled.
Rationale
Easy updates encourage users to stay on latest version, reducing support burden and delivering security fixes.
Acceptance Criteria
- Update check runs off main thread; no UI freeze.
- Failed update check logs but does not crash application.
- Downloaded binary passes macOS notarisation verification.
- User can opt-out entirely; no network calls when disabled.
Test-Driven Design
Begin with failing integration tests that simulate update availability, download verification,
References
PRD Β§6 Functional Requirements β FR-9
INIT-01: Project Scaffold & Initial Structure
Define the baseline repository layout, build tooling, and development workflows for Speakr.
Requirement
- Workspace Layout (multi-crate)
speakr-core/
β pure Rust library (record β transcribe β inject).speakr-tauri/
β Tauri desktop shell; containssrc-tauri/
and embeds Leptos frontend by default.speakr-ui/
β optional standalone Leptos UI crate (only if the UI is fully separated).models/
β user-downloaded GGUF Whisper models (git-ignored).docs/
β architecture, PRD, and spec docs (this folder).nix/
β flakes, overlays,devenv.nix
, CI helpers.scripts/
β one-off dev scripts (lint, release, etc.).- Root-level
Cargo.toml
/Cargo.lock
defining a[workspace]
withmembers
.
- Build Tooling
- Use Cargo workspace to manage crates and enable incremental rebuilds.
- Root-level Nix flake +
devenv.nix
for reproducible shells. Trunk.toml
(inspeakr-tauri/
) bundles static assets for the WebView.
- CI / CD
- GitHub Actions workflow for: lint (
rustfmt
,clippy
), test, macOS build, docs build. - Release workflow signs and notarises macOS DMG.
- GitHub Actions workflow for: lint (
- Linters & Hooks
- Pre-commit config:
rustfmt
,markdownlint
,shellcheck
,nixpkgs-fmt
.
- Pre-commit config:
- Documentation Site
- mdBook in
docs/book/
published via GitHub Pages.
- mdBook in
- Version Control Hygiene
.gitignore
tracks target, model files, and local config overrides.
Rationale
A consistent scaffold accelerates onboarding, enforces build reproducibility, and aligns with the projectβs privacy-first & cross-platform goals.
Acceptance Criteria
-
Fresh clone followed by
devenv shell
(ordevenv up
) yields a working shell withcargo
, -
tauri
, andmdbook
available. -
cargo test
passes with placeholder tests. -
npm run tauri dev
(via Trunk) launches stub window. - GitHub Actions green on lint + test.
-
mdbook serve
builds documentation without errors.
Migration Steps (from mono-crate β multi-crate)
-
Create workspace file
# At repo root echo "[workspace]\nmembers = [ \"speakr-core\", \"speakr-tauri\", \"speakr-ui\" ]" > Cargo.toml
-
Scaffold core crate
cargo new --lib speakr-core mv src/*.rs speakr-core/src/ # move existing logic rm -rf src/
-
Scaffold Tauri crate
cargo tauri init --template leptos speakr-tauri # move existing src-tauri/ into speakr-tauri/ mv src-tauri speakr-tauri/
-
Wire dependency In
speakr-tauri/Cargo.toml
add:speakr-core = { path = "../speakr-core" }
-
(Optional) Separate UI crate
cargo new --lib speakr-ui mv speakr-tauri/src-leptos/* speakr-ui/src/ # then depend on speakr-ui from speakr-tauri via WASM asset pipeline
-
Update paths in code & imports.
-
Run tests & build
cargo test --workspace cargo tauri dev -p speakr-tauri
-
CI / Nix β update workflows and
devenv.nix
to use--workspace
.
Completion of these steps should yield the new structure with all tests & tauri dev
working.
NFR: Accessibility
Comply with macOS accessibility guidelines.
Requirement
- UI elements (overlay, settings) must be VoiceOver readable.
- Support high-contrast mode and respect user font scaling preferences.
- Achieve Apple Accessibility Inspector score β₯ 85.
Rationale
Ensures inclusivity for users with visual impairments or other accessibility needs.
Acceptance Criteria
- VoiceOver reads overlay status changes accurately.
- High-contrast mode renders UI with sufficient contrast ratios (> 4.5:1).
- Automated accessibility audit (axe-core) passes with no critical violations.
Test-Driven Design
Introduce automated accessibility audits (axe-core, VoiceOver scripts) in CI before fixing violations.
References
PRD Β§7 Non-Functional Requirements β Accessibility
NFR: Compatibility
Operate across supported macOS versions and CPU architectures.
Requirement
- Support macOS 13+ on Apple Silicon and Intel Macs.
- Intel Macs may experience doubled latency but must remain functional.
Rationale
Wider OS support increases addressable market while retaining acceptable performance.
Acceptance Criteria
- Manual QA passes on Intel MBP 2020 (macOS 13).
- Automated smoke test on GitHub Actions Intel runner passes.
- Latency SLA documented separately for Intel.
Test-Driven Design
Add failing cross-arch smoke tests to CI runners before porting; success criteria met when tests pass on Intel and Apple Silicon.
References
PRD Β§7 Non-Functional Requirements β Compatibility
NFR: Footprint
Constrain binary size and runtime memory usage.
Requirement
- Universal macOS binary size β€ 20 MB (excluding model files).
- Peak RSS β€ 400 MB including model during standard transcription workload.
Rationale
A lightweight application reduces download size, disk usage and keeps memory pressure low on older devices.
Acceptance Criteria
-
du -h
on release DMG shows β€ 20 MB binary. - Runtime memory measured via Activity Monitor stays β€ 400 MB during 30 s monkey test.
Test-Driven Design
Add failing size and memory regression tests into CI before implementation tweaks.
References
PRD Β§7 Non-Functional Requirements β Footprint
NFR: Latency
Ensure low end-to-end latency from hot-key activation to text injection.
Requirement
- 95th percentile time-to-text β€ 3 s for a 5-second audio clip on Apple Silicon (M1) using the
- small Whisper model.
- Latency measured in release (optimised) builds with all background services running.
Rationale
Sub-3-second latency preserves conversational flow and competitive advantage over cloud dictation.
Acceptance Criteria
- Automated telemetry logs latency for every invocation.
- CI latency test passes on GitHub Actions M1 runner.
- Performance regression test fails build if P95 > 3 s.
Test-Driven Design
Create automated performance tests that measure P95 latency; commit them before optimising the code.
References
PRD Β§7 Non-Functional Requirements β Latency
NFR: Reliability
Maintain stability across heavy usage.
Requirement
- Application must run 1-hour monkey test (500 invocations) with zero crashes.
- Recover gracefully from errors (audio device unavailable, model missing).
Rationale
High reliability builds user trust and reduces support overhead.
Acceptance Criteria
- CI integration test simulates 500 sequential hot-key invocations without crash.
- Error conditions logged and surfaced via Status Events.
Test-Driven Design
Introduce a failing soak-test (500 invocations) in CI first; stabilise code until it passes consistently.
References
PRD Β§7 Non-Functional Requirements β Reliability
NFR: Security
Prevent unintended data leakage and maintain user privacy.
Requirement
- No outbound network connections except optional auto-update domain.
- Hardened runtime & proper code-signing for macOS notarisation.
- Microphone access prompt shown once and justification provided.
Rationale
Privacy-first positioning requires strict control over network activity and OS security policies.
Acceptance Criteria
- Static analysis shows no runtime socket creation beyond update URL when enabled.
- Application passes Apple notarisation & gatekeeper checks.
- Firewall test (Little Snitch) reveals no unexpected traffic.
Test-Driven Design
Write security unit tests (e.g., socket mocks) and notarisation validation scripts before code changes; CI must enforce them.
References
PRD Β§7 Non-Functional Requirements β Security
date: {YYYY-MM-DD} requirement: {Requirement-ID} status:
Implementation Report: {Requirement-ID} -
Implementation Summary
For completed and partially completed requirements, 1-2 paragraphs explaining: - How the implementation works overall - Specific behaviours of note - Control and data flow(s) - Other significant details as appropriate
Work Remaining
(N/A
for Complete
requirements) Itemised list of specific work required for the
requirement to be completed.
Architecture
One or more Mermaid diagrams, include ALL applicable to the requirement:
- Sequence diagrams (e.g. IPC, user interactions)
- State diagrams (e.g. system state transitions)
- Entity relationships (e.g. data entities)
- Class diagrams
- Flowcharts (e.g. process/control flows)
- Any other diagram type that best describes the information
Each diagram should be preceded by a ### Title
and a short summary of what the diagram shows,
and any clarifying remarks (if anything is not self-evident from the diagram). Diagrams should be
embedded using a mermaid
code fence.
Noteworthy
(Discretionary section, N/A
if not relevant) Discussion about any especially interesting details
about the implementation, or insights related to it.
Related Requirements
References
Speakr-Tauri lib.rs Refactoring Plan
Current State Analysis
The speakr-tauri/src/lib.rs
file has grown to 2,000 lines and contains multiple
responsibilities that should be separated for better maintainability.
Current File Composition
- Lines 1-27: Imports and use statements
- Lines 29-87: Debug-only types and static storage
- Lines 89-255: Settings management utilities
- Lines 256-456: GlobalHotkeyService implementation
- Lines 457-600: Tauri command functions
- Lines 601-950: Audio functionality helpers
- Lines 951-1100: Additional utility functions
- Lines 1732-1830: BackendStatusService implementation
- Lines 1831-1913: Main run function and setup
- Lines 1400+: Extensive test module (500+ lines)
Proposed Refactoring Structure
1. Move Tests to Separate Files
Target: Extract all tests from lib.rs
into dedicated test files
-
Current: 500+ lines of tests in
mod tests
-
New Structure:
speakr-tauri/tests/ βββ settings_tests.rs # Settings save/load/migration tests βββ hotkey_tests.rs # GlobalHotkeyService tests βββ status_tests.rs # BackendStatusService tests βββ audio_tests.rs # Audio recording/file tests βββ commands_tests.rs # Tauri command tests βββ integration_tests.rs # Cross-module integration tests
-
Benefits: Reduces
lib.rs
by ~500 lines, improves test organization -
Note: Integration tests can access internal modules via
speakr_lib::module_name
(speakr-tauri crate is named speakr_lib)
2. Extract Debug Functionality
Target: Move all debug-related code to separate module
-
Current: Debug types, static storage, debug commands scattered throughout
-
New Structure:
speakr-tauri/src/debug/ βββ mod.rs # Public interface, re-exports βββ types.rs # DebugLogLevel, DebugLogMessage, DebugRecordingState βββ storage.rs # Static storage (DEBUG_LOG_MESSAGES, DEBUG_RECORDING_STATE) βββ commands.rs # Debug Tauri commands
-
Files to Create:
src/debug/types.rs
: ~50 linessrc/debug/storage.rs
: ~30 linessrc/debug/commands.rs
: ~200 linessrc/debug/mod.rs
: ~20 lines
-
Benefits: Isolates debug code, easier to disable in release builds
3. Extract Settings Management
Target: Centralize all settings-related functionality
-
Current: Settings utilities and commands mixed in main file
-
New Structure:
speakr-tauri/src/settings/ βββ mod.rs # Public interface βββ persistence.rs # File I/O, atomic writes, backups βββ migration.rs # Version migration logic βββ validation.rs # Directory permissions, data validation βββ commands.rs # Settings Tauri commands
-
Functions to Move:
get_settings_path()
,get_settings_backup_path()
migrate_settings()
,save_settings_to_dir()
,load_settings_from_dir()
try_load_settings_file()
,validate_settings_directory_permissions()
- Commands:
save_settings()
,load_settings()
-
Files to Create:
src/settings/persistence.rs
: ~150 linessrc/settings/migration.rs
: ~50 linessrc/settings/validation.rs
: ~40 linessrc/settings/commands.rs
: ~60 linessrc/settings/mod.rs
: ~30 lines
-
Benefits: Clear separation of concerns, easier testing of settings logic
4. Extract Service Implementations
Target: Move service structs to dedicated service modules
-
Current: GlobalHotkeyService and BackendStatusService in main file
-
New Structure:
speakr-tauri/src/services/ βββ mod.rs # Re-exports, common traits βββ hotkey.rs # GlobalHotkeyService implementation βββ status.rs # BackendStatusService implementation βββ types.rs # ServiceComponent enum, shared types
-
Content to Move:
GlobalHotkeyService
struct (~200 lines)BackendStatusService
struct (~100 lines)ServiceComponent
enum- Related Tauri commands:
register_global_hotkey()
,unregister_global_hotkey()
-
Files to Create:
src/services/hotkey.rs
: ~220 linessrc/services/status.rs
: ~120 linessrc/services/types.rs
: ~20 linessrc/services/mod.rs
: ~30 lines
-
Benefits: Services become self-contained, easier to test and maintain
5. Extract Audio Functionality
Target: Isolate audio recording and file operations
-
Current: Audio functions scattered throughout main file
-
New Structure:
speakr-tauri/src/audio/ βββ mod.rs # Public interface βββ recording.rs # Recording logic, real audio backend βββ files.rs # WAV file operations, filename generation βββ commands.rs # Audio-related Tauri commands
-
Functions to Move:
generate_audio_filename_with_timestamp()
save_audio_samples_to_wav_file()
debug_record_audio_to_file()
,debug_record_real_audio_to_file()
get_debug_recordings_directory()
- Commands:
debug_start_recording()
,debug_stop_recording()
-
Files to Create:
src/audio/recording.rs
: ~100 linessrc/audio/files.rs
: ~80 linessrc/audio/commands.rs
: ~150 linessrc/audio/mod.rs
: ~25 lines
-
Benefits: Audio logic becomes testable in isolation
6. Extract General Tauri Commands
Target: Group remaining Tauri commands by domain
-
Current: Various commands mixed in main file
-
New Structure:
speakr-tauri/src/commands/ βββ mod.rs # Command registration, re-exports βββ validation.rs # validate_hot_key, input validation βββ system.rs # check_model_availability, set_auto_launch βββ legacy.rs # register_hot_key (backward compatibility)
-
Commands to Move:
validate_hot_key()
β validation.rscheck_model_availability()
,set_auto_launch()
β system.rsregister_hot_key()
,greet()
β legacy.rsget_backend_status()
β (might stay in services/status.rs)
-
Files to Create:
src/commands/validation.rs
: ~60 linessrc/commands/system.rs
: ~80 linessrc/commands/legacy.rs
: ~40 linessrc/commands/mod.rs
: ~40 lines
-
Benefits: Commands grouped by domain, easier to find and maintain
7. Simplified lib.rs
Target: Reduce lib.rs
to essential coordination code
- Final Content:
- Module declarations and re-exports
- Main
run()
function with Tauri setup - Essential imports
- Command registration (delegated to modules)
- Estimated Size: ~150-200 lines (down from 1,913)
Implementation Strategy
- Phase 1: Extract Tests
- Phase 2: Extract Services
- Phase 3: Extract Settings
- Phase 4: Extract Debug & Audio
- Phase 5: Extract Commands & Finalize
Refactoring Process Overview
The following diagram illustrates the 5-phase refactoring approach and its progression from the current monolithic structure to a modular architecture:
graph TD A["Phase 1: Extract Tests<br/>Low Risk"] --> B["Phase 2: Extract Services<br/>Medium Risk"] B --> C["Phase 3: Extract Settings<br/>Medium Risk"] C --> D["Phase 4: Extract Debug & Audio<br/>Low Risk"] D --> E["Phase 5: Extract Commands & Finalize<br/>Low Risk"] A1["β’ Create test directory structure<br/>β’ Move 500+ lines of tests<br/>β’ Update imports & run tests"] B1["β’ Extract GlobalHotkeyService<br/>β’ Extract BackendStatusService<br/>β’ Move related Tauri commands"] C1["β’ Extract settings persistence<br/>β’ Extract migration logic<br/>β’ Extract validation functions"] D1["β’ Extract debug functionality<br/>β’ Extract audio operations<br/>β’ Update conditional compilation"] E1["β’ Group remaining commands<br/>β’ Finalize lib.rs cleanup<br/>β’ Run full test suite"] A -.-> A1 B -.-> B1 C -.-> C1 D -.-> D1 E -.-> E1 F["lib.rs: 1,913 lines"] --> G["lib.rs: ~200 lines"] classDef process fill:#EAF5EA,stroke:#C6E7C6,color:#77AD77 classDef decision fill:#FFF5EB,stroke:#FD8D3C,color:#E6550D classDef error fill:#FCBBA1,stroke:#FB6A4A,color:#CB181D classDef data fill:#EFF3FF,stroke:#9ECAE1,color:#3182BD class A,D,E process class B,C decision class F error class G data
Risk Assessment
Low Risk Refactoring
- β Moving tests to separate files
- β Extracting debug functionality (conditional compilation)
- β Moving utility functions (no complex dependencies)
Medium Risk Refactoring
- β οΈ Service extraction (careful with state management)
- β οΈ Settings refactoring (critical for app functionality)
- β οΈ Tauri command reorganization (frontend depends on these)
Mitigation Strategies
- Incremental Changes: One module at a time
- Comprehensive Testing: Run full test suite after each phase
- Feature Flags: Use conditional compilation during transition
- Backup Strategy: Git branches for each refactoring phase
Success Criteria
-
lib.rs
reduced to ~200 lines - All existing tests pass without modification
- All Tauri commands remain accessible to frontend
- Debug functionality preserved in debug builds
- Settings persistence works identically
- Global hotkey registration continues working
- Build time remains similar or improves
- New module structure is logical and discoverable
This refactoring will significantly improve the maintainability and organization of the Speakr Tauri backend while preserving all existing functionality.
Phase 1: Extract Tests (Low Risk)
Objective: Move all tests from lib.rs
into separate files organized by domain
-
New Structure:
speakr-tauri/tests/ βββ settings_tests.rs # Settings save/load/migration tests βββ hotkey_tests.rs # GlobalHotkeyService tests βββ status_tests.rs # BackendStatusService tests βββ audio_tests.rs # Audio recording/file tests βββ commands_tests.rs # Tauri command tests βββ integration_tests.rs # Cross-module integration tests
-
Note: Integration tests can access internal modules via
speakr_lib::module_name
π PHASE 1 COMPLETE - MAJOR SUCCESS!
Final Results: 27 tests migrated out of 35 total tests (77% success rate)
β
Breakthrough Strategy: Making Functions pub
with Internal API Documentation
The key to success was making private functions pub
(not pub(crate)
) with clear internal API documentation. This allows external integration tests in the tests/
directory to access internal functions while maintaining clear API boundaries.
Example pattern used:
#![allow(unused)] fn main() { /// Internal hot-key validation logic. /// /// # Internal API /// This function is only intended for internal use and testing. pub async fn validate_hot_key_internal(hot_key: String) -> Result<(), AppError> { // implementation... } }
Task Checklist (Phase 1)
-
Create test directory structure
-
Create
speakr-tauri/tests/
directory -
Create
settings_tests.rs
file -
Create
hotkey_tests.rs
file -
Create
status_tests.rs
file -
Create
audio_tests.rs
file -
Create
commands_tests.rs
file -
Create
integration_tests.rs
file
-
Create
-
Move settings-related tests β 11/13 tests migrated (85% success)
-
Extract
test_app_settings_default()
βsettings_tests.rs
-
Extract
test_save_and_load_settings()
β `settings_tests.rs -
Extract
test_settings_migration()
β `settings_tests.rs - [~] Extract
test_atomic_write_creates_backup()
βSKIPPED: Tests Tauri command -
Extract
test_corruption_recovery_from_backup()
β `settings_tests.rs -
Extract
test_corruption_recovery_fallback_to_defaults()
β `settings_tests.rs -
Extract
test_settings_serialization()
βsettings_tests.rs
- [~] Extract
test_save_settings_tauri_command()
βSKIPPED: Tests Tauri command -
Extract
test_settings_performance()
β `settings_tests.rs -
Extract
test_settings_directory_permissions()
β `settings_tests.rs -
Extract
test_isolated_settings_save_and_load()
β `settings_tests.rs -
Extract
test_isolated_corruption_recovery()
β `settings_tests.rs -
Extract
debug_save_button_functionality()
βsettings_tests.rs
-
Extract
-
Move hotkey-related tests β 2/3 tests migrated (67% success)
-
Extract
test_validate_hot_key_success()
β `hotkey_tests.rs -
Extract
test_validate_hot_key_failures()
β `hotkey_tests.rs - [~] Extract
test_register_hot_key()
βSKIPPED: Tests Tauri command
-
Extract
-
Move status-related tests β 9/12 tests migrated (75% success)
-
Extract
test_backend_status_service_creation()
βstatus_tests.rs
-
Extract
test_backend_status_service_update_single_service()
βstatus_tests.rs
-
Extract
test_backend_status_service_all_services_ready()
βstatus_tests.rs
-
Extract
test_backend_status_service_error_handling()
βstatus_tests.rs
-
Extract
test_backend_status_timestamps()
βstatus_tests.rs
- [~] Extract
test_get_backend_status_tauri_command()
βSKIPPED: Tests Tauri command -
Extract
test_global_backend_service_initialization()
β `status_tests.rs -
Extract
test_global_backend_service_state_updates()
β `status_tests.rs -
Extract
test_global_backend_service_thread_safety()
β `status_tests.rs - [~] Extract
test_get_backend_status_command_uses_real_service()
βSKIPPED: Tests Tauri command -
Extract
test_backend_service_emits_events_on_state_change()
β `status_tests.rs - [~] Extract
test_complete_status_communication_flow()
βSKIPPED: Uses get_backend_status Tauri command
-
Extract
-
Move audio-related tests β 5/5 tests migrated (100% success)
-
Extract
test_debug_record_audio_to_file_saves_with_timestamp()
β `audio_tests.rs -
Extract
test_debug_record_audio_to_file_creates_unique_filenames()
β `audio_tests.rs -
Extract
test_save_audio_samples_to_wav_file()
β `audio_tests.rs -
Extract
test_generate_audio_filename_with_timestamp()
β `audio_tests.rs -
Extract
test_debug_real_audio_recording_integration()
β `audio_tests.rs (ignored, as expected)
-
Extract
-
[~] Move command-related tests β 0/2 tests migrated (0% success)
- [~] Extract
test_check_model_availability()
βSKIPPED: Tests Tauri command - [~] Extract
test_set_auto_launch()
βSKIPPED: Tests Tauri command
- [~] Extract
-
Update imports and run tests β COMPLETED
-
Made internal functions
pub
with "Internal API" documentation:-
Settings functions:
get_settings_path
,get_settings_backup_path
,migrate_settings
,try_load_settings_file
,load_settings_from_dir
,validate_settings_directory_permissions
-
Hotkey functions:
validate_hot_key_internal
(with Tauri command wrapper) -
Status functions:
get_global_backend_service
,reset_global_backend_service
-
Audio functions:
generate_audio_filename_with_timestamp
,save_audio_samples_to_wav_file
,debug_record_audio_to_file
,debug_record_real_audio_to_file
-
Settings functions:
-
Updated imports in all test files to use
speakr_lib::
-
Fixed
#[cfg(test)]
β#[cfg(any(test, debug_assertions))]
for external test access -
Verified all migrated tests pass: 27 tests across 4 files
-
settings_tests.rs
: 11 tests β -
status_tests.rs
: 9 tests β -
hotkey_tests.rs
: 2 tests β -
audio_tests.rs
: 5 tests β (4 + 1 ignored)
-
-
Removed successfully migrated test functions from
lib.rs
-
Run
cargo test --workspace
- all tests pass β
-
Made internal functions
π Final Migration Summary
Test Category | Total Found | Successfully Migrated | Still in lib.rs | Success Rate |
---|---|---|---|---|
Settings Tests | 13 tests | β 11 tests | 2 tests (Tauri commands) | 85% |
Status Tests | 12 tests | β 9 tests | 3 tests (Tauri commands) | 75% |
Hotkey Tests | 3 tests | β 2 tests | 1 test (Tauri command) | 67% |
Audio Tests | 5 tests | β 5 tests | 0 tests | 100% |
Command Tests | 2 tests | 0 tests | π 2 tests (All Tauri commands) | 0% |
TOTALS | 35 tests | β 27 tests | π 8 tests | π 77% |
π Major Improvement Achieved:
- Original attempt: 8 tests migrated (23%)
- After making functions
pub
: 27 tests migrated (77%) - Improvement: +19 additional tests successfully migrated!
π Remaining Tests in lib.rs (8 tests):
All remaining tests are Tauri commands that cannot be moved because:
#[tauri::command]
functions cannot bepub
(causes macro conflicts)- External tests cannot directly invoke Tauri commands
- The may be possible to migrate by renaming the functions to
*_internal
and making thempub(crate)
, and moving the#[tauri::command]
to a wrapper function with the original function name.
Settings (2 tests):
test_atomic_write_creates_backup()
- testssave_settings
Tauri commandtest_save_settings_tauri_command()
- testssave_settings
Tauri command
Status (3 tests):
test_get_backend_status_tauri_command()
- testsget_backend_status
Tauri commandtest_get_backend_status_command_uses_real_service()
- testsget_backend_status
Tauri commandtest_complete_status_communication_flow()
- testsget_backend_status
Tauri command
Hotkey (1 test):
test_register_hot_key()
- testsregister_hot_key
Tauri command
Commands (2 tests):
test_check_model_availability()
- testscheck_model_availability
Tauri commandtest_set_auto_launch()
- testsset_auto_launch
Tauri command
β Phase 1 Complete - Ready for Phase 2
Phase 1 has been tremendously successful, achieving a 77% migration rate and reducing the lib.rs
file by ~500 lines of test code. The modular test structure is now in place and working perfectly.
Next Steps: Proceed to Phase 2: Extract Services
Phase 2: Extract Services (Medium Risk)
Objective: Move service structs and related functionality to dedicated modules
Task Checklist (Phase 2)
-
Create services module structure
-
Create
speakr-tauri/src/services/
directory -
Create
services/mod.rs
with module declarations -
Create
services/types.rs
for shared enums -
Create
services/hotkey.rs
for GlobalHotkeyService -
Create
services/status.rs
for BackendStatusService
-
Create
-
Extract ServiceComponent enum
-
Move
ServiceComponent
enum βservices/types.rs
- Add appropriate derives and documentation
-
Re-export from
services/mod.rs
-
Move
-
Extract GlobalHotkeyService
-
Move entire
GlobalHotkeyService
struct βservices/hotkey.rs
- Move all impl blocks and methods
- Add necessary imports (tauri, tracing, etc.)
-
Extract
register_global_hotkey()
implementation βservices/hotkey.rs
asregister_global_hotkey_internal()
-
Extract
unregister_global_hotkey()
implementation βservices/hotkey.rs
asunregister_global_hotkey_internal()
-
Keep
#[tauri::command]
wrappers inlib.rs
that call_internal
functions -
Make service and methods
pub(crate)
for module visibility
-
Move entire
-
Extract BackendStatusService
-
Move
BackendStatusService
struct βservices/status.rs
- Move all impl blocks and methods
-
Move
GLOBAL_BACKEND_SERVICE
static βservices/status.rs
-
Move
get_global_backend_service()
helper βservices/status.rs
-
Move
update_global_service_status()
helper βservices/status.rs
-
Extract
get_backend_status()
implementation βservices/status.rs
asget_backend_status_internal()
-
Extract
update_service_status()
implementation βservices/status.rs
asupdate_service_status_internal()
-
Keep
#[tauri::command]
wrappers inlib.rs
that call_internal
functions - Add necessary imports for Tauri AppHandle, etc.
-
Make all functions
pub(crate)
for module visibility -
Add
Default
implementation
-
Move
-
Update lib.rs imports and exports
-
Add
mod services;
tolib.rs
-
Add
use services::*;
or specific imports -
Remove original service implementations from
lib.rs
-
Update command registration in
run()
function
-
Add
-
Test service extraction
-
Run
cargo check
to verify compilation -
Run
cargo test --workspace
to ensure tests pass - Test hotkey registration functionality manually
- Test status service functionality
-
Run
Phase 3: Extract Settings (Medium Risk)
Objective: Centralize all settings management into dedicated module
Task Checklist (Phase 3)
-
Create settings module structure
-
Create
speakr-tauri/src/settings/
directory -
Create
settings/mod.rs
with module declarations -
Create
settings/persistence.rs
for file I/O operations -
Create
settings/migration.rs
for version migrations -
Create
settings/validation.rs
for directory validation -
Create
settings/commands.rs
for Tauri commands
-
Create
-
Extract path and validation functions
-
Move
get_settings_path()
βsettings/persistence.rs
-
Move
get_settings_backup_path()
βsettings/persistence.rs
-
Move
validate_settings_directory_permissions()
βsettings/validation.rs
- Add proper error handling and documentation
-
Make functions
pub(crate)
for module visibility
-
Move
-
Extract file I/O functions
-
Move
try_load_settings_file()
βsettings/persistence.rs
-
Move
save_settings_to_dir()
βsettings/persistence.rs
-
Move
load_settings_from_dir()
βsettings/persistence.rs
- Ensure all atomic write logic is preserved
- Add proper error handling chains
-
Make private functions
pub(crate)
for module visibility
-
Move
-
Extract migration logic
-
Move
migrate_settings()
βsettings/migration.rs
- Add version handling logic
- Document migration strategy for future versions
-
Make function
pub(crate)
for module visibility
-
Move
-
Extract Tauri commands
-
Extract
save_settings()
implementation βsettings/commands.rs
assave_settings_internal()
-
Extract
load_settings()
implementation βsettings/commands.rs
asload_settings_internal()
-
Keep
#[tauri::command]
wrappers inlib.rs
that call_internal
functions - Ensure internal functions use the extracted helper functions
-
Make internal functions
pub(crate)
for module visibility - Maintain same function signatures for compatibility
-
Extract
-
Update module exports and imports
-
Configure
settings/mod.rs
to re-export public functions -
Add
mod settings;
tolib.rs
-
Update imports in
lib.rs
-
Remove original settings functions from
lib.rs
-
Configure
-
Test settings extraction thoroughly
- Run isolated settings tests to ensure file I/O works
- Test corruption recovery scenarios
- Test migration scenarios with version 0 files
- Verify atomic write behavior
- Test with real application settings directory
Phase 4: Extract Debug and Audio (Low Risk)
Objective: Isolate debug and audio functionality into separate modules
Task Checklist (Phase 4)
-
Create debug module structure
-
Create
speakr-tauri/src/debug/
directory -
Create
debug/mod.rs
with conditional compilation -
Create
debug/types.rs
for debug data structures -
Create
debug/storage.rs
for static storage -
Create
debug/commands.rs
for debug Tauri commands
-
Create
-
Extract debug types and storage
-
Move
DebugLogLevel
enum βdebug/types.rs
-
Move
DebugLogMessage
struct βdebug/types.rs
-
Move
DebugRecordingState
struct βdebug/types.rs
-
Move
DEBUG_LOG_MESSAGES
static βdebug/storage.rs
-
Move
DEBUG_RECORDING_STATE
static βdebug/storage.rs
-
Move
add_debug_log()
function βdebug/storage.rs
-
Move
-
Extract debug commands
-
Extract
debug_test_audio_recording()
implementation βdebug/commands.rs
asdebug_test_audio_recording_internal()
-
Extract
debug_start_recording()
implementation βdebug/commands.rs
asdebug_start_recording_internal()
-
Extract
debug_stop_recording()
implementation βdebug/commands.rs
asdebug_stop_recording_internal()
-
Extract
debug_get_log_messages()
implementation βdebug/commands.rs
asdebug_get_log_messages_internal()
-
Extract
debug_clear_log_messages()
implementation βdebug/commands.rs
asdebug_clear_log_messages_internal()
-
Keep
#[tauri::command]
wrappers inlib.rs
that call_internal
functions -
Move
get_debug_recordings_directory()
βdebug/commands.rs
-
Make all extracted functions
pub(crate)
for module visibility
-
Extract
-
Create audio module structure
-
Create
speakr-tauri/src/audio/
directory -
Create
audio/mod.rs
with public interface -
Create
audio/files.rs
for WAV file operations -
Create
audio/recording.rs
for recording logic
-
Create
-
Extract audio file operations
-
Move
generate_audio_filename_with_timestamp()
βaudio/files.rs
-
Move
save_audio_samples_to_wav_file()
βaudio/files.rs
-
Make functions
pub(crate)
for module visibility - Add proper WAV spec configuration
- Add file path validation
-
Move
-
Extract audio recording functions
-
Move
debug_record_audio_to_file()
βaudio/recording.rs
-
Move
debug_record_real_audio_to_file()
βaudio/recording.rs
-
Make functions
pub(crate)
for module visibility - Ensure proper integration with speakr-core AudioRecorder
-
Move
-
Update conditional compilation
-
Ensure
#[cfg(debug_assertions)]
is properly applied - Test that debug code is excluded from release builds (compilation successful)
- Update command registration to handle debug commands conditionally
-
Ensure
-
Update lib.rs and test functionality
-
Add
mod debug;
andmod audio;
tolib.rs
- Update imports and re-exports
-
Remove original debug and audio functions from
lib.rs
- Test debug panel functionality in development mode (24/27 tests passing)
- Test audio recording and file saving (integration tests passing)
-
Add
SPEAKR-TAURI_LIB-RS_PHASE_5
Migration Notes: Phase 5 Refactor - Command Organisation
Overview
Phase 5 of the Speakr Tauri backend refactor extracted remaining commands into dedicated
modules and finalised the cleanup of lib.rs
. This document provides guidance for developers
working with the new structure.
What Changed
Before (Pre-Phase 5)
- All command implementations lived in
lib.rs
- File was over 1000+ lines with mixed concerns
- Commands, services, and business logic were intermingled
- Testing required testing through Tauri command wrappers
After (Phase 5 Complete)
- Commands organised into functional modules under
commands/
- Each command has an
*_internal()
function with business logic - Tauri command wrappers remain in
lib.rs
for registration lib.rs
reduced to ~400 lines, focused on configuration and integration
New File Structure
speakr-tauri/src/
βββ commands/
β βββ mod.rs # Command organisation and documentation
β βββ validation.rs # Input validation commands
β βββ system.rs # System integration commands
β βββ legacy.rs # Backward compatibility commands
βββ services/ # (From previous phases)
β βββ mod.rs
β βββ hotkey.rs
β βββ status.rs
β βββ types.rs
βββ settings/ # (From previous phases)
βββ debug/ # (From previous phases)
βββ audio/ # (From previous phases)
βββ lib.rs # Tauri integration and command registration
Command Implementation Pattern
New Pattern (Recommended)
#![allow(unused)] fn main() { // In commands/validation.rs pub async fn validate_hot_key_internal(hot_key: String) -> Result<(), AppError> { // Business logic here Ok(()) } // In lib.rs #[tauri::command] async fn validate_hot_key(hot_key: String) -> Result<(), AppError> { validate_hot_key_internal(hot_key).await } }
Key Benefits
- Testability: Internal functions can be tested without Tauri overhead
- Modularity: Commands grouped by functional domain
- Maintainability: Business logic separated from framework concerns
- Documentation: Each module has focused documentation
Working with Commands
Adding a New Command
-
Choose the appropriate module (
validation
,system
, orlegacy
) -
Implement the internal function:
#![allow(unused)] fn main() { pub async fn my_command_internal(param: String) -> Result<T, AppError> { // Implementation here } }
-
Add Tauri wrapper in
lib.rs
:#![allow(unused)] fn main() { #[tauri::command] async fn my_command(param: String) -> Result<T, AppError> { my_command_internal(param).await } }
-
Register in
run()
function:#![allow(unused)] fn main() { .invoke_handler(tauri::generate_handler![ // ... existing commands, my_command ]) }
-
Add comprehensive tests for the internal function
Command Module Guidelines
validation.rs
: Input validation, sanitisation, format checkingsystem.rs
: OS integration, file system, auto-launch, model availabilitylegacy.rs
: Deprecated or backward-compatibility commands
Testing Commands
#![allow(unused)] fn main() { // Test the internal function directly #[tokio::test] async fn test_my_command_internal() { let result = my_command_internal("test".to_string()).await; assert!(result.is_ok()); } }
Breaking Changes
Import Changes
Commands moved from crate::*
to crate::commands::*
:
#![allow(unused)] fn main() { // Old (no longer works) use crate::validate_hot_key_internal; // New use crate::commands::validation::validate_hot_key_internal; }
Function Visibility
Internal functions changed from pub(crate)
to pub
to allow cross-module access:
#![allow(unused)] fn main() { // Old pub(crate) async fn validate_hot_key_internal(...) -> ... // New pub async fn validate_hot_key_internal(...) -> ... }
Error Handling
Consistent Error Types
All commands use speakr_types::AppError
for error handling:
#![allow(unused)] fn main() { pub enum AppError { HotKey(String), Settings(String), FileSystem(String), // ... other variants } }
Error Context
Add context to errors for better debugging:
#![allow(unused)] fn main() { Err(AppError::Settings(format!("Invalid model size: {model_size}"))) }
Documentation Standards
Function Documentation
All public functions must have rustdoc comments:
#![allow(unused)] fn main() { /// Brief description of what the function does. /// /// # Arguments /// /// * `param` - Description of the parameter /// /// # Returns /// /// Description of what is returned. /// /// # Errors /// /// Conditions that cause errors. /// /// # Examples /// /// ```rust,no_run /// use speakr_lib::commands::validation::validate_hot_key_internal; /// // Example usage /// ``` pub async fn my_function_internal(param: String) -> Result<(), AppError> { // Implementation } }
Module Documentation
Each module should have comprehensive documentation explaining its purpose and usage patterns.
Testing Strategy
Unit Tests
- Test internal functions directly (not through Tauri wrappers)
- Use test isolation patterns for file system operations
- Mock external dependencies where possible
Test Organisation
Tests live alongside code in mod tests
blocks:
#![allow(unused)] fn main() { #[cfg(test)] mod tests { use super::*; #[tokio::test] async fn test_function_success() { // Test implementation } } }
Backward Compatibility
Legacy Support
Commands in legacy.rs
maintain backward compatibility but should be considered deprecated
for new development.
Deprecation Path
When deprecating commands:
- Move to
legacy.rs
- Add deprecation notice in documentation
- Provide migration path in rustdoc
Performance Considerations
Command Overhead
The new pattern adds minimal overhead:
- Internal functions: Direct function calls
- Tauri wrappers: Thin delegation layer
Memory Usage
- Internal functions can be tested in isolation without Tauri runtime
- Reduced memory usage during testing
- Better compiler optimisations due to cleaner module boundaries
Common Patterns
Input Validation
#![allow(unused)] fn main() { pub async fn validate_input_internal(input: String) -> Result<(), AppError> { let input = input.trim(); if input.is_empty() { return Err(AppError::Settings("Input cannot be empty".to_string())); } // Additional validation... Ok(()) } }
File System Operations
#![allow(unused)] fn main() { pub async fn check_file_internal(path: String) -> Result<bool, AppError> { let path = std::path::Path::new(&path); match path.exists() { true => Ok(true), false => Ok(false), } } }
Error Propagation
#![allow(unused)] fn main() { pub async fn complex_operation_internal() -> Result<T, AppError> { let result = validate_input_internal(input).await?; let file_exists = check_file_internal(path).await?; // Process results... Ok(final_result) } }
Future Development
Adding New Modules
If the commands/
directory grows too large, consider:
- Creating subdirectories for related commands
- Grouping by feature area rather than technical function
- Maintaining the
*_internal
+ wrapper pattern
Architectural Evolution
The current pattern supports:
- Easy migration to other frameworks (business logic is framework-agnostic)
- Microservice extraction (internal functions are self-contained)
- Enhanced testing strategies (direct function testing)
Troubleshooting
Common Issues
- Import errors: Check if function moved to new module
- Visibility errors: Internal functions are now
pub
, notpub(crate)
- Test failures: Update imports in test files
- Documentation tests: Use
speakr_lib
as crate name, notspeakr_tauri
Migration Checklist
When updating code that depends on the old structure:
- Update imports to new module paths
- Change function visibility if needed
- Update test imports and assertions
- Fix documentation examples with correct crate name
-
Verify error handling uses
AppError
consistently
Last Updated: Phase 5 Complete
For questions about this refactor, see the original planning documents in docs/refactor/
Rust Documentation Tracking
Instructions
Original Prompt: Add detailed comments to all functions etc in the files and clean up each file, remove orphaned comments, group code logically (e.g. tauri::commands together) and add large comment signposts to help navigate the file easily.
Documentation Standards:
- Add detailed rustdoc comments to all functions, commands, and relevant items
- Remove orphaned or outdated comments
- Group code logically with clear comment signposts for easy navigation
- Ensure all public items are fully documented, including parameters, errors, and usage examples where appropriate
- Use large comment blocks (e.g.,
// ============================================================================
) for major sections - Use smaller comment dividers (e.g.,
// --------------------------------------------------------------------------
) for individual functions - Follow Rust documentation best practices and project coding standards
Progress Tracking
- Select an UNCHECKED
[ ]
item from the list. - IMMEDIATELY add a progress indicator to the item:
[~]
- Comment the file following the instructions in this document.
- On COMPLETION, add a checkmark to the item in the list:
[x]
- Verify your changes using
precommit run ...
(formats, lints and runs tests) - Fix any errors or warnings and repeat step 5 until no errors or warnings remain
- Commit your changes to Git.
- Return to step 1 until all items are checked.
speakr-core/src/
-
lib.rs
β COMPLETED -
audio/mod.rs
-
model/mod.rs
-
model/list.rs
-
model/list_updater.rs
-
model/list_tests.rs
-
model/metadata.rs
-
bin/update_models.rs
-
bin/update_models_tui.rs
speakr-tauri/src/
-
lib.rs
β COMPLETED -
main.rs
-
audio/mod.rs
-
audio/files.rs
-
audio/recording.rs
-
commands/mod.rs
-
commands/legacy.rs
-
commands/system.rs
-
commands/validation.rs
-
debug/mod.rs
-
debug/commands.rs
-
debug/storage.rs
-
debug/types.rs
-
services/mod.rs
-
services/hotkey.rs
-
services/status.rs
-
services/types.rs
-
settings/mod.rs
-
settings/commands.rs
-
settings/migration.rs
-
settings/persistence.rs
-
settings/validation.rs
speakr-types/src/
-
lib.rs
β COMPLETED
speakr-ui/src/
-
lib.rs
-
app.rs
-
debug.rs
-
settings.rs
Test Files
speakr-core/tests/
-
audio_capture.rs
speakr-tauri/tests/
-
audio_tests.rs
-
commands_tests.rs
-
debug_save.rs
-
global_hotkey.rs
-
hotkey_tests.rs
-
integration_tests.rs
-
settings_tests.rs
-
status_tests.rs
Comment Style Examples
Use these exact patterns for consistency across all files:
Comment Hierarchy Structure
graph TD A["File Level<br/>============================================================================<br/>//! Module Documentation<br/>============================================================================"] --> B["Major Section<br/>============================================================================<br/>// Section Name<br/>============================================================================"] B --> C["Subsection<br/>// =========================<br/>// Subsection Name<br/>// ========================="] C --> D["Function/Item<br/>// --------------------------------------------------------------------------<br/>/// Function documentation<br/>/// # Arguments, # Returns, # Errors<br/>#[tauri::command]<br/>async fn function_name()"] D --> E["Implementation<br/>// Regular comments<br/>// explaining logic"] F["End of File<br/>// ==========================================================================="] B --> G["Module Declarations<br/>// =========================<br/>// Module Declarations<br/>// =========================<br/>pub mod commands;"] B --> H["External Imports<br/>// =========================<br/>// External Imports<br/>// =========================<br/>use tauri::AppHandle;"] classDef fileLevel fill:#E5F5E0,stroke:#31A354,color:#31A354 classDef majorSection fill:#E6E6FA,stroke:#756BB1,color:#756BB1 classDef subsection fill:#EFF3FF,stroke:#9ECAE1,color:#3182BD classDef function fill:#FFF5EB,stroke:#FD8D3C,color:#E6550D classDef implementation fill:#F2F0F7,stroke:#BCBDDC,color:#756BB1 classDef endFile fill:#E5E1F2,stroke:#C7C0DE,color:#8471BF classDef modules fill:#EAF5EA,stroke:#C6E7C6,color:#77AD77 class A fileLevel class B majorSection class C subsection class D function class E implementation class F endFile class G,H modules
File-Level Documentation
#![allow(unused)] fn main() { // ============================================================================ //! Module name and purpose. //! //! This module provides functionality for: //! - Feature 1 //! - Feature 2 //! - Feature 3 // ============================================================================ }
Major Section Dividers
#![allow(unused)] fn main() { // ============================================================================ // Section Name (e.g., "Tauri Command Definitions") // ============================================================================ }
Subsection Headers
#![allow(unused)] fn main() { // ========================= // Subsection Name (e.g., "Debug Commands (Debug Only)") // ========================= }
Function/Item Dividers
#![allow(unused)] fn main() { // -------------------------------------------------------------------------- /// Function description with full rustdoc. /// /// # Arguments /// * `param` - Parameter description /// /// # Returns /// Returns description. /// /// # Errors /// Error conditions. /// /// # Examples /// ```no_run /// // Usage example /// ``` #[tauri::command] async fn function_name() -> Result<(), AppError> { // Implementation } }
Module Declarations Section
#![allow(unused)] fn main() { // ========================= // Module Declarations // ========================= pub mod commands; pub mod services; // etc. }
Import Section
#![allow(unused)] fn main() { // ========================= // External Imports // ========================= use std::collections::HashMap; use tauri::{AppHandle, Manager}; // etc. }
Setup/Initialization Comments
#![allow(unused)] fn main() { // ========================= // Initial Setup (Description of what's being set up) // ========================= }
End-of-File Marker
#![allow(unused)] fn main() { // =========================================================================== }
Rustdoc Comment Patterns
Standard Function Documentation
#![allow(unused)] fn main() { /// Brief one-line description of what the function does. /// /// More detailed explanation if needed, including behavior, /// side effects, and important implementation details. /// /// # Arguments /// * `param1` - Description of first parameter /// * `param2` - Description of second parameter /// /// # Returns /// Description of return value and what it represents. /// /// # Errors /// Description of when and why the function might return an error. /// /// # Examples /// ```no_run /// let result = function_name(param1, param2)?; /// assert_eq!(result, expected_value); /// ``` }
Tauri Command Documentation
#![allow(unused)] fn main() { /// Brief description of the command's purpose. /// /// # Arguments /// * `param` - Parameter description /// /// # Returns /// Returns `Ok(())` on success. /// /// # Errors /// Returns `AppError` if the operation fails. /// /// # Example /// ```no_run /// // In frontend: invoke('command_name', { param }) /// ``` }
Debug-Only Function Documentation
#![allow(unused)] fn main() { /// Debug: Brief description of debug functionality. /// /// This function is only available in debug builds. }
Module Documentation
#![allow(unused)] fn main() { //! Module name and purpose. //! //! This module provides [specific functionality] for the Speakr application: //! - Feature/capability 1 //! - Feature/capability 2 //! - Feature/capability 3 //! //! # Usage //! Brief usage example or important notes. }
Documentation Checklist Template
For each file, ensure:
- File-level documentation: Module-level rustdoc comment explaining purpose and contents
-
Function documentation: All public functions have comprehensive rustdoc
- Purpose and behavior description
-
Parameters documented with
# Arguments
-
Return values documented with
# Returns
-
Error conditions documented with
# Errors
-
Usage examples where appropriate with
# Examples
- Type documentation: All public structs, enums, and traits documented
- Large comment signposts: Major sections clearly marked
- Code organization: Related code grouped logically
- Orphaned comments: Removed outdated or irrelevant comments
- Formatting: Consistent with rustfmt standards
- Testing: Code compiles and tests pass after changes
Priority Order
-
High Priority (Core functionality):
speakr-types/src/lib.rs
(shared types)speakr-core/src/lib.rs
(core functionality)speakr-tauri/src/main.rs
(application entry)
-
Medium Priority (Services and commands):
speakr-tauri/src/services/*
(service modules)speakr-tauri/src/commands/*
(command modules)speakr-tauri/src/settings/*
(settings modules)
-
Lower Priority (Supporting modules):
speakr-tauri/src/audio/*
(audio modules)speakr-tauri/src/debug/*
(debug modules)speakr-ui/src/*
(UI modules)speakr-core/src/model/*
(model modules)
-
Test Files (Documentation focused on test clarity):
- All test files in
tests/
directories
- All test files in
Notes
- Completed:
speakr-tauri/src/lib.rs
- Comprehensive documentation added with clear sections and detailed rustdoc comments - Next Target: Recommend starting with
speakr-types/src/lib.rs
as it contains shared types used across the project - Testing: Always run
cargo fmt
,cargo clippy
, andcargo test
before committing changes - Commit Strategy: Document and commit files in logical groups (e.g., all service files together)