FR-3: Transcription
Offline transcription of recorded audio to text using Whisper.
Requirement
- Use
whisper-rs
to run Whisper (GGUF) models entirely on-device. - Default language: English (en). Allow user language selection in Settings.
- Transcription must complete within ≤ 3 s (95th percentile) for 5-second recordings on Apple
- Silicon with the small model.
- Support user-selectable model sizes for latency/accuracy trade-off.
- No external network calls during transcription.
Rationale
On-device inference preserves privacy and removes network latency, achieving the product’s privacy-first promise.
Acceptance Criteria
- Transcription completes within latency budget on M1 and Intel reference machines.
- Selecting a different model in Settings updates the engine without restart.
- No outbound network traffic observed via packet capture.
- Errors (e.g. model missing) surface in UI overlay/log with actionable message.
Test-Driven Design
Begin with failing automated tests for latency, language selection, and network isolation. Implement transcription until all tests pass, following TDD.
References
PRD §6 Functional Requirements – FR-3