Talk into your phone; get back a clean transcript, a short summary, and auto-extracted to-dos. A voice memo that files itself.
Next to-dos
- Decide on-device vs cloud transcription default — privacy vs accuracy on long notes
- Solve the iOS background-audio permission flow
- Multi-speaker memos — diarization or punt for v1?
- Pick the default destination for extracted to-dos (Reminders vs in-app)
- Answer the existential question: how is this better than Otter?
Recent activity
- To-do added — Whisper.cpp transcription working on-device · 5 hours ago
- To-do added — Extraction prompt returning clean JSON (summary + action items) · 5 hours ago
- To-do added — Decide on-device vs cloud transcription default — privacy vs accuracy... · 5 hours ago
- To-do added — Solve the iOS background-audio permission flow · 5 hours ago
- To-do added — Multi-speaker memos — diarization or punt for v1? · 5 hours ago
- To-do added — Pick the default destination for extracted to-dos (Reminders vs in-app... · 5 hours ago
- To-do added — Answer the existential question: how is this better than Otter? · 5 hours ago
- Created project · 5 hours ago
Design doc
AI Voice Notes — design doc
What it is: A voice-memo app that turns rambling out-loud thinking into something useful — a clean transcript, a 3-line summary, and a list of action items — without you touching the keyboard.
The problem it solves
Voice memos pile up and never get listened to again. The value is trapped in audio. This extracts it: record → transcribe → summarize → pull out the to-dos → drop them somewhere you'll actually see them.
User flow
- Tap record, talk, tap stop.
- Transcript appears in seconds (on-device first, cloud fallback for long notes).
- A summary + extracted action items render below.
- One tap sends the action items to your task app (or this Latrop project, naturally).
Approach
- Transcription: Whisper (small/distil) on-device where possible; cloud for >2 min.
- Summary + to-dos: one LLM pass with a strict JSON schema (summary, bullets, action_items[]).
- Storage: audio + transcript + extraction stored together; searchable.
Stack
Swift / SwiftUI (iOS first) · whisper.cpp · an LLM endpoint for extraction · local SQLite.
Open questions
- On-device vs cloud transcription default — privacy vs. accuracy on long notes?
- How to handle multi-speaker memos — diarization, or ignore for v1?
- Where do extracted to-dos go by default — Reminders, or stay in-app?
Why it's parked
Whisper on-device works; the extraction prompt is solid. Stalled on the iOS background-audio permissions dance and a lingering "is this just a worse Otter?" existential question. Shelved, not dead.
Decision log
- 2026-04-12 — Extraction must return strict JSON (summary / bullets / action_items) so the UI is dumb.
Latrop