Next to-dos

Decide on-device vs cloud transcription default — privacy vs accuracy on long notes
Solve the iOS background-audio permission flow
Multi-speaker memos — diarization or punt for v1?
Pick the default destination for extracted to-dos (Reminders vs in-app)
Answer the existential question: how is this better than Otter?

Recent activity

To-do added — Whisper.cpp transcription working on-device · 5 hours ago
To-do added — Extraction prompt returning clean JSON (summary + action items) · 5 hours ago
To-do added — Decide on-device vs cloud transcription default — privacy vs accuracy... · 5 hours ago
To-do added — Solve the iOS background-audio permission flow · 5 hours ago
To-do added — Multi-speaker memos — diarization or punt for v1? · 5 hours ago
To-do added — Pick the default destination for extracted to-dos (Reminders vs in-app... · 5 hours ago
To-do added — Answer the existential question: how is this better than Otter? · 5 hours ago
Created project · 5 hours ago

Design doc

AI Voice Notes — design doc

What it is: A voice-memo app that turns rambling out-loud thinking into something useful — a clean transcript, a 3-line summary, and a list of action items — without you touching the keyboard.

The problem it solves

Voice memos pile up and never get listened to again. The value is trapped in audio. This extracts it: record → transcribe → summarize → pull out the to-dos → drop them somewhere you'll actually see them.

User flow

Tap record, talk, tap stop.
Transcript appears in seconds (on-device first, cloud fallback for long notes).
A summary + extracted action items render below.
One tap sends the action items to your task app (or this Latrop project, naturally).

Approach

Transcription: Whisper (small/distil) on-device where possible; cloud for >2 min.
Summary + to-dos: one LLM pass with a strict JSON schema (summary, bullets, action_items[]).
Storage: audio + transcript + extraction stored together; searchable.

Stack

Swift / SwiftUI (iOS first) · whisper.cpp · an LLM endpoint for extraction · local SQLite.

Open questions

On-device vs cloud transcription default — privacy vs. accuracy on long notes?
How to handle multi-speaker memos — diarization, or ignore for v1?
Where do extracted to-dos go by default — Reminders, or stay in-app?

Why it's parked

Whisper on-device works; the extraction prompt is solid. Stalled on the iOS background-audio permissions dance and a lingering "is this just a worse Otter?" existential question. Shelved, not dead.

Decision log

2026-04-12 — Extraction must return strict JSON (summary / bullets / action_items) so the UI is dumb.