Stop typing your todos. Start talking to them.

Here's a thing nobody talks about: typing a task into a todo app takes about 15 seconds. Unlock phone, find app, tap new-task button, wait for keyboard, type, assign a project, maybe set a date, hit save. Fifteen seconds per task.

Fifteen seconds sounds like nothing. Fifteen seconds is actually a huge amount of time. It's longer than the average person will hold onto an unimportant thought before it evaporates. It's longer than the interruption window a colleague will wait before moving on. It's longer than the distance between "oh I should remember to do that" and "what was I thinking about again?"

The result: roughly 40% of the things you plan to add to your todo list never make it there, not because they weren't worth capturing, but because the capture friction was too high. You thought of the thing, you started reaching for the phone, and by the time you unlocked it the thought was gone.

This is the single biggest problem with every todo app built before 2024. And it's finally, properly solved.

Dictation is not voice capture

Before we get into the interesting part, let's clear up a common confusion. Every iOS todo app has "dictation" — you tap a text field, tap the mic key on the keyboard, and speak your task. Apple's on-device speech recognition converts your words to text and pastes them into the field. You tap save.

This is not voice capture. This is typing with your mouth. It's slightly faster than using the keyboard, but:

You still have to open the app and tap the right text field
You still have to say the task in "writing" form, with punctuation and structure
You still have to manually pick a project, a date, a priority
It only works for one task at a time

Real voice capture — the thing that actually removes the friction — lets you just say what you're thinking, in natural messy human sentences, and the app figures out everything else.

What real voice capture looks like

Here's what I actually said into goals. while I was making coffee this morning, verbatim:

"Ok so I need to pick up dry cleaning before five, I want to meal prep every Sunday going forward, my new goal this year is to run a half marathon by fall, and I should probably schedule a dentist appointment at some point."

One sentence, spoken out loud while my hands were full of coffee beans. Takes about six seconds to say. Here's what appeared in the app a second later:

Todos tab, "This week" section

Pick up dry cleaning (before 5pm)

Todos tab, "Every week" section

Meal prep (recurring weekly)

Todos tab, "Ongoing" section

Schedule dentist appointment

Goals tab

Run a half marathon by fall (category: health)

Four items, correctly sorted into the right tabs, the right scopes, the right cadences. The dry cleaning became a this-week one-off. Meal prep became a weekly recurring habit because I used the word "every." The dentist appointment got slotted into the Ongoing backlog because I said "at some point." And the half marathon got filed as a proper goal in the health category, not as a todo, because it's long-horizon and aspirational.

I didn't tell the app any of this. It figured out all four classifications from the single sentence I said.

How it works under the hood

There are three pieces.

1. On-device speech-to-text (Apple Speech framework)

The raw audio is transcribed on your device using Apple's SFSpeechRecognizer. This is the same framework that powers iOS dictation, but the nice thing is that for our use case the audio never has to leave the phone. Apple's on-device models in iOS 17+ are genuinely good — near-parity with cloud-based transcription for short utterances in a quiet environment.

This matters for privacy. Your voice recordings are transcribed on-device before anything is sent to our servers. Even if you never upload the audio file (you can configure goals. to keep recordings local-only), the transcript is all we ever receive.

2. Intent parsing via Claude

The transcript gets sent to a Supabase edge function we call parse-voice-intent. That function wraps Anthropic's Claude API with a careful system prompt that looks something like:

The system prompt (roughly)

You parse a spoken voice memo into structured todos and goals. Users talk in natural, messy sentences. Sort each thing they said into a todo (actionable, checkable) or a goal (long-horizon, aspirational). For todos, pick a cadence (one-off vs weekly recurring) based on explicit language like "every Sunday." Pick a scope (this week vs ongoing) based on urgency cues like "before Friday" or "at some point." For goals, pick a category from {health, career, finance, learning, relationships, personal, mindfulness, creative, other}. Never invent items the user didn't mention. Return pure JSON.

Claude is exceptionally good at this kind of structured extraction. We've run thousands of test phrases through it and the accuracy on cadence, scope, and category classification is >95% for natural spoken English. The cases where it misclassifies are usually ambiguous cases where a human would also ask a clarifying question.

3. Database writes + optimistic UI

The function writes the extracted items directly into the todos and goals tables on Supabase, then returns a summary to the client. The iOS app refetches both collections in parallel and the new items show up in the right tabs immediately, with no page reload or manual sync.

From the user's perspective: you tap a mic, say a sentence, and a couple of seconds later the items are just... there. In the right place. Classified correctly. Ready to be checked off.

Why this changes how you use a todo app

Here's the thing that surprised us when we started using this internally. The value isn't really "I save time typing." The value is that you start capturing things you used to skip.

That flash of "oh I should remember to..." that would previously evaporate in the 15-second friction gap? Now you can capture it in 3 seconds, hands-free, while you're walking to your next meeting. The mental load of remembering-to-remember drops dramatically. Your todo list becomes a more honest reflection of what's actually in your head, because the things in your head can get in without fighting for it.

David Allen (him again) calls this the "mind like water" state — when you trust your external system enough that you stop trying to hold things in your head. Everyone who's read GTD wants this. Almost nobody achieves it, because the capture step is too slow. Voice-first capture is the first thing I've ever used that actually makes it feel reachable.

The "just talk to it" test

If you want to know whether a productivity app is genuinely voice-first or just has dictation bolted on, try this: open it, tap the voice button, and say "I need to call my mom, schedule a gym session for Wednesday morning, and meal prep every Sunday."

A dictation-based app will transcribe that into one text field as a single run-on task. You'll have to manually split it, assign dates, set the recurrence, and file it into the right project. Four separate sessions.

A real voice-first app will create three distinct tasks with three different cadences in one shot.

Right now, the only app I know of that passes this test is goals., which is what we built. I'd love to be wrong — if you know of another one, please email me at hello@trygoals.app. I think voice-first capture is going to be the single biggest shift in productivity software over the next two years, and I don't want us to be the only ones working on it.

Try voice capture yourself

goals. is live on the App Store for iPhone and Mac. 30-day free trial, no credit card — tap the mic and go.