We built a Claude Code bridge into our todo app

A few weeks ago I assigned a todo to Claude Code and it opened a pull request before I finished my coffee. The todo lived in our own app — goals. — on a goal called "Ship the website refresh." I'd added a Claude-Code-paired agent to that goal the night before, named it CodeBot, and then forgotten about it. The next morning I tapped assign → CodeBot on the first todo in the list, locked my phone, and went to make coffee. By the time I got back, my phone had buzzed three times: tool-call notifications, then a chat reply with a branch name and a commit hash.

That moment felt like the future bending toward me. It also took us about six weeks of tearing down assumptions about how an AI assistant should plug into a real product to get there. This post is about both halves — the why (founder), and the how (engineering) — because we think more apps should do this and we want to make it less mysterious.

The problem we kept hitting

For the first eight months of goals., our AI assistant Keen lived entirely inside the app. You could talk to Keen, get a Monday briefing written by Keen, ask Keen to suggest todos for a goal — but Keen was bound to whatever surface we built for it. Every new capability meant a new view, a new sheet, a new prompt template, a new round trip through the App Review queue.

Meanwhile, half our power users were already paying for Claude Code. They'd open it in a side window, paste in a goal description, ask Claude to draft something, and then manually copy the result back into our app. That's fine. It also misses the entire point. The valuable thing about an AI assistant isn't its prose — it's what it can do when it has live context. Pasting a screenshot of a todo into a chat window is the assistant version of printing an email so you can put it on your desk.

So we kept asking the same question: what would it look like if Claude Code could just join the goal? Read the open todos directly. See what's been completed this week. Check the chat history. Notice that the user has scheduled a meeting on Tuesday. Push results back into the same chat the human is reading. Be a teammate, not a panel.

The answer turned out to be: build an MCP server.

Why MCP is the right shape for this

The Model Context Protocol is Anthropic's open spec for connecting AI tools to external data and capabilities. Think of it as the USB-C port for AI assistants: any client that speaks MCP (Claude Code, claude.ai, Claude Desktop, several IDEs now) can talk to any server that speaks MCP, with no per-integration glue code.

For us this matters for three reasons:

One server, many clients. We didn't have to write a Claude Code plugin and then a separate Claude Desktop plugin and then a separate API. We wrote one MCP server, and every Anthropic surface that adds MCP support inherits it for free.
The protocol carries auth. MCP supports both OAuth 2.1 (with dynamic client registration, which is the actual technical wonder of the spec) and bearer tokens. Both map cleanly onto the two ways our users actually want to set this up: OAuth for the connector UX in claude.ai, PATs for headless / scripted setups.
Tools are first-class. The whole point of the protocol is structured tool calls with typed arguments and structured responses. We could expose complete_todo, assign_todo, edit_goal as actual capabilities the model can introspect, instead of stuffing them into a prose system prompt and praying.

If you're building anything that wants to be reachable by AI assistants and you're still serializing your domain into a chat-prompt-shaped pancake, look at MCP. It is genuinely a better way.

What we actually built

Three Supabase edge functions, four database tables, and one new concept inside the app.

The new concept: agents

Before any of the wiring, we had to decide what an "AI teammate" actually is inside our data model. The shape we landed on:

An agent is an addressable identity owned by a human.

Not a row in auth.users. Not a magic system actor. Just a row in a new agents table with a name, a config blob (model id, system prompt, tool allowlist), an optional repo_path, and an owner_user_id. The owner is responsible for what the agent does. The agent can be a member of any shared goal. It can be assigned todos. It can post in chat. It looks like a teammate everywhere a teammate would normally appear, because that's what it is.

This shape made the rest of the work fall out cleanly. We already had a sharing model — goals, todos, and notes can be shared with collaborators via shared_item_members. We added an agent_id column to that table (XOR with user_id) and suddenly agents could be members of any shared goal. We added assigned_to_user_id and assigned_to_agent_id columns to todos (also XOR), with a Postgres trigger validating that whoever you're assigning to actually belongs to that goal's team. The chat thread we'd already built for human-to-human messaging worked unchanged for human-to-agent and agent-to-human.

About 80% of "support AI agents as teammates" turned out to be just modeling them as proper teammates. The remaining 20% was the MCP wiring.

Edge function 1: the MCP server itself

An edge function called mcp implements the protocol. It accepts JSON-RPC requests over HTTP, advertises a tool list on tools/list, and dispatches tools/call to the appropriate handler. Tools talk directly to Postgres via the service role, with a row-level filter that restricts each call to the goal + agent the session is paired to.

The tool surface, roughly:

pair_session, get_paired_context, get_goal, list_goals — read
append_message — chat
complete_todo, add_sub_todo, schedule_todo, edit_todo, edit_goal — work
assign_todo — reassign across the team
add_memory — propose a long-term memory the user can approve

You can see the full live tool list on the Claude Code page, and our project's CLAUDE.md doc tells paired sessions which tools to call when (e.g. always refresh get_paired_context before answering a "what's open?" question, because the chat is moving while the session is running).

Edge function 2: the OAuth dance

The OAuth side took the longest. mcp-oauth implements just enough of OAuth 2.1 + Dynamic Client Registration to make claude.ai's "Add connector" flow work end-to-end. The first time a user adds https://trygoals.app/mcp as a connector, Claude Code's MCP client hits our DCR endpoint and registers itself. Then it bounces the user through our authorize page (which lives in-app via a deep link — they tap "approve," we issue an auth code, Claude Code exchanges it for an access token).

The thing nobody tells you about implementing OAuth 2.1 with DCR for the first time: the spec is cleanly written, but the implicit grant flow you'll be tempted to support is not the one Claude Code actually uses. Read the MCP authorization spec carefully and implement exactly what it says, in the order it says. We had two false starts before we accepted this.

Edge function 3: the PAT path

For users who'd rather paste a token into claude mcp add ... than do the OAuth dance, mcp-pats mints personal access tokens. The flow is the standard one: random secret, hashed with SHA-256 at rest, plaintext shown to the user exactly once. The first ~12 characters stay in plaintext as a token_prefix column purely so the Settings UI can render gap_live_a8f3… in the device list. (That's a cosmetic concession, not a credential — the prefix is too short to brute-force usefully.)

Both auth paths converge on the same mcp_sessions table. The MCP function doesn't care whether you came in via OAuth or PAT; it just looks up the session, validates the agent + goal binding, and dispatches the tool call.

The pairing flow we landed on (after rejecting two others)

Here's where the design got interesting.

Our first instinct was to bind a Claude Code session to a single goal at OAuth time. The user picks the goal during the consent screen, the access token is scoped to that goal, done.

That fell over the first time someone wanted to use Claude Code on two goals. They'd have to disconnect, re-authorize, and pick a different goal — which is awful UX, especially because OAuth re-auth flows in claude.ai aren't designed to be done casually.

Our second instinct was to bind sessions to the user, and have the model pick which goal to act on per tool call. The model would see all the user's goals and pick "the right one" each time.

That fell over the first time the model picked the wrong one. (Of course it did. The mental model of "every tool call could be for any of my 12 goals" is not one any AI assistant — or human — should be expected to maintain.)

What we actually shipped: session-scoped pairing, with a six-digit code as the binding step.

You add the goals-app server in Claude Code (one-time, OAuth or PAT).
In the iPhone app you open a goal, tap Pair Claude Code as <agent>, and the app shows you a six-digit code with a 5-minute TTL.
From inside Claude Code, you call the pair_session tool with that code.
That session — and only that session — is now bound to (this goal, this agent) for its lifetime.

If you want a second Claude Code session on a second goal, open a second Claude Code window and pair it to a different code. Sessions are cheap. The model never has to ask "which goal did you mean?" because the answer is fixed for the duration of the conversation.

This is the design we kept circling back to and almost rejecting because it requires the user to learn one extra concept (paired sessions). We finally accepted it after watching three different people use the failed designs and all three independently invent the "let me pair this session to one specific goal" mental model on their own. Sometimes the right design is the one users intuit before you build it.

What it feels like to use

Here's a transcript from a real session yesterday, lightly edited for length. The goal was "Refresh marketing site for the collab features." The agent was paired in via OAuth that morning.

Me: @CodeBot can you tackle the new "Solo on Sunday. A team by Tuesday." section on the homepage? Three cards mirroring the Four-signals layout. Repo's set on the goal.

CodeBot (Claude Code): On it. I'll branch off main, add the section between #how and .features-section, ship a 3-up grid that reuses the .features card style with a .collab-grid modifier, and push to a new branch. Want me to also bump the featureList JSON-LD to match?

Me: Yes please.

...

CodeBot: Pushed 8756980 to claude/website-collab-section-GIZtt. Section + matching FAQ + JSON-LD all in. Marked the todo done. Want me to keep going on Tier 2 or pause for review?

The thing that surprises people the first time they see this is how unsurprising it is. There's no chat-prompt magic happening. The agent isn't pretending to remember context across messages. It's just reading the goal's actual state from the database between messages and then acting on it through tool calls. The conversation works because the conversation is a real artifact of the team — chat thread, todo list, repo path, all stored as rows the agent can re-fetch any time it needs them.

It also pushes you. When the agent finishes, an APNs notification fires from our send-push function — the same path that handles human teammate assignments. So you find out from your phone that the work is done, the same way you'd find out a colleague had finished their part.

The honest list of things we got wrong

We're not going to pretend this was clean.

1. We built the OAuth flow before we'd understood it. We had two false starts on the auth dance because we tried to be clever with token scoping before we'd really internalized that MCP sessions are the natural scope, not OAuth tokens. The third try — session-scoped pairing on top of user-scoped tokens — is the one that worked.

2. We almost made agents authenticate as users. The first design had agents living in auth.users with a kind = 'agent' flag. This would have been a disaster. Every auth-aware piece of the codebase would have needed a "but is it actually an agent?" check. Splitting agents into their own table and doing the XOR-on-foreign-keys trick on shared_item_members and todos was the design that let us add agents without touching auth at all.

3. We initially didn't push. The first version of the bridge was silent — the agent would post in the chat, but nothing notified you. We thought the in-app realtime sync was enough. It wasn't. Half the value of an AI teammate is "I can hand this off and walk away." Without push, you have to keep checking. Wiring APNs into the agent reply path was a one-day fix that doubled the felt value of the whole feature.

4. We tried to make the agent omniscient. Early on we'd dump every goal, every todo, every chat message into the system prompt at session start. This burned tokens, slowed everything down, and didn't help. Switching to get_paired_context as a tool the model calls explicitly when it needs context made everything faster and better. Models, like people, do their best work when they go look something up at the moment they need it instead of trying to remember it from earlier.

What we want to build next

Three things on the roadmap that the bridge unlocks:

PR-completion → todo done. When a Claude-Code-paired agent merges a PR for a todo, automatically mark the todo complete and APNs-notify the human assignee. The plumbing exists; we just need to listen for the right webhook.
Cross-goal handoff. Right now an agent can only act on its paired goal. We'd like to let a human say "hand this todo to the agent on the Marketing goal" and have the system handle re-assignment cleanly. This is mostly UX work; the data model already supports it.
Agent templates. Pre-configured agent recipes so you don't have to write a system prompt from scratch. ResearchBot, EditorBot, CodeBot. We already have the table; we just haven't filled it.

The bigger argument

We didn't build this because it's cool that AI can write code. (It's cool. That's not new.) We built it because the next era of productivity software is going to be defined by how cleanly you can delegate to an AI — not how cleanly you can chat with one.

Every productivity app I've used in the last decade is a single-player game where AI has been bolted onto the side as a chat panel. You ask the chat panel something, it responds, you copy the response somewhere it can actually do something. That's a step up from no AI at all, but it's a small step. The big step is making AI a peer inside the app's own collaboration model — same chat thread, same task list, same notification rail.

MCP is what makes this newly possible. The previous generation of "AI plus your data" required either an enterprise contract with a vendor, a brittle screen-scraping browser extension, or an entirely new product category. Now any app with a half-decent backend can stand up a server in a long weekend and let any MCP-speaking client become a teammate.

If you're a builder reading this and you're tempted to just bolt a chat panel onto your app: please don't. Build the bridge. The first time someone you've never met assigns a todo to your AI agent and the agent ships it before lunch, you'll know what we mean.

Try the bridge yourself

The Claude Code MCP integration is included with every goals. account. See the Claude Code page for setup instructions (OAuth and PAT both supported), or grab the app on the App Store and try it on a goal of your own.

We built a Claude Code bridge into our todo app.