Architecture
The core idea
Clip is a thin orchestration layer between three things that already exist: a workspace database (Supabase), an agent runtime (Cursor SDK + bridge + pool), and a content directory (skills.sh / companies.sh). Everything else is glue and UI.
If you read this page and remember nothing else, remember the shape:
humans agents
│ │
▼ ▼
┌─────────────────────────────────────┐
│ Next.js workspace shell │
│ (auth, RBAC, board, run viewer) │
└────────────┬────────────────────────┘
│
┌────────────────┼────────────────────┐
▼ ▼ ▼
Supabase RuntimeProvider Skills/Companies
(DB + Realtime (cloud / local / (skills.sh,
+ Auth + RLS) pool — same iface) companies.sh)
The pieces
Supabase is the source of truth. Workspaces, members, roles, agents, goals, tickets, runs, run_events, audit. RLS keeps tenants apart; Realtime pushes ticket and run updates to subscribed clients.
RuntimeProvider is a single TypeScript interface with three implementations. Cloud calls Agent.create() from @cursor/sdk against Cursor-hosted VMs. Local talks to a tiny bridge that piggybacks on the user's already-running Cursor. Pool is "any URL that speaks the same shape." All three emit SDKMessage events; we persist them to run_events and stream to the client over Supabase Realtime.
Skills are installable units pinned by owner/repo and a version. We bind them to roles, not directly to agents — that way "CTO" is a portable description that includes its capabilities, and changing a role propagates.
Companies are manifests imported from companies.sh: roles, default agents, goals template, MCP servers, skill bindings. Forking creates rows in your workspace. We don't publish back yet.
Request flow for a single run
- User clicks "Summon agent on this ticket" in the UI.
- Server checks RBAC — does this member's role have
summon_agent? - We compose the agent's skill bundle:
agent.skill_refs ∪ role.skill_refs. - We pick a runtime (UI default
agent.runtime_pref, overridable per summon) and call the matchingRuntimeProvider. - The provider returns a
Run. We insert intorunsand start consuming the event stream. - Each
SDKMessagebecomes arun_eventsrow. Subscribed clients see the run page light up live. - On completion, we record final status, cost, git metadata, and dispose the agent handle.
What we explicitly don't do
We don't sandbox skills (the runtime is the boundary). We don't ship a Docker image for the pool (the HTTP contract is the artifact). We don't run our own queue (Vercel cron suffices for v1). Each of these has a decision page; see /wiki/decisions/*.
Why this works
The hard part of an agent platform isn't talking to one model — it's giving teams a shared, audited, role-aware surface where humans and agents see the same world. Clip pushes that into the boring tier (Postgres + RLS + a thin UI) and treats the actual model execution as a swappable backend. Adding a new runtime is one file. Adding a new skill is one row.