Architecture

The core idea

Clip is a thin orchestration layer between three things that already exist: a workspace database (Supabase), an agent runtime (Cursor SDK + bridge + pool), and a content directory (skills.sh / companies.sh). Everything else is glue and UI.

If you read this page and remember nothing else, remember the shape:

              humans                       agents
                 │                            │
                 ▼                            ▼
            ┌─────────────────────────────────────┐
            │     Next.js workspace shell         │
            │   (auth, RBAC, board, run viewer)   │
            └────────────┬────────────────────────┘
                         │
        ┌────────────────┼────────────────────┐
        ▼                ▼                    ▼
   Supabase         RuntimeProvider     Skills/Companies
   (DB + Realtime  (cloud / local /     (skills.sh,
    + Auth + RLS)   pool — same iface)   companies.sh)

The pieces

Supabase is the source of truth. Workspaces, members, roles, agents, goals, tickets, runs, run_events, audit. RLS keeps tenants apart; Realtime pushes ticket and run updates to subscribed clients.

RuntimeProvider is a single TypeScript interface with three implementations. Cloud calls Agent.create() from @cursor/sdk against Cursor-hosted VMs. Local talks to a tiny bridge that piggybacks on the user's already-running Cursor. Pool is "any URL that speaks the same shape." All three emit SDKMessage events; we persist them to run_events and stream to the client over Supabase Realtime.

Skills are installable units pinned by owner/repo and a version. We bind them to roles, not directly to agents — that way "CTO" is a portable description that includes its capabilities, and changing a role propagates.

Companies are manifests imported from companies.sh: roles, default agents, goals template, MCP servers, skill bindings. Forking creates rows in your workspace. We don't publish back yet.

Request flow for a single run

  1. User clicks "Summon agent on this ticket" in the UI.
  2. Server checks RBAC — does this member's role have summon_agent?
  3. We compose the agent's skill bundle: agent.skill_refs ∪ role.skill_refs.
  4. We pick a runtime (UI default agent.runtime_pref, overridable per summon) and call the matching RuntimeProvider.
  5. The provider returns a Run. We insert into runs and start consuming the event stream.
  6. Each SDKMessage becomes a run_events row. Subscribed clients see the run page light up live.
  7. On completion, we record final status, cost, git metadata, and dispose the agent handle.

What we explicitly don't do

We don't sandbox skills (the runtime is the boundary). We don't ship a Docker image for the pool (the HTTP contract is the artifact). We don't run our own queue (Vercel cron suffices for v1). Each of these has a decision page; see /wiki/decisions/*.

Why this works

The hard part of an agent platform isn't talking to one model — it's giving teams a shared, audited, role-aware surface where humans and agents see the same world. Clip pushes that into the boring tier (Postgres + RLS + a thin UI) and treats the actual model execution as a swappable backend. Adding a new runtime is one file. Adding a new skill is one row.