381 lines
13 KiB
Markdown
381 lines
13 KiB
Markdown
---
|
||
summary: "Clawnet refactor: unify network protocol, roles, auth, approvals, identity"
|
||
read_when:
|
||
- Planning a unified network protocol for nodes + operator clients
|
||
- Reworking approvals, pairing, TLS, and presence across devices
|
||
---
|
||
# Clawnet refactor (protocol + auth unification)
|
||
|
||
## Hi
|
||
Hi Peter — great direction; this unlocks simpler UX + stronger security.
|
||
|
||
## Purpose
|
||
Single, rigorous document for:
|
||
- Current state: protocols, flows, trust boundaries.
|
||
- Pain points: approvals, multi‑hop routing, UI duplication.
|
||
- Proposed new state: one protocol, scoped roles, unified auth/pairing, TLS pinning.
|
||
- Identity model: stable IDs + cute slugs.
|
||
- Migration plan, risks, open questions.
|
||
|
||
## Goals (from discussion)
|
||
- One protocol for all clients (mac app, CLI, iOS, Android, headless node).
|
||
- Every network participant authenticated + paired.
|
||
- Role clarity: nodes vs operators.
|
||
- Central approvals routed to where the user is.
|
||
- TLS encryption + optional pinning for all remote traffic.
|
||
- Minimal code duplication.
|
||
- Single machine should appear once (no UI/node duplicate entry).
|
||
|
||
## Non‑goals (explicit)
|
||
- Remove capability separation (still need least‑privilege).
|
||
- Expose full gateway control plane without scope checks.
|
||
- Make auth depend on human labels (slugs remain non‑security).
|
||
|
||
---
|
||
|
||
# Current state (as‑is)
|
||
|
||
## Two protocols
|
||
|
||
### 1) Gateway WebSocket (control plane)
|
||
- Full API surface: config, channels, models, sessions, agent runs, logs, nodes, etc.
|
||
- Default bind: loopback. Remote access via SSH/Tailscale.
|
||
- Auth: token/password via `connect`.
|
||
- No TLS pinning (relies on loopback/tunnel).
|
||
- Code:
|
||
- `src/gateway/server/ws-connection/message-handler.ts`
|
||
- `src/gateway/client.ts`
|
||
- `docs/gateway/protocol.md`
|
||
|
||
### 2) Bridge (node transport)
|
||
- Narrow allowlist surface, node identity + pairing.
|
||
- JSONL over TCP; optional TLS + cert fingerprint pinning.
|
||
- TLS advertises fingerprint in discovery TXT.
|
||
- Code:
|
||
- `src/infra/bridge/server/connection.ts`
|
||
- `src/gateway/server-bridge.ts`
|
||
- `src/node-host/bridge-client.ts`
|
||
- `docs/gateway/bridge-protocol.md`
|
||
|
||
## Control plane clients today
|
||
- CLI → Gateway WS via `callGateway` (`src/gateway/call.ts`).
|
||
- macOS app UI → Gateway WS (`GatewayConnection`).
|
||
- Web Control UI → Gateway WS.
|
||
- ACP → Gateway WS.
|
||
- Browser control uses its own HTTP control server.
|
||
|
||
## Nodes today
|
||
- macOS app in node mode connects to Gateway bridge (`MacNodeBridgeSession`).
|
||
- iOS/Android apps connect to Gateway bridge.
|
||
- Pairing + per‑node token stored on gateway.
|
||
|
||
## Current approval flow (exec)
|
||
- Agent uses `system.run` via Gateway.
|
||
- Gateway invokes node over bridge.
|
||
- Node runtime decides approval.
|
||
- UI prompt shown by mac app (when node == mac app).
|
||
- Node returns `invoke-res` to Gateway.
|
||
- Multi‑hop, UI tied to node host.
|
||
|
||
## Presence + identity today
|
||
- Gateway presence entries from WS clients.
|
||
- Node presence entries from bridge.
|
||
- mac app can show two entries for same machine (UI + node).
|
||
- Node identity stored in pairing store; UI identity separate.
|
||
|
||
---
|
||
|
||
# Problems / pain points
|
||
|
||
- Two protocol stacks to maintain (WS + Bridge).
|
||
- Approvals on remote nodes: prompt appears on node host, not where user is.
|
||
- TLS pinning only exists for bridge; WS depends on SSH/Tailscale.
|
||
- Identity duplication: same machine shows as multiple instances.
|
||
- Ambiguous roles: UI + node + CLI capabilities not clearly separated.
|
||
|
||
---
|
||
|
||
# Proposed new state (Clawnet)
|
||
|
||
## One protocol, two roles
|
||
Single WS protocol with role + scope.
|
||
- **Role: node** (capability host)
|
||
- **Role: operator** (control plane)
|
||
- Optional **scope** for operator:
|
||
- `operator.read` (status + viewing)
|
||
- `operator.write` (agent run, sends)
|
||
- `operator.admin` (config, channels, models)
|
||
|
||
### Role behaviors
|
||
|
||
**Node**
|
||
- Can register capabilities (`caps`, `commands`, permissions).
|
||
- Can receive `invoke` commands (`system.run`, `camera.*`, `canvas.*`, `screen.record`, etc).
|
||
- Can send events: `voice.transcript`, `agent.request`, `chat.subscribe`.
|
||
- Cannot call config/models/channels/sessions/agent control plane APIs.
|
||
|
||
**Operator**
|
||
- Full control plane API, gated by scope.
|
||
- Receives all approvals.
|
||
- Does not directly execute OS actions; routes to nodes.
|
||
|
||
### Key rule
|
||
Role is per‑connection, not per device. A device may open both roles, separately.
|
||
|
||
---
|
||
|
||
# Unified authentication + pairing
|
||
|
||
## Client identity
|
||
Every client provides:
|
||
- `deviceId` (stable, derived from device key).
|
||
- `displayName` (human name).
|
||
- `role` + `scope` + `caps` + `commands`.
|
||
|
||
## Pairing flow (unified)
|
||
- Client connects unauthenticated.
|
||
- Gateway creates a **pairing request** for that `deviceId`.
|
||
- Operator receives prompt; approves/denies.
|
||
- Gateway issues credentials bound to:
|
||
- device public key
|
||
- role(s)
|
||
- scope(s)
|
||
- capabilities/commands
|
||
- Client persists token, reconnects authenticated.
|
||
|
||
## Device‑bound auth (avoid bearer token replay)
|
||
Preferred: device keypairs.
|
||
- Device generates keypair once.
|
||
- `deviceId = fingerprint(publicKey)`.
|
||
- Gateway sends nonce; device signs; gateway verifies.
|
||
- Tokens are issued to a public key (proof‑of‑possession), not a string.
|
||
|
||
Alternatives:
|
||
- mTLS (client certs): strongest, more ops complexity.
|
||
- Short‑lived bearer tokens only as a temporary phase (rotate + revoke early).
|
||
|
||
## Silent approval (SSH heuristic)
|
||
Define it precisely to avoid a weak link. Prefer one:
|
||
- **Local‑only**: auto‑pair when client connects via loopback/Unix socket.
|
||
- **Challenge via SSH**: gateway issues nonce; client proves SSH by fetching it.
|
||
- **Physical presence window**: after a local approval on gateway host UI, allow auto‑pair for a short window (e.g. 10 minutes).
|
||
|
||
Always log + record auto‑approvals.
|
||
|
||
---
|
||
|
||
# TLS everywhere (dev + prod)
|
||
|
||
## Reuse existing bridge TLS
|
||
Use current TLS runtime + fingerprint pinning:
|
||
- `src/infra/bridge/server/tls.ts`
|
||
- fingerprint verification logic in `src/node-host/bridge-client.ts`
|
||
|
||
## Apply to WS
|
||
- WS server supports TLS with same cert/key + fingerprint.
|
||
- WS clients can pin fingerprint (optional).
|
||
- Discovery advertises TLS + fingerprint for all endpoints.
|
||
- Discovery is locator hints only; never a trust anchor.
|
||
|
||
## Why
|
||
- Reduce reliance on SSH/Tailscale for confidentiality.
|
||
- Make remote mobile connections safe by default.
|
||
|
||
---
|
||
|
||
# Approvals redesign (centralized)
|
||
|
||
## Current
|
||
Approval happens on node host (mac app node runtime). Prompt appears where node runs.
|
||
|
||
## Proposed
|
||
Approval is **gateway‑hosted**, UI delivered to operator clients.
|
||
|
||
### New flow
|
||
1) Gateway receives `system.run` intent (agent).
|
||
2) Gateway creates approval record: `approval.requested`.
|
||
3) Operator UI(s) show prompt.
|
||
4) Approval decision sent to gateway: `approval.resolve`.
|
||
5) Gateway invokes node command if approved.
|
||
6) Node executes, returns `invoke-res`.
|
||
|
||
### Approval semantics (hardening)
|
||
- Broadcast to all operators; only the active UI shows a modal (others get a toast).
|
||
- First resolution wins; gateway rejects subsequent resolves as already settled.
|
||
- Default timeout: deny after N seconds (e.g. 60s), log reason.
|
||
- Resolution requires `operator.approvals` scope.
|
||
|
||
## Benefits
|
||
- Prompt appears where user is (mac/phone).
|
||
- Consistent approvals for remote nodes.
|
||
- Node runtime stays headless; no UI dependency.
|
||
|
||
---
|
||
|
||
# Role clarity examples
|
||
|
||
## iPhone app
|
||
- **Node role** for: mic, camera, voice chat, location, push‑to‑talk.
|
||
- Optional **operator.read** for status and chat view.
|
||
- Optional **operator.write/admin** only when explicitly enabled.
|
||
|
||
## macOS app
|
||
- Operator role by default (control UI).
|
||
- Node role when “Mac node” enabled (system.run, screen, camera).
|
||
- Same deviceId for both connections → merged UI entry.
|
||
|
||
## CLI
|
||
- Operator role always.
|
||
- Scope derived by subcommand:
|
||
- `status`, `logs` → read
|
||
- `agent`, `message` → write
|
||
- `config`, `channels` → admin
|
||
- approvals + pairing → `operator.approvals` / `operator.pairing`
|
||
|
||
---
|
||
|
||
# Identity + slugs
|
||
|
||
## Stable ID
|
||
Required for auth; never changes.
|
||
Preferred:
|
||
- Keypair fingerprint (public key hash).
|
||
|
||
## Cute slug (lobster‑themed)
|
||
Human label only.
|
||
- Example: `scarlet-claw`, `saltwave`, `mantis-pinch`.
|
||
- Stored in gateway registry, editable.
|
||
- Collision handling: `-2`, `-3`.
|
||
|
||
## UI grouping
|
||
Same `deviceId` across roles → single “Instance” row:
|
||
- Badge: `operator`, `node`.
|
||
- Shows capabilities + last seen.
|
||
|
||
---
|
||
|
||
# Migration strategy
|
||
|
||
## Phase 0: Document + align
|
||
- Publish this doc.
|
||
- Inventory all protocol calls + approval flows.
|
||
|
||
## Phase 1: Add roles/scopes to WS
|
||
- Extend `connect` params with `role`, `scope`, `deviceId`.
|
||
- Add allowlist gating for node role.
|
||
|
||
## Phase 2: Bridge compatibility
|
||
- Keep bridge running.
|
||
- Add WS node support in parallel.
|
||
- Gate features behind config flag.
|
||
|
||
## Phase 3: Central approvals
|
||
- Add approval request + resolve events in WS.
|
||
- Update mac app UI to prompt + respond.
|
||
- Node runtime stops prompting UI.
|
||
|
||
## Phase 4: TLS unification
|
||
- Add TLS config for WS using bridge TLS runtime.
|
||
- Add pinning to clients.
|
||
|
||
## Phase 5: Deprecate bridge
|
||
- Migrate iOS/Android/mac node to WS.
|
||
- Keep bridge as fallback; remove once stable.
|
||
|
||
## Phase 6: Device‑bound auth
|
||
- Require key‑based identity for all non‑local connections.
|
||
- Add revocation + rotation UI.
|
||
|
||
---
|
||
|
||
# Execution checklist (ship order)
|
||
- [x] **Device‑bound auth (PoP):** nonce challenge + signature verify on connect; remove bearer‑only for non‑local.
|
||
- [ ] **Role‑scoped creds:** issue per‑role tokens, rotate, revoke, list; UI/CLI surfaced; audit log entries.
|
||
- [ ] **Scope enforcement:** keep paired scopes in sync on rotation; reject/upgrade flows explicit; tests.
|
||
- [ ] **Approvals routing:** gateway‑hosted approvals; operator UI prompt/resolve; node stops prompting.
|
||
- [ ] **TLS pinning for WS:** reuse bridge TLS runtime; discovery advertises fingerprint; client validation.
|
||
- [ ] **Discovery + allowlist:** WS discovery TXT includes TLS fingerprint + role hints; node commands filtered by server allowlist.
|
||
- [ ] **Presence unification:** dedupe deviceId across roles; include role/scope metadata; “single instance row”.
|
||
- [ ] **Docs + examples:** protocol doc, CLI docs, onboarding + security notes; no personal hostnames.
|
||
- [ ] **Test coverage:** connect auth paths, rotation/revoke, approvals, TLS fingerprint mismatch, presence.
|
||
|
||
Process per item:
|
||
- Do implementation.
|
||
- Fresh‑eyes review (scan for regressions + missing tests).
|
||
- Fix issues.
|
||
- Commit with Conventional Commit.
|
||
- Move to next item.
|
||
|
||
---
|
||
|
||
# Security notes
|
||
|
||
- Role/allowlist enforced at gateway boundary.
|
||
- No client gets “full” API without operator scope.
|
||
- Pairing required for *all* connections.
|
||
- TLS + pinning reduces MITM risk for mobile.
|
||
- SSH silent approval is a convenience; still recorded + revocable.
|
||
- Discovery is never a trust anchor.
|
||
- Capability claims are verified against server allowlists by platform/type.
|
||
|
||
# Streaming + large payloads (node media)
|
||
WS control plane is fine for small messages, but nodes also do:
|
||
- camera clips
|
||
- screen recordings
|
||
- audio streams
|
||
|
||
Options:
|
||
1) WS binary frames + chunking + backpressure rules.
|
||
2) Separate streaming endpoint (still TLS + auth).
|
||
3) Keep bridge longer for media‑heavy commands, migrate last.
|
||
|
||
Pick one before implementation to avoid drift.
|
||
|
||
# Capability + command policy
|
||
- Node‑reported caps/commands are treated as **claims**.
|
||
- Gateway enforces per‑platform allowlists.
|
||
- Any new command requires operator approval or explicit allowlist change.
|
||
- Audit changes with timestamps.
|
||
|
||
# Audit + rate limiting
|
||
- Log: pairing requests, approvals/denials, token issuance/rotation/revocation.
|
||
- Rate‑limit pairing spam and approval prompts.
|
||
|
||
# Protocol hygiene
|
||
- Explicit protocol version + error codes.
|
||
- Reconnect rules + heartbeat policy.
|
||
- Presence TTL and last‑seen semantics.
|
||
|
||
---
|
||
|
||
# Open questions
|
||
|
||
1) Single device running both roles: token model
|
||
- Recommend separate tokens per role (node vs operator).
|
||
- Same deviceId; different scopes; clearer revocation.
|
||
|
||
2) Operator scope granularity
|
||
- read/write/admin + approvals + pairing (minimum viable).
|
||
- Consider per‑feature scopes later.
|
||
|
||
3) Token rotation + revocation UX
|
||
- Auto‑rotate on role change.
|
||
- UI to revoke by deviceId + role.
|
||
|
||
4) Discovery
|
||
- Extend current Bonjour TXT to include WS TLS fingerprint + role hints.
|
||
- Treat as locator hints only.
|
||
|
||
5) Cross‑network approval
|
||
- Broadcast to all operator clients; active UI shows modal.
|
||
- First response wins; gateway enforces atomicity.
|
||
|
||
---
|
||
|
||
# Summary (TL;DR)
|
||
|
||
- Today: WS control plane + Bridge node transport.
|
||
- Pain: approvals + duplication + two stacks.
|
||
- Proposal: one WS protocol with explicit roles + scopes, unified pairing + TLS pinning, gateway‑hosted approvals, stable device IDs + cute slugs.
|
||
- Outcome: simpler UX, stronger security, less duplication, better mobile routing.
|