8.1 KiB
8.1 KiB
summary, read_when
| summary | read_when | |
|---|---|---|
| WebSocket gateway architecture, components, and client flows |
|
Gateway Architecture
Last updated: 2026-01-05
Overview
- A single long-lived Gateway process owns all messaging surfaces (WhatsApp via Baileys, Telegram via grammY, Slack via Bolt, Discord via discord.js, Signal via signal-cli, iMessage via imsg, WebChat) and the control/event plane.
- All clients (macOS app, CLI, web UI, automations) connect to the Gateway over one transport: WebSocket on the configured bind host (default
127.0.0.1:18789; tunnel or VPN for remote). - One Gateway per host; it is the only place that is allowed to open a WhatsApp session. All sends/agent runs go through it.
- By default: the Gateway exposes a Canvas host on
canvasHost.port(default18793), serving~/clawd/canvasat/__clawdbot__/canvas/with live-reload; disable viacanvasHost.enabled=falseorCLAWDBOT_SKIP_CANVAS_HOST=1.
Implementation snapshot (current code)
TypeScript Gateway (src/gateway/server.ts)
- Single HTTP + WebSocket server (default
18789); bind policyloopback|lan|tailnet|auto. Refuses non-loopback binds without auth; Tailscale serve/funnel requires loopback. - Handshake: first frame must be a
connectrequest; AJV validates request + params against TypeBox schemas; protocol negotiated viaminProtocol/maxProtocol. hello-okincludes snapshot (presence/health/stateVersion/uptime/configPath/stateDir), features (methods/events), policy (max payload/buffer/tick), andcanvasHostUrlwhen available.- Events emitted:
agent,chat,presence,tick,health,heartbeat,cron,talk.mode,node.pair.requested,node.pair.resolved,voicewake.changed,shutdown. - Idempotency keys are required for
send,agent,chat.send, and node invokes; the dedupe cache avoids double-sends on reconnects. Payload sizes are capped per connection. - Optional node bridge (
src/infra/bridge/server.ts): TCP JSONL frames (hello,pair-request,req/res,event,invoke,ping). Node connect/disconnect updates presence and flows into the session bus. - Control UI + Canvas host: HTTP serves Control UI (base path configurable) and can host the A2UI canvas via
src/canvas-host/server.ts(live reload). Canvas host URL is advertised to nodes + clients.
iOS node (apps/ios)
- Discovery + pairing:
BridgeDiscoveryModelusesNWBrowserBonjour discovery and reads TXT fields for LAN/tailnet host hints plus gateway/bridge/canvas ports. - Auto-connect:
BridgeConnectionControlleruses storednode.instanceId+ Keychain token; supports manual host/port; performspair-and-hello. - Bridge runtime:
BridgeSessionactor owns anNWConnection, JSONL frames,hello/hello-ok, ping/pong,req/res, serverevents, andinvokecallbacks; storescanvasHostUrl. - Commands:
NodeAppModelexecutescanvas.*,canvas.a2ui.*,camera.*,screen.record,location.get. Canvas/camera/screen are blocked when backgrounded. - Canvas + actions:
WKWebViewwith A2UI action bridge; accepts actions from local-network or trusted file URLs; interceptsclawdbot://deep links and forwardsagent.requestto the bridge. - Voice/talk: voice wake sends
voice.transcriptevents and syncs triggers viavoicewake.get+voicewake.changed; Talk Mode attaches to the bridge.
Components and flows
- Gateway (daemon)
- Maintains WhatsApp (Baileys), Telegram (grammY), Slack (Bolt), Discord (discord.js), Signal (signal-cli), and iMessage (imsg) connections.
- Exposes a typed WS API (req/resp + server push events).
- Validates every inbound frame against JSON Schema; rejects anything before a mandatory
connect.
- Clients (mac app / CLI / web admin)
- One WS connection per client.
- Send requests (
health,status,send,agent,system-presence, toggles) and subscribe to events (tick,agent,presence,shutdown). - On macOS, the app can also be invoked via deep links (
clawdbot://agent?...) which translate into the same Gatewayagentrequest path (seedocs/clawdbot-mac.md).
- Agent process (Pi)
- Spawned by the Gateway on demand for
agentcalls; streams events back over the same WS connection.
- Spawned by the Gateway on demand for
- WebChat
- Serves static assets locally.
- Holds a single WS connection to the Gateway for control/data; all sends/agent runs go through the Gateway WS.
- Remote use goes through the same SSH/Tailscale tunnel as other clients.
Connection lifecycle (single client)
Client Gateway
| |
|---- req:connect -------->|
|<------ res (ok) ---------| (or res error + close)
| (payload=hello-ok carries snapshot: presence + health)
| |
|<------ event:presence ---| (deltas)
|<------ event:tick -------| (keepalive/no-op)
| |
|------- req:agent ------->|
|<------ res:agent --------| (ack: {runId,status:"accepted"})
|<------ event:agent ------| (streaming)
|<------ res:agent --------| (final: {runId,status,summary})
| |
Wire protocol (summary)
- Transport: WebSocket, text frames with JSON payloads.
- First frame must be
req {type:"req", id, method:"connect", params:{minProtocol, maxProtocol, client:{name,version,platform,mode,instanceId}, caps, auth?, locale?, userAgent? } }. - Server replies
res {type:"res", id, ok:true, payload: hello-ok }orok:falsethen closes. - After handshake:
- Requests:
{type:"req", id, method, params}→{type:"res", id, ok, payload|error} - Events:
{type:"event", event:"agent"|"presence"|"tick"|"shutdown", payload, seq?, stateVersion?}
- Requests:
- If
CLAWDBOT_GATEWAY_TOKEN(or--token) is set,connect.params.auth.tokenmust match; otherwise the socket closes with policy violation. - Presence payload is structured, not free text:
{host, ip, version, mode, lastInputSeconds?, ts, reason?, tags?[], instanceId? }. - Agent runs are acked
{runId,status:"accepted"}then complete with a final res{runId,status,summary}; streamed output arrives asevent:"agent". - Protocol versions are bumped on breaking changes; clients must match
minClient; Gateway chooses within client’s min/max. - Idempotency keys are required for side-effecting methods (
send,agent) to safely retry; server keeps a short-lived dedupe cache. - Policy in
hello-okcommunicates payload/queue limits and tick interval.
Type system and codegen
- Source of truth: TypeBox (or ArkType) definitions in
protocol/on the server. - Build step emits JSON Schema.
- Clients:
- TypeScript: uses the same TypeBox types directly.
- Swift: generated
Codablemodels via quicktype from the JSON Schema.
- Validation: AJV on the server for every inbound frame; optional client-side validation for defensive programming.
Invariants
- Exactly one Gateway controls a single Baileys session per host. No fallbacks to ad-hoc direct Baileys sends.
- Handshake is mandatory; any non-JSON or non-connect first frame is a hard close.
- All methods and events are versioned; new fields are additive; breaking changes increment
protocol. - No event replay: on seq gaps, clients must refresh (
health+system-presence) and continue; presence is bounded via TTL/max entries.
Remote access
- Preferred: Tailscale or VPN; alternate: SSH tunnel
ssh -N -L 18789:127.0.0.1:18789 user@host. - Same protocol over the tunnel; same handshake. If a shared token is configured, clients must send it in
connect.params.auth.tokeneven over the tunnel. - Same protocol over the tunnel; same handshake. If a shared token is configured, clients must send it in
connect.params.auth.tokeneven over the tunnel.
Operations snapshot
- Start:
clawdbot gateway(foreground, logs to stdout).
Supervise with launchd/systemd for restarts. - Health: request
healthover WS; also surfaced inhello-ok.health. - Metrics/logging: keep outside this spec; gateway should expose Prometheus text or structured logs separately.
Migration notes
- This architecture supersedes the legacy stdin RPC and the ad-hoc TCP control port. New clients should speak only the WS protocol. Legacy compatibility is intentionally dropped.