Files
clawdbot/docs/gateway.md
2025-12-25 18:44:23 +00:00

152 lines
8.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
summary: "Runbook for the Gateway daemon, lifecycle, and operations"
read_when:
- Running or debugging the gateway process
---
# Gateway (daemon) runbook
Last updated: 2025-12-09
## What it is
- The always-on process that owns the single Baileys/Telegram connection and the control/event plane.
- Replaces the legacy `gateway` command. CLI entry point: `clawdis gateway`.
- Runs until stopped; exits non-zero on fatal errors so the supervisor restarts it.
## How to run (local)
```bash
pnpm clawdis gateway --port 18789
# for full debug/trace logs in stdio:
pnpm clawdis gateway --port 18789 --verbose
# if the port is busy, terminate listeners then start:
pnpm clawdis gateway --force
# dev loop (auto-reload on TS changes):
pnpm gateway:watch
```
- Binds WebSocket control plane to `127.0.0.1:<port>` (default 18789).
- Starts a Canvas file server by default on `canvasHost.port` (default `18793`), serving `http://<gateway-host>:18793/__clawdis__/canvas/` from `~/clawd/canvas`. Disable with `canvasHost.enabled=false` or `CLAWDIS_SKIP_CANVAS_HOST=1`.
- Logs to stdout; use launchd/systemd to keep it alive and rotate logs.
- Pass `--verbose` to mirror debug logging (handshakes, req/res, events) from the log file into stdio when troubleshooting.
- `--force` uses `lsof` to find listeners on the chosen port, sends SIGTERM, logs what it killed, then starts the gateway (fails fast if `lsof` is missing).
- If you run under a supervisor (launchd/systemd/mac app child-process mode), a stop/restart typically sends **SIGTERM**; older builds may surface this as `pnpm` `ELIFECYCLE` exit code **143** (SIGTERM), which is a normal shutdown, not a crash.
- **SIGUSR1** triggers an in-process restart (no external supervisor required). This is what the `clawdis_gateway` agent tool uses.
- Optional shared secret: pass `--token <value>` or set `CLAWDIS_GATEWAY_TOKEN` to require clients to send `connect.params.auth.token`.
## Remote access
- Tailscale/VPN preferred; otherwise SSH tunnel:
```bash
ssh -N -L 18789:127.0.0.1:18789 user@host
```
- Clients then connect to `ws://127.0.0.1:18789` through the tunnel.
- If a token is configured, clients must include it in `connect.params.auth.token` even over the tunnel.
## Protocol (operator view)
- Mandatory first frame from client: `req {type:"req", id, method:"connect", params:{minProtocol,maxProtocol,client:{name,version,platform,deviceFamily?,modelIdentifier?,mode,instanceId}, caps, auth?, locale?, userAgent? } }`.
- Gateway replies `res {type:"res", id, ok:true, payload:hello-ok }` (or `ok:false` with an error, then closes).
- After handshake:
- Requests: `{type:"req", id, method, params}` → `{type:"res", id, ok, payload|error}`
- Events: `{type:"event", event, payload, seq?, stateVersion?}`
- Structured presence entries: `{host, ip, version, platform?, deviceFamily?, modelIdentifier?, mode, lastInputSeconds?, ts, reason?, tags?[], instanceId? }`.
- `agent` responses are two-stage: first `res` ack `{runId,status:"accepted"}`, then a final `res` `{runId,status:"ok"|"error",summary}` after the run finishes; streamed output arrives as `event:"agent"`.
## Methods (initial set)
- `health` — full health snapshot (same shape as `clawdis health --json`).
- `status` — short summary.
- `system-presence` — current presence list.
- `system-event` — post a presence/system note (structured).
- `send` — send a message via the active provider(s).
- `agent` — run an agent turn (streams events back on same connection).
- `node.list` — list paired + currently-connected bridge nodes (includes `caps`, `deviceFamily`, `modelIdentifier`, `paired`, `connected`, and advertised `commands`).
- `node.describe` — describe a node (capabilities + supported `node.invoke` commands; works for paired nodes and for currently-connected unpaired nodes).
- `node.invoke` — invoke a command on a node (e.g. `canvas.*`, `camera.*`).
- `node.pair.*` — pairing lifecycle (`request`, `list`, `approve`, `reject`, `verify`).
See also: `docs/presence.md` for how presence is produced/deduped and why `instanceId` matters.
## Events
- `agent` — streamed tool/output events from the agent run (seq-tagged).
- `presence` — presence updates (deltas with stateVersion) pushed to all connected clients.
- `tick` — periodic keepalive/no-op to confirm liveness.
- `shutdown` — Gateway is exiting; payload includes `reason` and optional `restartExpectedMs`. Clients should reconnect.
## WebChat integration
- WebChat is a native SwiftUI UI that talks directly to the Gateway WebSocket for history, sends, abort, and events.
- Remote use goes through the same SSH/Tailscale tunnel; if a gateway token is configured, the client includes it during `connect`.
- macOS app connects via a single WS (shared connection); it hydrates presence from the initial snapshot and listens for `presence` events to update the UI.
## Typing and validation
- Server validates every inbound frame with AJV against JSON Schema emitted from the protocol definitions.
- Clients (TS/Swift) consume generated types (TS directly; Swift via the repos generator).
- Types live in `src/gateway/protocol/*.ts`; regenerate schemas/models with `pnpm protocol:gen` (writes `dist/protocol.schema.json`) and `pnpm protocol:gen:swift` (writes `apps/macos/Sources/ClawdisProtocol/GatewayModels.swift`).
## Connection snapshot
- `hello-ok` includes a `snapshot` with `presence`, `health`, `stateVersion`, and `uptimeMs` plus `policy {maxPayload,maxBufferedBytes,tickIntervalMs}` so clients can render immediately without extra requests.
- `health`/`system-presence` remain available for manual refresh, but are not required at connect time.
## Error codes (res.error shape)
- Errors use `{ code, message, details?, retryable?, retryAfterMs? }`.
- Standard codes:
- `NOT_LINKED` — WhatsApp not authenticated.
- `AGENT_TIMEOUT` — agent did not respond within the configured deadline.
- `INVALID_REQUEST` — schema/param validation failed.
- `UNAVAILABLE` — Gateway is shutting down or a dependency is unavailable.
## Keepalive behavior
- `tick` events (or WS ping/pong) are emitted periodically so clients know the Gateway is alive even when no traffic occurs.
- Send/agent acknowledgements remain separate responses; do not overload ticks for sends.
## Replay / gaps
- Events are not replayed. Clients detect seq gaps and should refresh (`health` + `system-presence`) before continuing. WebChat and macOS clients now auto-refresh on gap.
## Supervision (macOS example)
- Use launchd to keep the daemon alive:
- Program: path to `clawdis`
- Arguments: `gateway`
- KeepAlive: true
- StandardOut/Err: file paths or `syslog`
- On failure, launchd restarts; fatal misconfig should keep exiting so the operator notices.
Bundled mac app:
- Clawdis.app can bundle a bun-compiled gateway binary and install a per-user LaunchAgent labeled `com.steipete.clawdis.gateway`.
## Supervision (systemd example)
```
[Unit]
Description=Clawdis Gateway
After=network-online.target
Wants=network-online.target
[Service]
ExecStart=/usr/local/bin/clawdis gateway --port 18789
Restart=on-failure
RestartSec=5
User=clawdis
Environment=CLAWDIS_GATEWAY_TOKEN=
WorkingDirectory=/home/clawdis
[Install]
WantedBy=multi-user.target
```
Enable with `systemctl enable --now clawdis-gateway.service`.
## Operational checks
- Liveness: open WS and send `req:connect` → expect `res` with `payload.type="hello-ok"` (with snapshot).
- Readiness: call `health` → expect `ok: true` and `web.linked=true`.
- Debug: subscribe to `tick` and `presence` events; ensure `status` shows linked/auth age; presence entries show Gateway host and connected clients.
## Safety guarantees
- Only one Gateway per host; all sends/agent calls must go through it.
- No fallback to direct Baileys connections; if the Gateway is down, sends fail fast.
- Non-connect first frames or malformed JSON are rejected and the socket is closed.
- Graceful shutdown: emit `shutdown` event before closing; clients must handle close + reconnect.
## CLI helpers
- `clawdis gateway health|status` — request health/status over the Gateway WS.
- `clawdis gateway send --to <num> --message "hi" [--media-url ...]` — send via Gateway (idempotent).
- `clawdis gateway agent --message "hi" [--to ...]` — run an agent turn (waits for final by default).
- `clawdis gateway call <method> --params '{"k":"v"}'` — raw method invoker for debugging.
- Gateway helper subcommands assume a running gateway on `--url`; they no longer auto-spawn one.
## Migration guidance
- Retire uses of `clawdis gateway` and the legacy TCP control port.
- Update clients to speak the WS protocol with mandatory connect and structured presence.