Files
clawdbot/docs/gateway.md
2025-12-09 16:28:26 +00:00

6.9 KiB

Gateway (daemon) runbook

Last updated: 2025-12-09

What it is

  • The always-on process that owns the single Baileys/Telegram connection and the control/event plane.
  • Replaces the legacy relay command. CLI entry point: clawdis gateway.
  • Runs until stopped; exits non-zero on fatal errors so the supervisor restarts it.

How to run (local)

pnpm clawdis gateway --port 18789
# for full debug/trace logs in stdio:
pnpm clawdis gateway --port 18789 --verbose
# if the port is busy, terminate listeners then start:
pnpm clawdis gateway --force
  • Binds WebSocket control plane to 127.0.0.1:<port> (default 18789).
  • Logs to stdout; use launchd/systemd to keep it alive and rotate logs.
  • Pass --verbose to mirror debug logging from the log file into stdio when troubleshooting.
  • --force uses lsof to find listeners on the chosen port, sends SIGTERM, logs what it killed, then starts the gateway (fails fast if lsof is missing).
  • Optional shared secret: pass --token <value> or set CLAWDIS_GATEWAY_TOKEN to require clients to send hello.auth.token.

Remote access

  • Tailscale/VPN preferred; otherwise SSH tunnel:
    ssh -N -L 18789:127.0.0.1:18789 user@host
    
  • Clients then connect to ws://127.0.0.1:18789 through the tunnel.
  • If a token is configured, clients must include it in hello.auth.token even over the tunnel.

Protocol (operator view)

  • Mandatory first frame from client: hello {type:"hello", minProtocol, maxProtocol, client:{name,version,platform,mode,instanceId}, caps, auth?, locale?, userAgent? }.
  • Gateway replies hello-ok {type:"hello-ok", protocol:<chosen>, server:{version,commit,host,connId}, features:{methods,events}, snapshot:{presence[], health, stateVersion, uptimeMs}, policy:{maxPayload,maxBufferedBytes,tickIntervalMs} } or hello-error.
  • After handshake:
    • Requests: {type:"req", id, method, params}{type:"res", id, ok, payload|error}
    • Events: {type:"event", event, payload, seq?, stateVersion?}
  • Structured presence entries: {host, ip, version, mode, lastInputSeconds?, ts, reason?, tags?[], instanceId? }.
  • agent responses are two-stage: first res ack {runId,status:"accepted"}, then a final res {runId,status:"ok"|"error",summary} after the run finishes; streamed output arrives as event:"agent".

Methods (initial set)

  • health — full health snapshot (same shape as clawdis health --json).
  • status — short summary.
  • system-presence — current presence list.
  • system-event — post a presence/system note (structured).
  • send — send a message via the active provider(s).
  • agent — run an agent turn (streams events back on same connection).

Events

  • agent — streamed tool/output events from the agent run (seq-tagged).
  • presence — presence updates (deltas with stateVersion) pushed to all connected clients.
  • tick — periodic keepalive/no-op to confirm liveness.
  • shutdown — Gateway is exiting; payload includes reason and optional restartExpectedMs. Clients should reconnect.

WebChat integration

  • WebChat serves static assets locally (default port 18788, configurable).
  • The WebChat backend keeps a single WS connection to the Gateway for control/data; all sends and agent runs flow through that connection.
  • Remote use goes through the same SSH/Tailscale tunnel; if a gateway token is configured, WebChat must include it during hello.
  • macOS app also connects via this WS (one socket); it hydrates presence from the initial snapshot and listens for presence events to update the UI.

Typing and validation

  • Server validates every inbound frame with AJV against JSON Schema emitted from the protocol definitions.
  • Clients (TS/Swift) consume generated types (TS directly; Swift via quicktype from the JSON Schema).
  • Types live in src/gateway/protocol/*.ts; regenerate schemas/models with pnpm protocol:gen (writes dist/protocol.schema.json and apps/macos/Sources/ClawdisProtocol/Protocol.swift).

Connection snapshot

  • hello-ok includes a snapshot with presence, health, stateVersion, and uptimeMs plus policy {maxPayload,maxBufferedBytes,tickIntervalMs} so clients can render immediately without extra requests.
  • health/system-presence remain available for manual refresh, but are not required at connect time.

Error codes (res.error shape)

  • Errors use { code, message, details?, retryable?, retryAfterMs? }.
  • Standard codes:
    • NOT_LINKED — WhatsApp not authenticated.
    • AGENT_TIMEOUT — agent did not respond within the configured deadline.
    • INVALID_REQUEST — schema/param validation failed.
    • UNAVAILABLE — Gateway is shutting down or a dependency is unavailable.

Keepalive behavior

  • tick events (or WS ping/pong) are emitted periodically so clients know the Gateway is alive even when no traffic occurs.
  • Send/agent acknowledgements remain separate responses; do not overload ticks for sends.

Replay / gaps

  • Events are not replayed. Clients detect seq gaps and should refresh (health + system-presence) before continuing. WebChat and macOS clients now auto-refresh on gap.

Supervision (macOS example)

  • Use launchd to keep the daemon alive:
    • Program: path to clawdis
    • Arguments: gateway
    • KeepAlive: true
    • StandardOut/Err: file paths or syslog
  • On failure, launchd restarts; fatal misconfig should keep exiting so the operator notices.

Supervision (systemd example)

[Unit]
Description=Clawdis Gateway
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/usr/local/bin/clawdis gateway --port 18789
Restart=on-failure
RestartSec=5
User=clawdis
Environment=CLAWDIS_GATEWAY_TOKEN=
WorkingDirectory=/home/clawdis

[Install]
WantedBy=multi-user.target

Enable with systemctl enable --now clawdis-gateway.service.

Operational checks

  • Liveness: open WS and send hello → expect hello-ok (with snapshot).
  • Readiness: call health → expect ok: true and web.linked=true.
  • Debug: subscribe to tick and presence events; ensure status shows linked/auth age; presence entries show Gateway host and connected clients.

Safety guarantees

  • Only one Gateway per host; all sends/agent calls must go through it.
  • No fallback to direct Baileys connections; if the Gateway is down, sends fail fast.
  • Non-hello first frames or malformed JSON are rejected and the socket is closed.
  • Graceful shutdown: emit shutdown event before closing; clients must handle close + reconnect.

CLI helpers

  • clawdis gw:health / gw:status — request health/status over the Gateway WS.
  • clawdis gw:send --to <num> --message "hi" [--media-url ...] — send via Gateway (idempotent).
  • clawdis gw:agent --message "hi" [--to ...] — run an agent turn (waits for final by default).
  • clawdis gw:call <method> --params '{"k":"v"}' — raw method invoker for debugging.

Migration guidance

  • Retire uses of clawdis relay and the legacy TCP control port.
  • Update clients to speak the WS protocol with mandatory hello and structured presence.