9.5 KiB
9.5 KiB
summary, read_when
| summary | read_when | |
|---|---|---|
| Runbook for the Gateway daemon, lifecycle, and operations |
|
Gateway (daemon) runbook
Last updated: 2025-12-09
What it is
- The always-on process that owns the single Baileys/Telegram connection and the control/event plane.
- Replaces the legacy
gatewaycommand. CLI entry point:clawdbot gateway. - Runs until stopped; exits non-zero on fatal errors so the supervisor restarts it.
How to run (local)
pnpm clawdbot gateway --port 18789
# for full debug/trace logs in stdio:
pnpm clawdbot gateway --port 18789 --verbose
# if the port is busy, terminate listeners then start:
pnpm clawdbot gateway --force
# dev loop (auto-reload on TS changes):
pnpm gateway:watch
- Config hot reload watches
~/.clawdbot/clawdbot.json(orCLAWDBOT_CONFIG_PATH).- Default mode:
gateway.reload.mode="hybrid"(hot-apply safe changes, restart on critical). - Hot reload uses in-process restart via SIGUSR1 when needed.
- Disable with
gateway.reload.mode="off".
- Default mode:
- Binds WebSocket control plane to
127.0.0.1:<port>(default 18789). - The same port also serves HTTP (control UI, hooks, A2UI). Single-port multiplex.
- Starts a Canvas file server by default on
canvasHost.port(default18793), servinghttp://<gateway-host>:18793/__clawdbot__/canvas/from~/clawd/canvas. Disable withcanvasHost.enabled=falseorCLAWDBOT_SKIP_CANVAS_HOST=1. - Logs to stdout; use launchd/systemd to keep it alive and rotate logs.
- Pass
--verboseto mirror debug logging (handshakes, req/res, events) from the log file into stdio when troubleshooting. --forceuseslsofto find listeners on the chosen port, sends SIGTERM, logs what it killed, then starts the gateway (fails fast iflsofis missing).- If you run under a supervisor (launchd/systemd/mac app child-process mode), a stop/restart typically sends SIGTERM; older builds may surface this as
pnpmELIFECYCLEexit code 143 (SIGTERM), which is a normal shutdown, not a crash. - SIGUSR1 triggers an in-process restart (no external supervisor required). This is what the
gatewayagent tool uses. - Optional shared secret: pass
--token <value>or setCLAWDBOT_GATEWAY_TOKENto require clients to sendconnect.params.auth.token. - Port precedence:
--port>CLAWDBOT_GATEWAY_PORT>gateway.port> default18789.
Remote access
- Tailscale/VPN preferred; otherwise SSH tunnel:
ssh -N -L 18789:127.0.0.1:18789 user@host - Clients then connect to
ws://127.0.0.1:18789through the tunnel. - If a token is configured, clients must include it in
connect.params.auth.tokeneven over the tunnel.
Multiple gateways (same host)
Supported if you isolate state + config and use unique ports.
Checklist per instance:
- unique
gateway.port - unique
CLAWDBOT_CONFIG_PATH - unique
CLAWDBOT_STATE_DIR - unique
agent.workspace - separate WhatsApp numbers (if using WA)
Example:
CLAWDBOT_CONFIG_PATH=~/.clawdbot/a.json CLAWDBOT_STATE_DIR=~/.clawdbot-a clawdbot gateway --port 19001
CLAWDBOT_CONFIG_PATH=~/.clawdbot/b.json CLAWDBOT_STATE_DIR=~/.clawdbot-b clawdbot gateway --port 19002
Protocol (operator view)
- Mandatory first frame from client:
req {type:"req", id, method:"connect", params:{minProtocol,maxProtocol,client:{name,version,platform,deviceFamily?,modelIdentifier?,mode,instanceId}, caps, auth?, locale?, userAgent? } }. - Gateway replies
res {type:"res", id, ok:true, payload:hello-ok }(orok:falsewith an error, then closes). - After handshake:
- Requests:
{type:"req", id, method, params}→{type:"res", id, ok, payload|error} - Events:
{type:"event", event, payload, seq?, stateVersion?}
- Requests:
- Structured presence entries:
{host, ip, version, platform?, deviceFamily?, modelIdentifier?, mode, lastInputSeconds?, ts, reason?, tags?[], instanceId? }. agentresponses are two-stage: firstresack{runId,status:"accepted"}, then a finalres{runId,status:"ok"|"error",summary}after the run finishes; streamed output arrives asevent:"agent".
Methods (initial set)
health— full health snapshot (same shape asclawdbot health --json).status— short summary.system-presence— current presence list.system-event— post a presence/system note (structured).send— send a message via the active provider(s).agent— run an agent turn (streams events back on same connection).node.list— list paired + currently-connected bridge nodes (includescaps,deviceFamily,modelIdentifier,paired,connected, and advertisedcommands).node.describe— describe a node (capabilities + supportednode.invokecommands; works for paired nodes and for currently-connected unpaired nodes).node.invoke— invoke a command on a node (e.g.canvas.*,camera.*).node.pair.*— pairing lifecycle (request,list,approve,reject,verify).
See also: docs/presence.md for how presence is produced/deduped and why instanceId matters.
Events
agent— streamed tool/output events from the agent run (seq-tagged).presence— presence updates (deltas with stateVersion) pushed to all connected clients.tick— periodic keepalive/no-op to confirm liveness.shutdown— Gateway is exiting; payload includesreasonand optionalrestartExpectedMs. Clients should reconnect.
WebChat integration
- WebChat is a native SwiftUI UI that talks directly to the Gateway WebSocket for history, sends, abort, and events.
- Remote use goes through the same SSH/Tailscale tunnel; if a gateway token is configured, the client includes it during
connect. - macOS app connects via a single WS (shared connection); it hydrates presence from the initial snapshot and listens for
presenceevents to update the UI.
Typing and validation
- Server validates every inbound frame with AJV against JSON Schema emitted from the protocol definitions.
- Clients (TS/Swift) consume generated types (TS directly; Swift via the repo’s generator).
- Types live in
src/gateway/protocol/*.ts; regenerate schemas/models withpnpm protocol:gen(writesdist/protocol.schema.json) andpnpm protocol:gen:swift(writesapps/macos/Sources/ClawdbotProtocol/GatewayModels.swift).
Connection snapshot
hello-okincludes asnapshotwithpresence,health,stateVersion, anduptimeMspluspolicy {maxPayload,maxBufferedBytes,tickIntervalMs}so clients can render immediately without extra requests.health/system-presenceremain available for manual refresh, but are not required at connect time.
Error codes (res.error shape)
- Errors use
{ code, message, details?, retryable?, retryAfterMs? }. - Standard codes:
NOT_LINKED— WhatsApp not authenticated.AGENT_TIMEOUT— agent did not respond within the configured deadline.INVALID_REQUEST— schema/param validation failed.UNAVAILABLE— Gateway is shutting down or a dependency is unavailable.
Keepalive behavior
tickevents (or WS ping/pong) are emitted periodically so clients know the Gateway is alive even when no traffic occurs.- Send/agent acknowledgements remain separate responses; do not overload ticks for sends.
Replay / gaps
- Events are not replayed. Clients detect seq gaps and should refresh (
health+system-presence) before continuing. WebChat and macOS clients now auto-refresh on gap.
Supervision (macOS example)
- Use launchd to keep the daemon alive:
- Program: path to
clawdbot - Arguments:
gateway - KeepAlive: true
- StandardOut/Err: file paths or
syslog
- Program: path to
- On failure, launchd restarts; fatal misconfig should keep exiting so the operator notices.
Bundled mac app:
- Clawdbot.app can bundle a bun-compiled gateway binary and install a per-user LaunchAgent labeled
com.clawdbot.gateway.
Supervision (systemd example)
[Unit]
Description=Clawdbot Gateway
After=network-online.target
Wants=network-online.target
[Service]
ExecStart=/usr/local/bin/clawdbot gateway --port 18789
Restart=on-failure
RestartSec=5
User=clawdbot
Environment=CLAWDBOT_GATEWAY_TOKEN=
WorkingDirectory=/home/clawdbot
[Install]
WantedBy=multi-user.target
Enable with systemctl enable --now clawdbot-gateway.service.
Operational checks
- Liveness: open WS and send
req:connect→ expectreswithpayload.type="hello-ok"(with snapshot). - Readiness: call
health→ expectok: trueandweb.linked=true. - Debug: subscribe to
tickandpresenceevents; ensurestatusshows linked/auth age; presence entries show Gateway host and connected clients.
Safety guarantees
- Only one Gateway per host; all sends/agent calls must go through it.
- No fallback to direct Baileys connections; if the Gateway is down, sends fail fast.
- Non-connect first frames or malformed JSON are rejected and the socket is closed.
- Graceful shutdown: emit
shutdownevent before closing; clients must handle close + reconnect.
CLI helpers
clawdbot gateway health|status— request health/status over the Gateway WS.clawdbot gateway send --to <num> --message "hi" [--media-url ...]— send via Gateway (idempotent).clawdbot gateway agent --message "hi" [--to ...]— run an agent turn (waits for final by default).clawdbot gateway call <method> --params '{"k":"v"}'— raw method invoker for debugging.- Gateway helper subcommands assume a running gateway on
--url; they no longer auto-spawn one.
Migration guidance
- Retire uses of
clawdbot gatewayand the legacy TCP control port. - Update clients to speak the WS protocol with mandatory connect and structured presence.