6.9 KiB
6.9 KiB
Gateway (daemon) runbook
Last updated: 2025-12-09
What it is
- The always-on process that owns the single Baileys/Telegram connection and the control/event plane.
- Replaces the legacy
relaycommand. CLI entry point:clawdis gateway. - Runs until stopped; exits non-zero on fatal errors so the supervisor restarts it.
How to run (local)
pnpm clawdis gateway --port 18789
# for full debug/trace logs in stdio:
pnpm clawdis gateway --port 18789 --verbose
# if the port is busy, terminate listeners then start:
pnpm clawdis gateway --force
- Binds WebSocket control plane to
127.0.0.1:<port>(default 18789). - Logs to stdout; use launchd/systemd to keep it alive and rotate logs.
- Pass
--verboseto mirror debug logging from the log file into stdio when troubleshooting. --forceuseslsofto find listeners on the chosen port, sends SIGTERM, logs what it killed, then starts the gateway (fails fast iflsofis missing).- Optional shared secret: pass
--token <value>or setCLAWDIS_GATEWAY_TOKENto require clients to sendhello.auth.token.
Remote access
- Tailscale/VPN preferred; otherwise SSH tunnel:
ssh -N -L 18789:127.0.0.1:18789 user@host - Clients then connect to
ws://127.0.0.1:18789through the tunnel. - If a token is configured, clients must include it in
hello.auth.tokeneven over the tunnel.
Protocol (operator view)
- Mandatory first frame from client:
hello {type:"hello", minProtocol, maxProtocol, client:{name,version,platform,mode,instanceId}, caps, auth?, locale?, userAgent? }. - Gateway replies
hello-ok {type:"hello-ok", protocol:<chosen>, server:{version,commit,host,connId}, features:{methods,events}, snapshot:{presence[], health, stateVersion, uptimeMs}, policy:{maxPayload,maxBufferedBytes,tickIntervalMs} }orhello-error. - After handshake:
- Requests:
{type:"req", id, method, params}→{type:"res", id, ok, payload|error} - Events:
{type:"event", event, payload, seq?, stateVersion?}
- Requests:
- Structured presence entries:
{host, ip, version, mode, lastInputSeconds?, ts, reason?, tags?[], instanceId? }. agentresponses are two-stage: firstresack{runId,status:"accepted"}, then a finalres{runId,status:"ok"|"error",summary}after the run finishes; streamed output arrives asevent:"agent".
Methods (initial set)
health— full health snapshot (same shape asclawdis health --json).status— short summary.system-presence— current presence list.system-event— post a presence/system note (structured).send— send a message via the active provider(s).agent— run an agent turn (streams events back on same connection).
Events
agent— streamed tool/output events from the agent run (seq-tagged).presence— presence updates (deltas with stateVersion) pushed to all connected clients.tick— periodic keepalive/no-op to confirm liveness.shutdown— Gateway is exiting; payload includesreasonand optionalrestartExpectedMs. Clients should reconnect.
WebChat integration
- WebChat serves static assets locally (default port 18788, configurable).
- The WebChat backend keeps a single WS connection to the Gateway for control/data; all sends and agent runs flow through that connection.
- Remote use goes through the same SSH/Tailscale tunnel; if a gateway token is configured, WebChat must include it during hello.
- macOS app also connects via this WS (one socket); it hydrates presence from the initial snapshot and listens for
presenceevents to update the UI.
Typing and validation
- Server validates every inbound frame with AJV against JSON Schema emitted from the protocol definitions.
- Clients (TS/Swift) consume generated types (TS directly; Swift via quicktype from the JSON Schema).
- Types live in
src/gateway/protocol/*.ts; regenerate schemas/models withpnpm protocol:gen(writesdist/protocol.schema.jsonandapps/macos/Sources/ClawdisProtocol/Protocol.swift).
Connection snapshot
hello-okincludes asnapshotwithpresence,health,stateVersion, anduptimeMspluspolicy {maxPayload,maxBufferedBytes,tickIntervalMs}so clients can render immediately without extra requests.health/system-presenceremain available for manual refresh, but are not required at connect time.
Error codes (res.error shape)
- Errors use
{ code, message, details?, retryable?, retryAfterMs? }. - Standard codes:
NOT_LINKED— WhatsApp not authenticated.AGENT_TIMEOUT— agent did not respond within the configured deadline.INVALID_REQUEST— schema/param validation failed.UNAVAILABLE— Gateway is shutting down or a dependency is unavailable.
Keepalive behavior
tickevents (or WS ping/pong) are emitted periodically so clients know the Gateway is alive even when no traffic occurs.- Send/agent acknowledgements remain separate responses; do not overload ticks for sends.
Replay / gaps
- Events are not replayed. Clients detect seq gaps and should refresh (
health+system-presence) before continuing. WebChat and macOS clients now auto-refresh on gap.
Supervision (macOS example)
- Use launchd to keep the daemon alive:
- Program: path to
clawdis - Arguments:
gateway - KeepAlive: true
- StandardOut/Err: file paths or
syslog
- Program: path to
- On failure, launchd restarts; fatal misconfig should keep exiting so the operator notices.
Supervision (systemd example)
[Unit]
Description=Clawdis Gateway
After=network-online.target
Wants=network-online.target
[Service]
ExecStart=/usr/local/bin/clawdis gateway --port 18789
Restart=on-failure
RestartSec=5
User=clawdis
Environment=CLAWDIS_GATEWAY_TOKEN=
WorkingDirectory=/home/clawdis
[Install]
WantedBy=multi-user.target
Enable with systemctl enable --now clawdis-gateway.service.
Operational checks
- Liveness: open WS and send
hello→ expecthello-ok(with snapshot). - Readiness: call
health→ expectok: trueandweb.linked=true. - Debug: subscribe to
tickandpresenceevents; ensurestatusshows linked/auth age; presence entries show Gateway host and connected clients.
Safety guarantees
- Only one Gateway per host; all sends/agent calls must go through it.
- No fallback to direct Baileys connections; if the Gateway is down, sends fail fast.
- Non-hello first frames or malformed JSON are rejected and the socket is closed.
- Graceful shutdown: emit
shutdownevent before closing; clients must handle close + reconnect.
CLI helpers
clawdis gw:health/gw:status— request health/status over the Gateway WS.clawdis gw:send --to <num> --message "hi" [--media-url ...]— send via Gateway (idempotent).clawdis gw:agent --message "hi" [--to ...]— run an agent turn (waits for final by default).clawdis gw:call <method> --params '{"k":"v"}'— raw method invoker for debugging.
Migration guidance
- Retire uses of
clawdis relayand the legacy TCP control port. - Update clients to speak the WS protocol with mandatory hello and structured presence.