docs: refresh and simplify docs

This commit is contained in:
Peter Steinberger
2026-01-08 23:06:56 +01:00
parent 88dca1afdf
commit a6c309824e
46 changed files with 1117 additions and 2155 deletions

View File

@@ -171,9 +171,9 @@ Notes:
Recommended: `clawdbot hooks gmail run` wraps the same flow and auto-renews the watch.
## Expose the handler (dev, unsupported hack)
## Expose the handler (advanced, unsupported)
If you insist on a non-Tailscale tunnel, wire it manually and use the public URL in the push
If you need a non-Tailscale tunnel, wire it manually and use the public URL in the push
subscription (unsupported, no guardrails):
```bash

View File

@@ -7,8 +7,7 @@ read_when:
# CLI reference
This page mirrors `src/cli/*` and is the source of truth for CLI behavior.
If you change the CLI code, update this doc.
This page describes the current CLI behavior. If commands change, update this doc.
## Global flags
@@ -25,7 +24,7 @@ If you change the CLI code, update this doc.
## Color palette
Clawdbot uses a lobster palette for CLI output. Source of truth: `src/terminal/theme.ts`.
Clawdbot uses a lobster palette for CLI output.
- `accent` (#FF5A2D): headings, provider labels, primary highlights.
- `accentBright` (#FF7A3D): command names, emphasis.

View File

@@ -5,11 +5,11 @@ read_when:
---
# Agent Loop (Clawdis)
Short, exact flow of one agent run. Source of truth: current code in `src/`.
Short, exact flow of one agent run.
## Entry points
- Gateway RPC: `agent` and `agent.wait` in [`src/gateway/server-methods/agent.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/agent.ts).
- CLI: `agentCommand` in [`src/commands/agent.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/commands/agent.ts).
- Gateway RPC: `agent` and `agent.wait`.
- CLI: `agent` command.
## High-level flow
1) `agent` RPC validates params, resolves session (sessionKey/sessionId), persists session metadata, returns `{ runId, acceptedAt }` immediately.
@@ -37,10 +37,8 @@ Short, exact flow of one agent run. Source of truth: current code in `src/`.
- `tool`: streamed tool events from pi-agent-core
## Chat provider handling
- `createAgentEventHandler` in [`src/gateway/server-chat.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-chat.ts):
- buffers assistant deltas
- emits chat `delta` messages
- emits chat `final` when **lifecycle end/error** arrives
- Assistant deltas are buffered into chat `delta` messages.
- A chat `final` is emitted on **lifecycle end/error**.
## Timeouts
- `agent.wait` default: 30s (just the wait). `timeoutMs` param overrides.
@@ -51,11 +49,3 @@ Short, exact flow of one agent run. Source of truth: current code in `src/`.
- AbortSignal (cancel)
- Gateway disconnect or RPC timeout
- `agent.wait` timeout (wait-only, does not stop agent)
## Files
- [`src/gateway/server-methods/agent.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/agent.ts)
- [`src/gateway/server-methods/agent-job.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/agent-job.ts)
- [`src/commands/agent.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/commands/agent.ts)
- [`src/agents/pi-embedded-runner.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-embedded-runner.ts)
- [`src/agents/pi-embedded-subscribe.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-embedded-subscribe.ts)
- [`src/gateway/server-chat.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-chat.ts)

View File

@@ -5,7 +5,7 @@ read_when:
---
# Agent Runtime 🤖
CLAWDBOT runs a single embedded agent runtime derived from **p-mono** (internal name: **p**).
CLAWDBOT runs a single embedded agent runtime derived from **p-mono**.
## Workspace (required)
@@ -43,9 +43,9 @@ To disable bootstrap file creation entirely (for pre-seeded workspaces), set:
{ agent: { skipBootstrap: true } }
```
## Built-in tools (internal)
## Built-in tools
ps embedded core tools (read/bash/edit/write and related internals) are defined in code and always available. `TOOLS.md` does **not** control which tools exist; its guidance for how *you* want them used.
Core tools (read/bash/edit/write and related system tools) are always available. `TOOLS.md` does **not** control which tools exist; its guidance for how *you* want them used.
## Skills
@@ -63,18 +63,6 @@ Clawdbot reuses pieces of the p-mono codebase (models/tools), but **session mana
- No p-coding agent runtime.
- No `~/.pi/agent` or `<workspace>/.pi` settings are consulted.
## Peter @ steipete (only)
Apply these notes **only** when the user is Peter Steinberger at steipete.
- Gateway runs on the **Mac Studio in London**.
- Primary work computer: **MacBook Pro**.
- Peter travels between **Vienna** and **London**; there are two networks bridged via **Tailscale**.
- For debugging, connect to the Mac Studio (London) or MacBook Pro (primary).
- There is also an **M1 MacBook Pro** on the Vienna tailnet you can use to access the Vienna network.
- Nodes can be accessed via the `clawdbot` binary (`pnpm clawdbot` in `~/Projects/clawdbot`).
- See also `skills/clawdbot*` for node/browser/canvas/cron usage.
## Sessions
Session transcripts are stored as JSONL at:

View File

@@ -3,67 +3,55 @@ summary: "WebSocket gateway architecture, components, and client flows"
read_when:
- Working on gateway protocol, clients, or transports
---
# Gateway Architecture
# Gateway architecture
Last updated: 2026-01-05
## Overview
- A single long-lived **Gateway** process owns all messaging surfaces (WhatsApp via Baileys, Telegram via grammY, Slack via Bolt, Discord via discord.js, Signal via signal-cli, iMessage via imsg, WebChat) and the control/event plane.
- All clients (macOS app, CLI, web UI, automations) connect to the Gateway over one transport: **WebSocket on the configured bind host** (default `127.0.0.1:18789`; tunnel or VPN for remote).
- One Gateway per host; it is the only place that is allowed to open a WhatsApp session. All sends/agent runs go through it.
- By default: the Gateway exposes a Canvas host on `canvasHost.port` (default `18793`), serving `~/clawd/canvas` at `/__clawdbot__/canvas/` with live-reload; disable via `canvasHost.enabled=false` or `CLAWDBOT_SKIP_CANVAS_HOST=1`.
## Implementation snapshot (current code)
### TypeScript Gateway ([`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts))
- Single HTTP + WebSocket server (default `18789`); bind policy `loopback|lan|tailnet|auto`. Refuses non-loopback binds without auth; Tailscale serve/funnel requires loopback.
- Handshake: first frame must be a `connect` request; AJV validates request + params against TypeBox schemas; protocol negotiated via `minProtocol`/`maxProtocol`.
- `hello-ok` includes snapshot (presence/health/stateVersion/uptime/configPath/stateDir), features (methods/events), policy (max payload/buffer/tick), and `canvasHostUrl` when available.
- Events emitted: `agent`, `chat`, `presence`, `tick`, `health`, `heartbeat`, `cron`, `talk.mode`, `node.pair.requested`, `node.pair.resolved`, `voicewake.changed`, `shutdown`.
- Idempotency keys are required for `send`, `agent`, `chat.send`, and node invokes; the dedupe cache avoids double-sends on reconnects. Payload sizes are capped per connection.
- Optional node bridge ([`src/infra/bridge/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/bridge/server.ts)): TCP JSONL frames (`hello`, `pair-request`, `req/res`, `event`, `invoke`, `ping`). Node connect/disconnect updates presence and flows into the session bus.
- Control UI + Canvas host: HTTP serves Control UI (base path configurable) and can host the A2UI canvas via [`src/canvas-host/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/canvas-host/server.ts) (live reload). Canvas host URL is advertised to nodes + clients.
### iOS node (`apps/ios`)
- Discovery + pairing: `BridgeDiscoveryModel` uses `NWBrowser` Bonjour discovery and reads TXT fields for LAN/tailnet host hints plus gateway/bridge/canvas ports.
- Auto-connect: `BridgeConnectionController` uses stored `node.instanceId` + Keychain token; supports manual host/port; performs `pair-and-hello`.
- Bridge runtime: `BridgeSession` actor owns an `NWConnection`, JSONL frames, `hello`/`hello-ok`, ping/pong, `req/res`, server `event`s, and `invoke` callbacks; stores `canvasHostUrl`.
- Commands: `NodeAppModel` executes `canvas.*`, `canvas.a2ui.*`, `camera.*`, `screen.record`, `location.get`. Canvas/camera/screen are blocked when backgrounded.
- Canvas + actions: `WKWebView` with A2UI action bridge; accepts actions from local-network or trusted file URLs; intercepts `clawdbot://` deep links and forwards `agent.request` to the bridge.
- Voice/talk: voice wake sends `voice.transcript` events and syncs triggers via `voicewake.get` + `voicewake.changed`; Talk Mode attaches to the bridge.
### Android node (`apps/android`)
- Discovery + pairing: `BridgeDiscovery` uses mDNS/NSD to find `_clawdbot-bridge._tcp`, with manual host/port fallback.
- Auto-connect: `NodeRuntime` restores a stored token, performs `pair-and-hello`, and reconnects to the last discovered or manual bridge.
- Bridge runtime: `BridgeSession` owns the TCP JSONL session (`hello`/`hello-ok`, ping/pong, `req/res`, `event`, `invoke`); stores `canvasHostUrl`.
- Commands: `NodeRuntime` executes `canvas.*`, `canvas.a2ui.*`, `camera.*`, and chat/session events; foreground-only for canvas/camera.
- A single longlived **Gateway** owns all messaging surfaces (WhatsApp via
Baileys, Telegram via grammY, Slack, Discord, Signal, iMessage, WebChat).
- All clients (macOS app, CLI, web UI, automations) connect to the Gateway over
**one transport: WebSocket** on the configured bind host (default
`127.0.0.1:18789`).
- One Gateway per host; it is the only place that opens a WhatsApp session.
- A **bridge** (default `18790`) is used for nodes (macOS/iOS/Android).
- A **canvas host** (default `18793`) serves agenteditable HTML and A2UI.
## Components and flows
- **Gateway (daemon)**
- Maintains WhatsApp (Baileys), Telegram (grammY), Slack (Bolt), Discord (discord.js), Signal (signal-cli), and iMessage (imsg) connections.
- Exposes a typed WS API (req/resp + server push events).
- Validates every inbound frame against JSON Schema; rejects anything before a mandatory `connect`.
- **Clients (mac app / CLI / web admin)**
- One WS connection per client.
- Send requests (`health`, `status`, `send`, `agent`, `system-presence`, toggles) and subscribe to events (`tick`, `agent`, `presence`, `shutdown`).
- On macOS, the app can also be invoked via deep links (`clawdbot://agent?...`) which translate into the same Gateway `agent` request path (see [`docs/macos.md`](/platforms/macos)).
- **Agent process (Pi)**
- Spawned by the Gateway on demand for `agent` calls; streams events back over the same WS connection.
- **WebChat**
- Serves static assets locally.
- Holds a single WS connection to the Gateway for control/data; all sends/agent runs go through the Gateway WS.
- Remote use goes through the same SSH/Tailscale tunnel as other clients.
### Gateway (daemon)
- Maintains provider connections.
- Exposes a typed WS API (requests, responses, serverpush events).
- Validates inbound frames against JSON Schema.
- Emits events like `agent`, `chat`, `presence`, `health`, `heartbeat`, `cron`.
### Clients (mac app / CLI / web admin)
- One WS connection per client.
- Send requests (`health`, `status`, `send`, `agent`, `system-presence`).
- Subscribe to events (`tick`, `agent`, `presence`, `shutdown`).
### Nodes (macOS / iOS / Android)
- Connect to the **bridge** (TCP JSONL) rather than the WS server.
- Pair with the Gateway to receive a token.
- Expose commands like `canvas.*`, `camera.*`, `screen.record`, `location.get`.
### WebChat
- Static UI that uses the Gateway WS API for chat history and sends.
- In remote setups, connects through the same SSH/Tailscale tunnel as other
clients.
## Connection lifecycle (single client)
```
Client Gateway
| |
|---- req:connect -------->|
|<------ res (ok) ---------| (or res error + close)
| (payload=hello-ok carries snapshot: presence + health)
| (payload=hello-ok carries snapshot: presence + health)
| |
|<------ event:presence ---| (deltas)
|<------ event:tick -------| (keepalive/no-op)
|<------ event:presence ---|
|<------ event:tick -------|
| |
|------- req:agent ------->|
|<------ res:agent --------| (ack: {runId,status:"accepted"})
@@ -71,44 +59,42 @@ Client Gateway
|<------ res:agent --------| (final: {runId,status,summary})
| |
```
## Wire protocol (summary)
- Transport: WebSocket, text frames with JSON payloads.
- First frame must be `req {type:"req", id, method:"connect", params:{minProtocol, maxProtocol, client:{name,version,platform,mode,instanceId}, caps, auth?, locale?, userAgent? } }`.
- Server replies `res {type:"res", id, ok:true, payload: hello-ok }` or `ok:false` then closes.
- After handshake:
- Requests: `{type:"req", id, method, params}``{type:"res", id, ok, payload|error}`
- Events: `{type:"event", event:"agent"|"presence"|"tick"|"shutdown", payload, seq?, stateVersion?}`
- If `CLAWDBOT_GATEWAY_TOKEN` (or `--token`) is set, `connect.params.auth.token` must match; otherwise the socket closes with policy violation.
- Presence payload is structured, not free text: `{host, ip, version, mode, lastInputSeconds?, ts, reason?, tags?[], instanceId? }`.
- Agent runs are acked `{runId,status:"accepted"}` then complete with a final res `{runId,status,summary}`; streamed output arrives as `event:"agent"`.
- Protocol versions are bumped on breaking changes; clients must match `minClient`; Gateway chooses within clients min/max.
- Idempotency keys are required for side-effecting methods (`send`, `agent`) to safely retry; server keeps a short-lived dedupe cache.
- Policy in `hello-ok` communicates payload/queue limits and tick interval.
- First frame **must** be `connect`.
- After handshake:
- Requests: `{type:"req", id, method, params}``{type:"res", id, ok, payload|error}`
- Events: `{type:"event", event, payload, seq?, stateVersion?}`
- If `CLAWDBOT_GATEWAY_TOKEN` (or `--token`) is set, `connect.params.auth.token`
must match or the socket closes.
- Idempotency keys are required for sideeffecting methods (`send`, `agent`) to
safely retry; the server keeps a shortlived dedupe cache.
## Type system and codegen
- Source of truth: TypeBox (or ArkType) definitions in `protocol/` on the server.
- Build step emits JSON Schema.
- Clients:
- TypeScript: uses the same TypeBox types directly.
- Swift: generated `Codable` models via quicktype from the JSON Schema.
- Validation: AJV on the server for every inbound frame; optional client-side validation for defensive programming.
## Protocol typing and codegen
## Invariants
- Exactly one Gateway controls a single Baileys session per host. No fallbacks to ad-hoc direct Baileys sends.
- Handshake is mandatory; any non-JSON or non-connect first frame is a hard close.
- All methods and events are versioned; new fields are additive; breaking changes increment `protocol`.
- No event replay: on seq gaps, clients must refresh (`health` + `system-presence`) and continue; presence is bounded via TTL/max entries.
- TypeBox schemas define the protocol.
- JSON Schema is generated from those schemas.
- Swift models are generated from the JSON Schema.
## Remote access
- Preferred: Tailscale or VPN; alternate: SSH tunnel `ssh -N -L 18789:127.0.0.1:18789 user@host`.
- Same protocol over the tunnel; same handshake. If a shared token is configured, clients must send it in `connect.params.auth.token` even over the tunnel.
- Same protocol over the tunnel; same handshake. If a shared token is configured, clients must send it in `connect.params.auth.token` even over the tunnel.
- Preferred: Tailscale or VPN.
- Alternative: SSH tunnel
```bash
ssh -N -L 18789:127.0.0.1:18789 user@host
```
- The same handshake + auth token apply over the tunnel.
## Operations snapshot
- Start: `clawdbot gateway` (foreground, logs to stdout).
Supervise with launchd/systemd for restarts.
- Health: request `health` over WS; also surfaced in `hello-ok.health`.
- Metrics/logging: keep outside this spec; gateway should expose Prometheus text or structured logs separately.
## Migration notes
- This architecture supersedes the legacy stdin RPC and the ad-hoc TCP control port. New clients should speak only the WS protocol. Legacy compatibility is intentionally dropped.
- Start: `clawdbot gateway` (foreground, logs to stdout).
- Health: `health` over WS (also included in `hello-ok`).
- Supervision: launchd/systemd for autorestart.
## Invariants
- Exactly one Gateway controls a single Baileys session per host.
- Handshake is mandatory; any nonJSON or nonconnect first frame is a hard close.
- Events are not replayed; clients must refresh on gaps.

View File

@@ -61,7 +61,6 @@ Only the owner number (from `whatsapp.allowFrom`, or the bots own E.164 when
4) Session-level directives (`/verbose on`, `/think high`, `/new` or `/reset`, `/compact`) apply only to that groups session; send them as standalone messages so they register. Your personal DM session remains independent.
## Testing / verification
- Automated: `pnpm test -- src/web/auto-reply.test.ts --runInBand` (covers mention gating, history injection, sender suffix).
- Manual smoke:
- Send an `@clawd` ping in the group and confirm a reply that references the sender name.
- Send a second ping and verify the history block is included then cleared on the next turn.

View File

@@ -1,157 +1,114 @@
---
summary: "Plan for models CLI: scan, list, aliases, fallbacks, status"
summary: "Models CLI: list, set, aliases, fallbacks, scan, status"
read_when:
- Adding or modifying models CLI (models list/set/scan/aliases/fallbacks)
- Changing model fallback behavior or selection UX
- Updating model scan probes (tools/images)
---
# Models CLI plan
# Models CLI
See [`docs/model-failover.md`](/concepts/model-failover) for how auth profiles rotate (OAuth vs API keys), cooldowns, and how that interacts with model fallbacks.
See [/concepts/model-failover](/concepts/model-failover) for auth profile
rotation, cooldowns, and how that interacts with fallbacks.
Goal: give clear model visibility + control (configured vs available), plus scan tooling
that prefers tool-call + image-capable models and maintains ordered fallbacks.
## How Clawdbot models work (quick explainer)
## How model selection works
Clawdbot selects models in this order:
1) The configured **primary** model (`agent.model.primary`).
2) If it fails, fallbacks in `agent.model.fallbacks` (in order).
3) Auth failover happens **inside** the provider first (see [/concepts/model-failover](/concepts/model-failover)).
Key pieces:
- `provider/model` is the canonical model id (e.g. `anthropic/claude-opus-4-5`).
- `agent.models` is the **allowlist/catalog** of models Clawdbot can use, with optional aliases and provider params.
- `agent.imageModel` is only used when the primary model **cant** accept images.
- `models.providers` lets you add custom providers + models (written to `models.json`).
- `/model <id>` switches the active model for the current session; `/model list` shows whats allowed.
1) **Primary** model (`agent.model.primary` or `agent.model`).
2) **Fallbacks** in `agent.model.fallbacks` (in order).
3) **Provider auth failover** happens inside a provider before moving to the
next model.
Related:
- Context limits are model-specific; long sessions may trigger compaction. See [/concepts/compaction](/concepts/compaction).
- `agent.models` is the allowlist/catalog of models Clawdbot can use (plus aliases).
- `agent.imageModel` is used **only when** the primary model cant accept images.
## Model recommendations
## Config keys (overview)
- [Claude Opus 4.5](https://www.anthropic.com/claude/opus): default primary for assistant + general work. Its pricey and cap-prone, so consider the [Claude Max $200 subscription](https://www.anthropic.com/pricing/) if you live here.
- [Claude Sonnet 4.5](https://www.anthropic.com/claude/sonnet): default fallback when Opus caps out. Similar behavior with fewer limit headaches.
- [GPT-5.2-Codex](https://developers.openai.com/codex/models): recommended for coding and sub-agents. Prefer the [Codex CLI](https://developers.openai.com/codex/cli) if you want the strongest feel.
- `agent.model.primary` and `agent.model.fallbacks`
- `agent.imageModel.primary` and `agent.imageModel.fallbacks`
- `agent.models` (allowlist + aliases + provider params)
- `models.providers` (custom providers written into `models.json`)
Suggested stacks:
- Assistant-first: Opus 4.5 primary → Sonnet 4.5 fallback.
- Agentic coding: Opus 4.5 primary → GPT-5.2-Codex for sub-agents → Sonnet 4.5 fallback.
Model refs are normalized to lowercase. Provider aliases like `z.ai/*` normalize
to `zai/*`.
## Model discussions (community notes)
## CLI commands
Anecdotal notes from the Discord thread on January 45, 2026. Treat as “reported by users,” not a benchmark.
```bash
clawdbot models list
clawdbot models status
clawdbot models set <provider/model>
clawdbot models set-image <provider/model>
**Reported working well**
- [Claude Opus 4.5](https://www.anthropic.com/claude/opus): best overall quality in Clawdbot, especially for “assistant” work. Tradeoff is cost and hitting usage limits quickly.
- [Claude Sonnet 4.5](https://www.anthropic.com/claude/sonnet): common fallback when Opus caps out. Similar behavior with fewer limit headaches.
- [Gemini 3 Pro](https://deepmind.google/en/models/gemini/pro/): some users felt it maps well to Clawdbots structure. Vibe was “fits the framework” more than “best at everything.”
- [GLM](https://www.zhipuai.cn/en/): used successfully as a worker model under orchestration. Seen as strong for delegated/secondary tasks, not the primary brain.
- [MiniMax M2.1](https://platform.minimax.io/docs/guides/models-intro): “good enough” for grunt work or a cheap fallback. Community nickname was “Temu-Sonnet,” i.e. usable but not Sonnet-level polish.
clawdbot models aliases list
clawdbot models aliases add <alias> <provider/model>
clawdbot models aliases remove <alias>
**Mixed / unclear**
- [Antigravity](https://blog.google/technology/ai/google-ai-updates-november-2025/) (Claude Opus access): some reported extra Opus quota. Pricing/limits were unclear, so the value is hard to predict.
clawdbot models fallbacks list
clawdbot models fallbacks add <provider/model>
clawdbot models fallbacks remove <provider/model>
clawdbot models fallbacks clear
**Reported weak in Clawdbot**
- [GPT-5.2-Codex](https://developers.openai.com/codex/models) inside Clawdbot: reported as rough for conversation/assistant tasks when embedded. Same notes said Codex felt stronger via the [Codex CLI](https://developers.openai.com/codex/cli) than embedded use.
- [Grok](https://docs.x.ai/docs/models/grok-4): people tried it and then abandoned it. No strong upside showed up in the notes.
clawdbot models image-fallbacks list
clawdbot models image-fallbacks add <provider/model>
clawdbot models image-fallbacks remove <provider/model>
clawdbot models image-fallbacks clear
```
**Theme**
- Token burn feels higher than expected in long sessions; people suspect context buildup + tool outputs. Pruning/compaction helps. Check session logs before blaming providers. See [/concepts/session](/concepts/session) and [/concepts/model-failover](/concepts/model-failover).
`clawdbot models` (no subcommand) is a shortcut for `models status`.
Want a tailored stack? Share whether youre using Clawdbot or Clawdis and your main workload (agentic coding vs “assistant” work), and we can suggest a primary + fallback set based on these reports.
### `models list`
## Models CLI
Shows configured models by default. Useful flags:
See [/cli](/cli) for the full command tree and CLI flags.
- `--all`: full catalog
- `--local`: local providers only
- `--provider <name>`: filter by provider
- `--plain`: one model per line
- `--json`: machinereadable output
### CLI output (list + status)
### `models status`
`clawdbot models list` (default) prints a table with these columns:
- `Model`: `provider/model` key (truncated in TTY).
- `Input`: `text` or `text+image`.
- `Ctx`: context window in K tokens (from the model registry).
- `Local`: `yes/no` when the provider base URL is local.
- `Auth`: `yes/no` when the provider has usable auth.
- `Tags`: origin + role hints.
Shows the resolved primary model, fallbacks, image model, and an auth overview
of configured providers. `--plain` prints only the resolved primary model.
Common tags:
- `default` — resolved default model.
- `fallback#N``agent.model.fallbacks` order.
- `image``agent.imageModel.primary`.
- `img-fallback#N``agent.imageModel.fallbacks` order.
- `configured` — present in `agent.models`.
- `alias:<name>` — alias from `agent.models.*.alias`.
- `missing` — referenced in config but not found in the registry.
## Scanning (OpenRouter free models)
Output formats:
- `--plain`: prints only `provider/model` keys (one per line).
- `--json`: `{ count, models: [{ key, name, input, contextWindow, local, available, tags, missing }] }`.
`clawdbot models scan` inspects OpenRouters **free model catalog** and can
optionally probe models for tool and image support.
`clawdbot models status` prints the resolved defaults, fallbacks, image model, aliases,
and an **Auth overview** section showing which providers have profiles/env/models.json keys.
`--plain` prints the resolved default model only; `--json` returns a structured object for tooling.
Key flags:
## Config changes
- `--no-probe`: skip live probes (metadata only)
- `--min-params <b>`: minimum parameter size (billions)
- `--max-age-days <days>`: skip older models
- `--provider <name>`: provider prefix filter
- `--max-candidates <n>`: fallback list size
- `--set-default`: set `agent.model.primary` to the first selection
- `--set-image`: set `agent.imageModel.primary` to the first image selection
- `agent.models` (configured model catalog + aliases).
- `agent.models.*.params` (provider-specific API params passed through to requests).
- `agent.model.primary` + `agent.model.fallbacks`.
- `agent.imageModel.primary` + `agent.imageModel.fallbacks` (optional).
- `auth.profiles` + `auth.order` for per-provider auth failover.
Probing requires an OpenRouter API key (from auth profiles or
`OPENROUTER_API_KEY`). Without a key, use `--no-probe` to list candidates only.
## Scan behavior (models scan)
Scan results are ranked by:
1) Image support
2) Tool latency
3) Context size
4) Parameter count
<<<<<<< HEAD
Input
- OpenRouter `/models` list (filter `:free`)
- Requires OpenRouter API key from auth profiles or `OPENROUTER_API_KEY` (see [/environment](/environment))
- Optional filters: `--max-age-days`, `--min-params`, `--provider`, `--max-candidates`
- Probe controls: `--timeout`, `--concurrency`
Probes (direct pi-ai complete)
- Tool-call probe (required):
- Provide a dummy tool, verify tool call emitted.
- Image probe (preferred):
- Prompt includes 1x1 PNG; success if no "unsupported image" error.
When run in a TTY, you can select fallbacks interactively. In noninteractive
mode, pass `--yes` to accept defaults.
Scoring/selection
- Prefer models passing tool + image for text/tool fallbacks.
- Prefer image-only models for image tool fallback (even if tool probe fails).
- Rank by: image ok, then lower tool latency, then larger context, then params.
## Models registry (`models.json`)
Interactive selection (TTY)
- Multiselect list with per-model stats:
- model id, tool ok, image ok, median latency, context, inferred params.
- Pre-select top N (default 6).
- Non-TTY: auto-select; require `--yes`/`--no-input` to apply.
Output
- Writes `agent.model.fallbacks` ordered.
- Writes `agent.imageModel.fallbacks` ordered (image-capable models).
- Ensures `agent.models` entries exist for selected models.
- Optional `--set-default` to set `agent.model.primary`.
- Optional `--set-image` to set `agent.imageModel.primary`.
## Runtime fallback
- On model failure: try `agent.model.fallbacks` in order.
- Per-provider auth failover uses `auth.order` (or stored profile order) **before**
moving to the next model.
- Image routing uses `agent.imageModel` **only when configured** and the primary
model lacks image input.
- Persist last successful provider/model to session entry; auth profile success is global.
- See [`docs/model-failover.md`](/concepts/model-failover) for auth profile rotation, cooldowns, and timeout handling.
## Tests
- Unit: scan selection ordering + probe classification.
- CLI: list/aliases/fallbacks add/remove + scan writes config.
- Status: shows last used model + fallbacks.
## Docs
- Update [`docs/configuration.md`](/gateway/configuration) with `agent.models` + `agent.model` + `agent.imageModel`.
- Keep this doc current when CLI surface or scan logic changes.
- Note provider aliases like `z.ai/*` -> `zai/*` when relevant.
- Provider ids in model refs are normalized to lowercase.
Custom providers in `models.providers` are written into `models.json` under the
agent directory (default `~/.clawdbot/agents/<agentId>/models.json`). This file
is merged by default unless `models.mode` is set to `replace`.

View File

@@ -97,7 +97,7 @@ At runtime:
- if `expires` is in the future → use the stored access token
- if expired → refresh (under a file lock) and overwrite the stored credentials
See implementation: `src/agents/auth-profiles.ts`.
The refresh flow is automatic; you generally dont need to manage tokens manually.
## Multiple accounts (profiles) + routing

View File

@@ -7,127 +7,93 @@ read_when:
---
# Presence
Clawdbot “presence” is a lightweight, best-effort view of:
- The **Gateway** itself (one per host), and
- The **clients connected to the Gateway** (mac app, WebChat, CLI, etc.).
Clawdbot “presence” is a lightweight, besteffort view of:
- the **Gateway** itself, and
- **clients connected to the Gateway** (mac app, WebChat, CLI, etc.)
Presence is used primarily to render the mac apps **Instances** tab and to provide quick operator visibility.
Presence is used primarily to render the macOS apps **Instances** tab and to
provide quick operator visibility.
## The data model
## Presence fields (what shows up)
Presence entries are structured objects with (some) fields:
- `instanceId` (optional but strongly recommended): stable client identity used for dedupe
- `host`: a human-readable name (often the machine name)
- `ip`: best-effort IP address (may be missing or stale)
Presence entries are structured objects with fields like:
- `instanceId` (optional but strongly recommended): stable client identity
- `host`: humanfriendly host name
- `ip`: besteffort IP address
- `version`: client version string
- `deviceFamily` (optional): hardware family like `iPad`, `iPhone`, `Mac`
- `modelIdentifier` (optional): hardware model identifier like `iPad16,6` or `Mac16,6`
- `mode`: e.g. `gateway`, `app`, `webchat`, `cli`
- `lastInputSeconds` (optional): “seconds since last user input” for that client machine
- `reason`: a short marker like `self`, `connect`, `node-connected`, `node-disconnected`, `periodic`, `instances-refresh`
- `text`: legacy/debug summary string (kept for backwards compatibility and UI display)
- `deviceFamily` / `modelIdentifier`: hardware hints
- `mode`: `gateway`, `app`, `webchat`, `cli`, `node`, ...
- `lastInputSeconds`: “seconds since last user input” (if known)
- `reason`: `self`, `connect`, `node-connected`, `periodic`, ...
- `ts`: last update timestamp (ms since epoch)
## Producers (where presence comes from)
Presence entries are produced by multiple sources and then **merged**.
Presence entries are produced by multiple sources and **merged**.
### 1) Gateway self entry
The Gateway seeds a “self” entry at startup so UIs always show at least the current gateway host.
The Gateway always seeds a “self” entry at startup so UIs show the gateway host
even before any clients connect.
Implementation: [`src/infra/system-presence.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/system-presence.ts) (`initSelfPresence()`).
### 2) WebSocket connect
### 2) WebSocket connect (connection-derived presence)
Every WS client begins with a `connect` request. On successful handshake the
Gateway upserts a presence entry for that connection.
Every WS client must begin with a `connect` request. On successful handshake, the Gateway upserts a presence entry for that connection.
#### Why oneoff CLI commands dont show up
This is meant to answer: “Which clients are currently connected?”
The CLI often connects for short, oneoff commands. To avoid spamming the
Instances list, `client.mode === "cli"` is **not** turned into a presence entry.
Implementation: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) (connect handling uses `connect.params.client.instanceId` when provided; otherwise falls back to `connId`).
### 3) `system-event` beacons
#### Why one-off CLI commands do not show up
Clients can send richer periodic beacons via the `system-event` method. The mac
app uses this to report host name, IP, and `lastInputSeconds`.
The CLI connects to the Gateway to execute one-off commands (health/status/send/agent/etc.). These are not “nodes” and would spam the Instances list, so the Gateway does not create presence entries for clients with `client.mode === "cli"`.
### 3) `system-event` beacons (client-reported presence)
Clients can publish richer periodic beacons via the `system-event` method. The mac app uses this to report:
- a human-friendly host name
- its best-known IP address
- `lastInputSeconds`
Implementation:
- Gateway: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) handles method `system-event` by calling `updateSystemPresence(...)`.
- mac app beaconing: [`apps/macos/Sources/Clawdbot/PresenceReporter.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/PresenceReporter.swift).
### 4) Node bridge beacons (gateway-owned presence)
### 4) Node bridge beacons
When a node bridge connection authenticates, the Gateway emits a presence entry
for that node and starts periodic refresh beacons so it does not expire.
- Connect/disconnect markers: `node-connected`, `node-disconnected`
- Periodic heartbeat: every 3 minutes (`reason: periodic`)
Implementation: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) (node bridge handlers + timer beacons).
for that node and refreshes it periodically so it doesnt expire.
## Merge + dedupe rules (why `instanceId` matters)
All producers write into a single in-memory presence map.
Presence entries are stored in a single inmemory map:
Key points:
- Entries are **keyed** by a “presence key”. If two producers use the same key, they update the same entry.
- The best key is a stable, opaque `instanceId` that does not change across restarts.
- Keys are treated case-insensitively.
- Entries are keyed by a **presence key**.
- The best key is a stable `instanceId` that survives restarts.
- Keys are caseinsensitive.
Implementation: [`src/infra/system-presence.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/system-presence.ts) (`normalizePresenceKey()`).
If a client reconnects without a stable `instanceId`, it may show up as a
**duplicate** row.
### mac app identity (stable UUID)
## TTL and bounded size
The mac app uses a persisted UUID as `instanceId` so:
- restarts/reconnects do not create duplicates
- renaming the Mac does not create a new “instance”
- debug/release builds can share the same identity
Presence is intentionally ephemeral:
Implementation: [`apps/macos/Sources/Clawdbot/InstanceIdentity.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/InstanceIdentity.swift).
- **TTL:** entries older than 5 minutes are pruned
- **Max entries:** 200 (oldest dropped first)
`displayName` (machine name) is used for UI, while `instanceId` is used for dedupe.
## TTL and bounded size (why stale rows disappear)
Presence entries are not permanent:
- TTL: entries older than 5 minutes are pruned
- Max: map is capped at 200 entries (LRU by `ts`)
Implementation: [`src/infra/system-presence.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/system-presence.ts) (`TTL_MS`, `MAX_ENTRIES`, pruning in `listSystemPresence()`).
This keeps the list fresh and avoids unbounded memory growth.
## Remote/tunnel caveat (loopback IPs)
When a client connects over an SSH tunnel / local port forward, the Gateway may see the remote address as loopback (`127.0.0.1`).
When a client connects over an SSH tunnel / local port forward, the Gateway may
see the remote address as `127.0.0.1`. To avoid overwriting a good clientreported
IP, loopback remote addresses are ignored.
To avoid degrading an otherwise-correct client beacon IP, the Gateway avoids writing loopback remote addresses into presence entries.
Implementation: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) (`isLoopbackAddress()`).
## Consumers (who reads presence)
## Consumers
### macOS Instances tab
The mac apps Instances tab renders the result of `system-presence`.
Implementation:
- View: [`apps/macos/Sources/Clawdbot/InstancesSettings.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/InstancesSettings.swift)
- Store: [`apps/macos/Sources/Clawdbot/InstancesStore.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/InstancesStore.swift)
The Instances rows show a small presence indicator (Active/Idle/Stale) based on
the last beacon age. The label is derived from the entry timestamp (`ts`).
The store refreshes periodically and also applies `presence` WS events.
The macOS app renders the output of `system-presence` and applies a small status
indicator (Active/Idle/Stale) based on the age of the last update.
## Debugging tips
- To see the raw list, call `system-presence` against the gateway.
- To see the raw list, call `system-presence` against the Gateway.
- If you see duplicates:
- confirm clients send a stable `instanceId` in the handshake (`connect.params.client.instanceId`)
- confirm beaconing uses the same `instanceId`
- check whether the connection-derived entry is missing `instanceId` (then it will be keyed by `connId` and duplicates are expected on reconnect)
- confirm clients send a stable `instanceId` in the handshake
- confirm periodic beacons use the same `instanceId`
- check whether the connectionderived entry is missing `instanceId` (duplicates are expected)

View File

@@ -3,24 +3,97 @@ summary: "Routing rules per provider (WhatsApp, Telegram, Discord, web) and shar
read_when:
- Changing provider routing or inbox behavior
---
# Providers & Routing
# Providers & routing
Updated: 2026-01-06
Goal: deterministic replies per provider, while supporting multi-agent + multi-account routing.
Clawdbot routes replies **back to the provider where a message came from**. The
model does not choose a provider; routing is deterministic and controlled by the
host configuration.
- **Provider**: provider label (`whatsapp`, `webchat`, `telegram`, `discord`, `signal`, `imessage`, …). Routing is fixed: replies go back to the origin provider; the model doesnt choose.
- **AccountId**: provider account instance (e.g. WhatsApp account `"default"` vs `"work"`). Not every provider supports multi-account yet.
- **AgentId**: one isolated “brain” (workspace + per-agent agentDir + per-agent session store).
- **Reply context:** inbound replies include `ReplyToId`, `ReplyToBody`, and `ReplyToSender`, and the quoted context is appended to `Body` as a `[Replying to ...]` block.
- **Canonical direct session (per agent):** direct chats collapse to `agent:<agentId>:<mainKey>` (default `main`). Groups/channels stay isolated per agent:
- group: `agent:<agentId>:<provider>:group:<id>`
- channel/room: `agent:<agentId>:<provider>:channel:<id>`
- Telegram forum topics: `agent:<agentId>:telegram:group:<chatId>:topic:<threadId>`
- **Session store:** per-agent store lives under `~/.clawdbot/agents/<agentId>/sessions/sessions.json` (override via `session.store` with `{agentId}` templating). JSONL transcripts live next to it.
- **WebChat:** attaches to the selected agents main session (so desktop reflects cross-provider history for that agent).
- **Implementation hints:**
- Set `Provider` + `AccountId` in each ingress.
- Route inbound to an agent via `routing.bindings` (match on `provider`, `accountId`, plus optional peer/guild/team).
- Keep routing deterministic: originate → same provider. Use the gateway WebSocket for sends; avoid side channels.
- Do not let the agent emit “send to X” decisions; keep that policy in the host code.
## Key terms
- **Provider**: `whatsapp`, `telegram`, `discord`, `slack`, `signal`, `imessage`, `webchat`.
- **AccountId**: perprovider account instance (when supported).
- **AgentId**: an isolated workspace + session store (“brain”).
- **SessionKey**: the internal bucket key used to store context and control concurrency.
## Session key shapes (examples)
Direct messages collapse to the agents **main** session:
- `agent:<agentId>:<mainKey>` (default: `agent:main:main`)
Groups and channels remain isolated per provider:
- Groups: `agent:<agentId>:<provider>:group:<id>`
- Channels/rooms: `agent:<agentId>:<provider>:channel:<id>`
Threads:
- Slack/Discord threads append `:thread:<threadId>` to the base key.
- Telegram forum topics embed `:topic:<topicId>` in the group key.
Examples:
- `agent:main:telegram:group:-1001234567890:topic:42`
- `agent:main:discord:channel:123456:thread:987654`
## Routing rules (how an agent is chosen)
Routing picks **one agent** for each inbound message:
1. **Exact peer match** (`routing.bindings` with `peer.kind` + `peer.id`).
2. **Guild match** (Discord) via `guildId`.
3. **Team match** (Slack) via `teamId`.
4. **Account match** (`accountId` on the provider).
5. **Provider match** (any account on that provider).
6. **Default agent** (`routing.defaultAgentId`, fallback to `main`).
The matched agent determines which workspace and session store are used.
## Config overview
- `routing.defaultAgentId`: default agent when no binding matches.
- `routing.agents`: named agent definitions (workspace, model, etc.).
- `routing.bindings`: map inbound providers/accounts/peers to agents.
Example:
```json5
{
routing: {
defaultAgentId: "main",
agents: {
support: { name: "Support", workspace: "~/clawd-support" }
},
bindings: [
{ match: { provider: "slack", teamId: "T123" }, agentId: "support" },
{ match: { provider: "telegram", peer: { kind: "group", id: "-100123" } }, agentId: "support" }
]
}
}
```
## Session storage
Session stores live under the state directory (default `~/.clawdbot`):
- `~/.clawdbot/agents/<agentId>/sessions/sessions.json`
- JSONL transcripts live alongside the store
You can override the store path via `session.store` and `{agentId}` templating.
## WebChat behavior
WebChat attaches to the **selected agent** and defaults to the agents main
session. Because of this, WebChat lets you see crossprovider context for that
agent in one place.
## Reply context
Inbound replies include:
- `ReplyToId`, `ReplyToBody`, and `ReplyToSender` when available.
- Quoted context is appended to `Body` as a `[Replying to ...]` block.
This is consistent across providers.

View File

@@ -12,7 +12,7 @@ We now serialize command-based auto-replies (WhatsApp Web listener) through a ti
- Serializing avoids competing for terminal/stdin, keeps logs readable, and reduces the chance of rate limits from upstream tools.
## How it works
- [`src/process/command-queue.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/process/command-queue.ts) holds a lane-aware FIFO queue and drains each lane synchronously.
- A lane-aware FIFO queue drains each lane synchronously.
- `runEmbeddedPiAgent` enqueues by **session key** (lane `session:<key>`) to guarantee only one active run per session.
- Each session run is then queued into a **global lane** (`main` by default) so overall parallelism is capped by `agent.maxConcurrent`.
- When verbose logging is enabled, queued commands emit a short notice if they waited more than ~2s before starting.
@@ -74,4 +74,4 @@ Defaults: `debounceMs: 1000`, `cap: 20`, `drop: summarize`.
## Troubleshooting
- If commands seem stuck, enable verbose logs and look for “queued for …ms” lines to confirm the queue is draining.
- `enqueueCommand` exposes a lightweight `getQueueSize()` helper if you need to surface queue depth in future diagnostics.
- If you need queue depth, enable verbose logs and watch for queue timing lines.

View File

@@ -21,7 +21,7 @@ Goal: small, hard-to-misuse tool set so agents can list sessions, fetch history,
- Hooks use `hook:<uuid>` unless explicitly set.
- Node bridge uses `node-<nodeId>` unless explicitly set.
`global` and `unknown` are internal-only and never listed. If `session.scope = "global"`, we alias it to `main` for all tools so callers never see `global`.
`global` and `unknown` are reserved values and are never listed. If `session.scope = "global"`, we alias it to `main` for all tools so callers never see `global`.
## sessions_list
List sessions as an array of rows.

View File

@@ -8,7 +8,7 @@ read_when:
ClaudeBot builds a custom system prompt for every agent run. The prompt is **Clawdbot-owned** and does not use the p-coding-agent default prompt.
The prompt is assembled in `src/agents/system-prompt.ts` and injected by `src/agents/pi-embedded-runner.ts`.
The prompt is assembled by Clawdbot and injected into each agent run.
## Structure
@@ -56,9 +56,3 @@ Skills are **not** auto-injected. Instead, the prompt instructs the model to use
```
This keeps the base prompt small while still enabling targeted skill usage.
## Code references
- Prompt text: `src/agents/system-prompt.ts`
- Prompt assembly + injection: `src/agents/pi-embedded-runner.ts`
- Bootstrap trimming: `src/agents/pi-embedded-helpers.ts`

View File

@@ -3,40 +3,34 @@ summary: "TypeBox schemas as the single source of truth for the gateway protocol
read_when:
- Updating protocol schemas or codegen
---
# TypeBox as Protocol Source of Truth
# TypeBox as protocol source of truth
Last updated: 2025-12-09
Last updated: 2026-01-08
We use TypeBox schemas in [`src/gateway/protocol/schema.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/protocol/schema.ts) as the single source of truth for the Gateway control plane (connect/req/res/event frames and payloads). All derived artifacts should be generated from these schemas, not edited by hand.
TypeBox schemas define the Gateway control plane (connect/req/res/event frames and
payloads). All generated artifacts must come from these schemas.
## Current pipeline
- **TypeBox → JSON Schema**: `pnpm protocol:gen` writes [`dist/protocol.schema.json`](https://github.com/clawdbot/clawdbot/blob/main/dist/protocol.schema.json) (draft-07) and runs AJV in the server tests.
- **TypeBox → Swift**: `pnpm protocol:gen:swift` generates [`apps/macos/Sources/ClawdbotProtocol/GatewayModels.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/ClawdbotProtocol/GatewayModels.swift).
- `pnpm protocol:gen`
- writes the JSON Schema output (draft07)
- `pnpm protocol:gen:swift`
- generates Swift gateway models
- `pnpm protocol:check`
- runs both generators and verifies the output is committed
## Problem
## Swift codegen behavior
- We want strong typing in Swift, including a sealed `GatewayFrame` enum with a discriminator and a forward-compatible `unknown` case.
The Swift generator emits:
## Preferred plan (next step)
- `GatewayFrame` enum with `req`, `res`, `event`, and `unknown` cases
- Strongly typed payload structs/enums
- `ErrorCode` values and `GATEWAY_PROTOCOL_VERSION`
- Add a small, custom Swift generator driven directly by the TypeBox schemas:
- Emit a sealed `enum GatewayFrame: Codable { case req(RequestFrame), res(ResponseFrame), event(EventFrame) }`.
- Emit strongly typed payload structs/enums (`ConnectParams`, `HelloOk`, `RequestFrame`, `ResponseFrame`, `EventFrame`, `PresenceEntry`, `Snapshot`, `StateVersion`, `ErrorShape`, `AgentEvent`, `TickEvent`, `ShutdownEvent`, `SendParams`, `AgentParams`, `ErrorCode`, `PROTOCOL_VERSION`).
- Custom `init(from:)` / `encode(to:)` enforces the `type` discriminator and can include an `unknown` case for forward compatibility.
- Wire a new script (e.g., `pnpm protocol:gen:swift`) into `protocol:check` so CI fails if the generated Swift is stale.
Unknown frame types are preserved as raw payloads for forward compatibility.
Why this path:
- Single source of truth stays TypeBox; no new IDL to maintain.
- Predictable, strongly typed Swift (no optional soup).
- Small deterministic codegen (~150200 LOC script) we control.
## When you change schemas
## Alternative (if we want off-the-shelf codegen)
- Wrap the existing JSON Schema into an OpenAPI 3.1 doc (auto-generated) and use **swift-openapi-generator** or **openapi-generator swift5**. More moving parts, but also yields enums with discriminator support. Keep this as a fallback if we dont want a custom emitter.
## Action items
- Implement `protocol:gen:swift` that reads the TypeBox schemas and emits the sealed Swift enum + payload structs.
- Update `protocol:check` to include the Swift generator output in the diff check.
- Remove quicktype output once the custom generator is in place (or keep it for docs only).
1) Update the TypeBox schemas.
2) Run `pnpm protocol:check`.
3) Commit the regenerated schema + Swift models.

View File

@@ -8,11 +8,11 @@ read_when: "Changing onboarding wizard steps or config schema endpoints"
Purpose: shared onboarding + config surfaces across CLI, macOS app, and Web UI.
## Components
- Wizard engine: `src/wizard` (session + prompts + onboarding state).
- CLI: [`src/commands/onboard-*.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/commands/onboard-*.ts) uses the wizard with the CLI prompter.
- Gateway RPC: wizard + config schema endpoints serve UI clients.
- macOS: SwiftUI onboarding uses the wizard step model.
- Web UI: config form renders from JSON Schema + hints.
- Wizard engine (shared session + prompts + onboarding state).
- CLI onboarding uses the same wizard flow as the UI clients.
- Gateway RPC exposes wizard + config schema endpoints.
- macOS onboarding uses the wizard step model.
- Web UI renders config forms from JSON Schema + UI hints.
## Gateway RPC
- `wizard.start` params: `{ mode?: "local"|"remote", workspace?: string }`

View File

@@ -28,44 +28,29 @@ Recent gateway logs show repeated `cron.add` failures with invalid parameters (m
- Agent cron tool schema allows arbitrary `job` objects, enabling malformed inputs.
- Gateway strictly validates `cron.add` with no normalization, so wrapped payloads fail.
## Proposed Approach
1. **Normalize** incoming `cron.add` payloads (unwrap `data`/`job`, infer `schedule.kind` and `payload.kind`, default `wakeMode` + `sessionTarget` when safe).
2. **Harden** the agent cron tool schema using the canonical gateway `CronAddParamsSchema` and normalize before sending to the gateway.
3. **Align** provider enums and cron status fields across gateway schema, TS types, CLI descriptions, and UI form controls.
4. **Test** normalization in gateway tests and tool behavior in agent tests.
## What changed
## Multi-phase Execution Plan
- `cron.add` and `cron.update` now normalize common wrapper shapes and infer missing `kind` fields.
- Agent cron tool schema matches the gateway schema, which reduces invalid payloads.
- Provider enums are aligned across gateway, CLI, UI, and macOS picker.
- Control UI uses the gateways `jobs` count field for status.
### Phase 1 — Schema + type alignment
- [x] Expand gateway `CronPayloadSchema` provider enum to include `signal` and `imessage`.
- [x] Update CLI `--provider` descriptions to include `slack` (already supported by gateway).
- [x] Update UI Cron payload/provider union types to include all supported providers.
- [x] Fix UI CronStatus type to match gateway (`jobs` instead of `jobCount`).
- [x] Update cron UI provider select to include Discord/Slack/Signal/iMessage.
- [x] Update macOS CronJobEditor provider picker + enum to include Slack/Signal/iMessage.
- [x] Document cron compatibility normalization policy in [`docs/cron-jobs.md`](/automation/cron-jobs).
## Current behavior
### Phase 2 — Input normalization + tooling hardening
- [x] Add shared cron input normalization helpers (`normalizeCronJobCreate`/`normalizeCronJobPatch`).
- [x] Apply normalization in gateway `cron.add` (and patch normalization in `cron.update`).
- [x] Tighten agent cron tool schema to `CronAddParamsSchema` and normalize job/patch before sending.
- **Normalization:** wrapped `data`/`job` payloads are unwrapped; `schedule.kind` and `payload.kind` are inferred when safe.
- **Defaults:** safe defaults are applied for `wakeMode` and `sessionTarget` when missing.
- **Providers:** Discord/Slack/Signal/iMessage are now consistently surfaced across CLI/UI.
### Phase 3 — Tests
- [x] Add gateway test covering wrapped `cron.add` payload normalization.
- [x] Add cron tool test to assert normalization and defaulting for `cron.add`.
- [x] Add gateway test covering `cron.update` normalization.
- [x] Add UI + Swift conformance test for cron channels + status fields.
See [`docs/cron-jobs.md`](/automation/cron-jobs) for the normalized shape and examples.
### Phase 4 — Verification
- [x] Run tests (full suite executed via `pnpm test -- cron-tool`).
## Verification
## Rollout/Monitoring
- Watch gateway logs for reduced `cron.add` INVALID_REQUEST errors.
- Confirm Control UI cron status shows job count after refresh.
- If errors persist, extend normalization for additional common shapes (e.g., `schedule.at`, `payload.message` without `kind`).
## Optional Follow-ups
- Manual Control UI smoke: add cron job per provider + verify status job count.
- Manual Control UI smoke: add a cron job per provider + verify status job count.
## Open Questions
- Should `cron.add` accept explicit `state` from clients (currently disallowed by schema)?

View File

@@ -1,126 +1,38 @@
---
summary: "Spec: groupPolicy hardening for Telegram allowlist parity"
summary: "Telegram allowlist hardening: prefix + whitespace normalization"
read_when:
- Reviewing historical Telegram allowlist normalization changes
- Reviewing historical Telegram allowlist changes
---
# Engineering Execution Spec: groupPolicy Hardening (Telegram Allowlist Parity)
# Telegram Allowlist Hardening
**Date**: 2026-01-05
**Status**: Complete
**PR**: #216 (feat/whatsapp-group-policy)
**PR**: #216
---
## Summary
## Executive Summary
Telegram allowlists now accept `telegram:` and `tg:` prefixes case-insensitively, and tolerate
accidental whitespace. This aligns inbound allowlist checks with outbound send normalization.
Follow-up hardening work ensures Telegram allowlists behave consistently across inbound group/DM filtering and outbound send normalization. The focus is on prefix parity (`telegram:` / `tg:`), case-insensitive matching for prefixes, and resilience to accidental whitespace in config entries. Documentation and tests were updated to reflect and lock in this behavior.
## What changed
---
- Prefixes `telegram:` and `tg:` are treated the same (case-insensitive).
- Allowlist entries are trimmed; empty entries are ignored.
## Findings Analysis
## Examples
### [MED] F1: Telegram Allowlist Prefix Handling Is Case-Sensitive and Excludes `tg:`
All of these are accepted for the same ID:
**Location**: [`src/telegram/bot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.ts)
- `telegram:123456`
- `TG:123456`
- ` tg:123456 `
**Problem**: Inbound allowlist normalization only stripped a lowercase `telegram:` prefix. This rejected `TG:123` / `Telegram:123` and did not accept the `tg:` shorthand even though outbound send normalization already accepts `tg:` and case-insensitive prefixes.
## Why it matters
**Impact**:
- DMs and group allowlists fail when users copy/paste prefixed IDs from logs or existing send format.
- Behavior is inconsistent between inbound filtering and outbound send normalization.
Copy/paste from logs or chat IDs often includes prefixes and whitespace. Normalizing avoids
false negatives when deciding whether to respond in DMs or groups.
**Fix**: Normalize allowlist entries by trimming whitespace and stripping `telegram:` / `tg:` prefixes case-insensitively at pre-compute time.
## Related docs
---
### [LOW] F2: Allowlist Entries Are Not Trimmed
**Location**: [`src/telegram/bot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.ts)
**Problem**: Allowlist entries are not trimmed; accidental whitespace causes mismatches.
**Fix**: Trim and drop empty entries while normalizing allowlist inputs.
---
## Implementation Phases
### Phase 1: Normalize Telegram Allowlist Inputs
**File**: [`src/telegram/bot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.ts)
**Changes**:
1. Trim allowlist entries and drop empty values.
2. Strip `telegram:` / `tg:` prefixes case-insensitively.
3. Simplify DM allowlist check to rely on normalized values.
---
### Phase 2: Add Coverage for Prefix + Whitespace
**File**: [`src/telegram/bot.test.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.test.ts)
**Add Tests**:
- DM allowlist accepts `TG:` prefix with surrounding whitespace.
- Group allowlist accepts `TG:` prefix case-insensitively.
---
### Phase 3: Documentation Updates
**Files**:
- [`docs/groups.md`](/concepts/groups)
- [`docs/telegram.md`](/providers/telegram)
**Changes**:
- Document `tg:` alias and case-insensitive prefixes for Telegram allowlists.
---
### Phase 4: Verification
1. Run targeted Telegram tests (`pnpm test -- src/telegram/bot.test.ts`).
2. If time allows, run full suite (`pnpm test`).
---
## Files Modified
| File | Change Type | Description |
|------|-------------|-------------|
| [`src/telegram/bot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.ts) | Fix | Trim allowlist values; strip `telegram:` / `tg:` prefixes case-insensitively |
| [`src/telegram/bot.test.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.test.ts) | Test | Add DM + group allowlist coverage for `TG:` prefix + whitespace |
| [`docs/groups.md`](/concepts/groups) | Docs | Mention `tg:` alias + case-insensitive prefixes |
| [`docs/telegram.md`](/providers/telegram) | Docs | Mention `tg:` alias + case-insensitive prefixes |
---
## Success Criteria
- [x] Telegram allowlist accepts `telegram:` / `tg:` prefixes case-insensitively.
- [x] Telegram allowlist tolerates whitespace in config entries.
- [x] DM and group allowlist tests cover prefixed cases.
- [x] Docs updated to reflect allowlist formats.
- [x] Targeted tests pass.
- [x] Full test suite passes.
---
## Risk Assessment
| Risk | Severity | Mitigation |
|------|----------|------------|
| Behavior change for malformed entries | Low | Normalization is additive and trims only whitespace |
| Test fragility | Low | Isolated unit tests; no external dependencies |
| Doc drift | Low | Updated docs alongside code |
---
## Estimated Complexity
- **Phase 1**: Low (normalization helpers)
- **Phase 2**: Low (2 new tests)
- **Phase 3**: Low (doc edits)
- **Phase 4**: Low (verification)
**Total**: ~20 minutes
- [Group Chats](/concepts/groups)
- [Telegram Provider](/providers/telegram)

View File

@@ -1,147 +1,32 @@
---
summary: "Proposal: model config, auth profiles, and fallback behavior"
summary: "Exploration: model config, auth profiles, and fallback behavior"
read_when:
- Designing model selection, auth profiles, or fallback behavior
- Migrating model config schema
- Exploring future model selection + auth profile ideas
---
# Model Config (Exploration)
# Model config proposal
This document captures **ideas** for future model configuration. It is not a
shipping spec. For current behavior, see:
- [Models](/concepts/models)
- [Model failover](/concepts/model-failover)
- [OAuth + profiles](/concepts/oauth)
Goals
- Multi OAuth + multi API key per provider
- Model selection via `/model` with sensible fallback
- Global (not per-session) fallback logic
- Keep last-known-good auth profile when switching models
- Profile override only when explicitly requested
- Image routing override only when explicitly configured
## Motivation
Non-goals (v1)
- Auto-discovery of provider capabilities beyond catalog input tags
- Per-model auth profile order (see open questions)
Operators want:
- Multiple auth profiles per provider (personal vs work).
- Simple `/model` selection with predictable fallbacks.
- Clear separation between text models and image-capable models.
## Proposed config shape
## Possible direction (high level)
```json
{
"auth": {
"profiles": {
"anthropic:default": {
"provider": "anthropic",
"mode": "oauth"
},
"anthropic:work": {
"provider": "anthropic",
"mode": "api_key"
},
"openai:default": {
"provider": "openai",
"mode": "oauth"
}
},
"order": {
"anthropic": ["anthropic:default", "anthropic:work"],
"openai": ["openai:default"]
}
},
"agent": {
"models": {
"anthropic/claude-opus-4-5": {
"alias": "Opus"
},
"openai/gpt-5.2": {
"alias": "gpt52"
}
},
"model": {
"primary": "anthropic/claude-opus-4-5",
"fallbacks": ["openai/gpt-5.2"]
},
"imageModel": {
"primary": "openai/gpt-5.2",
"fallbacks": ["anthropic/claude-opus-4-5"]
}
}
}
```
- Keep model selection simple: `provider/model` with optional aliases.
- Let providers have multiple auth profiles, with an explicit order.
- Use a global fallback list so all sessions fail over consistently.
- Only override image routing when explicitly configured.
Notes
- Canonical model keys are full `provider/model`.
- `alias` optional; used by `/model` resolution.
- `auth.profiles` is keyed. Default CLI login creates `provider:default`.
- `auth.order[provider]` controls rotation order for that provider.
## Open questions
## CLI / UX
Login
- `clawdbot login anthropic` → create/replace `anthropic:default`.
- `clawdbot login anthropic --profile work` → create/replace `anthropic:work`.
Model selection
- `/model Opus` → resolve alias to full id.
- `/model anthropic/claude-opus-4-5` → explicit.
- Optional: `/model Opus@anthropic:work` (explicit profile override for session only).
Model listing
- `/model` list shows:
- model id
- alias
- provider
- auth order (from `auth.order`)
- auth source for the current provider (auth-profiles.json/env/shell env/models.json)
## Fallback behavior (global)
Fallback list
- Use `agent.model.fallbacks` globally.
- No per-session fallback list; last-known-good is per-session but uses global ordering.
Auth profile rotation
- If provider auth error (401/403/invalid refresh):
- advance to next profile in `auth.order[provider]`.
- if all fail, fall back to next model.
Rate limiting
- If rate limit / quota error:
- rotate auth profile first (same provider)
- if still failing, fall back to next model.
Model not found / capability mismatch
- immediate model fallback.
## Image routing
Rule
- Only use `agent.imageModel` when explicitly configured.
- If `agent.imageModel` is configured and the current text model lacks image input, use it.
Support detection
- From model catalog `input` tags when available (e.g. `image` in models.json).
- If unknown: treat as text-only and use `agent.imageModel`.
## Migration (doctor + gateway auto-run)
Inputs
- Legacy keys (pre-migration):
- `agent.model` (string)
- `agent.modelFallbacks` (string[])
- `agent.imageModel` (string)
- `agent.imageModelFallbacks` (string[])
- `agent.allowedModels` (string[])
- `agent.modelAliases` (record)
Outputs
- `agent.models` map with keys for all referenced models
- `agent.model.primary/fallbacks`
- `agent.imageModel.primary/fallbacks`
- Auth profile store seeded from current auth-profiles.json/auth.json + oauth.json + env (as `provider:default`)
- `auth.order` seeded with `["provider:default"]` when config is updated
Auto-run
- Gateway start detects legacy keys and runs doctor migration.
## Decisions
- Auth order is per-provider (`auth.order`).
- Doctor migration is required; gateway will auto-run on startup when legacy keys detected.
- `/model Opus@profile` is explicit session override only.
- Image routing override only when `agent.imageModel` is explicitly configured.
- Should profile rotation be per-provider or per-model?
- How should the UI surface profile selection for a session?
- What is the safest migration path from legacy config keys?

View File

@@ -1,12 +1,12 @@
---
summary: "Proposal + research notes: offline memory system for Clawd workspaces (Markdown source-of-truth + derived index)"
summary: "Research notes: offline memory system for Clawd workspaces (Markdown source-of-truth + derived index)"
read_when:
- Designing workspace memory (~/clawd) beyond daily Markdown logs
- Deciding: standalone CLI vs deep Clawdbot integration
- Adding offline recall + reflection (retain/recall/reflect)
---
# Workspace Memory v2 (offline): proposal + research
# Workspace Memory v2 (offline): research notes
Target: Clawd-style workspace (`agent.workspace`, default `~/clawd`) where “memory” is stored as one Markdown file per day (`memory/YYYY-MM-DD.md`) plus a small set of stable files (e.g. `memory.md`, `SOUL.md`).
@@ -171,8 +171,7 @@ Recommendation: **deep integration in Clawdbot**, but keep a separable core libr
- reuse from other contexts (local scripts, future desktop app, etc.)
Shape:
- `src/memory/*` (library-ish core; pure functions + sqlite adapter)
- [`src/commands/memory/*.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/commands/memory/*.ts) (CLI glue)
The memory tooling is intended to be a small CLI + library layer, but this is exploratory only.
## “S-Collide” / SuCo: when to use it (research)
@@ -196,29 +195,13 @@ Open question:
- whats the **best** offline embedding model for “personal assistant memory” on your machines (MacBook + Castle)?
- if you already have Ollama: embed with a local model; otherwise ship a small embedding model in the toolchain.
## Implementation plan (phased, shippable)
## Smallest useful pilot
### Phase 0: workspace conventions (no code)
- add `bank/` files + entity pages
- add `## Retain` convention to daily logs
If you want a minimal, still-useful version:
### Phase 1: `clawdbot memory index|recall` (FTS-only)
- parse Markdown (`memory/*.md`, `bank/*.md`) into chunks
- write to SQLite: `facts`, `entities`, `fact_entities`, `opinions`
- FTS5 table over `facts.content`
- `recall` returns citations (path + line) + trimmed content budget
### Phase 2: entity summaries + opinion tracking
- `reflect` updates `bank/entities/*.md`
- opinion confidence updates with evidence pointers (no embeddings required yet)
### Phase 3: semantic recall (offline embeddings)
- compute embeddings during indexing (incremental)
- retrieval = `hybrid(FTS, vector)` with simple fusion
### Phase 4: “graph-ish” traversal (still simple)
- entity links enable multi-hop: “related to Peter via warelay”
- optional: “topic” nodes, lightweight edges (not a full KG)
- Add `bank/` entity pages and a `## Retain` section in daily logs.
- Use SQLite FTS for recall with citations (path + line numbers).
- Add embeddings only if recall quality or scale demands it.
## References

View File

@@ -6,24 +6,29 @@ read_when:
---
# Bonjour / mDNS discovery
Clawdbot uses Bonjour (mDNS / DNS-SD) as a **LAN-only convenience** to discover a running Gateway bridge transport. It is best-effort and does **not** replace SSH or Tailnet-based connectivity.
Clawdbot uses Bonjour (mDNS / DNSSD) as a **LANonly convenience** to discover
an active Gateway bridge. It is besteffort and does **not** replace SSH or
Tailnet-based connectivity.
## Wide-Area Bonjour (Unicast DNS-SD) over Tailscale
## Widearea Bonjour (Unicast DNSSD) over Tailscale
If you want iOS node auto-discovery while the Gateway is on another network (e.g. Vienna ⇄ London), you can keep the `NWBrowser` UX but switch discovery from multicast mDNS (`local.`) to **unicast DNS-SD** (“Wide-Area Bonjour”) over Tailscale.
If the node and gateway are on different networks, multicast mDNS wont cross the
boundary. You can keep the same discovery UX by switching to **unicast DNSSD**
("WideArea Bonjour") over Tailscale.
High level:
Highlevel steps:
1) Run a DNS server on the gateway host (reachable via tailnet IP).
2) Publish DNS-SD records for `_clawdbot-bridge._tcp` in a dedicated zone (example: `clawdbot.internal.`).
3) Configure Tailscale **split DNS** so `clawdbot.internal` resolves via that DNS server for clients (including iOS).
1) Run a DNS server on the gateway host (reachable over Tailnet).
2) Publish DNSSD records for `_clawdbot-bridge._tcp` under a dedicated zone
(example: `clawdbot.internal.`).
3) Configure Tailscale **split DNS** so `clawdbot.internal` resolves via that
DNS server for clients (including iOS).
Clawdbot standardizes on the discovery domain `clawdbot.internal.` for this mode. iOS/Android nodes browse both `local.` and `clawdbot.internal.` automatically (no per-device knob).
Clawdbot standardizes on `clawdbot.internal.` for this mode. iOS/Android nodes
browse both `local.` and `clawdbot.internal.` automatically.
### Gateway config (recommended)
On the gateway host (the machine running the Gateway bridge), add to `~/.clawdbot/clawdbot.json` (JSON5):
```json5
{
bridge: { bind: "tailnet" }, // tailnet-only (recommended)
@@ -31,21 +36,17 @@ On the gateway host (the machine running the Gateway bridge), add to `~/.clawdbo
}
```
### One-time DNS server setup (gateway host)
On the gateway host (macOS), run:
### Onetime DNS server setup (gateway host)
```bash
clawdbot dns setup --apply
```
This installs CoreDNS and configures it to:
- listen on port 53 **only** on the gateways Tailscale interface IPs
- serve the zone `clawdbot.internal.` from the gateway-owned zone file `~/.clawdbot/dns/clawdbot.internal.db`
- listen on port 53 only on the gateways Tailscale interfaces
- serve `clawdbot.internal.` from `~/.clawdbot/dns/clawdbot.internal.db`
The Gateway writes/updates that zone file when `discovery.wideArea.enabled` is true.
Validate from any tailnet-connected machine:
Validate from a tailnetconnected machine:
```bash
dns-sd -B _clawdbot-bridge._tcp clawdbot.internal.
@@ -59,99 +60,102 @@ In the Tailscale admin console:
- Add a nameserver pointing at the gateways tailnet IP (UDP/TCP 53).
- Add split DNS so the domain `clawdbot.internal` uses that nameserver.
Once clients accept tailnet DNS, iOS nodes can browse `_clawdbot-bridge._tcp` in `clawdbot.internal.` without multicast.
Wide-area beacons also include `tailnetDns` (when available) so the macOS app can auto-fill SSH targets off-LAN.
Once clients accept tailnet DNS, iOS nodes can browse
`_clawdbot-bridge._tcp` in `clawdbot.internal.` without multicast.
### Bridge listener security (recommended)
The bridge port (default `18790`) is a plain TCP service. By default it binds to `0.0.0.0`, which makes it reachable from *any* interface on the gateway machine (LAN/WiFi/Tailscale).
For a tailnet-only setup, bind it to the Tailscale IP instead:
The bridge port (default `18790`) is a plain TCP service. By default it binds to
`0.0.0.0`, which makes it reachable from any interface on the gateway host.
For tailnetonly setups:
- Set `bridge.bind: "tailnet"` in `~/.clawdbot/clawdbot.json`.
- Restart the Gateway (or restart the macOS menubar app via [`./scripts/restart-mac.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/restart-mac.sh) on that machine).
This keeps the bridge reachable only from devices on your tailnet (while still listening on loopback for local/SSH port-forwards).
- Restart the Gateway (or restart the macOS menubar app).
## What advertises
Only the **Node Gateway** (`clawd` / `clawdbot gateway`) advertises Bonjour beacons.
- Implementation: [`src/infra/bonjour.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/bonjour.ts)
- Gateway wiring: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts)
Only the Gateway (when the **bridge is enabled**) advertises `_clawdbot-bridge._tcp`.
## Service types
- `_clawdbot-bridge._tcp` — bridge transport beacon (used by macOS/iOS/Android nodes).
## TXT keys (non-secret hints)
## TXT keys (nonsecret hints)
The Gateway advertises small non-secret hints to make UI flows convenient:
The Gateway advertises small nonsecret hints to make UI flows convenient:
- `role=gateway`
- `displayName=<friendly name>`
- `lanHost=<hostname>.local`
- `sshPort=<port>` (defaults to 22 when not overridden)
- `gatewayPort=<port>` (informational; the Gateway WS is typically loopback-only)
- `gatewayPort=<port>` (informational; Gateway WS is usually loopbackonly)
- `bridgePort=<port>` (only when bridge is enabled)
- `canvasPort=<port>` (only when the canvas host is enabled + reachable; default `18793`; serves `/__clawdbot__/canvas/`)
- `cliPath=<path>` (optional; absolute path to a runnable `clawdbot` entrypoint or binary)
- `tailnetDns=<magicdns>` (optional hint; auto-detected from Tailscale when available; may be absent)
- `canvasPort=<port>` (only when the canvas host is enabled; default `18793`)
- `sshPort=<port>` (defaults to 22 when not overridden)
- `transport=bridge`
- `cliPath=<path>` (optional; absolute path to a runnable `clawdbot` entrypoint)
- `tailnetDns=<magicdns>` (optional hint when Tailnet is available)
## Debugging on macOS
Useful built-in tools:
Useful builtin tools:
- Browse instances:
- `dns-sd -B _clawdbot-bridge._tcp local.`
```bash
dns-sd -B _clawdbot-bridge._tcp local.
```
- Resolve one instance (replace `<instance>`):
- `dns-sd -L "<instance>" _clawdbot-bridge._tcp local.`
```bash
dns-sd -L "<instance>" _clawdbot-bridge._tcp local.
```
If browsing shows instances but resolving fails, youre usually hitting a LAN policy / multicast issue.
If browsing works but resolving fails, youre usually hitting a LAN policy or
mDNS resolver issue.
## Debugging in Gateway logs
The Gateway writes a rolling log file (printed on startup as `gateway log file: ...`).
The Gateway writes a rolling log file (printed on startup as
`gateway log file: ...`). Look for `bonjour:` lines, especially:
Look for `bonjour:` lines, especially:
- `bonjour: advertise failed ...` (probing/announce failure)
- `bonjour: advertise failed ...`
- `bonjour: ... name conflict resolved` / `hostname conflict resolved`
- `bonjour: watchdog detected non-announced service; attempting re-advertise ...` (self-heal attempt after sleep/interface churn)
- `bonjour: watchdog detected non-announced service ...`
## Debugging on iOS node
The iOS node app discovers bridges via `NWBrowser` browsing `_clawdbot-bridge._tcp`.
The iOS node uses `NWBrowser` to discover `_clawdbot-bridge._tcp`.
To capture what the browser is doing:
To capture logs:
- Settings → Bridge → Advanced → **Discovery Debug Logs**
- Settings → Bridge → Advanced → **Discovery Logs** → reproduce → **Copy**
- Settings → Bridge → Advanced → enable **Discovery Debug Logs**
- Settings → Bridge → Advanced → open **Discovery Logs** → reproduce the “Searching…” / “No bridges found” case → **Copy**
The log includes browser state transitions (`ready`, `waiting`, `failed`, `cancelled`) and result-set changes (added/removed counts).
The log includes browser state transitions and resultset changes.
## Common failure modes
- **Bonjour doesnt cross networks**: London/Vienna style setups require Tailnet (MagicDNS/IP) or SSH.
- **Multicast blocked**: some WiFi networks (enterprise/hotels) disable mDNS; expect “no results”.
- **Sleep / interface churn**: macOS may temporarily drop mDNS results when switching networks; retry.
- **Browse works but resolve fails (iOS “NoSuchRecord”)**: make sure the advertiser publishes a valid SRV target hostname.
- Implementation detail: `@homebridge/ciao` defaults `hostname` to the *service instance name* when `hostname` is omitted. If your instance name contains spaces/parentheses, some resolvers can fail to resolve the implied A/AAAA record.
- Fix: set an explicit DNS-safe `hostname` (single label; no `.local`) in [`src/infra/bonjour.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/bonjour.ts).
- **Bonjour doesnt cross networks**: use Tailnet or SSH.
- **Multicast blocked**: some WiFi networks disable mDNS.
- **Sleep / interface churn**: macOS may temporarily drop mDNS results; retry.
- **Browse works but resolve fails**: keep machine names simple (avoid emojis or
punctuation), then restart the Gateway. The bridge instance name derives from
the host name, so overly complex names can confuse some resolvers.
## Escaped instance names (`\\032`)
Bonjour/DNS-SD often escapes bytes in service instance names as decimal `\\DDD` sequences (e.g. spaces become `\\032`).
## Escaped instance names (`\032`)
Bonjour/DNSSD often escapes bytes in service instance names as decimal `\DDD`
sequences (e.g. spaces become `\032`).
- This is normal at the protocol level.
- UIs should decode for display (iOS uses `BonjourEscapes.decode` in `apps/shared/ClawdbotKit`).
- UIs should decode for display (iOS uses `BonjourEscapes.decode`).
## Disabling / configuration
- `CLAWDBOT_DISABLE_BONJOUR=1` disables advertising.
- `CLAWDBOT_BRIDGE_ENABLED=0` disables the bridge listener (and therefore the bridge beacon).
- `bridge.bind` / `bridge.port` in `~/.clawdbot/clawdbot.json` control bridge bind/port (preferred).
- `CLAWDBOT_BRIDGE_HOST` / `CLAWDBOT_BRIDGE_PORT` still work as a back-compat override when `bridge.bind` / `bridge.port` are not set.
- `CLAWDBOT_SSH_PORT` overrides the SSH port advertised in `_clawdbot-bridge._tcp`.
- `CLAWDBOT_TAILNET_DNS` publishes a `tailnetDns` hint (MagicDNS) in `_clawdbot-bridge._tcp`. If unset, the gateway auto-detects Tailscale and publishes the MagicDNS name when possible.
- `CLAWDBOT_BRIDGE_ENABLED=0` disables the bridge listener (and the bridge beacon).
- `bridge.bind` / `bridge.port` in `~/.clawdbot/clawdbot.json` control bridge bind/port.
- `CLAWDBOT_BRIDGE_HOST` / `CLAWDBOT_BRIDGE_PORT` still work as backcompat overrides.
- `CLAWDBOT_SSH_PORT` overrides the SSH port advertised in TXT.
- `CLAWDBOT_TAILNET_DNS` publishes a MagicDNS hint in TXT.
- `CLAWDBOT_CLI_PATH` overrides the advertised CLI path.
## Related docs

View File

@@ -1383,8 +1383,8 @@ Notes:
- `z.ai/*` and `z-ai/*` are accepted aliases and normalize to `zai/*`.
- If `ZAI_API_KEY` is missing, requests to `zai/*` will fail with an auth error at runtime.
- Example error: `No API key found for provider "zai".`
- Z.AIs general API endpoint is `https://api.z.ai/api/paas/v4`. The GLM Coding
Plan uses the dedicated Coding endpoint `https://api.z.ai/api/coding/paas/v4`.
- Z.AIs general API endpoint is `https://api.z.ai/api/paas/v4`. GLM coding
requests use the dedicated Coding endpoint `https://api.z.ai/api/coding/paas/v4`.
The built-in `zai` provider uses the Coding endpoint. If you need the general
endpoint, define a custom provider in `models.providers` with the base URL
override (see the custom providers section above).

View File

@@ -44,7 +44,7 @@ Target direction:
Troubleshooting and beacon details: [`docs/bonjour.md`](/gateway/bonjour).
#### Current implementation
#### Service beacon details
- Service types:
- `_clawdbot-bridge._tcp` (bridge transport beacon)
@@ -98,15 +98,8 @@ The gateway is the source of truth for node/client admission.
- scopes/ACLs (bridge is not a raw proxy to every gateway method)
- rate limits
## Where the code lives (target architecture)
## Responsibilities by component
- Node gateway:
- advertises discovery beacons (Bonjour)
- owns pairing storage + decisions
- runs the bridge listener (direct transport)
- macOS app:
- UI for picking a gateway, showing pairing prompts, and troubleshooting
- SSH tunneling only for the fallback path
- iOS node:
- browses Bonjour (LAN) as a convenience only
- uses direct transport + pairing to connect to the gateway
- **Gateway**: advertises discovery beacons, owns pairing decisions, runs the bridge listener.
- **macOS app**: helps you pick a gateway, shows pairing prompts, and uses SSH only as a fallback.
- **iOS/Android nodes**: browse Bonjour as a convenience and connect via the paired bridge.

View File

@@ -1,47 +1,33 @@
---
summary: "Plan for heartbeat polling messages and notification rules"
summary: "Heartbeat polling messages and notification rules"
read_when:
- Adjusting heartbeat cadence or messaging
---
# Heartbeat (Gateway)
Heartbeat runs periodic agent turns in the **main session** so the model can
surface anything that needs attention without spamming the user.
Heartbeat runs **periodic agent turns** in the main session so the model can
surface anything that needs attention without spamming you.
## Defaults
- Interval: `30m` (set `agent.heartbeat.every` to change, `0m` disables).
- Interval: `30m` (set `agent.heartbeat.every`; use `0m` to disable).
- Prompt body (configurable via `agent.heartbeat.prompt`):
`Read HEARTBEAT.md if exists. Consider outstanding tasks. Checkup sometimes on your human during (user local) day time.`
- Heartbeat prompt text is sent **verbatim** as the user message. Clawdbot does
not append extra body text. The system prompt includes a Heartbeats section
and the run is flagged as a heartbeat internally.
- The heartbeat prompt is sent **verbatim** as the user message. The system
prompt includes a Heartbeat section and the run is flagged internally.
## Prompt contract
- If nothing needs attention, the model should reply `HEARTBEAT_OK`.
- During heartbeat runs, Clawdbot treats `HEARTBEAT_OK` as an ack when it appears at
the **start or end** of the reply. Clawdbot strips the token and discards the
reply if the remaining content is **`ackMaxChars`** (default: 30).
- If `HEARTBEAT_OK` is in the **middle** of a reply, it is not treated specially.
- For alerts, do **not** include `HEARTBEAT_OK`; return only the alert text.
## Response contract
## Prompt overrides
- Overriding `agent.heartbeat.prompt` **replaces** the default body. Nothing is
merged for you.
- If you still want `HEARTBEAT.md` instructions, keep a line like
`Read HEARTBEAT.md if exists` in your custom prompt.
- `HEARTBEAT_OK` handling stays the same; changing the prompt wont break acks.
- If nothing needs attention, reply with **`HEARTBEAT_OK`**.
- During heartbeat runs, Clawdbot treats `HEARTBEAT_OK` as an ack when it appears
at the **start or end** of the reply. The token is stripped and the reply is
dropped if the remaining content is **`ackMaxChars`** (default: 30).
- If `HEARTBEAT_OK` appears in the **middle** of a reply, it is not treated
specially.
- For alerts, **do not** include `HEARTBEAT_OK`; return only the alert text.
### Stray `HEARTBEAT_OK` outside heartbeats
If the model accidentally includes `HEARTBEAT_OK` at the start or end of a
normal (non-heartbeat) reply, Clawdbot strips the token and logs a verbose
message. If the reply is only `HEARTBEAT_OK`, it is dropped.
### Outbound normalization (all providers)
For **all providers** (WhatsApp/Web, Telegram, Slack, Discord, Signal, iMessage),
Clawdbot applies the same filtering to tool summaries, streaming block replies,
and final replies:
- drop payloads that are only `HEARTBEAT_OK` with no media
- strip `HEARTBEAT_OK` at the edges when mixed with other text
Outside heartbeats, stray `HEARTBEAT_OK` at the start/end of a message is stripped
and logged; a message that is only `HEARTBEAT_OK` is dropped.
## Config
@@ -51,8 +37,8 @@ and final replies:
heartbeat: {
every: "30m", // default: 30m (0m disables)
model: "anthropic/claude-opus-4-5",
target: "last", // last | whatsapp | telegram | discord | slack | signal | imessage | none
to: "+15551234567", // optional provider-specific override (e.g. E.164 or chat id)
target: "last", // last | whatsapp | telegram | discord | slack | signal | imessage | none
to: "+15551234567", // optional provider-specific override
prompt: "Read HEARTBEAT.md if exists. Consider outstanding tasks. Checkup sometimes on your human during (user local) day time.",
ackMaxChars: 30 // max chars allowed after HEARTBEAT_OK
}
@@ -60,47 +46,45 @@ and final replies:
}
```
### Fields
- `every`: heartbeat interval (duration string; default unit minutes). Default:
`30m`. Set to `0m` to disable.
- `model`: optional model override for heartbeat runs (`provider/model`).
- `target`: where heartbeat output is delivered.
- `last` (default): send to the last used external provider.
- `whatsapp` / `telegram` / `discord` / `slack` / `signal` / `imessage`: force the provider (optionally set `to`).
- `none`: do not deliver externally; output stays in the session (WebChat-visible).
- `to`: optional recipient override (E.164 for WhatsApp, chat id for Telegram).
- `prompt`: optional override for the heartbeat body (default shown above). Safe to
change; heartbeat acks are still keyed off `HEARTBEAT_OK`.
- `ackMaxChars`: max chars allowed after `HEARTBEAT_OK` before delivery (default: 30).
### Field notes
## Cost awareness
Heartbeats run full agent turns. Shorter intervals burn more tokens. Be
intentional about `every`, keep `HEARTBEAT.md` tiny, and consider a cheaper
`model` or `target: "none"` if you only want internal state updates.
- `every`: heartbeat interval (duration string; default unit = minutes).
- `model`: optional model override for heartbeat runs (`provider/model`).
- `target`:
- `last` (default): deliver to the last used external provider.
- explicit provider: `whatsapp` / `telegram` / `discord` / `slack` / `signal` / `imessage`.
- `none`: run the heartbeat but **do not deliver** externally.
- `to`: optional recipient override (E.164 for WhatsApp, chat id for Telegram, etc.).
- `prompt`: overrides the default prompt body (not merged).
- `ackMaxChars`: max chars allowed after `HEARTBEAT_OK` before delivery.
## Delivery behavior
- Heartbeats run in the **main session** (`main`, or `global` when scope is global).
- If the main queue is busy, the heartbeat is skipped and retried later.
- If `target` resolves to no external destination, the run still happens but no
outbound message is sent.
- Heartbeat-only replies do **not** keep the session alive; the last `updatedAt`
is restored so idle expiry behaves normally.
## HEARTBEAT.md (optional)
If a `HEARTBEAT.md` file exists in the workspace, the default prompt tells the
agent to read it. Keep it tiny (short checklist or reminders) to avoid prompt
bloat.
## Behavior
- Runs in the main session (`main`, or `global` when scope is global).
- Uses the main lane queue; if requests are in flight, the wake is retried.
- Empty output or `HEARTBEAT_OK` is treated as “ok” and does **not** keep the
session alive (`updatedAt` is restored).
- If `target` resolves to no external destination (no last route or `none`), the
heartbeat still runs but no outbound message is sent.
## Manual wake (on-demand)
## Ideas for use
- Check up on the user (light, respectful pings during daytime).
- Handle mundane tasks (triage inboxes, summarize queues, refresh notes).
- Nudge on open loops or reminders.
- Background monitoring (health checks, status polling, low-priority alerts).
- Scheduled routines (use [Cron jobs](/automation/cron-jobs) when you
need exact schedules or isolated runs).
You can enqueue a system event and trigger an immediate heartbeat with:
## Wake hook
- The gateway exposes a heartbeat wake hook so cron/jobs/webhooks can request an
immediate run (`requestHeartbeatNow`).
- `wake` endpoints should enqueue system events and optionally trigger a wake; the
heartbeat runner picks those up on the next tick or immediately.
```bash
clawdbot wake --text "Check for urgent follow-ups" --mode now
```
Use `--mode next-heartbeat` to wait for the next scheduled tick.
## Cost awareness
Heartbeats run full agent turns. Shorter intervals burn more tokens. Keep
`HEARTBEAT.md` small and consider a cheaper `model` or `target: "none"` if you
only want internal state updates.

View File

@@ -127,7 +127,9 @@ See also: [`docs/presence.md`](/concepts/presence) for how presence is produced/
## Typing and validation
- Server validates every inbound frame with AJV against JSON Schema emitted from the protocol definitions.
- Clients (TS/Swift) consume generated types (TS directly; Swift via the repos generator).
- Types live in [`src/gateway/protocol/*.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/protocol/*.ts); regenerate schemas/models with `pnpm protocol:gen` (writes [`dist/protocol.schema.json`](https://github.com/clawdbot/clawdbot/blob/main/dist/protocol.schema.json)) and `pnpm protocol:gen:swift` (writes [`apps/macos/Sources/ClawdbotProtocol/GatewayModels.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/ClawdbotProtocol/GatewayModels.swift)).
- Protocol definitions are the source of truth; regenerate schema/models with:
- `pnpm protocol:gen`
- `pnpm protocol:gen:swift`
## Connection snapshot
- `hello-ok` includes a `snapshot` with `presence`, `health`, `stateVersion`, and `uptimeMs` plus `policy {maxPayload,maxBufferedBytes,tickIntervalMs}` so clients can render immediately without extra requests.

View File

@@ -10,12 +10,10 @@ read_when:
Clawdbot has two log “surfaces”:
- **Console output** (what you see in the terminal / Debug UI).
- **File logs** (JSON lines) written by the internal logger.
- **File logs** (JSON lines) written by the gateway logger.
## File-based logger
Clawdbot uses a file logger backed by `tslog` ([`src/logging.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/logging.ts)).
- Default rolling log file is under `/tmp/clawdbot/` (one file per day): `clawdbot-YYYY-MM-DD.log`
- The log file path and level can be configured via `~/.clawdbot/clawdbot.json`:
- `logging.file`
@@ -40,9 +38,8 @@ clawdbot logs --follow
## Console capture
The CLI entrypoint enables console capture ([`src/index.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/index.ts) calls `enableConsoleCapture()`).
That means every `console.log/info/warn/error/debug/trace` is also written into the file logs,
while still behaving normally on stdout/stderr.
The CLI captures `console.log/info/warn/error/debug/trace` and writes them to file logs,
while still printing to stdout/stderr.
You can tune console verbosity independently via:
@@ -94,13 +91,8 @@ clawdbot gateway --verbose --ws-log full
## Console formatting (subsystem logging)
Clawdbot formats console logs via a small wrapper on top of the existing stack:
- **tslog** for structured file logs ([`src/logging.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/logging.ts))
- **chalk** for colors ([`src/globals.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/globals.ts))
The console formatter is **TTY-aware** and prints consistent, prefixed lines.
Subsystem loggers are created via `createSubsystemLogger("gateway")`.
Subsystem loggers keep output grouped and scannable.
Behavior:

View File

@@ -7,103 +7,83 @@ read_when:
---
# Gateway-owned pairing (Option B)
Goal: The Gateway (`clawd`) is the **source of truth** for which nodes are allowed to join the network.
This enables:
- Headless approval via terminal/CLI (no Swift UI required).
- Optional macOS UI approval (Swift app is just a frontend).
- One consistent membership store for iOS, mac nodes, future hardware nodes.
In Gateway-owned pairing, the **Gateway** is the source of truth for which nodes
are allowed to join. UIs (macOS app, future clients) are just frontends that
approve or reject pending requests.
## Concepts
- **Pending request**: a node asked to join; requires explicit approve/reject.
- **Paired node**: node is allowed; gateway returns an auth token for subsequent connects.
- **Bridge**: direct transport endpoint owned by the gateway. The bridge does not decide membership.
- **Pending request**: a node asked to join; requires approval.
- **Paired node**: approved node with an issued auth token.
- **Bridge**: transport endpoint only; it forwards requests but does not decide
membership.
## How pairing works
1. A node connects to the bridge and requests pairing.
2. The Gateway stores a **pending request** and emits `node.pair.requested`.
3. You approve or reject the request (CLI or UI).
4. On approval, the Gateway issues a **new token** (tokens are rotated on repair).
5. The node reconnects using the token and is now “paired”.
Pending requests expire automatically after **5 minutes**.
## CLI workflow (headless friendly)
```bash
clawdbot nodes pending
clawdbot nodes approve <requestId>
clawdbot nodes reject <requestId>
clawdbot nodes status
clawdbot nodes rename --node <id|name|ip> --name "Living Room iPad"
```
`nodes status` shows paired/connected nodes and their capabilities.
## API surface (gateway protocol)
These are conceptual method names; wire them into [`src/gateway/protocol/schema.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/protocol/schema.ts) and regenerate Swift types.
### Events
- `node.pair.requested`
- Emitted whenever a new pending pairing request is created.
- Payload:
- `requestId` (string)
- `nodeId` (string)
- `displayName?` (string)
- `platform?` (string)
- `version?` (string)
- `remoteIp?` (string)
- `silent?` (boolean) — hint that the UI may attempt auto-approval
- `ts` (ms since epoch)
- `node.pair.resolved`
- Emitted when a pending request is approved/rejected.
- Payload:
- `requestId` (string)
- `nodeId` (string)
- `decision` ("approved" | "rejected" | "expired")
- `ts` (ms since epoch)
Events:
- `node.pair.requested` — emitted when a new pending request is created.
- `node.pair.resolved` — emitted when a request is approved/rejected/expired.
### Methods
- `node.pair.request`
- Creates (or returns) a pending request.
- Params: node metadata (same shape as `node.pair.requested` payload, minus `requestId`/`ts`).
- Optional `silent` flag hints that the UI can attempt an SSH auto-approve before showing an alert.
- Result:
- `status` ("pending")
- `created` (boolean) — whether this call created the pending request
- `request` (pending request object), including `isRepair` when the node was already paired
- Security: **never returns an existing token**. If a paired node “lost” its token, it must be approved again (token rotation).
- `node.pair.list`
- Returns:
- `pending[]` (pending requests)
- `paired[]` (paired node records)
- `node.pair.approve`
- Params: `{ requestId }`
- Result: `{ requestId, node: { nodeId, token, ... } }`
- Must be idempotent (first decision wins).
- `node.pair.reject`
- Params: `{ requestId }`
- Result: `{ requestId, nodeId }`
- `node.pair.verify`
- Params: `{ nodeId, token }`
- Result: `{ ok: boolean, node?: { nodeId, ... } }`
## CLI flows
CLI must be able to fully operate without any GUI:
- `clawdbot nodes pending`
- `clawdbot nodes approve <requestId>`
- `clawdbot nodes reject <requestId>`
- `clawdbot nodes status` (paired nodes + connection status/capabilities)
Optional interactive helper:
- `clawdbot nodes watch` (subscribe to `node.pair.requested` and prompt in-place)
Implementation pointers:
- CLI commands: [`src/cli/nodes-cli.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/nodes-cli.ts)
- Gateway handlers + events: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) + [`src/gateway/server-methods/nodes.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/nodes.ts)
- Pairing store: [`src/infra/node-pairing.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/node-pairing.ts) (under `~/.clawdbot/nodes/`)
- Optional macOS UI prompt (frontend only): [`apps/macos/Sources/Clawdbot/NodePairingApprovalPrompter.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/NodePairingApprovalPrompter.swift)
- Push-first: listens to `node.pair.requested`/`node.pair.resolved`, does a `node.pair.list` on startup/reconnect,
and only runs a slow safety poll while a request is pending/visible.
## Storage (private, local)
Gateway stores the authoritative state under `~/.clawdbot/`:
- `~/.clawdbot/nodes/paired.json`
- `~/.clawdbot/nodes/pending.json` (or `~/.clawdbot/nodes/pending/*.json`)
Methods:
- `node.pair.request` — create or reuse a pending request.
- `node.pair.list` — list pending + paired nodes.
- `node.pair.approve` — approve a pending request (issues token).
- `node.pair.reject` — reject a pending request.
- `node.pair.verify` — verify `{ nodeId, token }`.
Notes:
- Tokens are secrets. Treat `paired.json` as sensitive.
- Pending entries should have a TTL (e.g. 5 minutes) and expire automatically.
- `node.pair.request` is idempotent per node: repeated calls return the same
pending request.
- Approval **always** generates a fresh token; no token is ever returned from
`node.pair.request`.
- Requests may include `silent: true` as a hint for auto-approval flows.
## Bridge integration
Target direction:
- The gateway runs the bridge listener (LAN/tailnet-facing) and advertises discovery beacons (Bonjour).
- The bridge is transport only; it forwards/scopes requests and enforces ACLs, but pairing decisions are made by the gateway.
## Auto-approval (macOS app)
The macOS UI (Swift) can:
- Subscribe to `node.pair.requested`, show an alert (including `remoteIp`), and call `node.pair.approve` or `node.pair.reject`.
- Or ignore/dismiss (“Later”) and let CLI handle it.
- When `silent` is set, it can try a short SSH probe (same user) and auto-approve if reachable; otherwise fall back to the normal alert.
The macOS app can optionally attempt a **silent approval** when:
- the request is marked `silent`, and
- the app can verify an SSH connection to the gateway host using the same user.
## Implementation note
If the bridge is only provided by the macOS app, then “no Swift app running” cannot work end-to-end.
The long-term goal is to move bridge hosting + Bonjour advertising into the Node gateway so headless pairing works by default.
If silent approval fails, it falls back to the normal “Approve/Reject” prompt.
## Storage (local, private)
Pairing state is stored under the Gateway state directory (default `~/.clawdbot`):
- `~/.clawdbot/nodes/paired.json`
- `~/.clawdbot/nodes/pending.json`
If you override `CLAWDBOT_STATE_DIR`, the `nodes/` folder moves with it.
Security notes:
- Tokens are secrets; treat `paired.json` as sensitive.
- Rotating a token requires re-approval (or deleting the node entry).
## Bridge behavior
- The bridge is **transport only**; it does not store membership.
- If the Gateway is offline or pairing is disabled, nodes cannot pair.
- If the bridge is running but the Gateway is in remote mode, pairing still
happens against the remote Gateways store.

View File

@@ -140,11 +140,3 @@ Nodes may include a `permissions` map in `node.list` / `node.describe`, keyed by
- The macOS menubar app connects to the Gateway bridge as a node (so `clawdbot nodes …` works against this Mac).
- In remote mode, the app opens an SSH tunnel for the bridge port and connects to `localhost`.
## Where to look in code
- CLI wiring: [`src/cli/nodes-cli.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/nodes-cli.ts)
- Canvas snapshot decoding/temp paths: [`src/cli/nodes-canvas.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/nodes-canvas.ts)
- Duration parsing for CLI: [`src/cli/parse-duration.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/parse-duration.ts)
- iOS node commands: [`apps/ios/Sources/Model/NodeAppModel.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/ios/Sources/Model/NodeAppModel.swift)
- Android node commands: `apps/android/app/src/main/java/com/clawdbot/android/node/*`

View File

@@ -76,7 +76,7 @@ Goal: model can request location even when node is backgrounded, but only when:
Push-triggered flow (future):
1) Gateway sends a push to the node (silent push or FCM data).
2) Node wakes briefly and calls `location.get` internally.
2) Node wakes briefly and requests location from the device.
3) Node forwards payload to Gateway.
Notes:

View File

@@ -1,381 +1,105 @@
---
summary: "iOS app (node): architecture + connection runbook"
summary: "iOS node app: connect to the Gateway, pairing, canvas, and troubleshooting"
read_when:
- Pairing or reconnecting the iOS node
- Debugging iOS bridge discovery or auth
- Sending screen/canvas commands to iOS
- Designing iOS node + gateway integration
- Extending the Gateway protocol for node/canvas commands
- Implementing Bonjour pairing or transport security
- Running the iOS app from source
- Debugging bridge discovery or canvas commands
---
# iOS App (Node)
Status: prototype implemented (internal) · Date: 2025-12-13
Availability: internal preview. The iOS app is not publicly distributed yet.
## Support snapshot
- Role: companion node app (iOS does not host the Gateway).
- Gateway required: yes (run it on macOS, Linux, or Windows via WSL2).
- Install: [Getting Started](/start/getting-started) + [Pairing](/gateway/pairing).
- Gateway: [Runbook](/gateway) + [Configuration](/gateway/configuration).
## What it does
## System control
System control (launchd/systemd) lives on the Gateway host. See [Gateway](/gateway).
- Connects to a Gateway over the bridge (LAN or tailnet).
- Exposes node capabilities: Canvas, Screen snapshot, Camera capture, Location, Talk mode, Voice wake.
- Receives `node.invoke` commands and reports node status events.
## Connection Runbook
## Requirements
This is the practical “how do I connect the iOS node” guide:
- Gateway running on another device (macOS, Linux, or Windows via WSL2).
- Bridge enabled (default).
- Network path:
- Same LAN via Bonjour, **or**
- Tailnet via unicast DNS-SD (`clawdbot.internal.`), **or**
- Manual host/port (fallback).
**iOS app** ⇄ (Bonjour + TCP bridge) ⇄ **Gateway bridge** ⇄ (loopback WS) ⇄ **Gateway**
## Quick start (pair + connect)
The Gateway WebSocket stays loopback-only (`ws://127.0.0.1:18789`). The iOS node talks to the LAN-facing **bridge** (default `tcp://0.0.0.0:18790`) and uses Gateway-owned pairing.
### Prerequisites
- You can run the Gateway on the “master” machine.
- iOS node app can reach the gateway bridge:
- Same LAN with Bonjour/mDNS, **or**
- Same Tailscale tailnet using Wide-Area Bonjour / unicast DNS-SD (see below), **or**
- Manual bridge host/port (fallback)
- You can run the CLI (`clawdbot`) on the gateway machine (or via SSH).
### 1) Start the Gateway (with bridge enabled)
Bridge is enabled by default (disable via `CLAWDBOT_BRIDGE_ENABLED=0`).
1) Start the Gateway (bridge enabled by default):
```bash
clawdbot gateway --port 18789 --verbose
clawdbot gateway --port 18789
```
Confirm in logs you see something like:
- `bridge listening on tcp://0.0.0.0:18790 (node)`
2) In the iOS app, open Settings and pick a discovered gateway (or enable Manual Bridge and enter host/port).
For tailnet-only setups (recommended for Vienna ⇄ London), bind the bridge to the gateway machines Tailscale IP instead:
- Set `bridge.bind: "tailnet"` in `~/.clawdbot/clawdbot.json` on the gateway host.
- Restart the Gateway / macOS menubar app.
### 2) Verify Bonjour discovery (optional but recommended)
From the gateway machine:
```bash
dns-sd -B _clawdbot-bridge._tcp local.
```
You should see your gateway advertising `_clawdbot-bridge._tcp`.
If browse works, but the iOS node cant connect, try resolving one instance:
```bash
dns-sd -L "<instance name>" _clawdbot-bridge._tcp local.
```
More debugging notes: [`docs/bonjour.md`](/gateway/bonjour).
#### Tailnet (Vienna ⇄ London) discovery via unicast DNS-SD
If the iOS node and the gateway are on different networks but connected via Tailscale, multicast mDNS wont cross the boundary. Use Wide-Area Bonjour / unicast DNS-SD instead:
1) Set up a DNS-SD zone (example `clawdbot.internal.`) on the gateway host and publish `_clawdbot-bridge._tcp` records.
2) Configure Tailscale split DNS for `clawdbot.internal` pointing at that DNS server.
Details and example CoreDNS config: [`docs/bonjour.md`](/gateway/bonjour).
### 3) Connect from the iOS node app
In the iOS node app:
- Pick the discovered bridge (or hit refresh).
- If not paired yet, it will initiate pairing automatically.
- After the first successful pairing, it will auto-reconnect **strictly to the last discovered gateway** on launch (including after reinstall), as long as the iOS Keychain entry is still present.
#### Connection indicator (always visible)
The Settings tab icon shows a small status dot:
- **Green**: connected to the bridge
- **Yellow**: connecting (subtle pulse)
- **Red**: not connected / error
### 4) Approve pairing (CLI)
On the gateway machine:
3) Approve the pairing request on the gateway host:
```bash
clawdbot nodes pending
```
Approve the request:
```bash
clawdbot nodes approve <requestId>
```
After approval, the iOS node receives/stores the token and reconnects authenticated.
Pairing details: [`docs/gateway/pairing.md`](/gateway/pairing).
### 5) Verify the node is connected
- In the macOS app: **Instances** tab should show something like `iOS Node (...)` with a green “Active” presence dot shortly after connect.
- Via nodes status (paired + connected):
```bash
clawdbot nodes status
```
- Via Gateway (paired + connected):
```bash
clawdbot gateway call node.list --params "{}"
```
- Via Gateway presence (legacy-ish, still useful):
```bash
clawdbot gateway call system-presence --params "{}"
```
Look for the node `instanceId` (often a UUID).
### 6) Drive the iOS Canvas (draw / snapshot)
The iOS node runs a WKWebView “Canvas” scaffold which exposes:
- `window.__clawdbot.canvas`
- `window.__clawdbot.ctx` (2D context)
- `window.__clawdbot.setStatus(title, subtitle)`
#### Gateway Canvas Host (recommended for web content)
If you want the node to show real HTML/CSS/JS that the agent can edit on disk, point it at the Gateway canvas host.
Note: nodes always use the standalone canvas host on `canvasHost.port` (default `18793`), bound to the bridge interface.
1) Create `~/clawd/canvas/index.html` on the gateway host.
2) Navigate the node to it (LAN):
4) Verify connection:
```bash
clawdbot nodes invoke --node "iOS Node" --command canvas.navigate --params '{"url":"http://<gateway-hostname>.local:18793/__clawdbot__/canvas/"}'
clawdbot nodes status
clawdbot gateway call node.list --params "{}"
```
## Discovery paths
### Bonjour (LAN)
The Gateway advertises `_clawdbot-bridge._tcp` on `local.`. The iOS app lists these automatically.
### Tailnet (cross-network)
If mDNS is blocked, use a unicast DNS-SD zone (recommended domain: `clawdbot.internal.`) and Tailscale split DNS.
See [`docs/bonjour.md`](/gateway/bonjour) for the CoreDNS example.
### Manual host/port
In Settings, enable **Manual Bridge** and enter the gateway host + port (default `18790`).
## Canvas + A2UI
The iOS node renders a WKWebView canvas. Use `node.invoke` to drive it:
```bash
clawdbot nodes invoke --node "iOS Node" --command canvas.navigate --params '{"url":"http://<gateway-host>:18793/__clawdbot__/canvas/"}'
```
Notes:
- The server injects a live-reload client into HTML and reloads on file changes.
- A2UI is hosted on the same canvas host at `http://<gateway-host>:18793/__clawdbot__/a2ui/`.
- Tailnet (optional): if both devices are on Tailscale, use a MagicDNS name or tailnet IP instead of `.local`, e.g. `http://<gateway-magicdns>:18793/__clawdbot__/canvas/`.
- iOS may require App Transport Security allowances to load plain `http://` URLs; if it fails to load, prefer HTTPS or adjust the iOS apps ATS config.
- The Gateway canvas host serves `/__clawdbot__/canvas/` and `/__clawdbot__/a2ui/`.
- The iOS node auto-navigates to A2UI on connect when a canvas host URL is advertised.
- Return to the built-in scaffold with `canvas.navigate` and `{"url":""}`.
#### Draw with `canvas.eval`
### Canvas eval / snapshot
```bash
clawdbot nodes invoke --node "iOS Node" --command canvas.eval --params "$(cat <<'JSON'
{"javaScript":"(() => { const {ctx,setStatus} = window.__clawdbot; setStatus('Drawing','…'); ctx.clearRect(0,0,innerWidth,innerHeight); ctx.lineWidth=6; ctx.strokeStyle='#ff2d55'; ctx.beginPath(); ctx.moveTo(40,40); ctx.lineTo(innerWidth-40, innerHeight-40); ctx.stroke(); setStatus(null,null); return 'ok'; })()"}
JSON
)"
clawdbot nodes invoke --node "iOS Node" --command canvas.eval --params '{"javaScript":"(() => { const {ctx} = window.__clawdbot; ctx.clearRect(0,0,innerWidth,innerHeight); ctx.lineWidth=6; ctx.strokeStyle=\"#ff2d55\"; ctx.beginPath(); ctx.moveTo(40,40); ctx.lineTo(innerWidth-40, innerHeight-40); ctx.stroke(); return \"ok\"; })()"}'
```
#### Snapshot with `canvas.snapshot`
```bash
clawdbot nodes invoke --node 192.168.0.88 --command canvas.snapshot --params '{"maxWidth":900}'
clawdbot nodes invoke --node "iOS Node" --command canvas.snapshot --params '{"maxWidth":900,"format":"jpeg"}'
```
The response includes `{ format, base64 }` image data (default `format="jpeg"`; pass `{"format":"png"}` when you specifically need lossless PNG).
## Voice wake + talk mode
### Common gotchas
- Voice wake and talk mode are available in Settings.
- iOS may suspend background audio; treat voice features as best-effort when the app is not active.
- **iOS in background:** all `canvas.*` commands fail fast with `NODE_BACKGROUND_UNAVAILABLE` (bring the iOS node app to foreground).
- **Return to default scaffold:** `canvas.navigate` with `{"url":""}` or `{"url":"/"}` returns to the built-in scaffold page.
- **mDNS blocked:** some networks block multicast; use a different LAN or plan a tailnet-capable bridge (see [`docs/discovery.md`](/gateway/discovery)).
- **Wrong node selector:** `--node` can be the node id (UUID), display name (e.g. `iOS Node`), IP, or an unambiguous prefix. If its ambiguous, the CLI will tell you.
- **Stale pairing / Keychain cleared:** if the pairing token is missing (or iOS Keychain was wiped), the node must pair again; approve a new pending request.
- **App reinstall but no reconnect:** the node restores `instanceId` + last bridge preference from Keychain; if it still comes up “unpaired”, verify Keychain persistence on your device/simulator and re-pair once.
## Common errors
## Design + Architecture
### Goals
- Build an **iOS app** that acts as a **remote node** for Clawdbot:
- **Voice trigger** (wake-word / always-listening intent) that forwards transcripts to the Gateway `agent` method.
- **Canvas** surface that the agent can control: navigate, draw/render, evaluate JS, snapshot.
- **Dead-simple setup**:
- Auto-discover the host on the local network via **Bonjour**.
- One-tap pairing with an approval prompt on the Mac.
- iOS is **never** a local gateway; it is always a remote node.
- Operational clarity:
- When iOS is backgrounded, voice may still run; **canvas commands must fail fast** with a structured error.
- Provide **settings**: node display name, enable/disable voice wake, pairing status.
Non-goals (v1):
- Exposing the Node Gateway directly on the LAN.
- Supporting arbitrary third-party “plugins” on iOS.
- Perfect App Store compliance; this is **internal-only** initially.
### Current repo reality (constraints we respect)
- The Gateway WebSocket server binds to `127.0.0.1:18789` ([`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts)) with an optional `CLAWDBOT_GATEWAY_TOKEN`.
- The Gateway exposes a Canvas file server (`canvasHost`) on `canvasHost.port` (default `18793`), so nodes can `canvas.navigate` to `http://<lanHost>:18793/__clawdbot__/canvas/` and auto-reload on file changes ([`docs/configuration.md`](/gateway/configuration)).
- macOS “Canvas” is controlled via the Gateway node protocol (`canvas.*`), matching iOS/Android ([`docs/mac/canvas.md`](/platforms/mac/canvas)).
- Voice wake forwards via `GatewayChannel` to Gateway `agent` (mac app: `VoiceWakeForwarder` → `GatewayConnection.sendAgent`).
### Recommended topology (B): Gateway-owned Bridge + loopback Gateway
Keep the Node gateway loopback-only; expose a dedicated **gateway-owned bridge** to the LAN/tailnet.
**iOS App** ⇄ (TLS + pairing) ⇄ **Bridge (in gateway)** ⇄ (loopback) ⇄ **Gateway WS** (`ws://127.0.0.1:18789`)
Why:
- Preserves current threat model: Gateway remains local-only.
- Centralizes auth, rate limiting, and allowlisting in the bridge.
- Lets us unify “canvas node” semantics across mac + iOS without exposing raw gateway methods.
### Security plan (internal, but still robust)
#### Transport
- **Current (v0):** bridge is a LAN-facing **TCP** listener with token-based auth after pairing.
- **Next:** wrap the bridge in **TLS** and prefer key-pinned or mTLS-like auth after pairing.
#### Pairing
- Bonjour discovery shows a candidate “Clawdbot Bridge” on the LAN.
- First connection:
1) iOS generates a keypair (Secure Enclave if available).
2) iOS connects to the bridge and requests pairing.
3) The bridge forwards the pairing request to the **Gateway** as a *pending request*.
4) Approval can happen via:
- **macOS UI** (Clawdbot shows an alert with Approve/Reject/Later, including the node IP), or
- **Terminal/CLI** (headless flows).
5) Once approved, the bridge returns a token to iOS; iOS stores it in Keychain.
- Subsequent connections:
- The bridge requires the paired identity. Unpaired clients get a structured “not paired” error and no access.
##### Gateway-owned pairing (Option B details)
Pairing decisions must be owned by the Gateway (`clawd` / Node) so nodes can be approved without the macOS app running.
Key idea:
- The Swift app may still show an alert, but it is only a **frontend** for pending requests stored in the Gateway.
Desired behavior:
- If the Swift UI is present: show alert with Approve/Reject/Later.
- If the Swift UI is not present: `clawdbot` CLI can list pending requests and approve/reject.
See [`docs/gateway/pairing.md`](/gateway/pairing) for the API/events and storage.
CLI (headless approvals):
- `clawdbot nodes pending`
- `clawdbot nodes approve <requestId>`
- `clawdbot nodes reject <requestId>`
#### Authorization / scope control (bridge-side ACL)
The bridge must not be a raw proxy to every gateway method.
- Allow by default:
- `agent` (with guardrails; idempotency required)
- minimal `system-event` beacons (presence updates for the node)
- node/canvas methods defined below (new protocol surface)
- Deny by default:
- anything that widens control without explicit intent (future “shell”, “files”, etc.)
- Rate limit:
- handshake attempts
- voice forwards per minute
- snapshot frequency / payload size
### Protocol unification: add “node/canvas” to Gateway protocol
#### Principle
Unify mac Canvas + iOS Canvas under a single conceptual surface:
- The agent talks to the Gateway using a stable method set (typed protocol).
- The Gateway routes node-targeted requests to:
- local mac Canvas implementation, or
- remote iOS node via the bridge
#### Minimal protocol additions (v1)
Add to [`src/gateway/protocol/schema.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/protocol/schema.ts) (and regenerate Swift models):
**Identity**
- Node identity comes from `connect.params.client.instanceId` (stable), and `connect.params.client.mode = "node"` (or `"ios-node"`).
**Methods**
- `node.list` → list paired/connected nodes + capabilities
- `node.describe` → describe a node (capabilities + supported `node.invoke` commands)
- `node.invoke` → send a command to a specific node
- Params: `{ nodeId, command, params?, timeoutMs? }`
**Events**
- `node.event` → async node status/errors
- e.g. background/foreground transitions, voice availability, canvas availability
#### Node command set (canvas)
These are values for `node.invoke.command`:
- `canvas.present` / `canvas.hide`
- `canvas.navigate` with `{ url }` (loads a URL; use `""` or `"/"` to return to the default scaffold)
- `canvas.eval` with `{ javaScript }`
- `canvas.snapshot` with `{ maxWidth?, quality?, format? }`
- A2UI (mobile + macOS canvas):
- `canvas.a2ui.push` with `{ messages: [...] }` (A2UI v0.8 server→client messages)
- `canvas.a2ui.pushJSONL` with `{ jsonl: "..." }` (legacy alias)
- `canvas.a2ui.reset`
- A2UI is hosted by the Gateway canvas host (`/__clawdbot__/a2ui/`) on `canvasHost.port`. Commands fail if the host is unreachable.
Result pattern:
- Request is a standard `req/res` with `ok` / `error`.
- Long operations (loads, streaming drawing, etc.) may also emit `node.event` progress.
##### Current (implemented)
As of 2025-12-13, the Gateway supports `node.invoke` for bridge-connected nodes.
Example: draw a diagonal line on the iOS Canvas:
```bash
clawdbot nodes invoke --node ios-node --command canvas.eval --params '{"javaScript":"(() => { const {ctx} = window.__clawdbot; ctx.clearRect(0,0,innerWidth,innerHeight); ctx.lineWidth=6; ctx.strokeStyle=\"#ff2d55\"; ctx.beginPath(); ctx.moveTo(40,40); ctx.lineTo(innerWidth-40, innerHeight-40); ctx.stroke(); return \"ok\"; })()"}'
```
### Background behavior requirement
When iOS is backgrounded:
- Voice may still be active (subject to iOS suspension).
- **All `canvas.*` commands must fail** with a stable error code, e.g.:
- `NODE_BACKGROUND_UNAVAILABLE`
- Include `retryable: true` and `retryAfterMs` if we want the agent to wait.
## iOS app architecture (SwiftUI)
### App structure
- Single fullscreen Canvas surface (WKWebView).
- One settings entry point: a **gear button** that opens a settings sheet.
- All navigation is **agent-driven** (no local URL bar).
### Components
- `BridgeDiscovery`: Bonjour browse + resolve (Network.framework `NWBrowser`)
- `BridgeConnection`: TCP session + pairing handshake + reconnect (TLS planned)
- `NodeRuntime`:
- Voice pipeline (wake-word + capture + forward)
- Canvas pipeline (WKWebView controller + snapshot + eval)
- Background state tracking; enforces “canvas unavailable in background”
### Voice in background (internal)
- Enable background audio mode (and required session configuration) so the mic pipeline can keep running when the user switches apps.
- If iOS suspends the app anyway, surface a clear node status (`node.event`) so operators can see voice is unavailable.
## Code sharing (macOS + iOS)
Create/expand SwiftPM targets so both apps share:
- `ClawdbotProtocol` (generated models; platform-neutral)
- `ClawdbotGatewayClient` (shared WS framing + connect/req/res + seq-gap handling)
- `ClawdbotKit` (node/canvas command types + deep links + shared utilities)
macOS continues to own:
- local Canvas implementation details (custom scheme handler serving on-disk HTML, window/panel presentation)
iOS owns:
- iOS-specific audio/speech + WKWebView presentation and lifecycle
## Repo layout
- iOS app: `apps/ios/` (XcodeGen `project.yml`)
- Shared Swift packages: `apps/shared/`
- Lint/format: iOS target runs `swiftformat --lint` + `swiftlint lint` using repo configs (`.swiftformat`, `.swiftlint.yml`).
Generate the Xcode project:
```bash
cd apps/ios
xcodegen generate
open Clawdbot.xcodeproj
```
## Storage plan (private by default)
### iOS
- Canvas/workspace files (persistent, private):
- `Application Support/Clawdbot/canvas/<sessionKey>/...`
- Snapshots / temp exports (evictable):
- `Library/Caches/Clawdbot/canvas-snapshots/<sessionKey>/...`
- Credentials:
- Keychain (paired identity + bridge trust anchor)
- `NODE_BACKGROUND_UNAVAILABLE`: bring the iOS app to the foreground (canvas/camera/screen commands require it).
- `A2UI_HOST_NOT_CONFIGURED`: the Gateway did not advertise a canvas host URL; check `canvasHost` in [`docs/configuration.md`](/gateway/configuration).
- Pairing prompt never appears: run `clawdbot nodes pending` and approve manually.
- Reconnect fails after reinstall: the Keychain pairing token was cleared; re-pair the node.
## Related docs
- [`docs/gateway.md`](/gateway) (gateway runbook)
- [`docs/gateway/pairing.md`](/gateway/pairing) (approval + storage)
- [`docs/bonjour.md`](/gateway/bonjour) (discovery debugging)
- [`docs/discovery.md`](/gateway/discovery) (LAN vs tailnet vs SSH)
- [Pairing](/gateway/pairing)
- [Discovery](/gateway/discovery)
- [Bonjour](/gateway/bonjour)

View File

@@ -15,7 +15,7 @@ Goal: ship **Clawdbot.app** with a self-contained relay binary that can run both
App bundle layout:
- `Clawdbot.app/Contents/Resources/Relay/clawdbot`
- bun `--compile` relay executable built from [`dist/macos/relay.js`](https://github.com/clawdbot/clawdbot/blob/main/dist/macos/relay.js)
- bun `--compile` relay executable built from `dist/macos/relay.js`
- Supports:
- `clawdbot …` (CLI)
- `clawdbot gateway …` (LaunchAgent daemon)
@@ -47,7 +47,7 @@ Important bundler flags:
Version injection:
- `--define "__CLAWDBOT_VERSION__=\"<pkg version>\""`
- [`src/version.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/version.ts) also supports `__CLAWDBOT_VERSION__` (and `CLAWDBOT_BUNDLED_VERSION`) so `--version` doesnt depend on reading `package.json` at runtime.
- The relay honors `__CLAWDBOT_VERSION__` / `CLAWDBOT_BUNDLED_VERSION` so `--version` doesnt depend on reading `package.json` at runtime.
## Launchd (Gateway as LaunchAgent)
@@ -58,7 +58,7 @@ Plist location (per-user):
- `~/Library/LaunchAgents/com.clawdbot.gateway.plist`
Manager:
- [`apps/macos/Sources/Clawdbot/GatewayLaunchAgentManager.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/GatewayLaunchAgentManager.swift)
- The macOS app owns LaunchAgent install/update for the bundled gateway.
Behavior:
- “Clawdbot Active” enables/disables the LaunchAgent.
@@ -79,7 +79,7 @@ Symptom (when mis-signed):
Fix:
- The bun executable needs JIT-ish permissions under hardened runtime.
- [`scripts/codesign-mac-app.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/codesign-mac-app.sh) signs `Relay/clawdbot` with:
- `scripts/codesign-mac-app.sh` signs `Relay/clawdbot` with:
- `com.apple.security.cs.allow-jit`
- `com.apple.security.cs.allow-unsigned-executable-memory`
@@ -89,18 +89,14 @@ Problem:
- bun cant load some native Node addons like `sharp` (and we dont want to ship native addon trees for the gateway).
Solution:
- Central helper [`src/media/image-ops.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/media/image-ops.ts)
- Prefers `/usr/bin/sips` on macOS (esp. when running under bun)
- Falls back to `sharp` when available (Node/dev)
- Used by:
- [`src/web/media.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/web/media.ts) (optimize inbound/outbound images)
- [`src/browser/screenshot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/browser/screenshot.ts)
- [`src/agents/pi-tools.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-tools.ts) (image sanitization)
- Image operations prefer `/usr/bin/sips` on macOS (especially under bun).
- When running in Node/dev, `sharp` is used when available.
- This affects inbound/outbound media, screenshots, and tool image sanitization.
## Browser control server
The Gateway starts the browser control server (loopback only) from [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts).
Its started from the relay daemon process, so the relay binary includes Playwright deps.
The Gateway starts the browser control server (loopback only) from the relay daemon process,
so the relay binary includes Playwright deps.
## Tests / smoke checks
@@ -127,7 +123,7 @@ Bun may leave dotfiles like `*.bun-build` in the repo root or subfolders.
## DMG styling (human installer)
[`scripts/create-dmg.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/create-dmg.sh) styles the DMG via Finder AppleScript.
`scripts/create-dmg.sh` styles the DMG via Finder AppleScript.
Rules of thumb:
- Use a **72dpi** background image that matches the Finder window size in points.

View File

@@ -5,157 +5,117 @@ read_when:
- Adding agent controls for visual workspace
- Debugging WKWebView canvas loads
---
# Canvas (macOS app)
Status: draft spec · Date: 2025-12-12
The macOS app embeds an agentcontrolled **Canvas panel** using `WKWebView`. It
is a lightweight visual workspace for HTML/CSS/JS, A2UI, and small interactive
UI surfaces.
Note: for iOS/Android nodes that should render agent-edited HTML/CSS/JS over the network, prefer the Gateway `canvasHost` (serves `~/clawd/canvas` over LAN/tailnet with live reload). A2UI is also **hosted by the Gateway** over HTTP. This doc focuses on the macOS in-app canvas panel. See [`docs/configuration.md`](/gateway/configuration).
## Where Canvas lives
Clawdbot can embed an agent-controlled “visual workspace” panel (“Canvas”) inside the macOS app using `WKWebView`, served via a **custom URL scheme** (no loopback HTTP port required).
Canvas state is stored under Application Support:
This is designed for:
- Agent-written HTML/CSS/JS on disk (per-session directory).
- A real browser engine for layout, rendering, and basic interactivity.
- Agent-driven visibility (show/hide), navigation, DOM/JS queries, and snapshots.
- Minimal chrome: borderless panel; bezel/chrome appears only on hover.
- `~/Library/Application Support/Clawdbot/canvas/<session>/...`
## Why a custom scheme (vs. loopback HTTP)
The Canvas panel serves those files via a **custom URL scheme**:
Using `WKURLSchemeHandler` keeps Canvas entirely in-process:
- No port conflicts and no extra local server lifecycle.
- Easier to sandbox: only serve files we explicitly map.
- Works offline and can use an ephemeral data store (no persistent cookies/cache).
If a Canvas page truly needs “real web” semantics (CORS, fetch to loopback endpoints, service workers), consider the loopback-server variant instead (out of scope for this doc).
## URL ↔ directory mapping
The Canvas scheme is:
- `clawdbot-canvas://<session>/<path>`
Routing model:
- `clawdbot-canvas://main/``<canvasRoot>/main/index.html` (or `index.htm`)
- `clawdbot-canvas://main/yolo``<canvasRoot>/main/yolo/index.html` (or `index.htm`)
Examples:
- `clawdbot-canvas://main/``<canvasRoot>/main/index.html`
- `clawdbot-canvas://main/assets/app.css``<canvasRoot>/main/assets/app.css`
- `clawdbot-canvas://main/widgets/todo/``<canvasRoot>/main/widgets/todo/index.html`
Directory listings are not served.
If no `index.html` exists at the root, the app shows a **builtin scaffold page**.
When `/` has no `index.html` yet, the handler serves a **built-in scaffold page** (bundled with the macOS app).
This is a visual placeholder only (no A2UI renderer).
## Panel behavior
### Suggested on-disk location
- Borderless, resizable panel anchored near the menu bar (or mouse cursor).
- Remembers size/position per session.
- Autoreloads when local canvas files change.
- Only one Canvas panel is visible at a time (session is switched as needed).
Store Canvas state under the app support directory:
- `~/Library/Application Support/Clawdbot/canvas/<session>/…`
Canvas can be disabled from Settings → **Allow Canvas**. When disabled, canvas
node commands return `CANVAS_DISABLED`.
This keeps it alongside other app-owned state and avoids mixing with `~/.clawdbot/` gateway config.
## Agent API surface
## Panel behavior (agent-controlled)
Canvas is exposed via the **node bridge**, so the agent can:
Canvas is presented as a borderless `NSPanel` (similar to the existing WebChat panel):
- Can be shown/hidden at any time by the agent.
- Supports an “anchored” presentation (near the menu bar icon or another anchor rect).
- Uses a rounded container; shadow stays on, but **chrome/bezel only appears on hover**.
- Default position is the **top-right corner** of the current screens visible frame (unless the user moved/resized it previously).
- The panel is **user-resizable** (edge resize + hover resize handle) and the last frame is persisted per session.
- show/hide the panel
- navigate to a path or URL
- evaluate JavaScript
- capture a snapshot image
### Hover-only chrome
CLI examples:
Implementation notes:
- Keep the window borderless at all times (dont toggle `styleMask`).
- Add an overlay view inside the content container for chrome (stroke + subtle gradient/material).
- Use an `NSTrackingArea` to fade the chrome in/out on `mouseEntered/mouseExited`.
- Optionally show close/drag affordances only while hovered.
```bash
clawdbot nodes canvas present --node <id>
clawdbot nodes canvas navigate --node <id> --url "/"
clawdbot nodes canvas eval --node <id> --js "document.title"
clawdbot nodes canvas snapshot --node <id>
```
## Agent API surface (current)
Notes:
- `canvas.navigate` accepts **local canvas paths**, `http(s)` URLs, and `file://` URLs.
- If you pass `"/"`, the Canvas shows the local scaffold or `index.html`.
Canvas is exposed via the Gateway **node bridge**, so the agent can:
- Show/hide the panel.
- Navigate to a path (relative to the session root).
- Evaluate JavaScript and optionally return results.
- Query/modify DOM (helpers mirroring “dom query/all/attr/click/type/wait” patterns).
- Capture a snapshot image of the current canvas view.
- Optionally set panel placement (screen `x/y` + `width/height`) when showing/navigating.
## A2UI in Canvas
This should be modeled after `WebChatManager`/`WebChatSwiftUIWindowController` but targeting `clawdbot-canvas://…` URLs.
A2UI is hosted by the Gateway canvas host and rendered inside the Canvas panel.
When the Gateway advertises a Canvas host, the macOS app autonavigates to the
A2UI host page on first open.
Related:
- For “invoke the agent again from UI” flows, prefer the macOS deep link scheme (`clawdbot://agent?...`) so *any* UI surface (Canvas, WebChat, native views) can trigger a new agent run. See [`docs/macos.md`](/platforms/macos).
## Agent commands (current)
Use the main `clawdbot` CLI; it invokes canvas commands via `node.invoke`.
- `clawdbot nodes canvas present --node <id> [--target <...>] [--x/--y/--width/--height]`
- Local targets map into the session directory via the custom scheme (directory targets resolve `index.html|index.htm`).
- If `/` has no index file, Canvas shows the built-in scaffold page and returns `status: "welcome"`.
- `clawdbot nodes canvas hide --node <id>`
- `clawdbot nodes canvas eval --js <code> --node <id>`
- `clawdbot nodes canvas snapshot --node <id>`
### Canvas A2UI
Canvas A2UI is hosted by the **Gateway canvas host** at:
Default A2UI host URL:
```
http://<gateway-host>:18793/__clawdbot__/a2ui/
```
The macOS app simply renders that page in the Canvas panel. The agent can drive it with JSONL **server→client protocol messages** (one JSON object per line):
### A2UI commands (v0.8)
- `clawdbot nodes canvas a2ui push --jsonl <path> --node <id>`
- `clawdbot nodes canvas a2ui reset --node <id>`
Canvas currently accepts **A2UI v0.8** server→client messages:
`push` expects a JSONL file where **each line is a single JSON object** (parsed and forwarded to the in-page A2UI renderer).
- `beginRendering`
- `surfaceUpdate`
- `dataModelUpdate`
- `deleteSurface`
Minimal example (v0.8):
`createSurface` (v0.9) is not supported.
CLI example:
```bash
cat > /tmp/a2ui-v0.8.jsonl <<'EOF'
{"surfaceUpdate":{"surfaceId":"main","components":[{"id":"root","component":{"Column":{"children":{"explicitList":["title","content"]}}}},{"id":"title","component":{"Text":{"text":{"literalString":"Canvas (A2UI v0.8)"},"usageHint":"h1"}}},{"id":"content","component":{"Text":{"text":{"literalString":"If you can read this, `nodes canvas a2ui push` works."},"usageHint":"body"}}}]}}
cat > /tmp/a2ui-v0.8.jsonl <<'EOFA2'
{"surfaceUpdate":{"surfaceId":"main","components":[{"id":"root","component":{"Column":{"children":{"explicitList":["title","content"]}}}},{"id":"title","component":{"Text":{"text":{"literalString":"Canvas (A2UI v0.8)"},"usageHint":"h1"}}},{"id":"content","component":{"Text":{"text":{"literalString":"If you can read this, A2UI push works."},"usageHint":"body"}}}]}}
{"beginRendering":{"surfaceId":"main","root":"root"}}
EOF
EOFA2
clawdbot nodes canvas a2ui push --jsonl /tmp/a2ui-v0.8.jsonl --node <id>
```
Notes:
- This does **not** support the A2UI v0.9 examples using `createSurface`.
- A2UI **fails** if the Gateway canvas host is unreachable (no local fallback).
- `nodes canvas a2ui push` validates JSONL (line numbers on errors) and rejects v0.9 payloads.
- Quick smoke: `clawdbot nodes canvas a2ui push --node <id> --text "Hello from A2UI"` renders a minimal v0.8 view.
Quick smoke:
## Triggering agent runs from Canvas (deep links)
```bash
clawdbot nodes canvas a2ui push --node <id> --text "Hello from A2UI"
```
## Triggering agent runs from Canvas
Canvas can trigger new agent runs via deep links:
Canvas can trigger new agent runs via the macOS app deep-link scheme:
- `clawdbot://agent?...`
This is intentionally separate from `clawdbot-canvas://…` (which is only for serving local Canvas files into the `WKWebView`).
Example (in JS):
Suggested patterns:
- HTML: render links/buttons that navigate to `clawdbot://agent?message=...`.
- JS: set `window.location.href = 'clawdbot://agent?...'` for “run this now” actions.
```js
window.location.href = "clawdbot://agent?message=Review%20this%20design";
```
Implementation note (important):
- In `WKWebView`, intercept `clawdbot://…` navigations in `WKNavigationDelegate` and forward them to the app, e.g. by calling `DeepLinkHandler.shared.handle(url:)` and returning `.cancel` for the navigation.
The app prompts for confirmation unless a valid key is provided.
Safety:
- Deep links (`clawdbot://agent?...`) are always enabled.
- Without a `key` query param, the app will prompt for confirmation before invoking the agent.
- With a valid `key`, the run is unattended (no prompt). For Canvas-originated actions, the app injects an internal key automatically.
## Security notes
## Security / guardrails
Recommended defaults:
- `WKWebsiteDataStore.nonPersistent()` for Canvas (ephemeral).
- Navigation policy: allow only `clawdbot-canvas://…` (and optionally `about:blank`); open `http/https` externally.
- Scheme handler must prevent directory traversal: resolved file paths must stay under `<canvasRoot>/<session>/`.
- Disable or tightly scope any JS bridge; prefer query-string/bootstrap config over `window.webkit.messageHandlers` for sensitive data.
## Debugging
Suggested debugging hooks:
- Enable Web Inspector for Canvas builds (same approach as WebChat).
- Log scheme requests + resolution decisions to OSLog (subsystem `com.clawdbot`, category `Canvas`).
- Provide a “copy canvas dir” action in debug settings to quickly reveal the session directory in Finder.
- Canvas scheme blocks directory traversal; files must live under the session root.
- Local Canvas content uses a custom scheme (no loopback server required).
- External `http(s)` URLs are allowed only when explicitly navigated.

View File

@@ -1,72 +1,56 @@
---
summary: "Running the gateway as a child process of the macOS app and why"
summary: "Gateway lifecycle on macOS (launchd + attach-only)"
read_when:
- Integrating the mac app with the gateway lifecycle
---
# Clawdbot gateway as a child process of the macOS app
# Gateway lifecycle on macOS
Date: 2025-12-06 · Status: draft · Owner: steipete
The macOS app **manages the Gateway via launchd** by default. This gives you
reliable autostart at login and restart on crashes.
Note (2025-12-19): the current implementation prefers a **launchd LaunchAgent** that runs the **bundled bun-compiled gateway**. This doc remains as an alternative mode for tighter coupling to the UI.
Childprocess mode (Gateway spawned directly by the app) is **not in use** today.
If you need tighter coupling to the UI, use **Attachonly** and run the Gateway
manually in a terminal.
## Goal
Run the Node-based Clawdbot/clawdbot gateway as a direct child of the LSUIElement app (instead of a launchd agent) while keeping all TCC-sensitive work inside the Swift app/broker layer and wiring the existing “Clawdbot Active” toggle to start/stop the child.
## Default behavior (launchd)
## When to prefer the child-process mode
- You want gateway lifetime strictly coupled to the menu-bar app (dies when the app quits) and controlled by the “Clawdbot Active” toggle without touching launchd.
- Youre okay giving up login persistence/auto-restart that launchd provides, or youll add your own backoff loop.
- You want simpler log capture and supervision inside the app (no external plist or user-visible LaunchAgent).
- The app installs a peruser LaunchAgent labeled `com.clawdbot.gateway`.
- When Local mode is enabled, the app ensures the LaunchAgent is loaded and
starts the Gateway if needed.
- Logs are written to the launchd gateway log path (visible in Debug Settings).
## Tradeoffs vs. launchd
- **Pros:** tighter coupling to UI state; simpler surface (no plist install/bootout); easier to stream stdout/stderr; fewer moving parts for beta users.
- **Cons:** no built-in KeepAlive/login auto-start; app crash kills gateway; you must build your own restart/backoff; Activity Monitor will show both processes under the app; still need correct TCC handling (see below).
- **TCC:** behaviorally, child processes often inherit the parent apps “responsible process” for TCC, but this is *not a contract*. Continue to route all protected actions through the Swift app/broker so prompts stay tied to the signed app bundle.
Common commands:
## TCC guardrails (must keep)
- Screen Recording, Accessibility, mic, and speech prompts must originate from the signed Swift app/broker. The Node child should never call these APIs directly; route through the apps node commands (via Gateway `node.invoke`) for:
- `system.notify`
- `system.run` (including `needsScreenRecording`)
- `screen.record` / `camera.*`
- PeekabooBridge UI automation (`peekaboo …`)
- Usage strings (`NSMicrophoneUsageDescription`, `NSSpeechRecognitionUsageDescription`, etc.) stay in the app targets Info.plist; a bare Node binary has none and would fail.
- If you ever embed Node that *must* touch TCC, wrap that call in a tiny signed helper target inside the app bundle and have Node exec that helper instead of calling the API directly.
```bash
launchctl kickstart -k gui/$UID/com.clawdbot.gateway
launchctl bootout gui/$UID/com.clawdbot.gateway
```
## Process manager design (Swift Subprocess)
- Add a small `GatewayProcessManager` (Swift) that owns:
- `execution: Execution?` from `Swift Subprocess` to track the child.
- `start(config)` called when “Clawdbot Active” flips ON:
- binary: host Node running the bundled gateway under `Clawdbot.app/Contents/Resources/Gateway/`
- args: current clawdbot entrypoint and flags
- cwd/env: point to `~/.clawdbot` as today; inject the expanded PATH so Homebrew Node resolves under launchd
- output: stream stdout/stderr to `/tmp/clawdbot-gateway.log` (cap buffer via Subprocess OutputLimits)
- restart: optional linear/backoff restart if exit was non-zero and Active is still true
- `stop()` called when Active flips OFF or app terminates: cancel the execution and `waitUntilExit`.
- Wire SwiftUI toggle:
- ON: `GatewayProcessManager.start(...)`
- OFF: `GatewayProcessManager.stop()` (no launchctl calls in this mode)
- Keep the existing `LaunchdManager` around so we can switch back if needed; the toggle can choose between launchd or child mode with a flag if we want both.
## Attachonly (developer mode)
## Packaging and signing
- Bundle the gateway payload (dist + production node_modules) under `Contents/Resources/Gateway/`; rely on host Node ≥22 instead of embedding a runtime.
- Codesign native addons and dylibs inside the bundle; no nested runtime binary to sign now.
- Host runtime should not call TCC APIs directly; keep privileged work inside the app/broker.
Attachonly tells the app to **connect to an existing Gateway** without spawning
one. This is ideal for local dev (hotreload, custom flags).
## Logging and observability
- Stream child stdout/stderr to `/tmp/clawdbot-gateway.log`; surface the last N lines in the Debug tab.
- Emit a user notification (via existing NotificationManager) on crash/exit while Active is true.
- Add a lightweight heartbeat from Node → app (e.g., ping over stdout) so the app can show status in the menu.
Steps:
## Failure/edge cases
- App crash/quit kills the gateway. Decide if that is acceptable for the deployment tier; otherwise, stick with launchd for production and keep child-process for dev/experiments.
- If the gateway exits repeatedly, back off (e.g., 1s/2s/5s/10s) and give up after N attempts with a menu warning.
- Respect the existing pause semantics: when paused, the broker should return `ok=false, "clawdbot paused"`; the gateway should avoid calling privileged routes while paused.
1) Start the Gateway yourself:
```bash
pnpm gateway:watch
```
2) In the macOS app: Debug Settings → Gateway → **Attach only**.
## Open questions / follow-ups
- Do we need dual-mode (launchd for prod, child for dev)? If yes, gate via a setting or build flag.
- Embedding a runtime is off the table for now; we rely on host Node for size/simplicity. Revisit only if host PATH drift becomes painful.
- Do we want a tiny signed helper for rare TCC actions that cannot be brokered via the Swift app/broker?
The UI should show “Using existing gateway …” once connected.
## Decision snapshot (current recommendation)
- Keep all TCC surfaces in the Swift app/broker (node commands + PeekabooBridgeHost).
- Implement `GatewayProcessManager` with Swift Subprocess to start/stop the gateway on the “Clawdbot Active” toggle.
- Maintain the launchd path as a fallback for uptime/login persistence until child-mode proves stable.
## Remote mode
Remote mode never starts a local Gateway. The app uses an SSH tunnel to the
remote host and connects over that tunnel.
## Why we prefer launchd
- Autostart at login.
- Builtin restart/KeepAlive semantics.
- Predictable logs and supervision.
If a true childprocess mode is ever needed again, it should be documented as a
separate, explicit devonly mode.

View File

@@ -1,170 +1,62 @@
---
summary: "Plan for integrating Peekaboo automation into Clawdbot via PeekabooBridge (socket-based TCC broker)"
summary: "PeekabooBridge integration for macOS UI automation"
read_when:
- Hosting PeekabooBridge in Clawdbot.app
- Integrating Peekaboo as a submodule
- Changing PeekabooBridge protocol/paths
---
# Peekaboo Bridge in Clawdbot (macOS UI automation broker)
# Peekaboo Bridge (macOS UI automation)
## TL;DR
- **Peekaboo removed its XPC helper** and now exposes privileged automation via a **UNIX domain socket bridge** (`PeekabooBridge` / `PeekabooBridgeHost`, socket name `bridge.sock`).
- Clawdbot integrates by **optionally hosting the same bridge** inside **Clawdbot.app** (user-toggleable). The primary client is the **`peekaboo` CLI** (installed via npm); Clawdbot does not need its own `ui …` CLI surface.
- For **visualizations**, we keep them in **Peekaboo.app** (best UX); Clawdbot stays a thin broker host. No visualizer toggle in Clawdbot.
Clawdbot can host **PeekabooBridge** as a local, permissionaware UI automation
broker. This lets the `peekaboo` CLI drive UI automation while reusing the
macOS apps TCC permissions.
Non-goals:
- No auto-launching Peekaboo.app.
- No onboarding deep links from the automation endpoint (Clawdbot onboarding already handles permissions).
- No AI provider/agent runtime dependencies in Clawdbot (avoid pulling Tachikoma/MCP into the Clawdbot app/CLI).
## What this is (and isnt)
## Big refactor (Dec 2025): XPC → Bridge
Peekaboos privileged execution moved from “CLI → XPC helper” to “CLI → socket bridge host”. For Clawdbot this is a win:
- It matches the existing “local socket + codesign checks” approach.
- It lets us piggyback on **either** Peekaboo.apps permissions **or** Clawdbot.apps permissions (whichever is running).
- It avoids “two apps with two TCC bubbles” unless needed.
- **Host**: Clawdbot.app can act as a PeekabooBridge host.
- **Client**: use the `peekaboo` CLI (no separate `clawdbot ui ...` surface).
- **UI**: visual overlays stay in Peekaboo.app; Clawdbot is a thin broker host.
Reference (Peekaboo submodule): `Peekaboo/docs/bridge-host.md`.
## Enable the bridge
## Architecture
### Processes
- **Bridge hosts** (provide TCC-backed automation):
- **Peekaboo.app** (preferred; also provides visualizations + controls)
- **Claude.app** (secondary; lets `peekaboo` reuse Claude Desktops granted permissions)
- **Clawdbot.app** (secondary; “thin host” only)
- **Bridge clients** (trigger single actions):
- `peekaboo …` (preferred; humans + agents)
- Optional: Clawdbot/Node shells out to `peekaboo` when it needs UI automation/capture
In the macOS app:
- Settings → **Enable Peekaboo Bridge**
### Host discovery (client-side)
Order is deliberate:
1. Peekaboo.app host (full UX)
2. Claude.app host (piggyback on Claude Desktop permissions)
3. Clawdbot.app host (piggyback on Clawdbot permissions)
When enabled, Clawdbot starts a local UNIX socket server. If disabled, the host
is stopped and `peekaboo` will fall back to other available hosts.
Socket paths (convention; exact paths must match Peekaboo):
- Peekaboo: `~/Library/Application Support/Peekaboo/bridge.sock`
- Claude: `~/Library/Application Support/Claude/bridge.sock`
- Clawdbot: `~/Library/Application Support/clawdbot/bridge.sock`
## Client discovery order
No auto-launch: if a host isnt reachable, the command fails with a clear error (start Peekaboo.app, Claude.app, or Clawdbot.app).
Peekaboo clients typically try hosts in this order:
Override (debugging): set `PEEKABOO_BRIDGE_SOCKET=/path/to/bridge.sock`.
1. Peekaboo.app (full UX)
2. Claude.app (if installed)
3. Clawdbot.app (thin broker)
### Protocol shape
- **Single request per connection**: connect → write one JSON request → half-close → read one JSON response → close.
- **Timeout**: 10 seconds end-to-end per action (client enforced; host should also enforce per-operation).
- **Errors**: human-readable string by default; structured envelope in `--json`.
Use `peekaboo bridge status --verbose` to see which host is active and which
socket path is in use. You can override with:
## Dependency strategy (submodule)
Integrate Peekaboo via git submodule (nested submodules are OK).
```bash
export PEEKABOO_BRIDGE_SOCKET=/path/to/bridge.sock
```
Path in Clawdbot repo:
- `./Peekaboo` (Swabble-style; keep stable so SwiftPM path deps dont churn).
## Security & permissions
What Clawdbot should use:
- **Client side**: `PeekabooBridge` (socket client + protocol models).
- **Host side (Clawdbot.app)**: `PeekabooBridgeHost` + the minimal Peekaboo services needed to implement operations.
- The bridge validates **caller code signatures**; TeamID `Y5PE65HELJ` is
allowed by default (Peekaboos signing team), plus the Clawdbot apps TeamID.
- Requests time out after ~10 seconds.
- If required permissions are missing, the bridge returns a clear error message
rather than launching System Settings.
What Clawdbot should *not* embed:
- **Visualizer UI**: keep it in Peekaboo.app for now (toggle + controls live there).
- **XPC**: dont reintroduce helper targets; use the bridge.
## Snapshot behavior (automation)
## IPC / CLI surface
### No `clawdbot ui …`
We avoid a parallel “Clawdbot UI automation CLI”. Instead:
- `peekaboo` is the user/agent-facing CLI surface for automation and capture.
- Clawdbot.app can host PeekabooBridge as a **thin TCC broker** so Peekaboo can piggyback on Clawdbot permissions when Peekaboo.app isnt running.
Snapshots are stored in memory and expire automatically after a short window.
If you need longer retention, recapture from the client.
### Diagnostics
Use Peekaboos built-in diagnostics to see which host would be used:
- `peekaboo bridge status`
- `peekaboo bridge status --verbose`
- `peekaboo bridge status --json`
## Troubleshooting
### Output format
Peekaboo commands default to human text output. Add `--json` for a structured envelope.
### Timeouts
Default timeout for UI actions: **10 seconds** end-to-end (client enforced; host should also enforce per-operation).
## Coordinate model (multi-display)
Requirement: coordinates are **per screen**, not global.
Standardize for the CLI (agent-friendly): **top-left origin per screen**.
Proposed request shape:
- Requests accept `screenIndex` + `{x, y}` in that screens local coordinate space.
- Clawdbot.app converts to global CG coordinates using `NSScreen.screens[screenIndex].frame.origin`.
- Responses should echo both:
- The resolved `screenIndex`
- The local `{x, y}` and bounds
- Optionally the global `{x, y}` for debugging
Ordering: use `NSScreen.screens` ordering consistently (documented in the CLI help + JSON schema).
## Targeting (per app/window)
Expose window/app targeting in the UI surface (align with Peekaboo targeting):
- frontmost
- by app name / bundle id
- by window title substring
- by (app, index)
Peekaboo CLI targeting (agent-friendly):
- `--bundle-id <id>` for app targeting
- `--window-index <n>` (0-based) for disambiguating within an app when capturing
All “see/click/type/scroll/wait” requests should accept a target (default: frontmost).
## “See” + click packs (Playwright-style)
Behavior stays aligned with Peekaboo:
- `peekaboo see` returns element IDs (e.g. `B1`, `T3`) with bounds/labels.
- Follow-up actions reference those IDs without re-scanning.
`peekaboo see` should:
- capture (optionally targeted) window/screen
- return a screenshot **file path** (default: temp directory)
- return a list of elements (text or JSON)
Snapshot lifecycle requirement:
- Host apps are long-lived, so snapshot state should be **in-memory by default**.
- Snapshot scoping: “implicit snapshot” is **per target bundle id** (reuse last snapshot for that app when snapshot id is omitted).
Practical flow (agent-friendly):
- `peekaboo list apps` / `peekaboo list windows` provide bundle-id context for targeting.
- `peekaboo see --bundle-id X` updates the implicit snapshot for `X`.
- `peekaboo click --bundle-id X --on B1` reuses the most recent snapshot for `X` when `--snapshot-id` is omitted.
## Visualizer integration
Keep visualizations in **Peekaboo.app** for now.
- Clawdbot hosts the bridge, but does not render overlays.
- Any “visualizer enabled/disabled” setting is controlled in Peekaboo.app.
## Screenshots (legacy → Peekaboo takeover)
Clawdbot should not grow a separate screenshot CLI surface.
Migration plan:
- Use `peekaboo capture …` / `peekaboo see …` (returns a file path, default temp directory).
- Once Clawdbot legacy screenshot plumbing is replaced, remove it cleanly (no aliases).
## Permissions behavior
If required permissions are missing:
- return `ok=false` with a short human error message (e.g., “Accessibility permission missing”)
- do not try to open System Settings from the automation endpoint
## Security (socket auth)
Both hosts must enforce:
- filesystem perms on the socket path (owner read/write only)
- server-side caller validation:
- require the callers code signature TeamID to be `Y5PE65HELJ`
- optional bundle-id allowlist for tighter scoping
Debug-only escape hatch (development convenience):
- “allow same-UID callers” means: *skip codesign checks for clients running under the same Unix user*.
- This must be **opt-in**, **DEBUG-only**, and guarded by an env var (Peekaboo uses `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1`).
## Next integration steps (after this doc)
1. Add Peekaboo as a git submodule (nested submodules OK).
2. Host `PeekabooBridgeHost` inside Clawdbot.app behind a single setting (“Enable Peekaboo Bridge”, default on).
3. Ensure Clawdbot hosts the bridge at `~/Library/Application Support/clawdbot/bridge.sock` and speaks the PeekabooBridge JSON protocol.
4. Validate with `peekaboo bridge status --verbose` that Peekaboo can select Clawdbot as the fallback host (no auto-launch).
5. Keep all protocol decisions aligned with Peekaboo (coordinate system, element IDs, snapshot scoping, error envelopes).
- If `peekaboo` reports “bridge client is not authorized”, ensure the client is
properly signed or run the host with `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1`
in **debug** mode only.
- If no hosts are found, open one of the host apps (Peekaboo.app or Clawdbot.app)
and confirm permissions are granted.

View File

@@ -3,25 +3,37 @@ summary: "How the mac app embeds the gateway WebChat and how to debug it"
read_when:
- Debugging mac WebChat view or loopback port
---
# Web Chat (macOS app)
# WebChat (macOS app)
The macOS menu bar app shows the WebChat UI as a native SwiftUI view and reuses the **primary Clawd session** (`main`, or `global` when scope is global).
The macOS menu bar app embeds the WebChat UI as a native SwiftUI view. It
connects to the Gateway and defaults to the **main session** for the selected
agent (with a session switcher for other sessions).
- **Local mode**: connects directly to the local Gateway WebSocket.
- **Remote mode**: forwards the Gateway WebSocket control port over SSH and uses that as the data plane.
- **Remote mode**: forwards the Gateway control port over SSH and uses that
tunnel as the data plane.
## Launch & debugging
- Manual: Lobster menu → “Open Chat”.
- Auto-open for testing: run `dist/Clawdbot.app/Contents/MacOS/Clawdbot --webchat` (or pass `--webchat` to the binary launched by launchd). The window opens on startup.
- Logs: see [`./scripts/clawlog.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/clawlog.sh) (subsystem `com.clawdbot`, category `WebChatSwiftUI`).
- Autoopen for testing:
```bash
dist/Clawdbot.app/Contents/MacOS/Clawdbot --webchat
```
- Logs: `./scripts/clawlog.sh` (subsystem `com.clawdbot`, category `WebChatSwiftUI`).
## How its wired
- Implementation: [`apps/macos/Sources/Clawdbot/WebChatSwiftUI.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/WebChatSwiftUI.swift) hosts `ClawdbotChatUI` and speaks to the Gateway over `GatewayConnection`.
- Data plane: Gateway WebSocket methods `chat.history`, `chat.send`, `chat.abort`; events `chat`, `agent`, `presence`, `tick`, `health`.
- Session: usually primary (`main`); multiple transports (WhatsApp/Telegram/Discord/Desktop) share the same key. The onboarding flow uses a dedicated `onboarding` session to keep first-run setup separate.
## Security / surface area
- Data plane: Gateway WS methods `chat.history`, `chat.send`, `chat.abort` and
events `chat`, `agent`, `presence`, `tick`, `health`.
- Session: defaults to the primary session (`main`, or `global` when scope is
global). The UI can switch between sessions.
- Onboarding uses a dedicated session to keep firstrun setup separate.
## Security surface
- Remote mode forwards only the Gateway WebSocket control port over SSH.
## Known limitations
- The UI is optimized for the primary session and typical “chat” usage (not a full browser-based sandbox surface).
- The UI is optimized for chat sessions (not a full browser sandbox).

View File

@@ -3,7 +3,7 @@ summary: "macOS IPC architecture for Clawdbot app, gateway node bridge, and Peek
read_when:
- Editing IPC contracts or menu bar app IPC
---
# Clawdbot macOS IPC architecture (Dec 2025)
# Clawdbot macOS IPC architecture
**Current model:** there is **no local control socket** and no `clawdbot-mac` CLI. All agent actions go through the Gateway WebSocket and `node.invoke`. UI automation still uses PeekabooBridge.
@@ -21,10 +21,10 @@ read_when:
- UI automation uses a separate UNIX socket named `bridge.sock` and the PeekabooBridge JSON protocol.
- Host preference order (client-side): Peekaboo.app → Claude.app → Clawdbot.app → local execution.
- Security: bridge hosts require TeamID `Y5PE65HELJ`; DEBUG-only same-UID escape hatch is guarded by `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` (Peekaboo convention).
- See: [`docs/mac/peekaboo.md`](/platforms/mac/peekaboo) for the Clawdbot plan and naming.
- See: [`docs/mac/peekaboo.md`](/platforms/mac/peekaboo) for PeekabooBridge usage.
### Mach/XPC (future direction)
- Still optional for internal app services, but **not required** for automation now that node.invoke is the surface.
### Mach/XPC
- Not required for automation; `node.invoke` + PeekabooBridge cover current needs.
## Operational flows
- Restart/rebuild: `SIGN_IDENTITY="Apple Development: Peter Steinberger (2ZAC4GM7GD)" scripts/restart-mac.sh`
@@ -37,4 +37,4 @@ read_when:
- Prefer requiring a TeamID match for all privileged surfaces.
- PeekabooBridge: `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` (DEBUG-only) may allow same-UID callers for local development.
- All communication remains local-only; no network sockets are exposed.
- TCC prompts originate only from the GUI app bundle; run [`scripts/package-mac-app.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/package-mac-app.sh) so the signed bundle ID stays stable.
- TCC prompts originate only from the GUI app bundle; keep the signed bundle ID stable across rebuilds.

View File

@@ -1,123 +1,97 @@
---
summary: "Spec for the Clawdbot macOS companion menu bar app (gateway + node broker)"
summary: "Clawdbot macOS companion app (menu bar + gateway broker)"
read_when:
- Implementing macOS app features
- Changing gateway lifecycle or node bridging on macOS
---
# Clawdbot macOS Companion (menu bar + gateway broker)
Author: steipete · Status: draft spec · Date: 2025-12-20
The macOS app is the **menubar companion** for Clawdbot. It owns permissions,
manages the Gateway locally, and exposes macOS capabilities to the agent as a
node.
## Support snapshot
- Core Gateway: supported (TypeScript on Node/Bun).
- Companion app: macOS menu bar app with permissions + node bridge.
- Install: [Getting Started](/start/getting-started) or [Install & updates](/install/updating).
- Gateway: [Runbook](/gateway) + [Configuration](/gateway/configuration).
## What it does
## System control (launchd)
If you run the bundled macOS app, it installs a per-user LaunchAgent labeled `com.clawdbot.gateway`.
CLI-only installs can use `clawdbot onboard --install-daemon`, `clawdbot daemon install`, or `clawdbot configure`**Gateway daemon**.
- Shows native notifications and status in the menu bar.
- Owns TCC prompts (Notifications, Accessibility, Screen Recording, Microphone,
Speech Recognition, Automation/AppleScript).
- Runs or connects to the Gateway (local or remote).
- Exposes macOSonly tools (Canvas, Camera, Screen Recording, `system.run`).
- Optionally hosts **PeekabooBridge** for UI automation.
- Installs a helper CLI (`clawdbot`) into `/usr/local/bin` and
`/opt/homebrew/bin` on request.
## Local vs remote mode
- **Local** (default): the app ensures a local Gateway is running via launchd.
- **Remote**: the app connects to a Gateway over SSH/Tailscale and never starts
a local process.
- **Attachonly** (debug): the app connects to an alreadyrunning local Gateway
and never spawns its own.
## Launchd control
The app manages a peruser LaunchAgent labeled `com.clawdbot.gateway`.
```bash
launchctl kickstart -k gui/$UID/com.clawdbot.gateway
launchctl bootout gui/$UID/com.clawdbot.gateway
```
`launchctl` only works if the LaunchAgent is installed; otherwise run `clawdbot daemon install` first.
If the LaunchAgent isnt installed, enable it from the app or run
`clawdbot daemon install`.
Details: [Gateway runbook](/gateway) and [Bundled bun Gateway](/platforms/mac/bun).
## Node capabilities (mac)
## Purpose
- Single macOS menu-bar app named **Clawdbot** that:
- Shows native notifications for Clawdbot/clawdbot events.
- Owns TCC prompts (Notifications, Accessibility, Screen Recording, Automation/AppleScript, Microphone, Speech Recognition).
- Runs (or connects to) the **Gateway** and exposes itself as a **node** so agents can reach macOSonly features.
- Hosts **PeekabooBridge** for UI automation (consumed by `peekaboo`; see [`docs/mac/peekaboo.md`](/platforms/mac/peekaboo)).
- Installs a single CLI (`clawdbot`) by symlinking the bundled binary.
The macOS app presents itself as a node. Common commands:
## High-level design
- SwiftPM package in `apps/macos/` (macOS 15+, Swift 6).
- Targets:
- `ClawdbotIPC` (shared Codable types + helpers for appinternal actions).
- `Clawdbot` (LSUIElement MenuBarExtra app; hosts Gateway + node bridge + PeekabooBridgeHost).
- Bundle ID: `com.clawdbot.mac`.
- Bundled runtime binaries live under `Contents/Resources/Relay/`:
- `clawdbot` (buncompiled relay: CLI + gateway)
- The app symlinks `clawdbot` into `/usr/local/bin` and `/opt/homebrew/bin`.
## Gateway + node bridge
- The mac app runs the Gateway in **local** mode (unless configured remote).
- The gateway port is configurable via `gateway.port` or `CLAWDBOT_GATEWAY_PORT` (default 18789). The mac app reads that value for launchd, probes, and remote SSH tunnels.
- The mac app connects to the bridge as a **node** and advertises capabilities/commands.
- Agentfacing actions are exposed via `node.invoke` (no local control socket).
- The mac app watches `~/.clawdbot/clawdbot.json` and switches modes live when `gateway.mode` or `gateway.remote.url` changes.
- If `gateway.mode` is unset but `gateway.remote.url` is set, the mac app treats it as remote mode.
- Changing connection mode in the mac app writes `gateway.mode` (and `gateway.remote.url` in remote mode) back to the config file.
### Node commands (mac)
- Canvas: `canvas.present|navigate|eval|snapshot|a2ui.*`
- Camera: `camera.snap|camera.clip`
- Canvas: `canvas.present`, `canvas.navigate`, `canvas.eval`, `canvas.snapshot`, `canvas.a2ui.*`
- Camera: `camera.snap`, `camera.clip`
- Screen: `screen.record`
- System: `system.run` (shell) and `system.notify`
- System: `system.run`, `system.notify`
### Permission advertising
- Nodes include a `permissions` map in hello/pairing.
- The Gateway surfaces it via `node.list` / `node.describe` so agents can decide what to run.
The node reports a `permissions` map so agents can decide whats allowed.
## CLI (`clawdbot`)
- The **only** CLI is `clawdbot` (TS/bun). There is no `clawdbot-mac` helper.
- For macspecific actions, the CLI uses `node.invoke`:
- `clawdbot nodes canvas present|navigate|eval|snapshot|a2ui push|a2ui reset`
- `clawdbot nodes run --node <id> -- <command...>`
- `clawdbot nodes notify --node <id> --title ...`
## Deep links
## Onboarding
- Install CLI (symlink) → Permissions checklist → Test notification → Done.
- Remote mode skips local gateway/CLI steps.
- Selecting Local auto-enables the bundled Gateway via launchd (unless “Attach only” debug mode is enabled).
## Deep links (URL scheme)
Clawdbot (the macOS app) registers a URL scheme for triggering local actions from anywhere (browser, Shortcuts, CLI, etc.).
Scheme:
- `clawdbot://…`
The app registers the `clawdbot://` URL scheme for local actions.
### `clawdbot://agent`
Triggers a Gateway `agent` request (same machinery as WebChat/agent runs).
Example:
Triggers a Gateway `agent` request.
```bash
open 'clawdbot://agent?message=Hello%20from%20deep%20link'
```
Query parameters:
- `message` (required): the agent prompt (URL-encoded).
- `sessionKey` (optional): explicit session key to use.
- `thinking` (optional): thinking hint (e.g. `low`; omit for default).
- `deliver` (optional): `true|false` (default: false).
- `to` / `provider` (optional): forwarded to the Gateway `agent` method (only meaningful with `deliver=true`).
- `timeoutSeconds` (optional): timeout hint forwarded to the Gateway.
- `key` (optional): unattended mode key (see below).
- `message` (required)
- `sessionKey` (optional)
- `thinking` (optional)
- `deliver` / `to` / `provider` (optional)
- `timeoutSeconds` (optional)
- `key` (optional unattended mode key)
Safety/guardrails:
- Always enabled.
- Without a `key` query param, the app will prompt for confirmation before invoking the agent.
- With `key=<value>`, Clawdbot runs without prompting (intended for personal automations).
- The current key is shown in Debug Settings and stored locally in UserDefaults.
Safety:
- Without `key`, the app prompts for confirmation.
- With a valid `key`, the run is unattended (intended for personal automations).
Notes:
- In local mode, Clawdbot will start the local Gateway if needed before issuing the request.
- In remote mode, Clawdbot will use the configured remote tunnel/endpoint.
## Onboarding flow (typical)
1) Install and launch **Clawdbot.app**.
2) Complete the permissions checklist (TCC prompts).
3) Ensure **Local** mode is active and the Gateway is running.
4) Install the CLI helper if you want terminal access.
## Build & dev workflow (native)
- `cd apps/macos && swift build` (debug) / `swift build -c release`.
- Run app for dev: `swift run Clawdbot` (or Xcode scheme).
- Package app + CLI: [`scripts/package-mac-app.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/package-mac-app.sh) (builds bun CLI + gateway).
- Tests: add Swift Testing suites under `apps/macos/Tests`.
## Open questions / decisions
- Should `system.run` support streaming stdout/stderr or keep buffered responses only?
- Should we allow nodeside permission prompts, or always require explicit app UI action?
- `cd apps/macos && swift build`
- `swift run Clawdbot` (or Xcode)
- Package app + CLI: `scripts/package-mac-app.sh`
## Related docs
- [Gateway runbook](/gateway)
- [Bundled bun Gateway](/platforms/mac/bun)
- [macOS permissions](/platforms/mac/permissions)
- [Canvas](/platforms/mac/canvas)

View File

@@ -107,18 +107,16 @@ Slack's Conversations API is type-scoped: you only need the scopes for the
conversation types you actually touch (channels, groups, im, mpim). See
https://api.slack.com/docs/conversations-api for the overview.
### Required by current code
### Required scopes
- `chat:write` (send/update/delete messages via `chat.postMessage`)
https://api.slack.com/methods/chat.postMessage
- `im:write` (open DMs via `conversations.open` for user DMs)
https://api.slack.com/methods/conversations.open
- `channels:history`, `groups:history`, `im:history`, `mpim:history`
(`conversations.history` in [`src/slack/actions.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/slack/actions.ts))
https://api.slack.com/methods/conversations.history
- `channels:read`, `groups:read`, `im:read`, `mpim:read`
(`conversations.info` in [`src/slack/monitor.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/slack/monitor.ts))
https://api.slack.com/methods/conversations.info
- `users:read` (`users.info` in [`src/slack/monitor.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/slack/monitor.ts) + [`src/slack/actions.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/slack/actions.ts))
- `users:read` (user lookup)
https://api.slack.com/methods/users.info
- `reactions:read`, `reactions:write` (`reactions.get` / `reactions.add`)
https://api.slack.com/methods/reactions.get

View File

@@ -188,8 +188,3 @@ Recommended for personal numbers:
- Subsystems: `whatsapp/inbound`, `whatsapp/outbound`, `web-heartbeat`, `web-reconnect`.
- Log file: `/tmp/clawdbot/clawdbot-YYYY-MM-DD.log` (configurable).
- Troubleshooting guide: [`docs/troubleshooting.md`](/gateway/troubleshooting).
## Tests
- [`src/web/auto-reply.test.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/web/auto-reply.test.ts) (mention gating, history injection, reply flow)
- [`src/web/monitor-inbox.test.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/web/monitor-inbox.test.ts) (inbound parsing + reply context)
- [`src/web/outbound.test.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/web/outbound.test.ts) (send mapping + media)

View File

@@ -125,7 +125,7 @@ Use these hubs to discover every page, including deep dives and reference docs t
- [Linux](https://docs.clawd.bot/platforms/linux)
- [Web surfaces](https://docs.clawd.bot/web)
## macOS companion app (internals)
## macOS companion app (advanced)
- [macOS dev setup](https://docs.clawd.bot/platforms/mac/dev-setup)
- [macOS menu bar](https://docs.clawd.bot/platforms/mac/menu-bar)
@@ -144,7 +144,7 @@ Use these hubs to discover every page, including deep dives and reference docs t
- [macOS bun gateway](https://docs.clawd.bot/platforms/mac/bun)
- [macOS XPC](https://docs.clawd.bot/platforms/mac/xpc)
- [macOS skills](https://docs.clawd.bot/platforms/mac/skills)
- [macOS Peekaboo plan](https://docs.clawd.bot/platforms/mac/peekaboo)
- [macOS Peekaboo](https://docs.clawd.bot/platforms/mac/peekaboo)
## Workspace + templates
@@ -160,13 +160,13 @@ Use these hubs to discover every page, including deep dives and reference docs t
- [Templates: TOOLS](https://docs.clawd.bot/reference/templates/TOOLS)
- [Templates: USER](https://docs.clawd.bot/reference/templates/USER)
## Experiments + proposals
## Experiments (exploratory)
- [Onboarding config protocol](https://docs.clawd.bot/experiments/onboarding-config-protocol)
- [Plan: cron hardening](https://docs.clawd.bot/experiments/plans/cron-add-hardening)
- [Plan: group policy hardening](https://docs.clawd.bot/experiments/plans/group-policy-hardening)
- [Cron hardening notes](https://docs.clawd.bot/experiments/plans/cron-add-hardening)
- [Group policy hardening notes](https://docs.clawd.bot/experiments/plans/group-policy-hardening)
- [Research: memory](https://docs.clawd.bot/experiments/research/memory)
- [Proposal: model config](https://docs.clawd.bot/experiments/proposals/model-config)
- [Model config exploration](https://docs.clawd.bot/experiments/proposals/model-config)
## Testing + release

View File

@@ -1,207 +1,102 @@
---
summary: "Planned first-run onboarding flow for Clawdbot (local vs remote, OAuth auth, workspace bootstrap ritual)"
summary: "First-run onboarding flow for Clawdbot (macOS app)"
read_when:
- Designing the macOS onboarding assistant
- Implementing Anthropic/OpenAI auth or identity setup
- Implementing auth or identity setup
---
# Onboarding (macOS app)
This doc describes the intended **first-run onboarding** for Clawdbot. The goal is a good “day 0” experience: pick where the Gateway runs, bind subscription auth (Anthropic or OpenAI) for the embedded agent runtime, and then let the **agent bootstrap itself** via a first-run ritual in the workspace.
This doc describes the **current** firstrun onboarding flow. The goal is a
smooth “day 0” experience: pick where the Gateway runs, connect auth, run the
wizard, and let the agent bootstrap itself.
## Page order (high level)
## Page order (current)
1) **Local vs Remote**
2) **(Local only)** Connect subscription auth (Anthropic / OpenAI OAuth) — optional, but recommended
3) **Connect Gmail (optional)** — run `clawdbot hooks gmail setup` to configure Pub/Sub hooks
4) **Onboarding chat** — dedicated session where the agent introduces itself and guides setup
1) Welcome + security notice
2) **Gateway selection** (Local / Remote / Configure later)
3) **Auth (Anthropic OAuth)** — local only
4) **Setup Wizard** (Gatewaydriven)
5) **Permissions** (TCC prompts)
6) **CLI helper** (optional)
7) **Onboarding chat** (dedicated session)
8) Ready
## 1) Local vs Remote
First question: where does the **Gateway** run?
Where does the **Gateway** run?
- **Local (this Mac):** onboarding can run OAuth flows and write OAuth credentials locally.
- **Remote (over SSH/tailnet):** onboarding must not run OAuth locally, because credentials must exist on the **gateway host**.
- **Local (this Mac):** onboarding can run OAuth flows and write credentials
locally.
- **Remote (over SSH/Tailnet):** onboarding does **not** run OAuth locally;
credentials must exist on the gateway host.
- **Configure later:** skip setup and leave the app unconfigured.
Gateway auth tip:
- If you only use Clawdbot on this Mac (loopback gateway), keep auth **Off**.
- Use **Token** for multi-machine access or non-loopback binds.
- If you only use Clawdbot locally (loopback), auth can be **Off**.
- Use a **token** for multimachine access or nonloopback binds.
Implementation note (2025-12-19): in local mode, the macOS app bundles the Gateway and enables it via a per-user launchd LaunchAgent (no global npm install/Node requirement for the user).
## 2) Local-only auth (Anthropic OAuth)
## 2) Local-only: Connect subscription auth (Anthropic / OpenAI OAuth)
The macOS app supports Anthropic OAuth (Claude Pro/Max). The flow:
This is the “bind Clawdbot to subscription auth” step. It is explicitly the **Anthropic (Claude Pro/Max)** or **OpenAI (ChatGPT/Codex)** OAuth flow, not a generic “login”.
- Opens the browser for OAuth (PKCE)
- Asks the user to paste the `code#state` value
- Writes credentials to `~/.clawdbot/credentials/oauth.json`
More detail: [/concepts/oauth](/concepts/oauth)
Other providers (OpenAI, custom APIs) are configured via environment variables
or config files for now.
### Recommended: OAuth (Anthropic)
## 3) Setup Wizard (Gatewaydriven)
The macOS app should:
- Start the Anthropic OAuth (PKCE) flow in the users browser.
- Ask the user to paste the `code#state` value.
- Exchange it for tokens and write credentials to:
- `~/.clawdbot/credentials/oauth.json` (file mode `0600`, directory mode `0700`)
The app can run the same setup wizard as the CLI. This keeps onboarding in sync
with Gatewayside behavior and avoids duplicating logic in SwiftUI.
Why this location matters: its the Clawdbot-owned OAuth store.
Clawdbot also imports `oauth.json` into the agent auth profile store (`~/.clawdbot/agents/<agentId>/agent/auth-profiles.json`) on first use.
## 4) Permissions
### Recommended: OAuth (OpenAI Codex)
Onboarding requests TCC permissions needed for:
The macOS app should:
- Start the OpenAI Codex OAuth (PKCE) flow in the users browser.
- Auto-capture the callback on `http://127.0.0.1:1455/auth/callback` when possible.
- If the callback fails, prompt the user to paste the redirect URL or code.
- Store credentials in `~/.clawdbot/credentials/oauth.json` (same OAuth store as Anthropic).
- Set `agent.model` to `openai-codex/gpt-5.2` when the model is unset or `openai/*`.
- Notifications
- Accessibility
- Screen Recording
- Microphone / Speech Recognition
- Automation (AppleScript)
### Alternative: API key (instructions only)
## 5) CLI helper (optional)
Offer an “API key” option, but for now it is **instructions only**:
- Get an Anthropic API key.
- Provide it to Clawdbot via your preferred mechanism (env/config).
The app can symlink the bundled `clawdbot` CLI into `/usr/local/bin` and
`/opt/homebrew/bin` so terminal workflows work out of the box.
Note: environment variables are often confusing when the Gateway is launched by a GUI app (launchd environment != your shell).
## 6) Onboarding chat (dedicated session)
### Model safety rule
After setup, the app opens a dedicated onboarding chat session so the agent can
introduce itself and guide next steps. This keeps firstrun guidance separate
from your normal conversation.
Clawdbot should **always pass** `--model` when invoking the embedded agent (dont rely on defaults).
## Agent bootstrap ritual
Example (CLI):
On the first agent run, Clawdbot bootstraps a workspace (default `~/clawd`):
```bash
clawdbot agent --mode rpc --model anthropic/claude-opus-4-5 "<message>"
```
- Seeds `AGENTS.md`, `BOOTSTRAP.md`, `IDENTITY.md`, `USER.md`
- Runs a short Q&A ritual (one question at a time)
- Writes identity + preferences to `IDENTITY.md`, `USER.md`, `SOUL.md`
- Removes `BOOTSTRAP.md` when finished so it only runs once
If the user skips auth, onboarding should be clear: the agent likely wont respond until auth is configured.
## Optional: Gmail hooks (manual)
## 4) Onboarding chat (dedicated session)
The onboarding flow now embeds the SwiftUI chat view directly. It uses a **special session key**
(`onboarding`) so the “newborn agent” ritual stays separate from the main chat.
This onboarding chat is where the agent:
- does the BOOTSTRAP.md identity ritual (one question at a time)
- visits **soul.md** with the user and writes `SOUL.md` (values, tone, boundaries)
- asks how the user wants to talk (web-only / Telegram / WhatsApp)
- guides linking steps (including showing a QR inline for WhatsApp via the `whatsapp_login` tool)
If the workspace bootstrap is already complete (BOOTSTRAP.md removed), the onboarding chat step is skipped.
## 2.5) Optional: Connect Gmail
The macOS onboarding includes an optional Gmail step. It runs:
Gmail Pub/Sub setup is currently a manual step. Use:
```bash
clawdbot hooks gmail setup --account you@gmail.com
```
This writes the full `hooks.gmail` config, installs `gcloud` / `gog` / `tailscale`
via Homebrew if needed, and configures the Pub/Sub push endpoint. After setup,
restart the gateway so the internal Gmail watcher starts.
See [/automation/gmail-pubsub](/automation/gmail-pubsub) for details.
Once setup is complete, the user can switch to the normal chat (`main`) via the menu bar panel.
## Remote mode notes
## 5) Agent bootstrap ritual (outside onboarding)
When the Gateway runs on another machine, credentials and workspace files live
**on that host**. If you need OAuth in remote mode, create:
We no longer collect identity in the onboarding wizard. Instead, the **first agent run** performs a playful bootstrap ritual using files in the workspace:
- `~/.clawdbot/credentials/oauth.json`
- `~/.clawdbot/agents/<agentId>/agent/auth-profiles.json`
- Workspace is created implicitly (default `~/clawd`, configurable via `agent.workspace`) when local is selected,
but only if the folder is empty or already contains `AGENTS.md`.
- Files are seeded: `AGENTS.md`, `BOOTSTRAP.md`, `IDENTITY.md`, `USER.md`.
- `BOOTSTRAP.md` tells the agent to keep it conversational:
- open with a cute hello
- ask **one question at a time** (no multi-question bombardment)
- offer a small set of suggestions where helpful (name, creature, emoji)
- wait for the users reply before asking the next question
- The agent writes results to:
- `IDENTITY.md` (agent name, vibe/creature, emoji)
- `USER.md` (who the user is + how they want to be addressed)
- `SOUL.md` (identity, tone, boundaries — crafted from the soul.md prompt)
- `~/.clawdbot/clawdbot.json` (structured identity defaults)
- After the ritual, the agent **deletes `BOOTSTRAP.md`** so it only runs once.
Identity data still feeds the same defaults as before:
- outbound prefix emoji (`messages.responsePrefix`)
- group mention patterns / wake words
- default session intro (“You are Samantha…”)
- macOS UI labels
## 6) Workspace notes (no explicit onboarding step)
The workspace is created automatically as part of agent bootstrap (no dedicated onboarding screen).
Recommendation: treat the workspace as the agents “memory” and make it a git repo (ideally private) so identity + memories are backed up:
```bash
cd ~/clawd
git init
git add AGENTS.md
git commit -m "Add agent workspace"
```
Daily memory lives under `memory/` in the workspace:
- one file per day: `memory/YYYY-MM-DD.md`
- read today + yesterday on session start
- keep it short (durable facts, preferences, decisions; avoid secrets)
## Remote mode note (why OAuth is hidden)
If the Gateway runs on another machine, OAuth credentials must be created/stored on that host (where the agent runtime runs).
For now, remote onboarding should:
- explain why OAuth isn't shown
- point the user at the credential location (`~/.clawdbot/credentials/oauth.json`) and the auth profile store (`~/.clawdbot/agents/<agentId>/agent/auth-profiles.json`) on the gateway host
- mention that the **bootstrap ritual happens on the gateway host** (same BOOTSTRAP/IDENTITY/USER files)
### Manual credential setup
On the gateway host, create `~/.clawdbot/credentials/oauth.json` with this exact format:
```json
{
"anthropic": { "type": "oauth", "access": "sk-ant-oat01-...", "refresh": "sk-ant-ort01-...", "expires": 1767304352803 },
"openai-codex": { "type": "oauth", "access": "eyJhbGciOi...", "refresh": "oai-refresh-...", "expires": 1767304352803, "accountId": "acct_..." }
}
```
Set permissions: `chmod 600 ~/.clawdbot/credentials/oauth.json`
**Note:** Clawdbot can import from legacy pi-coding-agent paths (`~/.pi/agent/oauth.json`, etc.), but Claude Code/Codex CLI credentials live in different files.
### Using Claude Code + Codex CLI credentials (direct)
If these CLIs are installed on the **gateway host** and youve already signed in, Clawdbot auto-syncs their OAuth tokens into the per-agent auth profile store (`~/.clawdbot/agents/<agentId>/agent/auth-profiles.json`) on load:
- **Claude Code**: reads `~/.claude/.credentials.json` → profile `anthropic:claude-cli`
- **Codex CLI**: reads `~/.codex/auth.json` → profile `openai-codex:codex-cli`
Verification:
```bash
clawdbot providers list
```
### Fallback: convert Claude Code credentials into `oauth.json`
If you dont want to install Claude Code on the gateway host, you can still seed the legacy import file:
```bash
cat ~/.claude/.credentials.json | jq '{
anthropic: {
type: "oauth",
access: .claudeAiOauth.accessToken,
refresh: .claudeAiOauth.refreshToken,
expires: .claudeAiOauth.expiresAt
}
}' > ~/.clawdbot/credentials/oauth.json
chmod 600 ~/.clawdbot/credentials/oauth.json
```
## Workspace backup (recommended)
We suggest creating a **private GitHub repository** to back up the agent
workspace. The agent is really good at keeping a git repo in shape, and GitHub
is the perfect place for it. Keep it **private**.
Setup steps: https://docs.clawd.bot/concepts/agent-workspace
on the gateway host.

View File

@@ -43,10 +43,6 @@ Stored under `~/.clawdbot/credentials/`:
Treat these as sensitive (they gate access to your assistant).
### Source of truth (code)
- DM pairing storage + code generation: [`src/pairing/pairing-store.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/pairing/pairing-store.ts)
- CLI commands: [`src/cli/pairing-cli.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/pairing-cli.ts)
## 2) Node pairing (iOS/Android nodes joining the gateway)
@@ -70,11 +66,6 @@ Stored under `~/.clawdbot/nodes/`:
Full protocol + design notes: [Gateway pairing](/gateway/pairing)
### Source of truth (code)
- Node pairing store (pending/paired + token issuance): [`src/infra/node-pairing.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/node-pairing.ts)
- Gateway methods/events (`node.pair.*`): [`src/gateway/server-methods/nodes.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/nodes.ts)
- CLI: [`src/cli/nodes-cli.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/nodes-cli.ts)
## Related docs

View File

@@ -55,7 +55,7 @@ clawdbot providers login
clawdbot health
```
If onboarding is still WIP/broken on your build:
If onboarding is not available in your build:
- Run `clawdbot setup`, then `clawdbot providers login`, then start the Gateway manually (`clawdbot gateway`).
## Bleeding edge workflow (Gateway in a terminal)
@@ -77,7 +77,7 @@ pnpm install
pnpm gateway:watch
```
`gateway:watch` runs `src/entry.ts gateway --force` and reloads on [`src/**/*.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/**/*.ts) changes.
`gateway:watch` runs the gateway in watch mode and reloads on TypeScript changes.
### 2) Point the macOS app at your running Gateway

View File

@@ -1,21 +1,44 @@
---
summary: "Design notes for a direct `clawdbot agent` CLI subcommand without WhatsApp delivery"
summary: "Direct `clawdbot agent` CLI runs (with optional delivery)"
read_when:
- Adding or modifying the agent CLI entrypoint
---
# `clawdbot agent` (direct-to-agent invocation)
# `clawdbot agent` (direct agent runs)
`clawdbot agent` lets you talk to the **embedded** agent runtime directly (no chat send unless you opt in), while reusing the same session store and thinking/verbose persistence as inbound auto-replies.
`clawdbot agent` runs a single agent turn without needing an inbound chat message.
By default it goes **through the Gateway**; add `--local` to force the embedded
runtime on the current machine.
## Behavior
- Required: `--message <text>`
- Session selection:
- If `--session-id` is given, reuse it.
- Else if `--to <e164>` is given, derive the session key from `session.scope` (direct chats collapse to `main`, or `global` when scope is global).
- Runs the embedded Pi agent (configured via `agent`).
- Thinking/verbose:
- Flags `--thinking <off|minimal|low|medium|high>` and `--verbose <on|off>` persist into the session store.
- `--to <E.164>` derives the session key (normal direct-chat routing), **or**
- `--session-id <id>` reuses an existing session by id
- Runs the same embedded agent runtime as normal inbound replies.
- Thinking/verbose flags persist into the session store.
- Output:
- Default: prints text (and `MEDIA:<url>` lines) to stdout.
- `--json`: prints structured payloads + meta.
- Optional: `--deliver` sends the reply back to the selected provider (`whatsapp`, `telegram`, `discord`, `signal`, `imessage`).
- default: prints reply text (plus `MEDIA:<url>` lines)
- `--json`: prints structured payload + metadata
- Optional delivery back to a provider with `--deliver` + `--provider`.
If the Gateway is unreachable, the CLI **falls back** to the embedded local run.
## Examples
```bash
clawdbot agent --to +15555550123 --message "status update"
clawdbot agent --session-id 1234 --message "Summarize inbox" --thinking medium
clawdbot agent --to +15555550123 --message "Trace logs" --verbose on --json
clawdbot agent --to +15555550123 --message "Summon reply" --deliver
```
## Flags
- `--local`: run locally (requires provider keys in your shell)
- `--deliver`: send the reply to the chosen provider (requires `--to`)
- `--provider`: `whatsapp|telegram|discord|slack|signal|imessage` (default: `whatsapp`)
- `--thinking <off|minimal|low|medium|high>`: persist thinking level
- `--verbose <on|off>`: persist verbose level
- `--timeout <seconds>`: override agent timeout
- `--json`: output structured JSON

View File

@@ -1,254 +1,142 @@
---
summary: "Spec: integrated browser control server + action commands"
summary: "Integrated browser control server + action commands"
read_when:
- Adding agent-controlled browser automation
- Debugging why clawd is interfering with your own Chrome
- Implementing browser settings + lifecycle in the macOS app
---
# Browser (integrated) — clawd-managed Chrome
# Browser (clawd-managed)
Status: draft spec · Date: 2025-12-20
Clawdbot can run a **dedicated Chrome/Chromium profile** that the agent controls.
It is isolated from your personal browser and is managed through a small local
control server.
Goal: give the **clawd** persona its own browser that is:
- Visually distinct (lobster-orange, profile labeled "clawd").
- Fully agent-manageable (start/stop, list tabs, focus/close tabs, open URLs, screenshot).
- Non-interfering with the user's own browser (separate profile + dedicated ports).
## What you get
This doc covers the macOS app/gateway side. It intentionally does not mandate
Playwright vs Puppeteer; the key is the **contract** and the **separation guarantees**.
- A separate browser profile named **clawd** (orange accent by default).
- Deterministic tab control (list/open/focus/close).
- Agent actions (click/type/drag/select), snapshots, screenshots, PDFs.
- Optional multi-profile support (`clawd`, `work`, `remote`, ...).
## User-facing settings
This browser is **not** your daily driver. It is a safe, isolated surface for
agent automation and verification.
Add a dedicated settings section (preferably under **Skills** or its own "Browser" tab):
## Quick start
- **Enable clawd browser** (`default: on`)
- When off: no browser is launched, and browser tools return "disabled".
- **Browser control URL** (`default: http://127.0.0.1:18791`)
- Interpreted as the base URL of the local/remote browser-control server.
- If the URL host is not loopback, Clawdbot must **not** attempt to launch a local
browser; it only connects.
- **CDP URL** (`default: controlUrl + 1`)
- Base URL for Chrome DevTools Protocol (e.g. `http://127.0.0.1:18792`).
- Set this to a non-loopback host to attach the local control server to a remote
Chrome/Chromium CDP endpoint (SSH/Tailscale tunnel recommended).
- If the CDP URL host is non-loopback, clawd does **not** auto-launch a local browser.
- If you tunnel a remote CDP to `localhost`, set **Attach to existing only** to
avoid accidentally launching a local browser.
- **Accent color** (`default: #FF4500`, "lobster-orange")
- Used to theme the clawd browser profile (best-effort) and to tint UI indicators
in Clawdbot.
```bash
clawdbot browser status
clawdbot browser start
clawdbot browser open https://example.com
clawdbot browser snapshot
```
Optional (advanced, can be hidden behind Debug initially):
- **Use headless browser** (`default: off`)
- **Attach to existing only** (`default: off`) — if on, never launch; only connect if
already running.
- **Browser executable path** (override, optional)
- **No sandbox** (`default: off`) — adds `--no-sandbox` + `--disable-setuid-sandbox`
If you get “Browser disabled, enable it in config (see below) and restart the
Gateway.
### Port convention
## Configuration
Clawdbot already uses:
- Gateway WebSocket: `18789`
- Bridge (voice/node): `18790`
Browser settings live in `~/.clawdbot/clawdbot.json`.
For the clawd browser-control server, use "family" ports:
- Browser control HTTP API: `18791` (bridge + 1)
- Browser CDP/debugging port: `18792` (control + 1)
- Canvas host HTTP: `18793` by default, mounted at `/__clawdbot__/canvas/`
The user usually only configures the **control URL** (port `18791`). CDP is an
internal detail.
## Browser isolation guarantees (non-negotiable)
1) **Dedicated user data dir**
- Never attach to or reuse the user's default Chrome profile.
- Store clawd browser state under an app-owned directory, e.g.:
- `~/Library/Application Support/Clawdbot/browser/clawd/` (mac app)
- or `~/.clawdbot/browser/clawd/` (gateway/CLI)
2) **Dedicated ports**
- Never use `9222` (reserved for ad-hoc dev workflows; avoids colliding with
`agent-tools/browser-tools`).
- Default ports are `18791/18792` unless overridden.
3) **Named tab/page management**
- The agent must be able to enumerate and target tabs deterministically (by
stable `targetId` or equivalent), not "last tab".
## Browser selection (macOS + Linux)
On startup (when enabled + local URL), Clawdbot chooses the browser executable
in this order:
1) **Google Chrome Canary** (if installed)
2) **Chromium** (if installed)
3) **Google Chrome** (fallback)
Linux:
- Looks for `google-chrome` / `chromium` in common system paths.
- Use **Browser executable path** to force a specific binary.
Implementation detail:
- macOS: detection is by existence of the `.app` bundle under `/Applications`
(and optionally `~/Applications`), then using the resolved executable path.
- Linux: common `/usr/bin`/`/snap/bin` paths.
Rationale:
- Canary/Chromium are easy to visually distinguish from the user's daily driver.
- Chrome fallback ensures the feature works on a stock machine.
## Visual differentiation ("lobster-orange")
The clawd browser should be obviously different at a glance:
- Profile name: **clawd**
- Profile color: **#FF4500**
Preferred behavior:
- Seed/patch the profile's preferences on first launch so the color + name persist.
Fallback behavior:
- If preferences patching is not reliable, open with the dedicated profile and let
the user set the profile color/name once via Chrome UI; it must persist because
the `userDataDir` is persistent.
## Control server contract (vNext)
Expose a small local HTTP API (and/or gateway RPC surface) so the agent can manage
state without touching the user's Chrome.
Basics:
- `GET /` status payload (enabled/running/pid/cdpPort/etc)
- `POST /start` start browser
- `POST /stop` stop browser
- `GET /tabs` list tabs
- `POST /tabs/open` open a new tab
- `POST /tabs/focus` focus a tab by id/prefix
- `DELETE /tabs/:targetId` close a tab by id/prefix
Inspection:
- `POST /screenshot` `{ targetId?, fullPage?, ref?, element?, type? }`
- `GET /snapshot` `?format=aria|ai&targetId?&limit?`
- `GET /console` `?level?&targetId?`
- `POST /pdf` `{ targetId? }`
Actions:
- `POST /navigate`
- `POST /act` `{ kind, targetId?, ... }` where `kind` is one of:
- `click`, `type`, `press`, `hover`, `drag`, `select`, `fill`, `wait`, `resize`, `close`, `evaluate`
Hooks (arming):
- `POST /hooks/file-chooser` `{ targetId?, paths, timeoutMs? }`
- `POST /hooks/dialog` `{ targetId?, accept, promptText?, timeoutMs? }`
### "Is it open or closed?"
"Open" means:
- the control server is reachable at the configured URL **and**
- it reports a live browser connection.
"Closed" means:
- control server not reachable, or server reports no browser.
Clawdbot should treat "open/closed" as a health check (fast path), not by scanning
global Chrome processes (avoid false positives).
## Multi-profile support
Clawdbot supports multiple named browser profiles, each with:
- Dedicated CDP port (auto-allocated from 18800-18899) **or** a per-profile CDP URL
- Persistent user data directory (`~/.clawdbot/browser/<name>/user-data/`)
- Unique color for visual distinction
### Configuration
```json
```json5
{
"browser": {
"enabled": true,
"defaultProfile": "clawd",
"profiles": {
"clawd": { "cdpPort": 18800, "color": "#FF4500" },
"work": { "cdpPort": 18801, "color": "#0066CC" },
"remote": { "cdpUrl": "http://10.0.0.42:9222", "color": "#00AA00" }
browser: {
enabled: true, // default: true
controlUrl: "http://127.0.0.1:18791",
cdpUrl: "http://127.0.0.1:18792", // defaults to controlUrl + 1
defaultProfile: "clawd",
color: "#FF4500",
headless: false,
noSandbox: false,
attachOnly: false,
executablePath: "/Applications/Chromium.app/Contents/MacOS/Chromium",
profiles: {
clawd: { cdpPort: 18800, color: "#FF4500" },
work: { cdpPort: 18801, color: "#0066CC" },
remote: { cdpUrl: "http://10.0.0.42:9222", color: "#00AA00" }
}
}
}
```
### Profile actions
Notes:
- `controlUrl` defaults to `http://127.0.0.1:18791`.
- If you override the Gateway port (`gateway.port` or `CLAWDBOT_GATEWAY_PORT`),
the default browser ports shift to stay in the same “family” (control = gateway + 2).
- `cdpUrl` defaults to `controlUrl + 1` when unset.
- `attachOnly: true` means “never launch Chrome; only attach if it is already running.”
- `GET /profiles` — list all profiles with status
- `POST /profiles/create` `{ name, color?, cdpUrl? }` — create new profile (auto-allocates port if no `cdpUrl`)
- `DELETE /profiles/:name` — delete profile (stops browser + removes user data for local profiles)
- `POST /reset-profile?profile=<name>` — kill orphan process on profile's port (local profiles only)
## Local vs remote control
### Profile parameter
- **Local control (default):** `controlUrl` is loopback (`127.0.0.1`/`localhost`).
The Gateway starts the control server and can launch Chrome.
- **Remote control:** `controlUrl` is non-loopback. The Gateway **does not** start
a local server; it assumes you are pointing at an existing server elsewhere.
- **Remote CDP:** set `browser.profiles.<name>.cdpUrl` (or `browser.cdpUrl`) to
attach to a remote Chrome. In this case, Clawdbot will not launch a local browser.
All existing endpoints accept optional `?profile=<name>` query parameter:
- `GET /?profile=work` — status for work profile
- `POST /start?profile=work` — start work profile browser
- `GET /tabs?profile=work` — list tabs for work profile
- `POST /tabs/open?profile=work` — open tab in work profile
- etc.
## Profiles (multi-browser)
When `profile` is omitted, uses `browser.defaultProfile` (defaults to "clawd").
Clawdbot supports multiple named profiles. Each profile has its own:
- user data directory
- CDP port (local) or CDP URL (remote)
- accent color
### Agent browser tool
Defaults:
- The `clawd` profile is auto-created if missing.
- Local CDP ports allocate from **1880018899** by default.
- Deleting a profile moves its local data directory to Trash.
The `browser` tool accepts an optional `profile` parameter for all actions:
All control endpoints accept `?profile=<name>`; the CLI uses `--browser-profile`.
```json
{
"action": "open",
"targetUrl": "https://example.com",
"profile": "work"
}
```
## Isolation guarantees
This routes the operation to the specified profile's browser instance. Omitting
`profile` uses the default profile.
- **Dedicated user data dir**: never touches your personal Chrome profile.
- **Dedicated ports**: avoids `9222` to prevent collisions with dev workflows.
- **Deterministic tab control**: target tabs by `targetId`, not “last tab”.
### Profile naming rules
## Browser selection
- Lowercase alphanumeric characters and hyphens only
- Must start with a letter or number (not a hyphen)
- Maximum 64 characters
- Examples: `clawd`, `work`, `my-project-1`
When launching locally, Clawdbot picks the first available:
1. Chrome Canary
2. Chromium
3. Chrome
### Port allocation
You can override with `browser.executablePath`.
Ports are allocated from range 18800-18899 (~100 profiles max). This is far more
than practical use — memory and CPU exhaustion occur well before port exhaustion.
Ports are allocated once at profile creation and persisted permanently.
Remote profiles are attach-only and do **not** use the local port range.
## Interaction with the agent (clawd)
Platforms:
- macOS: checks `/Applications` and `~/Applications`.
- Linux: looks for `google-chrome`, `chromium`, etc.
- Windows: checks common install locations.
The agent should use browser tools only when:
- enabled in settings
- control URL is configured
## Control API (optional)
If disabled, tools must fail fast with a friendly error ("Browser disabled in settings").
If you want to integrate directly, the browser control server exposes a small
HTTP API:
The agent should not assume tabs are ephemeral. It should:
- call `browser.tabs.list` to discover existing tabs first
- reuse an existing tab when appropriate (e.g. a persistent "main" tab)
- avoid opening duplicate tabs unless asked
- Status/start/stop: `GET /`, `POST /start`, `POST /stop`
- Tabs: `GET /tabs`, `POST /tabs/open`, `POST /tabs/focus`, `DELETE /tabs/:targetId`
- Snapshot/screenshot: `GET /snapshot`, `POST /screenshot`
- Actions: `POST /navigate`, `POST /act`
- Hooks: `POST /hooks/file-chooser`, `POST /hooks/dialog`
- Debugging: `GET /console`, `POST /pdf`
## CLI quick reference (one example each)
All endpoints accept `?profile=<name>`.
All commands accept `--browser-profile <name>` to target a specific profile (default: `clawd`).
### Playwright requirement
Some features (navigate/act/ai snapshot, element screenshots, PDF) require
Playwright. In embedded gateway builds, Playwright may be unavailable; those
endpoints return a clear 501 error. ARIA snapshots and basic screenshots still work.
## CLI quick reference
All commands accept `--browser-profile <name>` to target a specific profile.
Profile management:
- `clawdbot browser profiles`
- `clawdbot browser create-profile --name work`
- `clawdbot browser create-profile --name remote --cdp-url http://10.0.0.42:9222`
- `clawdbot browser delete-profile --name work`
Basics:
- `clawdbot browser status`
- `clawdbot browser start`
- `clawdbot browser stop`
- `clawdbot browser reset-profile`
- `clawdbot browser tabs`
- `clawdbot browser open https://example.com`
- `clawdbot browser focus abcd1234`
@@ -260,6 +148,8 @@ Inspection:
- `clawdbot browser screenshot --ref 12`
- `clawdbot browser snapshot`
- `clawdbot browser snapshot --format aria --limit 200`
- `clawdbot browser console --level error`
- `clawdbot browser pdf`
Actions:
- `clawdbot browser navigate https://example.com`
@@ -271,39 +161,27 @@ Actions:
- `clawdbot browser drag 10 11`
- `clawdbot browser select 9 OptionA OptionB`
- `clawdbot browser upload /tmp/file.pdf`
- `clawdbot browser fill --fields '[{\"ref\":\"1\",\"value\":\"Ada\"}]'`
- `clawdbot browser fill --fields '[{"ref":"1","type":"text","value":"Ada"}]'`
- `clawdbot browser dialog --accept`
- `clawdbot browser wait --text "Done"`
- `clawdbot browser evaluate --fn '(el) => el.textContent' --ref 7`
- `clawdbot browser evaluate --fn "document.querySelector('.my-class').click()"`
- `clawdbot browser console --level error`
- `clawdbot browser pdf`
Notes:
- `upload` and `dialog` are **arming** calls; run them before the click/press that triggers the chooser/dialog.
- `upload` can take a `ref` to auto-click after arming (useful for single-step file uploads).
- `upload` can also take `inputRef` (aria ref) or `element` (CSS selector) to set `<input type="file">` directly without waiting for a file chooser.
- The arm default timeout is **2 minutes** (clamped to max 2 minutes); pass `timeoutMs` if you need shorter.
- `snapshot` defaults to `ai`; `aria` returns an accessibility tree for debugging.
- `click`/`type` require `ref` from `snapshot --format ai`; use `evaluate` for rare CSS selector one-offs.
- Avoid `wait` by default; use it only in exceptional cases when there is no reliable UI state to wait on.
- `upload` and `dialog` are **arming** calls; run them before the click/press
that triggers the chooser/dialog.
- `upload` can also set file inputs directly via `--input-ref` or `--element`.
- `snapshot` defaults to `ai` when available; use `--format aria` for the
accessibility tree.
- `click`/`type` require a `ref` from `snapshot` (CSS selectors are intentionally
not supported for actions).
## Security & privacy notes
## Security & privacy
- The clawd browser profile is app-owned; it may contain logged-in sessions.
Treat it as sensitive data.
- The control server must bind to loopback only by default (`127.0.0.1`) unless the
user explicitly configures a non-loopback URL.
- Never reuse or copy the user's default Chrome profile.
- Remote CDP endpoints should be tunneled or protected; CDP is highly privileged.
## Non-goals (for the first cut)
- Cross-device "sync" of tabs between Mac and Pi.
- Sharing the user's logged-in Chrome sessions automatically.
- General-purpose web scraping; this is primarily for "close-the-loop" verification
and interaction.
- The clawd browser profile may contain logged-in sessions; treat it as sensitive.
- Keep control URLs loopback-only unless you intentionally expose the server.
- Remote CDP endpoints are powerful; tunnel and protect them.
## Troubleshooting
For Linux-specific issues (especially Ubuntu with snap Chromium), see [browser-linux-troubleshooting](/tools/browser-linux-troubleshooting).
For Linux-specific issues (especially snap Chromium), see
[Browser troubleshooting](/tools/browser-linux-troubleshooting).

View File

@@ -294,25 +294,12 @@ Node targeting:
- Respect user consent for camera/screen capture.
- Use `status/describe` to ensure permissions before invoking media commands.
## How the model sees tools (pi-mono internals)
## How tools are presented to the agent
Tools are exposed to the model in **two parallel channels**:
Tools are exposed in two parallel channels:
1) **System prompt text**: a human-readable list + guidelines.
2) **Provider tool schema**: the actual function/tool declarations sent to the model API.
1) **System prompt text**: a human-readable list + guidance.
2) **Tool schema**: the structured function definitions sent to the model API.
In pi-mono:
- System prompt builder: [`packages/coding-agent/src/core/system-prompt.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/src/core/system-prompt.ts)
- Builds the `Available tools:` list from `toolDescriptions`.
- Appends skills and project context.
- Tool schemas passed to providers:
- OpenAI: [`packages/ai/src/providers/openai-responses.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/ai/src/providers/openai-responses.ts) (`convertTools`)
- Anthropic: [`packages/ai/src/providers/anthropic.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/ai/src/providers/anthropic.ts) (`convertTools`)
- Gemini: [`packages/ai/src/providers/google-shared.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/ai/src/providers/google-shared.ts) (`convertTools`)
- Tool execution loop:
- Agent loop: [`packages/ai/src/agent/agent-loop.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/ai/src/agent/agent-loop.ts)
- Validates tool arguments and executes tools, then appends `toolResult` messages.
In Clawdbot:
- System prompt append: [`src/agents/system-prompt.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/system-prompt.ts)
- Tool list injected via `createClawdbotCodingTools()` in [`src/agents/pi-tools.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-tools.ts)
That means the agent sees both “what tools exist” and “how to call them.” If a tool
doesnt appear in the system prompt or the schema, the model cannot call it.

View File

@@ -65,8 +65,3 @@ Use SSH tunneling or Tailscale to reach the Gateway WS.
## Notes
- The TUI shows Gateway chat deltas (`event: chat`) and agent tool events.
- It registers as a Gateway client with `mode: "tui"` for presence and debugging.
## Files
- CLI: [`src/cli/tui-cli.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/tui-cli.ts)
- Runner: [`src/tui/tui.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/tui/tui.ts)
- Gateway client: [`src/tui/gateway-chat.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/tui/gateway-chat.ts)