diff --git a/docs/automation/gmail-pubsub.md b/docs/automation/gmail-pubsub.md index d09e3c0ee..2c94c2c2c 100644 --- a/docs/automation/gmail-pubsub.md +++ b/docs/automation/gmail-pubsub.md @@ -171,9 +171,9 @@ Notes: Recommended: `clawdbot hooks gmail run` wraps the same flow and auto-renews the watch. -## Expose the handler (dev, unsupported hack) +## Expose the handler (advanced, unsupported) -If you insist on a non-Tailscale tunnel, wire it manually and use the public URL in the push +If you need a non-Tailscale tunnel, wire it manually and use the public URL in the push subscription (unsupported, no guardrails): ```bash diff --git a/docs/cli/index.md b/docs/cli/index.md index 0c9c52108..d64ec0faa 100644 --- a/docs/cli/index.md +++ b/docs/cli/index.md @@ -7,8 +7,7 @@ read_when: # CLI reference -This page mirrors `src/cli/*` and is the source of truth for CLI behavior. -If you change the CLI code, update this doc. +This page describes the current CLI behavior. If commands change, update this doc. ## Global flags @@ -25,7 +24,7 @@ If you change the CLI code, update this doc. ## Color palette -Clawdbot uses a lobster palette for CLI output. Source of truth: `src/terminal/theme.ts`. +Clawdbot uses a lobster palette for CLI output. - `accent` (#FF5A2D): headings, provider labels, primary highlights. - `accentBright` (#FF7A3D): command names, emphasis. diff --git a/docs/concepts/agent-loop.md b/docs/concepts/agent-loop.md index a60d02138..61cb748c0 100644 --- a/docs/concepts/agent-loop.md +++ b/docs/concepts/agent-loop.md @@ -5,11 +5,11 @@ read_when: --- # Agent Loop (Clawdis) -Short, exact flow of one agent run. Source of truth: current code in `src/`. +Short, exact flow of one agent run. ## Entry points -- Gateway RPC: `agent` and `agent.wait` in [`src/gateway/server-methods/agent.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/agent.ts). -- CLI: `agentCommand` in [`src/commands/agent.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/commands/agent.ts). +- Gateway RPC: `agent` and `agent.wait`. +- CLI: `agent` command. ## High-level flow 1) `agent` RPC validates params, resolves session (sessionKey/sessionId), persists session metadata, returns `{ runId, acceptedAt }` immediately. @@ -37,10 +37,8 @@ Short, exact flow of one agent run. Source of truth: current code in `src/`. - `tool`: streamed tool events from pi-agent-core ## Chat provider handling -- `createAgentEventHandler` in [`src/gateway/server-chat.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-chat.ts): - - buffers assistant deltas - - emits chat `delta` messages - - emits chat `final` when **lifecycle end/error** arrives +- Assistant deltas are buffered into chat `delta` messages. +- A chat `final` is emitted on **lifecycle end/error**. ## Timeouts - `agent.wait` default: 30s (just the wait). `timeoutMs` param overrides. @@ -51,11 +49,3 @@ Short, exact flow of one agent run. Source of truth: current code in `src/`. - AbortSignal (cancel) - Gateway disconnect or RPC timeout - `agent.wait` timeout (wait-only, does not stop agent) - -## Files -- [`src/gateway/server-methods/agent.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/agent.ts) -- [`src/gateway/server-methods/agent-job.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/agent-job.ts) -- [`src/commands/agent.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/commands/agent.ts) -- [`src/agents/pi-embedded-runner.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-embedded-runner.ts) -- [`src/agents/pi-embedded-subscribe.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-embedded-subscribe.ts) -- [`src/gateway/server-chat.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-chat.ts) diff --git a/docs/concepts/agent.md b/docs/concepts/agent.md index a4d4bb780..f970da7f2 100644 --- a/docs/concepts/agent.md +++ b/docs/concepts/agent.md @@ -5,7 +5,7 @@ read_when: --- # Agent Runtime 🤖 -CLAWDBOT runs a single embedded agent runtime derived from **p-mono** (internal name: **p**). +CLAWDBOT runs a single embedded agent runtime derived from **p-mono**. ## Workspace (required) @@ -43,9 +43,9 @@ To disable bootstrap file creation entirely (for pre-seeded workspaces), set: { agent: { skipBootstrap: true } } ``` -## Built-in tools (internal) +## Built-in tools -p’s embedded core tools (read/bash/edit/write and related internals) are defined in code and always available. `TOOLS.md` does **not** control which tools exist; it’s guidance for how *you* want them used. +Core tools (read/bash/edit/write and related system tools) are always available. `TOOLS.md` does **not** control which tools exist; it’s guidance for how *you* want them used. ## Skills @@ -63,18 +63,6 @@ Clawdbot reuses pieces of the p-mono codebase (models/tools), but **session mana - No p-coding agent runtime. - No `~/.pi/agent` or `/.pi` settings are consulted. -## Peter @ steipete (only) - -Apply these notes **only** when the user is Peter Steinberger at steipete. - -- Gateway runs on the **Mac Studio in London**. -- Primary work computer: **MacBook Pro**. -- Peter travels between **Vienna** and **London**; there are two networks bridged via **Tailscale**. -- For debugging, connect to the Mac Studio (London) or MacBook Pro (primary). -- There is also an **M1 MacBook Pro** on the Vienna tailnet you can use to access the Vienna network. -- Nodes can be accessed via the `clawdbot` binary (`pnpm clawdbot` in `~/Projects/clawdbot`). -- See also `skills/clawdbot*` for node/browser/canvas/cron usage. - ## Sessions Session transcripts are stored as JSONL at: diff --git a/docs/concepts/architecture.md b/docs/concepts/architecture.md index 07e863575..f473b4c6a 100644 --- a/docs/concepts/architecture.md +++ b/docs/concepts/architecture.md @@ -3,67 +3,55 @@ summary: "WebSocket gateway architecture, components, and client flows" read_when: - Working on gateway protocol, clients, or transports --- -# Gateway Architecture +# Gateway architecture Last updated: 2026-01-05 ## Overview -- A single long-lived **Gateway** process owns all messaging surfaces (WhatsApp via Baileys, Telegram via grammY, Slack via Bolt, Discord via discord.js, Signal via signal-cli, iMessage via imsg, WebChat) and the control/event plane. -- All clients (macOS app, CLI, web UI, automations) connect to the Gateway over one transport: **WebSocket on the configured bind host** (default `127.0.0.1:18789`; tunnel or VPN for remote). -- One Gateway per host; it is the only place that is allowed to open a WhatsApp session. All sends/agent runs go through it. -- By default: the Gateway exposes a Canvas host on `canvasHost.port` (default `18793`), serving `~/clawd/canvas` at `/__clawdbot__/canvas/` with live-reload; disable via `canvasHost.enabled=false` or `CLAWDBOT_SKIP_CANVAS_HOST=1`. -## Implementation snapshot (current code) - -### TypeScript Gateway ([`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts)) -- Single HTTP + WebSocket server (default `18789`); bind policy `loopback|lan|tailnet|auto`. Refuses non-loopback binds without auth; Tailscale serve/funnel requires loopback. -- Handshake: first frame must be a `connect` request; AJV validates request + params against TypeBox schemas; protocol negotiated via `minProtocol`/`maxProtocol`. -- `hello-ok` includes snapshot (presence/health/stateVersion/uptime/configPath/stateDir), features (methods/events), policy (max payload/buffer/tick), and `canvasHostUrl` when available. -- Events emitted: `agent`, `chat`, `presence`, `tick`, `health`, `heartbeat`, `cron`, `talk.mode`, `node.pair.requested`, `node.pair.resolved`, `voicewake.changed`, `shutdown`. -- Idempotency keys are required for `send`, `agent`, `chat.send`, and node invokes; the dedupe cache avoids double-sends on reconnects. Payload sizes are capped per connection. -- Optional node bridge ([`src/infra/bridge/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/bridge/server.ts)): TCP JSONL frames (`hello`, `pair-request`, `req/res`, `event`, `invoke`, `ping`). Node connect/disconnect updates presence and flows into the session bus. -- Control UI + Canvas host: HTTP serves Control UI (base path configurable) and can host the A2UI canvas via [`src/canvas-host/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/canvas-host/server.ts) (live reload). Canvas host URL is advertised to nodes + clients. - -### iOS node (`apps/ios`) -- Discovery + pairing: `BridgeDiscoveryModel` uses `NWBrowser` Bonjour discovery and reads TXT fields for LAN/tailnet host hints plus gateway/bridge/canvas ports. -- Auto-connect: `BridgeConnectionController` uses stored `node.instanceId` + Keychain token; supports manual host/port; performs `pair-and-hello`. -- Bridge runtime: `BridgeSession` actor owns an `NWConnection`, JSONL frames, `hello`/`hello-ok`, ping/pong, `req/res`, server `event`s, and `invoke` callbacks; stores `canvasHostUrl`. -- Commands: `NodeAppModel` executes `canvas.*`, `canvas.a2ui.*`, `camera.*`, `screen.record`, `location.get`. Canvas/camera/screen are blocked when backgrounded. -- Canvas + actions: `WKWebView` with A2UI action bridge; accepts actions from local-network or trusted file URLs; intercepts `clawdbot://` deep links and forwards `agent.request` to the bridge. -- Voice/talk: voice wake sends `voice.transcript` events and syncs triggers via `voicewake.get` + `voicewake.changed`; Talk Mode attaches to the bridge. - -### Android node (`apps/android`) -- Discovery + pairing: `BridgeDiscovery` uses mDNS/NSD to find `_clawdbot-bridge._tcp`, with manual host/port fallback. -- Auto-connect: `NodeRuntime` restores a stored token, performs `pair-and-hello`, and reconnects to the last discovered or manual bridge. -- Bridge runtime: `BridgeSession` owns the TCP JSONL session (`hello`/`hello-ok`, ping/pong, `req/res`, `event`, `invoke`); stores `canvasHostUrl`. -- Commands: `NodeRuntime` executes `canvas.*`, `canvas.a2ui.*`, `camera.*`, and chat/session events; foreground-only for canvas/camera. +- A single long‑lived **Gateway** owns all messaging surfaces (WhatsApp via + Baileys, Telegram via grammY, Slack, Discord, Signal, iMessage, WebChat). +- All clients (macOS app, CLI, web UI, automations) connect to the Gateway over + **one transport: WebSocket** on the configured bind host (default + `127.0.0.1:18789`). +- One Gateway per host; it is the only place that opens a WhatsApp session. +- A **bridge** (default `18790`) is used for nodes (macOS/iOS/Android). +- A **canvas host** (default `18793`) serves agent‑editable HTML and A2UI. ## Components and flows -- **Gateway (daemon)** - - Maintains WhatsApp (Baileys), Telegram (grammY), Slack (Bolt), Discord (discord.js), Signal (signal-cli), and iMessage (imsg) connections. - - Exposes a typed WS API (req/resp + server push events). - - Validates every inbound frame against JSON Schema; rejects anything before a mandatory `connect`. -- **Clients (mac app / CLI / web admin)** - - One WS connection per client. - - Send requests (`health`, `status`, `send`, `agent`, `system-presence`, toggles) and subscribe to events (`tick`, `agent`, `presence`, `shutdown`). - - On macOS, the app can also be invoked via deep links (`clawdbot://agent?...`) which translate into the same Gateway `agent` request path (see [`docs/macos.md`](/platforms/macos)). -- **Agent process (Pi)** - - Spawned by the Gateway on demand for `agent` calls; streams events back over the same WS connection. -- **WebChat** - - Serves static assets locally. - - Holds a single WS connection to the Gateway for control/data; all sends/agent runs go through the Gateway WS. - - Remote use goes through the same SSH/Tailscale tunnel as other clients. + +### Gateway (daemon) +- Maintains provider connections. +- Exposes a typed WS API (requests, responses, server‑push events). +- Validates inbound frames against JSON Schema. +- Emits events like `agent`, `chat`, `presence`, `health`, `heartbeat`, `cron`. + +### Clients (mac app / CLI / web admin) +- One WS connection per client. +- Send requests (`health`, `status`, `send`, `agent`, `system-presence`). +- Subscribe to events (`tick`, `agent`, `presence`, `shutdown`). + +### Nodes (macOS / iOS / Android) +- Connect to the **bridge** (TCP JSONL) rather than the WS server. +- Pair with the Gateway to receive a token. +- Expose commands like `canvas.*`, `camera.*`, `screen.record`, `location.get`. + +### WebChat +- Static UI that uses the Gateway WS API for chat history and sends. +- In remote setups, connects through the same SSH/Tailscale tunnel as other + clients. ## Connection lifecycle (single client) + ``` Client Gateway | | |---- req:connect -------->| |<------ res (ok) ---------| (or res error + close) - | (payload=hello-ok carries snapshot: presence + health) + | (payload=hello-ok carries snapshot: presence + health) | | - |<------ event:presence ---| (deltas) - |<------ event:tick -------| (keepalive/no-op) + |<------ event:presence ---| + |<------ event:tick -------| | | |------- req:agent ------->| |<------ res:agent --------| (ack: {runId,status:"accepted"}) @@ -71,44 +59,42 @@ Client Gateway |<------ res:agent --------| (final: {runId,status,summary}) | | ``` + ## Wire protocol (summary) + - Transport: WebSocket, text frames with JSON payloads. -- First frame must be `req {type:"req", id, method:"connect", params:{minProtocol, maxProtocol, client:{name,version,platform,mode,instanceId}, caps, auth?, locale?, userAgent? } }`. -- Server replies `res {type:"res", id, ok:true, payload: hello-ok }` or `ok:false` then closes. -- After handshake: - - Requests: `{type:"req", id, method, params}` → `{type:"res", id, ok, payload|error}` - - Events: `{type:"event", event:"agent"|"presence"|"tick"|"shutdown", payload, seq?, stateVersion?}` -- If `CLAWDBOT_GATEWAY_TOKEN` (or `--token`) is set, `connect.params.auth.token` must match; otherwise the socket closes with policy violation. -- Presence payload is structured, not free text: `{host, ip, version, mode, lastInputSeconds?, ts, reason?, tags?[], instanceId? }`. -- Agent runs are acked `{runId,status:"accepted"}` then complete with a final res `{runId,status,summary}`; streamed output arrives as `event:"agent"`. -- Protocol versions are bumped on breaking changes; clients must match `minClient`; Gateway chooses within client’s min/max. -- Idempotency keys are required for side-effecting methods (`send`, `agent`) to safely retry; server keeps a short-lived dedupe cache. -- Policy in `hello-ok` communicates payload/queue limits and tick interval. +- First frame **must** be `connect`. +- After handshake: + - Requests: `{type:"req", id, method, params}` → `{type:"res", id, ok, payload|error}` + - Events: `{type:"event", event, payload, seq?, stateVersion?}` +- If `CLAWDBOT_GATEWAY_TOKEN` (or `--token`) is set, `connect.params.auth.token` + must match or the socket closes. +- Idempotency keys are required for side‑effecting methods (`send`, `agent`) to + safely retry; the server keeps a short‑lived dedupe cache. -## Type system and codegen -- Source of truth: TypeBox (or ArkType) definitions in `protocol/` on the server. -- Build step emits JSON Schema. -- Clients: - - TypeScript: uses the same TypeBox types directly. - - Swift: generated `Codable` models via quicktype from the JSON Schema. -- Validation: AJV on the server for every inbound frame; optional client-side validation for defensive programming. +## Protocol typing and codegen -## Invariants -- Exactly one Gateway controls a single Baileys session per host. No fallbacks to ad-hoc direct Baileys sends. -- Handshake is mandatory; any non-JSON or non-connect first frame is a hard close. -- All methods and events are versioned; new fields are additive; breaking changes increment `protocol`. -- No event replay: on seq gaps, clients must refresh (`health` + `system-presence`) and continue; presence is bounded via TTL/max entries. +- TypeBox schemas define the protocol. +- JSON Schema is generated from those schemas. +- Swift models are generated from the JSON Schema. ## Remote access -- Preferred: Tailscale or VPN; alternate: SSH tunnel `ssh -N -L 18789:127.0.0.1:18789 user@host`. -- Same protocol over the tunnel; same handshake. If a shared token is configured, clients must send it in `connect.params.auth.token` even over the tunnel. -- Same protocol over the tunnel; same handshake. If a shared token is configured, clients must send it in `connect.params.auth.token` even over the tunnel. + +- Preferred: Tailscale or VPN. +- Alternative: SSH tunnel + ```bash + ssh -N -L 18789:127.0.0.1:18789 user@host + ``` +- The same handshake + auth token apply over the tunnel. ## Operations snapshot -- Start: `clawdbot gateway` (foreground, logs to stdout). - Supervise with launchd/systemd for restarts. -- Health: request `health` over WS; also surfaced in `hello-ok.health`. -- Metrics/logging: keep outside this spec; gateway should expose Prometheus text or structured logs separately. -## Migration notes -- This architecture supersedes the legacy stdin RPC and the ad-hoc TCP control port. New clients should speak only the WS protocol. Legacy compatibility is intentionally dropped. +- Start: `clawdbot gateway` (foreground, logs to stdout). +- Health: `health` over WS (also included in `hello-ok`). +- Supervision: launchd/systemd for auto‑restart. + +## Invariants + +- Exactly one Gateway controls a single Baileys session per host. +- Handshake is mandatory; any non‑JSON or non‑connect first frame is a hard close. +- Events are not replayed; clients must refresh on gaps. diff --git a/docs/concepts/group-messages.md b/docs/concepts/group-messages.md index d452208f6..7d9092e53 100644 --- a/docs/concepts/group-messages.md +++ b/docs/concepts/group-messages.md @@ -61,7 +61,6 @@ Only the owner number (from `whatsapp.allowFrom`, or the bot’s own E.164 when 4) Session-level directives (`/verbose on`, `/think high`, `/new` or `/reset`, `/compact`) apply only to that group’s session; send them as standalone messages so they register. Your personal DM session remains independent. ## Testing / verification -- Automated: `pnpm test -- src/web/auto-reply.test.ts --runInBand` (covers mention gating, history injection, sender suffix). - Manual smoke: - Send an `@clawd` ping in the group and confirm a reply that references the sender name. - Send a second ping and verify the history block is included then cleared on the next turn. diff --git a/docs/concepts/models.md b/docs/concepts/models.md index 0d1367be3..ed8cf9609 100644 --- a/docs/concepts/models.md +++ b/docs/concepts/models.md @@ -1,157 +1,114 @@ --- -summary: "Plan for models CLI: scan, list, aliases, fallbacks, status" +summary: "Models CLI: list, set, aliases, fallbacks, scan, status" read_when: - Adding or modifying models CLI (models list/set/scan/aliases/fallbacks) - Changing model fallback behavior or selection UX - Updating model scan probes (tools/images) --- -# Models CLI plan +# Models CLI -See [`docs/model-failover.md`](/concepts/model-failover) for how auth profiles rotate (OAuth vs API keys), cooldowns, and how that interacts with model fallbacks. +See [/concepts/model-failover](/concepts/model-failover) for auth profile +rotation, cooldowns, and how that interacts with fallbacks. -Goal: give clear model visibility + control (configured vs available), plus scan tooling -that prefers tool-call + image-capable models and maintains ordered fallbacks. - -## How Clawdbot models work (quick explainer) +## How model selection works Clawdbot selects models in this order: -1) The configured **primary** model (`agent.model.primary`). -2) If it fails, fallbacks in `agent.model.fallbacks` (in order). -3) Auth failover happens **inside** the provider first (see [/concepts/model-failover](/concepts/model-failover)). -Key pieces: -- `provider/model` is the canonical model id (e.g. `anthropic/claude-opus-4-5`). -- `agent.models` is the **allowlist/catalog** of models Clawdbot can use, with optional aliases and provider params. -- `agent.imageModel` is only used when the primary model **can’t** accept images. -- `models.providers` lets you add custom providers + models (written to `models.json`). -- `/model ` switches the active model for the current session; `/model list` shows what’s allowed. +1) **Primary** model (`agent.model.primary` or `agent.model`). +2) **Fallbacks** in `agent.model.fallbacks` (in order). +3) **Provider auth failover** happens inside a provider before moving to the + next model. Related: -- Context limits are model-specific; long sessions may trigger compaction. See [/concepts/compaction](/concepts/compaction). +- `agent.models` is the allowlist/catalog of models Clawdbot can use (plus aliases). +- `agent.imageModel` is used **only when** the primary model can’t accept images. -## Model recommendations +## Config keys (overview) -- [Claude Opus 4.5](https://www.anthropic.com/claude/opus): default primary for assistant + general work. It’s pricey and cap-prone, so consider the [Claude Max $200 subscription](https://www.anthropic.com/pricing/) if you live here. -- [Claude Sonnet 4.5](https://www.anthropic.com/claude/sonnet): default fallback when Opus caps out. Similar behavior with fewer limit headaches. -- [GPT-5.2-Codex](https://developers.openai.com/codex/models): recommended for coding and sub-agents. Prefer the [Codex CLI](https://developers.openai.com/codex/cli) if you want the strongest feel. +- `agent.model.primary` and `agent.model.fallbacks` +- `agent.imageModel.primary` and `agent.imageModel.fallbacks` +- `agent.models` (allowlist + aliases + provider params) +- `models.providers` (custom providers written into `models.json`) -Suggested stacks: -- Assistant-first: Opus 4.5 primary → Sonnet 4.5 fallback. -- Agentic coding: Opus 4.5 primary → GPT-5.2-Codex for sub-agents → Sonnet 4.5 fallback. +Model refs are normalized to lowercase. Provider aliases like `z.ai/*` normalize +to `zai/*`. -## Model discussions (community notes) +## CLI commands -Anecdotal notes from the Discord thread on January 4–5, 2026. Treat as “reported by users,” not a benchmark. +```bash +clawdbot models list +clawdbot models status +clawdbot models set +clawdbot models set-image -**Reported working well** -- [Claude Opus 4.5](https://www.anthropic.com/claude/opus): best overall quality in Clawdbot, especially for “assistant” work. Tradeoff is cost and hitting usage limits quickly. -- [Claude Sonnet 4.5](https://www.anthropic.com/claude/sonnet): common fallback when Opus caps out. Similar behavior with fewer limit headaches. -- [Gemini 3 Pro](https://deepmind.google/en/models/gemini/pro/): some users felt it maps well to Clawdbot’s structure. Vibe was “fits the framework” more than “best at everything.” -- [GLM](https://www.zhipuai.cn/en/): used successfully as a worker model under orchestration. Seen as strong for delegated/secondary tasks, not the primary brain. -- [MiniMax M2.1](https://platform.minimax.io/docs/guides/models-intro): “good enough” for grunt work or a cheap fallback. Community nickname was “Temu-Sonnet,” i.e. usable but not Sonnet-level polish. +clawdbot models aliases list +clawdbot models aliases add +clawdbot models aliases remove -**Mixed / unclear** -- [Antigravity](https://blog.google/technology/ai/google-ai-updates-november-2025/) (Claude Opus access): some reported extra Opus quota. Pricing/limits were unclear, so the value is hard to predict. +clawdbot models fallbacks list +clawdbot models fallbacks add +clawdbot models fallbacks remove +clawdbot models fallbacks clear -**Reported weak in Clawdbot** -- [GPT-5.2-Codex](https://developers.openai.com/codex/models) inside Clawdbot: reported as rough for conversation/assistant tasks when embedded. Same notes said Codex felt stronger via the [Codex CLI](https://developers.openai.com/codex/cli) than embedded use. -- [Grok](https://docs.x.ai/docs/models/grok-4): people tried it and then abandoned it. No strong upside showed up in the notes. +clawdbot models image-fallbacks list +clawdbot models image-fallbacks add +clawdbot models image-fallbacks remove +clawdbot models image-fallbacks clear +``` -**Theme** -- Token burn feels higher than expected in long sessions; people suspect context buildup + tool outputs. Pruning/compaction helps. Check session logs before blaming providers. See [/concepts/session](/concepts/session) and [/concepts/model-failover](/concepts/model-failover). +`clawdbot models` (no subcommand) is a shortcut for `models status`. -Want a tailored stack? Share whether you’re using Clawdbot or Clawdis and your main workload (agentic coding vs “assistant” work), and we can suggest a primary + fallback set based on these reports. +### `models list` -## Models CLI +Shows configured models by default. Useful flags: -See [/cli](/cli) for the full command tree and CLI flags. +- `--all`: full catalog +- `--local`: local providers only +- `--provider `: filter by provider +- `--plain`: one model per line +- `--json`: machine‑readable output -### CLI output (list + status) +### `models status` -`clawdbot models list` (default) prints a table with these columns: -- `Model`: `provider/model` key (truncated in TTY). -- `Input`: `text` or `text+image`. -- `Ctx`: context window in K tokens (from the model registry). -- `Local`: `yes/no` when the provider base URL is local. -- `Auth`: `yes/no` when the provider has usable auth. -- `Tags`: origin + role hints. +Shows the resolved primary model, fallbacks, image model, and an auth overview +of configured providers. `--plain` prints only the resolved primary model. -Common tags: -- `default` — resolved default model. -- `fallback#N` — `agent.model.fallbacks` order. -- `image` — `agent.imageModel.primary`. -- `img-fallback#N` — `agent.imageModel.fallbacks` order. -- `configured` — present in `agent.models`. -- `alias:` — alias from `agent.models.*.alias`. -- `missing` — referenced in config but not found in the registry. +## Scanning (OpenRouter free models) -Output formats: -- `--plain`: prints only `provider/model` keys (one per line). -- `--json`: `{ count, models: [{ key, name, input, contextWindow, local, available, tags, missing }] }`. +`clawdbot models scan` inspects OpenRouter’s **free model catalog** and can +optionally probe models for tool and image support. -`clawdbot models status` prints the resolved defaults, fallbacks, image model, aliases, -and an **Auth overview** section showing which providers have profiles/env/models.json keys. -`--plain` prints the resolved default model only; `--json` returns a structured object for tooling. +Key flags: -## Config changes +- `--no-probe`: skip live probes (metadata only) +- `--min-params `: minimum parameter size (billions) +- `--max-age-days `: skip older models +- `--provider `: provider prefix filter +- `--max-candidates `: fallback list size +- `--set-default`: set `agent.model.primary` to the first selection +- `--set-image`: set `agent.imageModel.primary` to the first image selection -- `agent.models` (configured model catalog + aliases). -- `agent.models.*.params` (provider-specific API params passed through to requests). -- `agent.model.primary` + `agent.model.fallbacks`. -- `agent.imageModel.primary` + `agent.imageModel.fallbacks` (optional). -- `auth.profiles` + `auth.order` for per-provider auth failover. +Probing requires an OpenRouter API key (from auth profiles or +`OPENROUTER_API_KEY`). Without a key, use `--no-probe` to list candidates only. -## Scan behavior (models scan) +Scan results are ranked by: +1) Image support +2) Tool latency +3) Context size +4) Parameter count +<<<<<<< HEAD Input - OpenRouter `/models` list (filter `:free`) - Requires OpenRouter API key from auth profiles or `OPENROUTER_API_KEY` (see [/environment](/environment)) - Optional filters: `--max-age-days`, `--min-params`, `--provider`, `--max-candidates` - Probe controls: `--timeout`, `--concurrency` -Probes (direct pi-ai complete) -- Tool-call probe (required): - - Provide a dummy tool, verify tool call emitted. -- Image probe (preferred): - - Prompt includes 1x1 PNG; success if no "unsupported image" error. +When run in a TTY, you can select fallbacks interactively. In non‑interactive +mode, pass `--yes` to accept defaults. -Scoring/selection -- Prefer models passing tool + image for text/tool fallbacks. -- Prefer image-only models for image tool fallback (even if tool probe fails). -- Rank by: image ok, then lower tool latency, then larger context, then params. +## Models registry (`models.json`) -Interactive selection (TTY) -- Multiselect list with per-model stats: - - model id, tool ok, image ok, median latency, context, inferred params. -- Pre-select top N (default 6). -- Non-TTY: auto-select; require `--yes`/`--no-input` to apply. - -Output -- Writes `agent.model.fallbacks` ordered. -- Writes `agent.imageModel.fallbacks` ordered (image-capable models). -- Ensures `agent.models` entries exist for selected models. -- Optional `--set-default` to set `agent.model.primary`. -- Optional `--set-image` to set `agent.imageModel.primary`. - -## Runtime fallback - -- On model failure: try `agent.model.fallbacks` in order. -- Per-provider auth failover uses `auth.order` (or stored profile order) **before** - moving to the next model. -- Image routing uses `agent.imageModel` **only when configured** and the primary - model lacks image input. -- Persist last successful provider/model to session entry; auth profile success is global. -- See [`docs/model-failover.md`](/concepts/model-failover) for auth profile rotation, cooldowns, and timeout handling. - -## Tests - -- Unit: scan selection ordering + probe classification. -- CLI: list/aliases/fallbacks add/remove + scan writes config. -- Status: shows last used model + fallbacks. - -## Docs - -- Update [`docs/configuration.md`](/gateway/configuration) with `agent.models` + `agent.model` + `agent.imageModel`. -- Keep this doc current when CLI surface or scan logic changes. -- Note provider aliases like `z.ai/*` -> `zai/*` when relevant. -- Provider ids in model refs are normalized to lowercase. +Custom providers in `models.providers` are written into `models.json` under the +agent directory (default `~/.clawdbot/agents//models.json`). This file +is merged by default unless `models.mode` is set to `replace`. diff --git a/docs/concepts/oauth.md b/docs/concepts/oauth.md index 633e0d61d..e89ce1822 100644 --- a/docs/concepts/oauth.md +++ b/docs/concepts/oauth.md @@ -97,7 +97,7 @@ At runtime: - if `expires` is in the future → use the stored access token - if expired → refresh (under a file lock) and overwrite the stored credentials -See implementation: `src/agents/auth-profiles.ts`. +The refresh flow is automatic; you generally don’t need to manage tokens manually. ## Multiple accounts (profiles) + routing diff --git a/docs/concepts/presence.md b/docs/concepts/presence.md index 86153aa7b..5e1e776e5 100644 --- a/docs/concepts/presence.md +++ b/docs/concepts/presence.md @@ -7,127 +7,93 @@ read_when: --- # Presence -Clawdbot “presence” is a lightweight, best-effort view of: -- The **Gateway** itself (one per host), and -- The **clients connected to the Gateway** (mac app, WebChat, CLI, etc.). +Clawdbot “presence” is a lightweight, best‑effort view of: +- the **Gateway** itself, and +- **clients connected to the Gateway** (mac app, WebChat, CLI, etc.) -Presence is used primarily to render the mac app’s **Instances** tab and to provide quick operator visibility. +Presence is used primarily to render the macOS app’s **Instances** tab and to +provide quick operator visibility. -## The data model +## Presence fields (what shows up) -Presence entries are structured objects with (some) fields: -- `instanceId` (optional but strongly recommended): stable client identity used for dedupe -- `host`: a human-readable name (often the machine name) -- `ip`: best-effort IP address (may be missing or stale) +Presence entries are structured objects with fields like: + +- `instanceId` (optional but strongly recommended): stable client identity +- `host`: human‑friendly host name +- `ip`: best‑effort IP address - `version`: client version string -- `deviceFamily` (optional): hardware family like `iPad`, `iPhone`, `Mac` -- `modelIdentifier` (optional): hardware model identifier like `iPad16,6` or `Mac16,6` -- `mode`: e.g. `gateway`, `app`, `webchat`, `cli` -- `lastInputSeconds` (optional): “seconds since last user input” for that client machine -- `reason`: a short marker like `self`, `connect`, `node-connected`, `node-disconnected`, `periodic`, `instances-refresh` -- `text`: legacy/debug summary string (kept for backwards compatibility and UI display) +- `deviceFamily` / `modelIdentifier`: hardware hints +- `mode`: `gateway`, `app`, `webchat`, `cli`, `node`, ... +- `lastInputSeconds`: “seconds since last user input” (if known) +- `reason`: `self`, `connect`, `node-connected`, `periodic`, ... - `ts`: last update timestamp (ms since epoch) ## Producers (where presence comes from) -Presence entries are produced by multiple sources and then **merged**. +Presence entries are produced by multiple sources and **merged**. ### 1) Gateway self entry -The Gateway seeds a “self” entry at startup so UIs always show at least the current gateway host. +The Gateway always seeds a “self” entry at startup so UIs show the gateway host +even before any clients connect. -Implementation: [`src/infra/system-presence.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/system-presence.ts) (`initSelfPresence()`). +### 2) WebSocket connect -### 2) WebSocket connect (connection-derived presence) +Every WS client begins with a `connect` request. On successful handshake the +Gateway upserts a presence entry for that connection. -Every WS client must begin with a `connect` request. On successful handshake, the Gateway upserts a presence entry for that connection. +#### Why one‑off CLI commands don’t show up -This is meant to answer: “Which clients are currently connected?” +The CLI often connects for short, one‑off commands. To avoid spamming the +Instances list, `client.mode === "cli"` is **not** turned into a presence entry. -Implementation: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) (connect handling uses `connect.params.client.instanceId` when provided; otherwise falls back to `connId`). +### 3) `system-event` beacons -#### Why one-off CLI commands do not show up +Clients can send richer periodic beacons via the `system-event` method. The mac +app uses this to report host name, IP, and `lastInputSeconds`. -The CLI connects to the Gateway to execute one-off commands (health/status/send/agent/etc.). These are not “nodes” and would spam the Instances list, so the Gateway does not create presence entries for clients with `client.mode === "cli"`. - -### 3) `system-event` beacons (client-reported presence) - -Clients can publish richer periodic beacons via the `system-event` method. The mac app uses this to report: -- a human-friendly host name -- its best-known IP address -- `lastInputSeconds` - -Implementation: -- Gateway: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) handles method `system-event` by calling `updateSystemPresence(...)`. -- mac app beaconing: [`apps/macos/Sources/Clawdbot/PresenceReporter.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/PresenceReporter.swift). - -### 4) Node bridge beacons (gateway-owned presence) +### 4) Node bridge beacons When a node bridge connection authenticates, the Gateway emits a presence entry -for that node and starts periodic refresh beacons so it does not expire. - -- Connect/disconnect markers: `node-connected`, `node-disconnected` -- Periodic heartbeat: every 3 minutes (`reason: periodic`) - -Implementation: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) (node bridge handlers + timer beacons). +for that node and refreshes it periodically so it doesn’t expire. ## Merge + dedupe rules (why `instanceId` matters) -All producers write into a single in-memory presence map. +Presence entries are stored in a single in‑memory map: -Key points: -- Entries are **keyed** by a “presence key”. If two producers use the same key, they update the same entry. -- The best key is a stable, opaque `instanceId` that does not change across restarts. -- Keys are treated case-insensitively. +- Entries are keyed by a **presence key**. +- The best key is a stable `instanceId` that survives restarts. +- Keys are case‑insensitive. -Implementation: [`src/infra/system-presence.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/system-presence.ts) (`normalizePresenceKey()`). +If a client reconnects without a stable `instanceId`, it may show up as a +**duplicate** row. -### mac app identity (stable UUID) +## TTL and bounded size -The mac app uses a persisted UUID as `instanceId` so: -- restarts/reconnects do not create duplicates -- renaming the Mac does not create a new “instance” -- debug/release builds can share the same identity +Presence is intentionally ephemeral: -Implementation: [`apps/macos/Sources/Clawdbot/InstanceIdentity.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/InstanceIdentity.swift). +- **TTL:** entries older than 5 minutes are pruned +- **Max entries:** 200 (oldest dropped first) -`displayName` (machine name) is used for UI, while `instanceId` is used for dedupe. - -## TTL and bounded size (why stale rows disappear) - -Presence entries are not permanent: -- TTL: entries older than 5 minutes are pruned -- Max: map is capped at 200 entries (LRU by `ts`) - -Implementation: [`src/infra/system-presence.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/system-presence.ts) (`TTL_MS`, `MAX_ENTRIES`, pruning in `listSystemPresence()`). +This keeps the list fresh and avoids unbounded memory growth. ## Remote/tunnel caveat (loopback IPs) -When a client connects over an SSH tunnel / local port forward, the Gateway may see the remote address as loopback (`127.0.0.1`). +When a client connects over an SSH tunnel / local port forward, the Gateway may +see the remote address as `127.0.0.1`. To avoid overwriting a good client‑reported +IP, loopback remote addresses are ignored. -To avoid degrading an otherwise-correct client beacon IP, the Gateway avoids writing loopback remote addresses into presence entries. - -Implementation: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) (`isLoopbackAddress()`). - -## Consumers (who reads presence) +## Consumers ### macOS Instances tab -The mac app’s Instances tab renders the result of `system-presence`. - -Implementation: -- View: [`apps/macos/Sources/Clawdbot/InstancesSettings.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/InstancesSettings.swift) -- Store: [`apps/macos/Sources/Clawdbot/InstancesStore.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/InstancesStore.swift) - -The Instances rows show a small presence indicator (Active/Idle/Stale) based on -the last beacon age. The label is derived from the entry timestamp (`ts`). - -The store refreshes periodically and also applies `presence` WS events. +The macOS app renders the output of `system-presence` and applies a small status +indicator (Active/Idle/Stale) based on the age of the last update. ## Debugging tips -- To see the raw list, call `system-presence` against the gateway. +- To see the raw list, call `system-presence` against the Gateway. - If you see duplicates: - - confirm clients send a stable `instanceId` in the handshake (`connect.params.client.instanceId`) - - confirm beaconing uses the same `instanceId` - - check whether the connection-derived entry is missing `instanceId` (then it will be keyed by `connId` and duplicates are expected on reconnect) + - confirm clients send a stable `instanceId` in the handshake + - confirm periodic beacons use the same `instanceId` + - check whether the connection‑derived entry is missing `instanceId` (duplicates are expected) diff --git a/docs/concepts/provider-routing.md b/docs/concepts/provider-routing.md index 44667456b..c5f474c4b 100644 --- a/docs/concepts/provider-routing.md +++ b/docs/concepts/provider-routing.md @@ -3,24 +3,97 @@ summary: "Routing rules per provider (WhatsApp, Telegram, Discord, web) and shar read_when: - Changing provider routing or inbox behavior --- -# Providers & Routing +# Providers & routing Updated: 2026-01-06 -Goal: deterministic replies per provider, while supporting multi-agent + multi-account routing. +Clawdbot routes replies **back to the provider where a message came from**. The +model does not choose a provider; routing is deterministic and controlled by the +host configuration. -- **Provider**: provider label (`whatsapp`, `webchat`, `telegram`, `discord`, `signal`, `imessage`, …). Routing is fixed: replies go back to the origin provider; the model doesn’t choose. -- **AccountId**: provider account instance (e.g. WhatsApp account `"default"` vs `"work"`). Not every provider supports multi-account yet. -- **AgentId**: one isolated “brain” (workspace + per-agent agentDir + per-agent session store). -- **Reply context:** inbound replies include `ReplyToId`, `ReplyToBody`, and `ReplyToSender`, and the quoted context is appended to `Body` as a `[Replying to ...]` block. -- **Canonical direct session (per agent):** direct chats collapse to `agent::` (default `main`). Groups/channels stay isolated per agent: - - group: `agent:::group:` - - channel/room: `agent:::channel:` - - Telegram forum topics: `agent::telegram:group::topic:` -- **Session store:** per-agent store lives under `~/.clawdbot/agents//sessions/sessions.json` (override via `session.store` with `{agentId}` templating). JSONL transcripts live next to it. -- **WebChat:** attaches to the selected agent’s main session (so desktop reflects cross-provider history for that agent). -- **Implementation hints:** - - Set `Provider` + `AccountId` in each ingress. - - Route inbound to an agent via `routing.bindings` (match on `provider`, `accountId`, plus optional peer/guild/team). - - Keep routing deterministic: originate → same provider. Use the gateway WebSocket for sends; avoid side channels. - - Do not let the agent emit “send to X” decisions; keep that policy in the host code. +## Key terms + +- **Provider**: `whatsapp`, `telegram`, `discord`, `slack`, `signal`, `imessage`, `webchat`. +- **AccountId**: per‑provider account instance (when supported). +- **AgentId**: an isolated workspace + session store (“brain”). +- **SessionKey**: the internal bucket key used to store context and control concurrency. + +## Session key shapes (examples) + +Direct messages collapse to the agent’s **main** session: + +- `agent::` (default: `agent:main:main`) + +Groups and channels remain isolated per provider: + +- Groups: `agent:::group:` +- Channels/rooms: `agent:::channel:` + +Threads: + +- Slack/Discord threads append `:thread:` to the base key. +- Telegram forum topics embed `:topic:` in the group key. + +Examples: + +- `agent:main:telegram:group:-1001234567890:topic:42` +- `agent:main:discord:channel:123456:thread:987654` + +## Routing rules (how an agent is chosen) + +Routing picks **one agent** for each inbound message: + +1. **Exact peer match** (`routing.bindings` with `peer.kind` + `peer.id`). +2. **Guild match** (Discord) via `guildId`. +3. **Team match** (Slack) via `teamId`. +4. **Account match** (`accountId` on the provider). +5. **Provider match** (any account on that provider). +6. **Default agent** (`routing.defaultAgentId`, fallback to `main`). + +The matched agent determines which workspace and session store are used. + +## Config overview + +- `routing.defaultAgentId`: default agent when no binding matches. +- `routing.agents`: named agent definitions (workspace, model, etc.). +- `routing.bindings`: map inbound providers/accounts/peers to agents. + +Example: + +```json5 +{ + routing: { + defaultAgentId: "main", + agents: { + support: { name: "Support", workspace: "~/clawd-support" } + }, + bindings: [ + { match: { provider: "slack", teamId: "T123" }, agentId: "support" }, + { match: { provider: "telegram", peer: { kind: "group", id: "-100123" } }, agentId: "support" } + ] + } +} +``` + +## Session storage + +Session stores live under the state directory (default `~/.clawdbot`): + +- `~/.clawdbot/agents//sessions/sessions.json` +- JSONL transcripts live alongside the store + +You can override the store path via `session.store` and `{agentId}` templating. + +## WebChat behavior + +WebChat attaches to the **selected agent** and defaults to the agent’s main +session. Because of this, WebChat lets you see cross‑provider context for that +agent in one place. + +## Reply context + +Inbound replies include: +- `ReplyToId`, `ReplyToBody`, and `ReplyToSender` when available. +- Quoted context is appended to `Body` as a `[Replying to ...]` block. + +This is consistent across providers. diff --git a/docs/concepts/queue.md b/docs/concepts/queue.md index 063585102..b175134e0 100644 --- a/docs/concepts/queue.md +++ b/docs/concepts/queue.md @@ -12,7 +12,7 @@ We now serialize command-based auto-replies (WhatsApp Web listener) through a ti - Serializing avoids competing for terminal/stdin, keeps logs readable, and reduces the chance of rate limits from upstream tools. ## How it works -- [`src/process/command-queue.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/process/command-queue.ts) holds a lane-aware FIFO queue and drains each lane synchronously. +- A lane-aware FIFO queue drains each lane synchronously. - `runEmbeddedPiAgent` enqueues by **session key** (lane `session:`) to guarantee only one active run per session. - Each session run is then queued into a **global lane** (`main` by default) so overall parallelism is capped by `agent.maxConcurrent`. - When verbose logging is enabled, queued commands emit a short notice if they waited more than ~2s before starting. @@ -74,4 +74,4 @@ Defaults: `debounceMs: 1000`, `cap: 20`, `drop: summarize`. ## Troubleshooting - If commands seem stuck, enable verbose logs and look for “queued for …ms” lines to confirm the queue is draining. -- `enqueueCommand` exposes a lightweight `getQueueSize()` helper if you need to surface queue depth in future diagnostics. +- If you need queue depth, enable verbose logs and watch for queue timing lines. diff --git a/docs/concepts/session-tool.md b/docs/concepts/session-tool.md index a92cda84e..55427fef3 100644 --- a/docs/concepts/session-tool.md +++ b/docs/concepts/session-tool.md @@ -21,7 +21,7 @@ Goal: small, hard-to-misuse tool set so agents can list sessions, fetch history, - Hooks use `hook:` unless explicitly set. - Node bridge uses `node-` unless explicitly set. -`global` and `unknown` are internal-only and never listed. If `session.scope = "global"`, we alias it to `main` for all tools so callers never see `global`. +`global` and `unknown` are reserved values and are never listed. If `session.scope = "global"`, we alias it to `main` for all tools so callers never see `global`. ## sessions_list List sessions as an array of rows. diff --git a/docs/concepts/system-prompt.md b/docs/concepts/system-prompt.md index 5e4baa9b2..2d1a236cc 100644 --- a/docs/concepts/system-prompt.md +++ b/docs/concepts/system-prompt.md @@ -8,7 +8,7 @@ read_when: ClaudeBot builds a custom system prompt for every agent run. The prompt is **Clawdbot-owned** and does not use the p-coding-agent default prompt. -The prompt is assembled in `src/agents/system-prompt.ts` and injected by `src/agents/pi-embedded-runner.ts`. +The prompt is assembled by Clawdbot and injected into each agent run. ## Structure @@ -56,9 +56,3 @@ Skills are **not** auto-injected. Instead, the prompt instructs the model to use ``` This keeps the base prompt small while still enabling targeted skill usage. - -## Code references - -- Prompt text: `src/agents/system-prompt.ts` -- Prompt assembly + injection: `src/agents/pi-embedded-runner.ts` -- Bootstrap trimming: `src/agents/pi-embedded-helpers.ts` diff --git a/docs/concepts/typebox.md b/docs/concepts/typebox.md index cc192d271..be3d47206 100644 --- a/docs/concepts/typebox.md +++ b/docs/concepts/typebox.md @@ -3,40 +3,34 @@ summary: "TypeBox schemas as the single source of truth for the gateway protocol read_when: - Updating protocol schemas or codegen --- -# TypeBox as Protocol Source of Truth +# TypeBox as protocol source of truth -Last updated: 2025-12-09 +Last updated: 2026-01-08 -We use TypeBox schemas in [`src/gateway/protocol/schema.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/protocol/schema.ts) as the single source of truth for the Gateway control plane (connect/req/res/event frames and payloads). All derived artifacts should be generated from these schemas, not edited by hand. +TypeBox schemas define the Gateway control plane (connect/req/res/event frames and +payloads). All generated artifacts must come from these schemas. ## Current pipeline -- **TypeBox → JSON Schema**: `pnpm protocol:gen` writes [`dist/protocol.schema.json`](https://github.com/clawdbot/clawdbot/blob/main/dist/protocol.schema.json) (draft-07) and runs AJV in the server tests. -- **TypeBox → Swift**: `pnpm protocol:gen:swift` generates [`apps/macos/Sources/ClawdbotProtocol/GatewayModels.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/ClawdbotProtocol/GatewayModels.swift). +- `pnpm protocol:gen` + - writes the JSON Schema output (draft‑07) +- `pnpm protocol:gen:swift` + - generates Swift gateway models +- `pnpm protocol:check` + - runs both generators and verifies the output is committed -## Problem +## Swift codegen behavior -- We want strong typing in Swift, including a sealed `GatewayFrame` enum with a discriminator and a forward-compatible `unknown` case. +The Swift generator emits: -## Preferred plan (next step) +- `GatewayFrame` enum with `req`, `res`, `event`, and `unknown` cases +- Strongly typed payload structs/enums +- `ErrorCode` values and `GATEWAY_PROTOCOL_VERSION` -- Add a small, custom Swift generator driven directly by the TypeBox schemas: - - Emit a sealed `enum GatewayFrame: Codable { case req(RequestFrame), res(ResponseFrame), event(EventFrame) }`. - - Emit strongly typed payload structs/enums (`ConnectParams`, `HelloOk`, `RequestFrame`, `ResponseFrame`, `EventFrame`, `PresenceEntry`, `Snapshot`, `StateVersion`, `ErrorShape`, `AgentEvent`, `TickEvent`, `ShutdownEvent`, `SendParams`, `AgentParams`, `ErrorCode`, `PROTOCOL_VERSION`). - - Custom `init(from:)` / `encode(to:)` enforces the `type` discriminator and can include an `unknown` case for forward compatibility. - - Wire a new script (e.g., `pnpm protocol:gen:swift`) into `protocol:check` so CI fails if the generated Swift is stale. +Unknown frame types are preserved as raw payloads for forward compatibility. -Why this path: -- Single source of truth stays TypeBox; no new IDL to maintain. -- Predictable, strongly typed Swift (no optional soup). -- Small deterministic codegen (~150–200 LOC script) we control. +## When you change schemas -## Alternative (if we want off-the-shelf codegen) - -- Wrap the existing JSON Schema into an OpenAPI 3.1 doc (auto-generated) and use **swift-openapi-generator** or **openapi-generator swift5**. More moving parts, but also yields enums with discriminator support. Keep this as a fallback if we don’t want a custom emitter. - -## Action items - -- Implement `protocol:gen:swift` that reads the TypeBox schemas and emits the sealed Swift enum + payload structs. -- Update `protocol:check` to include the Swift generator output in the diff check. -- Remove quicktype output once the custom generator is in place (or keep it for docs only). +1) Update the TypeBox schemas. +2) Run `pnpm protocol:check`. +3) Commit the regenerated schema + Swift models. diff --git a/docs/experiments/onboarding-config-protocol.md b/docs/experiments/onboarding-config-protocol.md index 9b593ba01..494b60563 100644 --- a/docs/experiments/onboarding-config-protocol.md +++ b/docs/experiments/onboarding-config-protocol.md @@ -8,11 +8,11 @@ read_when: "Changing onboarding wizard steps or config schema endpoints" Purpose: shared onboarding + config surfaces across CLI, macOS app, and Web UI. ## Components -- Wizard engine: `src/wizard` (session + prompts + onboarding state). -- CLI: [`src/commands/onboard-*.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/commands/onboard-*.ts) uses the wizard with the CLI prompter. -- Gateway RPC: wizard + config schema endpoints serve UI clients. -- macOS: SwiftUI onboarding uses the wizard step model. -- Web UI: config form renders from JSON Schema + hints. +- Wizard engine (shared session + prompts + onboarding state). +- CLI onboarding uses the same wizard flow as the UI clients. +- Gateway RPC exposes wizard + config schema endpoints. +- macOS onboarding uses the wizard step model. +- Web UI renders config forms from JSON Schema + UI hints. ## Gateway RPC - `wizard.start` params: `{ mode?: "local"|"remote", workspace?: string }` diff --git a/docs/experiments/plans/cron-add-hardening.md b/docs/experiments/plans/cron-add-hardening.md index 7ea78455c..5919d103a 100644 --- a/docs/experiments/plans/cron-add-hardening.md +++ b/docs/experiments/plans/cron-add-hardening.md @@ -28,44 +28,29 @@ Recent gateway logs show repeated `cron.add` failures with invalid parameters (m - Agent cron tool schema allows arbitrary `job` objects, enabling malformed inputs. - Gateway strictly validates `cron.add` with no normalization, so wrapped payloads fail. -## Proposed Approach -1. **Normalize** incoming `cron.add` payloads (unwrap `data`/`job`, infer `schedule.kind` and `payload.kind`, default `wakeMode` + `sessionTarget` when safe). -2. **Harden** the agent cron tool schema using the canonical gateway `CronAddParamsSchema` and normalize before sending to the gateway. -3. **Align** provider enums and cron status fields across gateway schema, TS types, CLI descriptions, and UI form controls. -4. **Test** normalization in gateway tests and tool behavior in agent tests. +## What changed -## Multi-phase Execution Plan +- `cron.add` and `cron.update` now normalize common wrapper shapes and infer missing `kind` fields. +- Agent cron tool schema matches the gateway schema, which reduces invalid payloads. +- Provider enums are aligned across gateway, CLI, UI, and macOS picker. +- Control UI uses the gateway’s `jobs` count field for status. -### Phase 1 — Schema + type alignment -- [x] Expand gateway `CronPayloadSchema` provider enum to include `signal` and `imessage`. -- [x] Update CLI `--provider` descriptions to include `slack` (already supported by gateway). -- [x] Update UI Cron payload/provider union types to include all supported providers. -- [x] Fix UI CronStatus type to match gateway (`jobs` instead of `jobCount`). -- [x] Update cron UI provider select to include Discord/Slack/Signal/iMessage. -- [x] Update macOS CronJobEditor provider picker + enum to include Slack/Signal/iMessage. -- [x] Document cron compatibility normalization policy in [`docs/cron-jobs.md`](/automation/cron-jobs). +## Current behavior -### Phase 2 — Input normalization + tooling hardening -- [x] Add shared cron input normalization helpers (`normalizeCronJobCreate`/`normalizeCronJobPatch`). -- [x] Apply normalization in gateway `cron.add` (and patch normalization in `cron.update`). -- [x] Tighten agent cron tool schema to `CronAddParamsSchema` and normalize job/patch before sending. +- **Normalization:** wrapped `data`/`job` payloads are unwrapped; `schedule.kind` and `payload.kind` are inferred when safe. +- **Defaults:** safe defaults are applied for `wakeMode` and `sessionTarget` when missing. +- **Providers:** Discord/Slack/Signal/iMessage are now consistently surfaced across CLI/UI. -### Phase 3 — Tests -- [x] Add gateway test covering wrapped `cron.add` payload normalization. -- [x] Add cron tool test to assert normalization and defaulting for `cron.add`. -- [x] Add gateway test covering `cron.update` normalization. -- [x] Add UI + Swift conformance test for cron channels + status fields. +See [`docs/cron-jobs.md`](/automation/cron-jobs) for the normalized shape and examples. -### Phase 4 — Verification -- [x] Run tests (full suite executed via `pnpm test -- cron-tool`). +## Verification -## Rollout/Monitoring - Watch gateway logs for reduced `cron.add` INVALID_REQUEST errors. - Confirm Control UI cron status shows job count after refresh. -- If errors persist, extend normalization for additional common shapes (e.g., `schedule.at`, `payload.message` without `kind`). ## Optional Follow-ups -- Manual Control UI smoke: add cron job per provider + verify status job count. + +- Manual Control UI smoke: add a cron job per provider + verify status job count. ## Open Questions - Should `cron.add` accept explicit `state` from clients (currently disallowed by schema)? diff --git a/docs/experiments/plans/group-policy-hardening.md b/docs/experiments/plans/group-policy-hardening.md index 113f55ea8..39a633902 100644 --- a/docs/experiments/plans/group-policy-hardening.md +++ b/docs/experiments/plans/group-policy-hardening.md @@ -1,126 +1,38 @@ --- -summary: "Spec: groupPolicy hardening for Telegram allowlist parity" +summary: "Telegram allowlist hardening: prefix + whitespace normalization" read_when: - - Reviewing historical Telegram allowlist normalization changes + - Reviewing historical Telegram allowlist changes --- -# Engineering Execution Spec: groupPolicy Hardening (Telegram Allowlist Parity) +# Telegram Allowlist Hardening **Date**: 2026-01-05 **Status**: Complete -**PR**: #216 (feat/whatsapp-group-policy) +**PR**: #216 ---- +## Summary -## Executive Summary +Telegram allowlists now accept `telegram:` and `tg:` prefixes case-insensitively, and tolerate +accidental whitespace. This aligns inbound allowlist checks with outbound send normalization. -Follow-up hardening work ensures Telegram allowlists behave consistently across inbound group/DM filtering and outbound send normalization. The focus is on prefix parity (`telegram:` / `tg:`), case-insensitive matching for prefixes, and resilience to accidental whitespace in config entries. Documentation and tests were updated to reflect and lock in this behavior. +## What changed ---- +- Prefixes `telegram:` and `tg:` are treated the same (case-insensitive). +- Allowlist entries are trimmed; empty entries are ignored. -## Findings Analysis +## Examples -### [MED] F1: Telegram Allowlist Prefix Handling Is Case-Sensitive and Excludes `tg:` +All of these are accepted for the same ID: -**Location**: [`src/telegram/bot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.ts) +- `telegram:123456` +- `TG:123456` +- ` tg:123456 ` -**Problem**: Inbound allowlist normalization only stripped a lowercase `telegram:` prefix. This rejected `TG:123` / `Telegram:123` and did not accept the `tg:` shorthand even though outbound send normalization already accepts `tg:` and case-insensitive prefixes. +## Why it matters -**Impact**: -- DMs and group allowlists fail when users copy/paste prefixed IDs from logs or existing send format. -- Behavior is inconsistent between inbound filtering and outbound send normalization. +Copy/paste from logs or chat IDs often includes prefixes and whitespace. Normalizing avoids +false negatives when deciding whether to respond in DMs or groups. -**Fix**: Normalize allowlist entries by trimming whitespace and stripping `telegram:` / `tg:` prefixes case-insensitively at pre-compute time. +## Related docs ---- - -### [LOW] F2: Allowlist Entries Are Not Trimmed - -**Location**: [`src/telegram/bot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.ts) - -**Problem**: Allowlist entries are not trimmed; accidental whitespace causes mismatches. - -**Fix**: Trim and drop empty entries while normalizing allowlist inputs. - ---- - -## Implementation Phases - -### Phase 1: Normalize Telegram Allowlist Inputs - -**File**: [`src/telegram/bot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.ts) - -**Changes**: -1. Trim allowlist entries and drop empty values. -2. Strip `telegram:` / `tg:` prefixes case-insensitively. -3. Simplify DM allowlist check to rely on normalized values. - ---- - -### Phase 2: Add Coverage for Prefix + Whitespace - -**File**: [`src/telegram/bot.test.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.test.ts) - -**Add Tests**: -- DM allowlist accepts `TG:` prefix with surrounding whitespace. -- Group allowlist accepts `TG:` prefix case-insensitively. - ---- - -### Phase 3: Documentation Updates - -**Files**: -- [`docs/groups.md`](/concepts/groups) -- [`docs/telegram.md`](/providers/telegram) - -**Changes**: -- Document `tg:` alias and case-insensitive prefixes for Telegram allowlists. - ---- - -### Phase 4: Verification - -1. Run targeted Telegram tests (`pnpm test -- src/telegram/bot.test.ts`). -2. If time allows, run full suite (`pnpm test`). - ---- - -## Files Modified - -| File | Change Type | Description | -|------|-------------|-------------| -| [`src/telegram/bot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.ts) | Fix | Trim allowlist values; strip `telegram:` / `tg:` prefixes case-insensitively | -| [`src/telegram/bot.test.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.test.ts) | Test | Add DM + group allowlist coverage for `TG:` prefix + whitespace | -| [`docs/groups.md`](/concepts/groups) | Docs | Mention `tg:` alias + case-insensitive prefixes | -| [`docs/telegram.md`](/providers/telegram) | Docs | Mention `tg:` alias + case-insensitive prefixes | - ---- - -## Success Criteria - -- [x] Telegram allowlist accepts `telegram:` / `tg:` prefixes case-insensitively. -- [x] Telegram allowlist tolerates whitespace in config entries. -- [x] DM and group allowlist tests cover prefixed cases. -- [x] Docs updated to reflect allowlist formats. -- [x] Targeted tests pass. -- [x] Full test suite passes. - ---- - -## Risk Assessment - -| Risk | Severity | Mitigation | -|------|----------|------------| -| Behavior change for malformed entries | Low | Normalization is additive and trims only whitespace | -| Test fragility | Low | Isolated unit tests; no external dependencies | -| Doc drift | Low | Updated docs alongside code | - ---- - -## Estimated Complexity - -- **Phase 1**: Low (normalization helpers) -- **Phase 2**: Low (2 new tests) -- **Phase 3**: Low (doc edits) -- **Phase 4**: Low (verification) - -**Total**: ~20 minutes +- [Group Chats](/concepts/groups) +- [Telegram Provider](/providers/telegram) diff --git a/docs/experiments/proposals/model-config.md b/docs/experiments/proposals/model-config.md index b7488378d..3ea0bb17c 100644 --- a/docs/experiments/proposals/model-config.md +++ b/docs/experiments/proposals/model-config.md @@ -1,147 +1,32 @@ --- -summary: "Proposal: model config, auth profiles, and fallback behavior" +summary: "Exploration: model config, auth profiles, and fallback behavior" read_when: - - Designing model selection, auth profiles, or fallback behavior - - Migrating model config schema + - Exploring future model selection + auth profile ideas --- +# Model Config (Exploration) -# Model config proposal +This document captures **ideas** for future model configuration. It is not a +shipping spec. For current behavior, see: +- [Models](/concepts/models) +- [Model failover](/concepts/model-failover) +- [OAuth + profiles](/concepts/oauth) -Goals -- Multi OAuth + multi API key per provider -- Model selection via `/model` with sensible fallback -- Global (not per-session) fallback logic -- Keep last-known-good auth profile when switching models -- Profile override only when explicitly requested -- Image routing override only when explicitly configured +## Motivation -Non-goals (v1) -- Auto-discovery of provider capabilities beyond catalog input tags -- Per-model auth profile order (see open questions) +Operators want: +- Multiple auth profiles per provider (personal vs work). +- Simple `/model` selection with predictable fallbacks. +- Clear separation between text models and image-capable models. -## Proposed config shape +## Possible direction (high level) -```json -{ - "auth": { - "profiles": { - "anthropic:default": { - "provider": "anthropic", - "mode": "oauth" - }, - "anthropic:work": { - "provider": "anthropic", - "mode": "api_key" - }, - "openai:default": { - "provider": "openai", - "mode": "oauth" - } - }, - "order": { - "anthropic": ["anthropic:default", "anthropic:work"], - "openai": ["openai:default"] - } - }, - "agent": { - "models": { - "anthropic/claude-opus-4-5": { - "alias": "Opus" - }, - "openai/gpt-5.2": { - "alias": "gpt52" - } - }, - "model": { - "primary": "anthropic/claude-opus-4-5", - "fallbacks": ["openai/gpt-5.2"] - }, - "imageModel": { - "primary": "openai/gpt-5.2", - "fallbacks": ["anthropic/claude-opus-4-5"] - } - } -} -``` +- Keep model selection simple: `provider/model` with optional aliases. +- Let providers have multiple auth profiles, with an explicit order. +- Use a global fallback list so all sessions fail over consistently. +- Only override image routing when explicitly configured. -Notes -- Canonical model keys are full `provider/model`. -- `alias` optional; used by `/model` resolution. -- `auth.profiles` is keyed. Default CLI login creates `provider:default`. -- `auth.order[provider]` controls rotation order for that provider. +## Open questions -## CLI / UX - -Login -- `clawdbot login anthropic` → create/replace `anthropic:default`. -- `clawdbot login anthropic --profile work` → create/replace `anthropic:work`. - -Model selection -- `/model Opus` → resolve alias to full id. -- `/model anthropic/claude-opus-4-5` → explicit. -- Optional: `/model Opus@anthropic:work` (explicit profile override for session only). - -Model listing -- `/model` list shows: - - model id - - alias - - provider - - auth order (from `auth.order`) - - auth source for the current provider (auth-profiles.json/env/shell env/models.json) - -## Fallback behavior (global) - -Fallback list -- Use `agent.model.fallbacks` globally. -- No per-session fallback list; last-known-good is per-session but uses global ordering. - -Auth profile rotation -- If provider auth error (401/403/invalid refresh): - - advance to next profile in `auth.order[provider]`. - - if all fail, fall back to next model. - -Rate limiting -- If rate limit / quota error: - - rotate auth profile first (same provider) - - if still failing, fall back to next model. - -Model not found / capability mismatch -- immediate model fallback. - -## Image routing - -Rule -- Only use `agent.imageModel` when explicitly configured. -- If `agent.imageModel` is configured and the current text model lacks image input, use it. - -Support detection -- From model catalog `input` tags when available (e.g. `image` in models.json). -- If unknown: treat as text-only and use `agent.imageModel`. - -## Migration (doctor + gateway auto-run) - -Inputs -- Legacy keys (pre-migration): - - `agent.model` (string) - - `agent.modelFallbacks` (string[]) - - `agent.imageModel` (string) - - `agent.imageModelFallbacks` (string[]) - - `agent.allowedModels` (string[]) - - `agent.modelAliases` (record) - -Outputs -- `agent.models` map with keys for all referenced models -- `agent.model.primary/fallbacks` -- `agent.imageModel.primary/fallbacks` -- Auth profile store seeded from current auth-profiles.json/auth.json + oauth.json + env (as `provider:default`) -- `auth.order` seeded with `["provider:default"]` when config is updated - -Auto-run -- Gateway start detects legacy keys and runs doctor migration. - -## Decisions - -- Auth order is per-provider (`auth.order`). -- Doctor migration is required; gateway will auto-run on startup when legacy keys detected. -- `/model Opus@profile` is explicit session override only. -- Image routing override only when `agent.imageModel` is explicitly configured. +- Should profile rotation be per-provider or per-model? +- How should the UI surface profile selection for a session? +- What is the safest migration path from legacy config keys? diff --git a/docs/experiments/research/memory.md b/docs/experiments/research/memory.md index 7df735e53..56523f186 100644 --- a/docs/experiments/research/memory.md +++ b/docs/experiments/research/memory.md @@ -1,12 +1,12 @@ --- -summary: "Proposal + research notes: offline memory system for Clawd workspaces (Markdown source-of-truth + derived index)" +summary: "Research notes: offline memory system for Clawd workspaces (Markdown source-of-truth + derived index)" read_when: - Designing workspace memory (~/clawd) beyond daily Markdown logs - Deciding: standalone CLI vs deep Clawdbot integration - Adding offline recall + reflection (retain/recall/reflect) --- -# Workspace Memory v2 (offline): proposal + research +# Workspace Memory v2 (offline): research notes Target: Clawd-style workspace (`agent.workspace`, default `~/clawd`) where “memory” is stored as one Markdown file per day (`memory/YYYY-MM-DD.md`) plus a small set of stable files (e.g. `memory.md`, `SOUL.md`). @@ -171,8 +171,7 @@ Recommendation: **deep integration in Clawdbot**, but keep a separable core libr - reuse from other contexts (local scripts, future desktop app, etc.) Shape: -- `src/memory/*` (library-ish core; pure functions + sqlite adapter) -- [`src/commands/memory/*.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/commands/memory/*.ts) (CLI glue) +The memory tooling is intended to be a small CLI + library layer, but this is exploratory only. ## “S-Collide” / SuCo: when to use it (research) @@ -196,29 +195,13 @@ Open question: - what’s the **best** offline embedding model for “personal assistant memory” on your machines (MacBook + Castle)? - if you already have Ollama: embed with a local model; otherwise ship a small embedding model in the toolchain. -## Implementation plan (phased, shippable) +## Smallest useful pilot -### Phase 0: workspace conventions (no code) -- add `bank/` files + entity pages -- add `## Retain` convention to daily logs +If you want a minimal, still-useful version: -### Phase 1: `clawdbot memory index|recall` (FTS-only) -- parse Markdown (`memory/*.md`, `bank/*.md`) into chunks -- write to SQLite: `facts`, `entities`, `fact_entities`, `opinions` -- FTS5 table over `facts.content` -- `recall` returns citations (path + line) + trimmed content budget - -### Phase 2: entity summaries + opinion tracking -- `reflect` updates `bank/entities/*.md` -- opinion confidence updates with evidence pointers (no embeddings required yet) - -### Phase 3: semantic recall (offline embeddings) -- compute embeddings during indexing (incremental) -- retrieval = `hybrid(FTS, vector)` with simple fusion - -### Phase 4: “graph-ish” traversal (still simple) -- entity links enable multi-hop: “related to Peter via warelay” -- optional: “topic” nodes, lightweight edges (not a full KG) +- Add `bank/` entity pages and a `## Retain` section in daily logs. +- Use SQLite FTS for recall with citations (path + line numbers). +- Add embeddings only if recall quality or scale demands it. ## References diff --git a/docs/gateway/bonjour.md b/docs/gateway/bonjour.md index bdafba807..a0b7a5ce5 100644 --- a/docs/gateway/bonjour.md +++ b/docs/gateway/bonjour.md @@ -6,24 +6,29 @@ read_when: --- # Bonjour / mDNS discovery -Clawdbot uses Bonjour (mDNS / DNS-SD) as a **LAN-only convenience** to discover a running Gateway bridge transport. It is best-effort and does **not** replace SSH or Tailnet-based connectivity. +Clawdbot uses Bonjour (mDNS / DNS‑SD) as a **LAN‑only convenience** to discover +an active Gateway bridge. It is best‑effort and does **not** replace SSH or +Tailnet-based connectivity. -## Wide-Area Bonjour (Unicast DNS-SD) over Tailscale +## Wide‑area Bonjour (Unicast DNS‑SD) over Tailscale -If you want iOS node auto-discovery while the Gateway is on another network (e.g. Vienna ⇄ London), you can keep the `NWBrowser` UX but switch discovery from multicast mDNS (`local.`) to **unicast DNS-SD** (“Wide-Area Bonjour”) over Tailscale. +If the node and gateway are on different networks, multicast mDNS won’t cross the +boundary. You can keep the same discovery UX by switching to **unicast DNS‑SD** +("Wide‑Area Bonjour") over Tailscale. -High level: +High‑level steps: -1) Run a DNS server on the gateway host (reachable via tailnet IP). -2) Publish DNS-SD records for `_clawdbot-bridge._tcp` in a dedicated zone (example: `clawdbot.internal.`). -3) Configure Tailscale **split DNS** so `clawdbot.internal` resolves via that DNS server for clients (including iOS). +1) Run a DNS server on the gateway host (reachable over Tailnet). +2) Publish DNS‑SD records for `_clawdbot-bridge._tcp` under a dedicated zone + (example: `clawdbot.internal.`). +3) Configure Tailscale **split DNS** so `clawdbot.internal` resolves via that + DNS server for clients (including iOS). -Clawdbot standardizes on the discovery domain `clawdbot.internal.` for this mode. iOS/Android nodes browse both `local.` and `clawdbot.internal.` automatically (no per-device knob). +Clawdbot standardizes on `clawdbot.internal.` for this mode. iOS/Android nodes +browse both `local.` and `clawdbot.internal.` automatically. ### Gateway config (recommended) -On the gateway host (the machine running the Gateway bridge), add to `~/.clawdbot/clawdbot.json` (JSON5): - ```json5 { bridge: { bind: "tailnet" }, // tailnet-only (recommended) @@ -31,21 +36,17 @@ On the gateway host (the machine running the Gateway bridge), add to `~/.clawdbo } ``` -### One-time DNS server setup (gateway host) - -On the gateway host (macOS), run: +### One‑time DNS server setup (gateway host) ```bash clawdbot dns setup --apply ``` This installs CoreDNS and configures it to: -- listen on port 53 **only** on the gateway’s Tailscale interface IPs -- serve the zone `clawdbot.internal.` from the gateway-owned zone file `~/.clawdbot/dns/clawdbot.internal.db` +- listen on port 53 only on the gateway’s Tailscale interfaces +- serve `clawdbot.internal.` from `~/.clawdbot/dns/clawdbot.internal.db` -The Gateway writes/updates that zone file when `discovery.wideArea.enabled` is true. - -Validate from any tailnet-connected machine: +Validate from a tailnet‑connected machine: ```bash dns-sd -B _clawdbot-bridge._tcp clawdbot.internal. @@ -59,99 +60,102 @@ In the Tailscale admin console: - Add a nameserver pointing at the gateway’s tailnet IP (UDP/TCP 53). - Add split DNS so the domain `clawdbot.internal` uses that nameserver. -Once clients accept tailnet DNS, iOS nodes can browse `_clawdbot-bridge._tcp` in `clawdbot.internal.` without multicast. -Wide-area beacons also include `tailnetDns` (when available) so the macOS app can auto-fill SSH targets off-LAN. +Once clients accept tailnet DNS, iOS nodes can browse +`_clawdbot-bridge._tcp` in `clawdbot.internal.` without multicast. ### Bridge listener security (recommended) -The bridge port (default `18790`) is a plain TCP service. By default it binds to `0.0.0.0`, which makes it reachable from *any* interface on the gateway machine (LAN/Wi‑Fi/Tailscale). - -For a tailnet-only setup, bind it to the Tailscale IP instead: +The bridge port (default `18790`) is a plain TCP service. By default it binds to +`0.0.0.0`, which makes it reachable from any interface on the gateway host. +For tailnet‑only setups: - Set `bridge.bind: "tailnet"` in `~/.clawdbot/clawdbot.json`. -- Restart the Gateway (or restart the macOS menubar app via [`./scripts/restart-mac.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/restart-mac.sh) on that machine). - -This keeps the bridge reachable only from devices on your tailnet (while still listening on loopback for local/SSH port-forwards). +- Restart the Gateway (or restart the macOS menubar app). ## What advertises -Only the **Node Gateway** (`clawd` / `clawdbot gateway`) advertises Bonjour beacons. - -- Implementation: [`src/infra/bonjour.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/bonjour.ts) -- Gateway wiring: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) +Only the Gateway (when the **bridge is enabled**) advertises `_clawdbot-bridge._tcp`. ## Service types - `_clawdbot-bridge._tcp` — bridge transport beacon (used by macOS/iOS/Android nodes). -## TXT keys (non-secret hints) +## TXT keys (non‑secret hints) -The Gateway advertises small non-secret hints to make UI flows convenient: +The Gateway advertises small non‑secret hints to make UI flows convenient: - `role=gateway` +- `displayName=` - `lanHost=.local` -- `sshPort=` (defaults to 22 when not overridden) -- `gatewayPort=` (informational; the Gateway WS is typically loopback-only) +- `gatewayPort=` (informational; Gateway WS is usually loopback‑only) - `bridgePort=` (only when bridge is enabled) -- `canvasPort=` (only when the canvas host is enabled + reachable; default `18793`; serves `/__clawdbot__/canvas/`) -- `cliPath=` (optional; absolute path to a runnable `clawdbot` entrypoint or binary) -- `tailnetDns=` (optional hint; auto-detected from Tailscale when available; may be absent) +- `canvasPort=` (only when the canvas host is enabled; default `18793`) +- `sshPort=` (defaults to 22 when not overridden) +- `transport=bridge` +- `cliPath=` (optional; absolute path to a runnable `clawdbot` entrypoint) +- `tailnetDns=` (optional hint when Tailnet is available) ## Debugging on macOS -Useful built-in tools: +Useful built‑in tools: - Browse instances: - - `dns-sd -B _clawdbot-bridge._tcp local.` + ```bash + dns-sd -B _clawdbot-bridge._tcp local. + ``` - Resolve one instance (replace ``): - - `dns-sd -L "" _clawdbot-bridge._tcp local.` + ```bash + dns-sd -L "" _clawdbot-bridge._tcp local. + ``` -If browsing shows instances but resolving fails, you’re usually hitting a LAN policy / multicast issue. +If browsing works but resolving fails, you’re usually hitting a LAN policy or +mDNS resolver issue. ## Debugging in Gateway logs -The Gateway writes a rolling log file (printed on startup as `gateway log file: ...`). +The Gateway writes a rolling log file (printed on startup as +`gateway log file: ...`). Look for `bonjour:` lines, especially: -Look for `bonjour:` lines, especially: - -- `bonjour: advertise failed ...` (probing/announce failure) +- `bonjour: advertise failed ...` - `bonjour: ... name conflict resolved` / `hostname conflict resolved` -- `bonjour: watchdog detected non-announced service; attempting re-advertise ...` (self-heal attempt after sleep/interface churn) +- `bonjour: watchdog detected non-announced service ...` ## Debugging on iOS node -The iOS node app discovers bridges via `NWBrowser` browsing `_clawdbot-bridge._tcp`. +The iOS node uses `NWBrowser` to discover `_clawdbot-bridge._tcp`. -To capture what the browser is doing: +To capture logs: +- Settings → Bridge → Advanced → **Discovery Debug Logs** +- Settings → Bridge → Advanced → **Discovery Logs** → reproduce → **Copy** -- Settings → Bridge → Advanced → enable **Discovery Debug Logs** -- Settings → Bridge → Advanced → open **Discovery Logs** → reproduce the “Searching…” / “No bridges found” case → **Copy** - -The log includes browser state transitions (`ready`, `waiting`, `failed`, `cancelled`) and result-set changes (added/removed counts). +The log includes browser state transitions and result‑set changes. ## Common failure modes -- **Bonjour doesn’t cross networks**: London/Vienna style setups require Tailnet (MagicDNS/IP) or SSH. -- **Multicast blocked**: some Wi‑Fi networks (enterprise/hotels) disable mDNS; expect “no results”. -- **Sleep / interface churn**: macOS may temporarily drop mDNS results when switching networks; retry. -- **Browse works but resolve fails (iOS “NoSuchRecord”)**: make sure the advertiser publishes a valid SRV target hostname. - - Implementation detail: `@homebridge/ciao` defaults `hostname` to the *service instance name* when `hostname` is omitted. If your instance name contains spaces/parentheses, some resolvers can fail to resolve the implied A/AAAA record. - - Fix: set an explicit DNS-safe `hostname` (single label; no `.local`) in [`src/infra/bonjour.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/bonjour.ts). +- **Bonjour doesn’t cross networks**: use Tailnet or SSH. +- **Multicast blocked**: some Wi‑Fi networks disable mDNS. +- **Sleep / interface churn**: macOS may temporarily drop mDNS results; retry. +- **Browse works but resolve fails**: keep machine names simple (avoid emojis or + punctuation), then restart the Gateway. The bridge instance name derives from + the host name, so overly complex names can confuse some resolvers. -## Escaped instance names (`\\032`) -Bonjour/DNS-SD often escapes bytes in service instance names as decimal `\\DDD` sequences (e.g. spaces become `\\032`). +## Escaped instance names (`\032`) + +Bonjour/DNS‑SD often escapes bytes in service instance names as decimal `\DDD` +sequences (e.g. spaces become `\032`). - This is normal at the protocol level. -- UIs should decode for display (iOS uses `BonjourEscapes.decode` in `apps/shared/ClawdbotKit`). +- UIs should decode for display (iOS uses `BonjourEscapes.decode`). ## Disabling / configuration - `CLAWDBOT_DISABLE_BONJOUR=1` disables advertising. -- `CLAWDBOT_BRIDGE_ENABLED=0` disables the bridge listener (and therefore the bridge beacon). -- `bridge.bind` / `bridge.port` in `~/.clawdbot/clawdbot.json` control bridge bind/port (preferred). -- `CLAWDBOT_BRIDGE_HOST` / `CLAWDBOT_BRIDGE_PORT` still work as a back-compat override when `bridge.bind` / `bridge.port` are not set. -- `CLAWDBOT_SSH_PORT` overrides the SSH port advertised in `_clawdbot-bridge._tcp`. -- `CLAWDBOT_TAILNET_DNS` publishes a `tailnetDns` hint (MagicDNS) in `_clawdbot-bridge._tcp`. If unset, the gateway auto-detects Tailscale and publishes the MagicDNS name when possible. +- `CLAWDBOT_BRIDGE_ENABLED=0` disables the bridge listener (and the bridge beacon). +- `bridge.bind` / `bridge.port` in `~/.clawdbot/clawdbot.json` control bridge bind/port. +- `CLAWDBOT_BRIDGE_HOST` / `CLAWDBOT_BRIDGE_PORT` still work as back‑compat overrides. +- `CLAWDBOT_SSH_PORT` overrides the SSH port advertised in TXT. +- `CLAWDBOT_TAILNET_DNS` publishes a MagicDNS hint in TXT. +- `CLAWDBOT_CLI_PATH` overrides the advertised CLI path. ## Related docs diff --git a/docs/gateway/configuration.md b/docs/gateway/configuration.md index 8c488e006..3f37921a1 100644 --- a/docs/gateway/configuration.md +++ b/docs/gateway/configuration.md @@ -1383,8 +1383,8 @@ Notes: - `z.ai/*` and `z-ai/*` are accepted aliases and normalize to `zai/*`. - If `ZAI_API_KEY` is missing, requests to `zai/*` will fail with an auth error at runtime. - Example error: `No API key found for provider "zai".` -- Z.AI’s general API endpoint is `https://api.z.ai/api/paas/v4`. The GLM Coding - Plan uses the dedicated Coding endpoint `https://api.z.ai/api/coding/paas/v4`. +- Z.AI’s general API endpoint is `https://api.z.ai/api/paas/v4`. GLM coding + requests use the dedicated Coding endpoint `https://api.z.ai/api/coding/paas/v4`. The built-in `zai` provider uses the Coding endpoint. If you need the general endpoint, define a custom provider in `models.providers` with the base URL override (see the custom providers section above). diff --git a/docs/gateway/discovery.md b/docs/gateway/discovery.md index dafc84dad..80b771d41 100644 --- a/docs/gateway/discovery.md +++ b/docs/gateway/discovery.md @@ -44,7 +44,7 @@ Target direction: Troubleshooting and beacon details: [`docs/bonjour.md`](/gateway/bonjour). -#### Current implementation +#### Service beacon details - Service types: - `_clawdbot-bridge._tcp` (bridge transport beacon) @@ -98,15 +98,8 @@ The gateway is the source of truth for node/client admission. - scopes/ACLs (bridge is not a raw proxy to every gateway method) - rate limits -## Where the code lives (target architecture) +## Responsibilities by component -- Node gateway: - - advertises discovery beacons (Bonjour) - - owns pairing storage + decisions - - runs the bridge listener (direct transport) -- macOS app: - - UI for picking a gateway, showing pairing prompts, and troubleshooting - - SSH tunneling only for the fallback path -- iOS node: - - browses Bonjour (LAN) as a convenience only - - uses direct transport + pairing to connect to the gateway +- **Gateway**: advertises discovery beacons, owns pairing decisions, runs the bridge listener. +- **macOS app**: helps you pick a gateway, shows pairing prompts, and uses SSH only as a fallback. +- **iOS/Android nodes**: browse Bonjour as a convenience and connect via the paired bridge. diff --git a/docs/gateway/heartbeat.md b/docs/gateway/heartbeat.md index 830d7d48f..431c4848b 100644 --- a/docs/gateway/heartbeat.md +++ b/docs/gateway/heartbeat.md @@ -1,47 +1,33 @@ --- -summary: "Plan for heartbeat polling messages and notification rules" +summary: "Heartbeat polling messages and notification rules" read_when: - Adjusting heartbeat cadence or messaging --- # Heartbeat (Gateway) -Heartbeat runs periodic agent turns in the **main session** so the model can -surface anything that needs attention without spamming the user. +Heartbeat runs **periodic agent turns** in the main session so the model can +surface anything that needs attention without spamming you. ## Defaults -- Interval: `30m` (set `agent.heartbeat.every` to change, `0m` disables). + +- Interval: `30m` (set `agent.heartbeat.every`; use `0m` to disable). - Prompt body (configurable via `agent.heartbeat.prompt`): `Read HEARTBEAT.md if exists. Consider outstanding tasks. Checkup sometimes on your human during (user local) day time.` -- Heartbeat prompt text is sent **verbatim** as the user message. Clawdbot does - not append extra body text. The system prompt includes a Heartbeats section - and the run is flagged as a heartbeat internally. +- The heartbeat prompt is sent **verbatim** as the user message. The system + prompt includes a “Heartbeat” section and the run is flagged internally. -## Prompt contract -- If nothing needs attention, the model should reply `HEARTBEAT_OK`. -- During heartbeat runs, Clawdbot treats `HEARTBEAT_OK` as an ack when it appears at - the **start or end** of the reply. Clawdbot strips the token and discards the - reply if the remaining content is **≤ `ackMaxChars`** (default: 30). -- If `HEARTBEAT_OK` is in the **middle** of a reply, it is not treated specially. -- For alerts, do **not** include `HEARTBEAT_OK`; return only the alert text. +## Response contract -## Prompt overrides -- Overriding `agent.heartbeat.prompt` **replaces** the default body. Nothing is - merged for you. -- If you still want `HEARTBEAT.md` instructions, keep a line like - `Read HEARTBEAT.md if exists` in your custom prompt. -- `HEARTBEAT_OK` handling stays the same; changing the prompt won’t break acks. +- If nothing needs attention, reply with **`HEARTBEAT_OK`**. +- During heartbeat runs, Clawdbot treats `HEARTBEAT_OK` as an ack when it appears + at the **start or end** of the reply. The token is stripped and the reply is + dropped if the remaining content is **≤ `ackMaxChars`** (default: 30). +- If `HEARTBEAT_OK` appears in the **middle** of a reply, it is not treated + specially. +- For alerts, **do not** include `HEARTBEAT_OK`; return only the alert text. -### Stray `HEARTBEAT_OK` outside heartbeats -If the model accidentally includes `HEARTBEAT_OK` at the start or end of a -normal (non-heartbeat) reply, Clawdbot strips the token and logs a verbose -message. If the reply is only `HEARTBEAT_OK`, it is dropped. - -### Outbound normalization (all providers) -For **all providers** (WhatsApp/Web, Telegram, Slack, Discord, Signal, iMessage), -Clawdbot applies the same filtering to tool summaries, streaming block replies, -and final replies: -- drop payloads that are only `HEARTBEAT_OK` with no media -- strip `HEARTBEAT_OK` at the edges when mixed with other text +Outside heartbeats, stray `HEARTBEAT_OK` at the start/end of a message is stripped +and logged; a message that is only `HEARTBEAT_OK` is dropped. ## Config @@ -51,8 +37,8 @@ and final replies: heartbeat: { every: "30m", // default: 30m (0m disables) model: "anthropic/claude-opus-4-5", - target: "last", // last | whatsapp | telegram | discord | slack | signal | imessage | none - to: "+15551234567", // optional provider-specific override (e.g. E.164 or chat id) + target: "last", // last | whatsapp | telegram | discord | slack | signal | imessage | none + to: "+15551234567", // optional provider-specific override prompt: "Read HEARTBEAT.md if exists. Consider outstanding tasks. Checkup sometimes on your human during (user local) day time.", ackMaxChars: 30 // max chars allowed after HEARTBEAT_OK } @@ -60,47 +46,45 @@ and final replies: } ``` -### Fields -- `every`: heartbeat interval (duration string; default unit minutes). Default: - `30m`. Set to `0m` to disable. -- `model`: optional model override for heartbeat runs (`provider/model`). -- `target`: where heartbeat output is delivered. - - `last` (default): send to the last used external provider. - - `whatsapp` / `telegram` / `discord` / `slack` / `signal` / `imessage`: force the provider (optionally set `to`). - - `none`: do not deliver externally; output stays in the session (WebChat-visible). -- `to`: optional recipient override (E.164 for WhatsApp, chat id for Telegram). -- `prompt`: optional override for the heartbeat body (default shown above). Safe to - change; heartbeat acks are still keyed off `HEARTBEAT_OK`. -- `ackMaxChars`: max chars allowed after `HEARTBEAT_OK` before delivery (default: 30). +### Field notes -## Cost awareness -Heartbeats run full agent turns. Shorter intervals burn more tokens. Be -intentional about `every`, keep `HEARTBEAT.md` tiny, and consider a cheaper -`model` or `target: "none"` if you only want internal state updates. +- `every`: heartbeat interval (duration string; default unit = minutes). +- `model`: optional model override for heartbeat runs (`provider/model`). +- `target`: + - `last` (default): deliver to the last used external provider. + - explicit provider: `whatsapp` / `telegram` / `discord` / `slack` / `signal` / `imessage`. + - `none`: run the heartbeat but **do not deliver** externally. +- `to`: optional recipient override (E.164 for WhatsApp, chat id for Telegram, etc.). +- `prompt`: overrides the default prompt body (not merged). +- `ackMaxChars`: max chars allowed after `HEARTBEAT_OK` before delivery. + +## Delivery behavior + +- Heartbeats run in the **main session** (`main`, or `global` when scope is global). +- If the main queue is busy, the heartbeat is skipped and retried later. +- If `target` resolves to no external destination, the run still happens but no + outbound message is sent. +- Heartbeat-only replies do **not** keep the session alive; the last `updatedAt` + is restored so idle expiry behaves normally. ## HEARTBEAT.md (optional) + If a `HEARTBEAT.md` file exists in the workspace, the default prompt tells the agent to read it. Keep it tiny (short checklist or reminders) to avoid prompt bloat. -## Behavior -- Runs in the main session (`main`, or `global` when scope is global). -- Uses the main lane queue; if requests are in flight, the wake is retried. -- Empty output or `HEARTBEAT_OK` is treated as “ok” and does **not** keep the - session alive (`updatedAt` is restored). -- If `target` resolves to no external destination (no last route or `none`), the - heartbeat still runs but no outbound message is sent. +## Manual wake (on-demand) -## Ideas for use -- Check up on the user (light, respectful pings during daytime). -- Handle mundane tasks (triage inboxes, summarize queues, refresh notes). -- Nudge on open loops or reminders. -- Background monitoring (health checks, status polling, low-priority alerts). -- Scheduled routines (use [Cron jobs](/automation/cron-jobs) when you - need exact schedules or isolated runs). +You can enqueue a system event and trigger an immediate heartbeat with: -## Wake hook -- The gateway exposes a heartbeat wake hook so cron/jobs/webhooks can request an - immediate run (`requestHeartbeatNow`). -- `wake` endpoints should enqueue system events and optionally trigger a wake; the - heartbeat runner picks those up on the next tick or immediately. +```bash +clawdbot wake --text "Check for urgent follow-ups" --mode now +``` + +Use `--mode next-heartbeat` to wait for the next scheduled tick. + +## Cost awareness + +Heartbeats run full agent turns. Shorter intervals burn more tokens. Keep +`HEARTBEAT.md` small and consider a cheaper `model` or `target: "none"` if you +only want internal state updates. diff --git a/docs/gateway/index.md b/docs/gateway/index.md index 2b232303a..058e7631f 100644 --- a/docs/gateway/index.md +++ b/docs/gateway/index.md @@ -127,7 +127,9 @@ See also: [`docs/presence.md`](/concepts/presence) for how presence is produced/ ## Typing and validation - Server validates every inbound frame with AJV against JSON Schema emitted from the protocol definitions. - Clients (TS/Swift) consume generated types (TS directly; Swift via the repo’s generator). -- Types live in [`src/gateway/protocol/*.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/protocol/*.ts); regenerate schemas/models with `pnpm protocol:gen` (writes [`dist/protocol.schema.json`](https://github.com/clawdbot/clawdbot/blob/main/dist/protocol.schema.json)) and `pnpm protocol:gen:swift` (writes [`apps/macos/Sources/ClawdbotProtocol/GatewayModels.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/ClawdbotProtocol/GatewayModels.swift)). +- Protocol definitions are the source of truth; regenerate schema/models with: + - `pnpm protocol:gen` + - `pnpm protocol:gen:swift` ## Connection snapshot - `hello-ok` includes a `snapshot` with `presence`, `health`, `stateVersion`, and `uptimeMs` plus `policy {maxPayload,maxBufferedBytes,tickIntervalMs}` so clients can render immediately without extra requests. diff --git a/docs/gateway/logging.md b/docs/gateway/logging.md index f8b7555a3..2a8ed3478 100644 --- a/docs/gateway/logging.md +++ b/docs/gateway/logging.md @@ -10,12 +10,10 @@ read_when: Clawdbot has two log “surfaces”: - **Console output** (what you see in the terminal / Debug UI). -- **File logs** (JSON lines) written by the internal logger. +- **File logs** (JSON lines) written by the gateway logger. ## File-based logger -Clawdbot uses a file logger backed by `tslog` ([`src/logging.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/logging.ts)). - - Default rolling log file is under `/tmp/clawdbot/` (one file per day): `clawdbot-YYYY-MM-DD.log` - The log file path and level can be configured via `~/.clawdbot/clawdbot.json`: - `logging.file` @@ -40,9 +38,8 @@ clawdbot logs --follow ## Console capture -The CLI entrypoint enables console capture ([`src/index.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/index.ts) calls `enableConsoleCapture()`). -That means every `console.log/info/warn/error/debug/trace` is also written into the file logs, -while still behaving normally on stdout/stderr. +The CLI captures `console.log/info/warn/error/debug/trace` and writes them to file logs, +while still printing to stdout/stderr. You can tune console verbosity independently via: @@ -94,13 +91,8 @@ clawdbot gateway --verbose --ws-log full ## Console formatting (subsystem logging) -Clawdbot formats console logs via a small wrapper on top of the existing stack: - -- **tslog** for structured file logs ([`src/logging.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/logging.ts)) -- **chalk** for colors ([`src/globals.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/globals.ts)) - The console formatter is **TTY-aware** and prints consistent, prefixed lines. -Subsystem loggers are created via `createSubsystemLogger("gateway")`. +Subsystem loggers keep output grouped and scannable. Behavior: diff --git a/docs/gateway/pairing.md b/docs/gateway/pairing.md index 9afa96397..8acfc4846 100644 --- a/docs/gateway/pairing.md +++ b/docs/gateway/pairing.md @@ -7,103 +7,83 @@ read_when: --- # Gateway-owned pairing (Option B) -Goal: The Gateway (`clawd`) is the **source of truth** for which nodes are allowed to join the network. - -This enables: -- Headless approval via terminal/CLI (no Swift UI required). -- Optional macOS UI approval (Swift app is just a frontend). -- One consistent membership store for iOS, mac nodes, future hardware nodes. +In Gateway-owned pairing, the **Gateway** is the source of truth for which nodes +are allowed to join. UIs (macOS app, future clients) are just frontends that +approve or reject pending requests. ## Concepts -- **Pending request**: a node asked to join; requires explicit approve/reject. -- **Paired node**: node is allowed; gateway returns an auth token for subsequent connects. -- **Bridge**: direct transport endpoint owned by the gateway. The bridge does not decide membership. + +- **Pending request**: a node asked to join; requires approval. +- **Paired node**: approved node with an issued auth token. +- **Bridge**: transport endpoint only; it forwards requests but does not decide + membership. + +## How pairing works + +1. A node connects to the bridge and requests pairing. +2. The Gateway stores a **pending request** and emits `node.pair.requested`. +3. You approve or reject the request (CLI or UI). +4. On approval, the Gateway issues a **new token** (tokens are rotated on re‑pair). +5. The node reconnects using the token and is now “paired”. + +Pending requests expire automatically after **5 minutes**. + +## CLI workflow (headless friendly) + +```bash +clawdbot nodes pending +clawdbot nodes approve +clawdbot nodes reject +clawdbot nodes status +clawdbot nodes rename --node --name "Living Room iPad" +``` + +`nodes status` shows paired/connected nodes and their capabilities. ## API surface (gateway protocol) -These are conceptual method names; wire them into [`src/gateway/protocol/schema.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/protocol/schema.ts) and regenerate Swift types. -### Events -- `node.pair.requested` - - Emitted whenever a new pending pairing request is created. - - Payload: - - `requestId` (string) - - `nodeId` (string) - - `displayName?` (string) - - `platform?` (string) - - `version?` (string) - - `remoteIp?` (string) - - `silent?` (boolean) — hint that the UI may attempt auto-approval - - `ts` (ms since epoch) -- `node.pair.resolved` - - Emitted when a pending request is approved/rejected. - - Payload: - - `requestId` (string) - - `nodeId` (string) - - `decision` ("approved" | "rejected" | "expired") - - `ts` (ms since epoch) +Events: +- `node.pair.requested` — emitted when a new pending request is created. +- `node.pair.resolved` — emitted when a request is approved/rejected/expired. -### Methods -- `node.pair.request` - - Creates (or returns) a pending request. - - Params: node metadata (same shape as `node.pair.requested` payload, minus `requestId`/`ts`). - - Optional `silent` flag hints that the UI can attempt an SSH auto-approve before showing an alert. - - Result: - - `status` ("pending") - - `created` (boolean) — whether this call created the pending request - - `request` (pending request object), including `isRepair` when the node was already paired - - Security: **never returns an existing token**. If a paired node “lost” its token, it must be approved again (token rotation). -- `node.pair.list` - - Returns: - - `pending[]` (pending requests) - - `paired[]` (paired node records) -- `node.pair.approve` - - Params: `{ requestId }` - - Result: `{ requestId, node: { nodeId, token, ... } }` - - Must be idempotent (first decision wins). -- `node.pair.reject` - - Params: `{ requestId }` - - Result: `{ requestId, nodeId }` -- `node.pair.verify` - - Params: `{ nodeId, token }` - - Result: `{ ok: boolean, node?: { nodeId, ... } }` - -## CLI flows -CLI must be able to fully operate without any GUI: -- `clawdbot nodes pending` -- `clawdbot nodes approve ` -- `clawdbot nodes reject ` -- `clawdbot nodes status` (paired nodes + connection status/capabilities) - -Optional interactive helper: -- `clawdbot nodes watch` (subscribe to `node.pair.requested` and prompt in-place) - -Implementation pointers: -- CLI commands: [`src/cli/nodes-cli.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/nodes-cli.ts) -- Gateway handlers + events: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) + [`src/gateway/server-methods/nodes.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/nodes.ts) -- Pairing store: [`src/infra/node-pairing.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/node-pairing.ts) (under `~/.clawdbot/nodes/`) -- Optional macOS UI prompt (frontend only): [`apps/macos/Sources/Clawdbot/NodePairingApprovalPrompter.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/NodePairingApprovalPrompter.swift) - - Push-first: listens to `node.pair.requested`/`node.pair.resolved`, does a `node.pair.list` on startup/reconnect, - and only runs a slow safety poll while a request is pending/visible. - -## Storage (private, local) -Gateway stores the authoritative state under `~/.clawdbot/`: -- `~/.clawdbot/nodes/paired.json` -- `~/.clawdbot/nodes/pending.json` (or `~/.clawdbot/nodes/pending/*.json`) +Methods: +- `node.pair.request` — create or reuse a pending request. +- `node.pair.list` — list pending + paired nodes. +- `node.pair.approve` — approve a pending request (issues token). +- `node.pair.reject` — reject a pending request. +- `node.pair.verify` — verify `{ nodeId, token }`. Notes: -- Tokens are secrets. Treat `paired.json` as sensitive. -- Pending entries should have a TTL (e.g. 5 minutes) and expire automatically. +- `node.pair.request` is idempotent per node: repeated calls return the same + pending request. +- Approval **always** generates a fresh token; no token is ever returned from + `node.pair.request`. +- Requests may include `silent: true` as a hint for auto-approval flows. -## Bridge integration -Target direction: -- The gateway runs the bridge listener (LAN/tailnet-facing) and advertises discovery beacons (Bonjour). -- The bridge is transport only; it forwards/scopes requests and enforces ACLs, but pairing decisions are made by the gateway. +## Auto-approval (macOS app) -The macOS UI (Swift) can: -- Subscribe to `node.pair.requested`, show an alert (including `remoteIp`), and call `node.pair.approve` or `node.pair.reject`. -- Or ignore/dismiss (“Later”) and let CLI handle it. -- When `silent` is set, it can try a short SSH probe (same user) and auto-approve if reachable; otherwise fall back to the normal alert. +The macOS app can optionally attempt a **silent approval** when: +- the request is marked `silent`, and +- the app can verify an SSH connection to the gateway host using the same user. -## Implementation note -If the bridge is only provided by the macOS app, then “no Swift app running” cannot work end-to-end. -The long-term goal is to move bridge hosting + Bonjour advertising into the Node gateway so headless pairing works by default. +If silent approval fails, it falls back to the normal “Approve/Reject” prompt. + +## Storage (local, private) + +Pairing state is stored under the Gateway state directory (default `~/.clawdbot`): + +- `~/.clawdbot/nodes/paired.json` +- `~/.clawdbot/nodes/pending.json` + +If you override `CLAWDBOT_STATE_DIR`, the `nodes/` folder moves with it. + +Security notes: +- Tokens are secrets; treat `paired.json` as sensitive. +- Rotating a token requires re-approval (or deleting the node entry). + +## Bridge behavior + +- The bridge is **transport only**; it does not store membership. +- If the Gateway is offline or pairing is disabled, nodes cannot pair. +- If the bridge is running but the Gateway is in remote mode, pairing still + happens against the remote Gateway’s store. diff --git a/docs/nodes/index.md b/docs/nodes/index.md index f27483d1d..6cff0dfb7 100644 --- a/docs/nodes/index.md +++ b/docs/nodes/index.md @@ -140,11 +140,3 @@ Nodes may include a `permissions` map in `node.list` / `node.describe`, keyed by - The macOS menubar app connects to the Gateway bridge as a node (so `clawdbot nodes …` works against this Mac). - In remote mode, the app opens an SSH tunnel for the bridge port and connects to `localhost`. - -## Where to look in code - -- CLI wiring: [`src/cli/nodes-cli.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/nodes-cli.ts) -- Canvas snapshot decoding/temp paths: [`src/cli/nodes-canvas.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/nodes-canvas.ts) -- Duration parsing for CLI: [`src/cli/parse-duration.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/parse-duration.ts) -- iOS node commands: [`apps/ios/Sources/Model/NodeAppModel.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/ios/Sources/Model/NodeAppModel.swift) -- Android node commands: `apps/android/app/src/main/java/com/clawdbot/android/node/*` diff --git a/docs/nodes/location-command.md b/docs/nodes/location-command.md index af9d893c5..79ba38841 100644 --- a/docs/nodes/location-command.md +++ b/docs/nodes/location-command.md @@ -76,7 +76,7 @@ Goal: model can request location even when node is backgrounded, but only when: Push-triggered flow (future): 1) Gateway sends a push to the node (silent push or FCM data). -2) Node wakes briefly and calls `location.get` internally. +2) Node wakes briefly and requests location from the device. 3) Node forwards payload to Gateway. Notes: diff --git a/docs/platforms/ios.md b/docs/platforms/ios.md index 939d5c044..3bb9b5a2d 100644 --- a/docs/platforms/ios.md +++ b/docs/platforms/ios.md @@ -1,381 +1,105 @@ --- -summary: "iOS app (node): architecture + connection runbook" +summary: "iOS node app: connect to the Gateway, pairing, canvas, and troubleshooting" read_when: - Pairing or reconnecting the iOS node - - Debugging iOS bridge discovery or auth - - Sending screen/canvas commands to iOS - - Designing iOS node + gateway integration - - Extending the Gateway protocol for node/canvas commands - - Implementing Bonjour pairing or transport security + - Running the iOS app from source + - Debugging bridge discovery or canvas commands --- # iOS App (Node) -Status: prototype implemented (internal) · Date: 2025-12-13 +Availability: internal preview. The iOS app is not publicly distributed yet. -## Support snapshot -- Role: companion node app (iOS does not host the Gateway). -- Gateway required: yes (run it on macOS, Linux, or Windows via WSL2). -- Install: [Getting Started](/start/getting-started) + [Pairing](/gateway/pairing). -- Gateway: [Runbook](/gateway) + [Configuration](/gateway/configuration). +## What it does -## System control -System control (launchd/systemd) lives on the Gateway host. See [Gateway](/gateway). +- Connects to a Gateway over the bridge (LAN or tailnet). +- Exposes node capabilities: Canvas, Screen snapshot, Camera capture, Location, Talk mode, Voice wake. +- Receives `node.invoke` commands and reports node status events. -## Connection Runbook +## Requirements -This is the practical “how do I connect the iOS node” guide: +- Gateway running on another device (macOS, Linux, or Windows via WSL2). +- Bridge enabled (default). +- Network path: + - Same LAN via Bonjour, **or** + - Tailnet via unicast DNS-SD (`clawdbot.internal.`), **or** + - Manual host/port (fallback). -**iOS app** ⇄ (Bonjour + TCP bridge) ⇄ **Gateway bridge** ⇄ (loopback WS) ⇄ **Gateway** +## Quick start (pair + connect) -The Gateway WebSocket stays loopback-only (`ws://127.0.0.1:18789`). The iOS node talks to the LAN-facing **bridge** (default `tcp://0.0.0.0:18790`) and uses Gateway-owned pairing. - -### Prerequisites - -- You can run the Gateway on the “master” machine. -- iOS node app can reach the gateway bridge: - - Same LAN with Bonjour/mDNS, **or** - - Same Tailscale tailnet using Wide-Area Bonjour / unicast DNS-SD (see below), **or** - - Manual bridge host/port (fallback) -- You can run the CLI (`clawdbot`) on the gateway machine (or via SSH). - -### 1) Start the Gateway (with bridge enabled) - -Bridge is enabled by default (disable via `CLAWDBOT_BRIDGE_ENABLED=0`). +1) Start the Gateway (bridge enabled by default): ```bash -clawdbot gateway --port 18789 --verbose +clawdbot gateway --port 18789 ``` -Confirm in logs you see something like: -- `bridge listening on tcp://0.0.0.0:18790 (node)` +2) In the iOS app, open Settings and pick a discovered gateway (or enable Manual Bridge and enter host/port). -For tailnet-only setups (recommended for Vienna ⇄ London), bind the bridge to the gateway machine’s Tailscale IP instead: - -- Set `bridge.bind: "tailnet"` in `~/.clawdbot/clawdbot.json` on the gateway host. -- Restart the Gateway / macOS menubar app. - -### 2) Verify Bonjour discovery (optional but recommended) - -From the gateway machine: - -```bash -dns-sd -B _clawdbot-bridge._tcp local. -``` - -You should see your gateway advertising `_clawdbot-bridge._tcp`. - -If browse works, but the iOS node can’t connect, try resolving one instance: - -```bash -dns-sd -L "" _clawdbot-bridge._tcp local. -``` - -More debugging notes: [`docs/bonjour.md`](/gateway/bonjour). - -#### Tailnet (Vienna ⇄ London) discovery via unicast DNS-SD - -If the iOS node and the gateway are on different networks but connected via Tailscale, multicast mDNS won’t cross the boundary. Use Wide-Area Bonjour / unicast DNS-SD instead: - -1) Set up a DNS-SD zone (example `clawdbot.internal.`) on the gateway host and publish `_clawdbot-bridge._tcp` records. -2) Configure Tailscale split DNS for `clawdbot.internal` pointing at that DNS server. - -Details and example CoreDNS config: [`docs/bonjour.md`](/gateway/bonjour). - -### 3) Connect from the iOS node app - -In the iOS node app: -- Pick the discovered bridge (or hit refresh). -- If not paired yet, it will initiate pairing automatically. -- After the first successful pairing, it will auto-reconnect **strictly to the last discovered gateway** on launch (including after reinstall), as long as the iOS Keychain entry is still present. - -#### Connection indicator (always visible) - -The Settings tab icon shows a small status dot: -- **Green**: connected to the bridge -- **Yellow**: connecting (subtle pulse) -- **Red**: not connected / error - -### 4) Approve pairing (CLI) - -On the gateway machine: +3) Approve the pairing request on the gateway host: ```bash clawdbot nodes pending -``` - -Approve the request: - -```bash clawdbot nodes approve ``` -After approval, the iOS node receives/stores the token and reconnects authenticated. - -Pairing details: [`docs/gateway/pairing.md`](/gateway/pairing). - -### 5) Verify the node is connected - -- In the macOS app: **Instances** tab should show something like `iOS Node (...)` with a green “Active” presence dot shortly after connect. -- Via nodes status (paired + connected): - ```bash - clawdbot nodes status - ``` -- Via Gateway (paired + connected): - ```bash - clawdbot gateway call node.list --params "{}" - ``` -- Via Gateway presence (legacy-ish, still useful): - ```bash - clawdbot gateway call system-presence --params "{}" - ``` - Look for the node `instanceId` (often a UUID). - -### 6) Drive the iOS Canvas (draw / snapshot) - -The iOS node runs a WKWebView “Canvas” scaffold which exposes: -- `window.__clawdbot.canvas` -- `window.__clawdbot.ctx` (2D context) -- `window.__clawdbot.setStatus(title, subtitle)` - -#### Gateway Canvas Host (recommended for web content) - -If you want the node to show real HTML/CSS/JS that the agent can edit on disk, point it at the Gateway canvas host. - -Note: nodes always use the standalone canvas host on `canvasHost.port` (default `18793`), bound to the bridge interface. - -1) Create `~/clawd/canvas/index.html` on the gateway host. - -2) Navigate the node to it (LAN): +4) Verify connection: ```bash -clawdbot nodes invoke --node "iOS Node" --command canvas.navigate --params '{"url":"http://.local:18793/__clawdbot__/canvas/"}' +clawdbot nodes status +clawdbot gateway call node.list --params "{}" +``` + +## Discovery paths + +### Bonjour (LAN) + +The Gateway advertises `_clawdbot-bridge._tcp` on `local.`. The iOS app lists these automatically. + +### Tailnet (cross-network) + +If mDNS is blocked, use a unicast DNS-SD zone (recommended domain: `clawdbot.internal.`) and Tailscale split DNS. +See [`docs/bonjour.md`](/gateway/bonjour) for the CoreDNS example. + +### Manual host/port + +In Settings, enable **Manual Bridge** and enter the gateway host + port (default `18790`). + +## Canvas + A2UI + +The iOS node renders a WKWebView canvas. Use `node.invoke` to drive it: + +```bash +clawdbot nodes invoke --node "iOS Node" --command canvas.navigate --params '{"url":"http://:18793/__clawdbot__/canvas/"}' ``` Notes: -- The server injects a live-reload client into HTML and reloads on file changes. -- A2UI is hosted on the same canvas host at `http://:18793/__clawdbot__/a2ui/`. -- Tailnet (optional): if both devices are on Tailscale, use a MagicDNS name or tailnet IP instead of `.local`, e.g. `http://:18793/__clawdbot__/canvas/`. -- iOS may require App Transport Security allowances to load plain `http://` URLs; if it fails to load, prefer HTTPS or adjust the iOS app’s ATS config. +- The Gateway canvas host serves `/__clawdbot__/canvas/` and `/__clawdbot__/a2ui/`. +- The iOS node auto-navigates to A2UI on connect when a canvas host URL is advertised. +- Return to the built-in scaffold with `canvas.navigate` and `{"url":""}`. -#### Draw with `canvas.eval` +### Canvas eval / snapshot ```bash -clawdbot nodes invoke --node "iOS Node" --command canvas.eval --params "$(cat <<'JSON' -{"javaScript":"(() => { const {ctx,setStatus} = window.__clawdbot; setStatus('Drawing','…'); ctx.clearRect(0,0,innerWidth,innerHeight); ctx.lineWidth=6; ctx.strokeStyle='#ff2d55'; ctx.beginPath(); ctx.moveTo(40,40); ctx.lineTo(innerWidth-40, innerHeight-40); ctx.stroke(); setStatus(null,null); return 'ok'; })()"} -JSON -)" +clawdbot nodes invoke --node "iOS Node" --command canvas.eval --params '{"javaScript":"(() => { const {ctx} = window.__clawdbot; ctx.clearRect(0,0,innerWidth,innerHeight); ctx.lineWidth=6; ctx.strokeStyle=\"#ff2d55\"; ctx.beginPath(); ctx.moveTo(40,40); ctx.lineTo(innerWidth-40, innerHeight-40); ctx.stroke(); return \"ok\"; })()"}' ``` -#### Snapshot with `canvas.snapshot` - ```bash -clawdbot nodes invoke --node 192.168.0.88 --command canvas.snapshot --params '{"maxWidth":900}' +clawdbot nodes invoke --node "iOS Node" --command canvas.snapshot --params '{"maxWidth":900,"format":"jpeg"}' ``` -The response includes `{ format, base64 }` image data (default `format="jpeg"`; pass `{"format":"png"}` when you specifically need lossless PNG). +## Voice wake + talk mode -### Common gotchas +- Voice wake and talk mode are available in Settings. +- iOS may suspend background audio; treat voice features as best-effort when the app is not active. -- **iOS in background:** all `canvas.*` commands fail fast with `NODE_BACKGROUND_UNAVAILABLE` (bring the iOS node app to foreground). -- **Return to default scaffold:** `canvas.navigate` with `{"url":""}` or `{"url":"/"}` returns to the built-in scaffold page. -- **mDNS blocked:** some networks block multicast; use a different LAN or plan a tailnet-capable bridge (see [`docs/discovery.md`](/gateway/discovery)). -- **Wrong node selector:** `--node` can be the node id (UUID), display name (e.g. `iOS Node`), IP, or an unambiguous prefix. If it’s ambiguous, the CLI will tell you. -- **Stale pairing / Keychain cleared:** if the pairing token is missing (or iOS Keychain was wiped), the node must pair again; approve a new pending request. -- **App reinstall but no reconnect:** the node restores `instanceId` + last bridge preference from Keychain; if it still comes up “unpaired”, verify Keychain persistence on your device/simulator and re-pair once. +## Common errors -## Design + Architecture - -### Goals -- Build an **iOS app** that acts as a **remote node** for Clawdbot: - - **Voice trigger** (wake-word / always-listening intent) that forwards transcripts to the Gateway `agent` method. - - **Canvas** surface that the agent can control: navigate, draw/render, evaluate JS, snapshot. -- **Dead-simple setup**: - - Auto-discover the host on the local network via **Bonjour**. - - One-tap pairing with an approval prompt on the Mac. - - iOS is **never** a local gateway; it is always a remote node. -- Operational clarity: - - When iOS is backgrounded, voice may still run; **canvas commands must fail fast** with a structured error. - - Provide **settings**: node display name, enable/disable voice wake, pairing status. - -Non-goals (v1): -- Exposing the Node Gateway directly on the LAN. -- Supporting arbitrary third-party “plugins” on iOS. -- Perfect App Store compliance; this is **internal-only** initially. - -### Current repo reality (constraints we respect) -- The Gateway WebSocket server binds to `127.0.0.1:18789` ([`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts)) with an optional `CLAWDBOT_GATEWAY_TOKEN`. -- The Gateway exposes a Canvas file server (`canvasHost`) on `canvasHost.port` (default `18793`), so nodes can `canvas.navigate` to `http://:18793/__clawdbot__/canvas/` and auto-reload on file changes ([`docs/configuration.md`](/gateway/configuration)). -- macOS “Canvas” is controlled via the Gateway node protocol (`canvas.*`), matching iOS/Android ([`docs/mac/canvas.md`](/platforms/mac/canvas)). -- Voice wake forwards via `GatewayChannel` to Gateway `agent` (mac app: `VoiceWakeForwarder` → `GatewayConnection.sendAgent`). - -### Recommended topology (B): Gateway-owned Bridge + loopback Gateway -Keep the Node gateway loopback-only; expose a dedicated **gateway-owned bridge** to the LAN/tailnet. - -**iOS App** ⇄ (TLS + pairing) ⇄ **Bridge (in gateway)** ⇄ (loopback) ⇄ **Gateway WS** (`ws://127.0.0.1:18789`) - -Why: -- Preserves current threat model: Gateway remains local-only. -- Centralizes auth, rate limiting, and allowlisting in the bridge. -- Lets us unify “canvas node” semantics across mac + iOS without exposing raw gateway methods. - -### Security plan (internal, but still robust) -#### Transport -- **Current (v0):** bridge is a LAN-facing **TCP** listener with token-based auth after pairing. -- **Next:** wrap the bridge in **TLS** and prefer key-pinned or mTLS-like auth after pairing. - -#### Pairing -- Bonjour discovery shows a candidate “Clawdbot Bridge” on the LAN. -- First connection: - 1) iOS generates a keypair (Secure Enclave if available). - 2) iOS connects to the bridge and requests pairing. - 3) The bridge forwards the pairing request to the **Gateway** as a *pending request*. - 4) Approval can happen via: - - **macOS UI** (Clawdbot shows an alert with Approve/Reject/Later, including the node IP), or - - **Terminal/CLI** (headless flows). - 5) Once approved, the bridge returns a token to iOS; iOS stores it in Keychain. -- Subsequent connections: - - The bridge requires the paired identity. Unpaired clients get a structured “not paired” error and no access. - -##### Gateway-owned pairing (Option B details) -Pairing decisions must be owned by the Gateway (`clawd` / Node) so nodes can be approved without the macOS app running. - -Key idea: -- The Swift app may still show an alert, but it is only a **frontend** for pending requests stored in the Gateway. - -Desired behavior: -- If the Swift UI is present: show alert with Approve/Reject/Later. -- If the Swift UI is not present: `clawdbot` CLI can list pending requests and approve/reject. - -See [`docs/gateway/pairing.md`](/gateway/pairing) for the API/events and storage. - -CLI (headless approvals): -- `clawdbot nodes pending` -- `clawdbot nodes approve ` -- `clawdbot nodes reject ` - -#### Authorization / scope control (bridge-side ACL) -The bridge must not be a raw proxy to every gateway method. - -- Allow by default: - - `agent` (with guardrails; idempotency required) - - minimal `system-event` beacons (presence updates for the node) - - node/canvas methods defined below (new protocol surface) -- Deny by default: - - anything that widens control without explicit intent (future “shell”, “files”, etc.) -- Rate limit: - - handshake attempts - - voice forwards per minute - - snapshot frequency / payload size - -### Protocol unification: add “node/canvas” to Gateway protocol -#### Principle -Unify mac Canvas + iOS Canvas under a single conceptual surface: -- The agent talks to the Gateway using a stable method set (typed protocol). -- The Gateway routes node-targeted requests to: - - local mac Canvas implementation, or - - remote iOS node via the bridge - -#### Minimal protocol additions (v1) -Add to [`src/gateway/protocol/schema.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/protocol/schema.ts) (and regenerate Swift models): - -**Identity** -- Node identity comes from `connect.params.client.instanceId` (stable), and `connect.params.client.mode = "node"` (or `"ios-node"`). - -**Methods** -- `node.list` → list paired/connected nodes + capabilities -- `node.describe` → describe a node (capabilities + supported `node.invoke` commands) -- `node.invoke` → send a command to a specific node - - Params: `{ nodeId, command, params?, timeoutMs? }` - -**Events** -- `node.event` → async node status/errors - - e.g. background/foreground transitions, voice availability, canvas availability - -#### Node command set (canvas) -These are values for `node.invoke.command`: -- `canvas.present` / `canvas.hide` -- `canvas.navigate` with `{ url }` (loads a URL; use `""` or `"/"` to return to the default scaffold) -- `canvas.eval` with `{ javaScript }` -- `canvas.snapshot` with `{ maxWidth?, quality?, format? }` -- A2UI (mobile + macOS canvas): - - `canvas.a2ui.push` with `{ messages: [...] }` (A2UI v0.8 server→client messages) - - `canvas.a2ui.pushJSONL` with `{ jsonl: "..." }` (legacy alias) - - `canvas.a2ui.reset` - - A2UI is hosted by the Gateway canvas host (`/__clawdbot__/a2ui/`) on `canvasHost.port`. Commands fail if the host is unreachable. - -Result pattern: -- Request is a standard `req/res` with `ok` / `error`. -- Long operations (loads, streaming drawing, etc.) may also emit `node.event` progress. - -##### Current (implemented) -As of 2025-12-13, the Gateway supports `node.invoke` for bridge-connected nodes. - -Example: draw a diagonal line on the iOS Canvas: -```bash -clawdbot nodes invoke --node ios-node --command canvas.eval --params '{"javaScript":"(() => { const {ctx} = window.__clawdbot; ctx.clearRect(0,0,innerWidth,innerHeight); ctx.lineWidth=6; ctx.strokeStyle=\"#ff2d55\"; ctx.beginPath(); ctx.moveTo(40,40); ctx.lineTo(innerWidth-40, innerHeight-40); ctx.stroke(); return \"ok\"; })()"}' -``` - -### Background behavior requirement -When iOS is backgrounded: -- Voice may still be active (subject to iOS suspension). -- **All `canvas.*` commands must fail** with a stable error code, e.g.: - - `NODE_BACKGROUND_UNAVAILABLE` - - Include `retryable: true` and `retryAfterMs` if we want the agent to wait. - -## iOS app architecture (SwiftUI) -### App structure -- Single fullscreen Canvas surface (WKWebView). -- One settings entry point: a **gear button** that opens a settings sheet. -- All navigation is **agent-driven** (no local URL bar). - -### Components -- `BridgeDiscovery`: Bonjour browse + resolve (Network.framework `NWBrowser`) -- `BridgeConnection`: TCP session + pairing handshake + reconnect (TLS planned) -- `NodeRuntime`: - - Voice pipeline (wake-word + capture + forward) - - Canvas pipeline (WKWebView controller + snapshot + eval) - - Background state tracking; enforces “canvas unavailable in background” - -### Voice in background (internal) -- Enable background audio mode (and required session configuration) so the mic pipeline can keep running when the user switches apps. -- If iOS suspends the app anyway, surface a clear node status (`node.event`) so operators can see voice is unavailable. - -## Code sharing (macOS + iOS) -Create/expand SwiftPM targets so both apps share: -- `ClawdbotProtocol` (generated models; platform-neutral) -- `ClawdbotGatewayClient` (shared WS framing + connect/req/res + seq-gap handling) -- `ClawdbotKit` (node/canvas command types + deep links + shared utilities) - -macOS continues to own: -- local Canvas implementation details (custom scheme handler serving on-disk HTML, window/panel presentation) - -iOS owns: -- iOS-specific audio/speech + WKWebView presentation and lifecycle - -## Repo layout -- iOS app: `apps/ios/` (XcodeGen `project.yml`) -- Shared Swift packages: `apps/shared/` -- Lint/format: iOS target runs `swiftformat --lint` + `swiftlint lint` using repo configs (`.swiftformat`, `.swiftlint.yml`). - -Generate the Xcode project: -```bash -cd apps/ios -xcodegen generate -open Clawdbot.xcodeproj -``` - -## Storage plan (private by default) -### iOS -- Canvas/workspace files (persistent, private): - - `Application Support/Clawdbot/canvas//...` -- Snapshots / temp exports (evictable): - - `Library/Caches/Clawdbot/canvas-snapshots//...` -- Credentials: - - Keychain (paired identity + bridge trust anchor) +- `NODE_BACKGROUND_UNAVAILABLE`: bring the iOS app to the foreground (canvas/camera/screen commands require it). +- `A2UI_HOST_NOT_CONFIGURED`: the Gateway did not advertise a canvas host URL; check `canvasHost` in [`docs/configuration.md`](/gateway/configuration). +- Pairing prompt never appears: run `clawdbot nodes pending` and approve manually. +- Reconnect fails after reinstall: the Keychain pairing token was cleared; re-pair the node. ## Related docs -- [`docs/gateway.md`](/gateway) (gateway runbook) -- [`docs/gateway/pairing.md`](/gateway/pairing) (approval + storage) -- [`docs/bonjour.md`](/gateway/bonjour) (discovery debugging) -- [`docs/discovery.md`](/gateway/discovery) (LAN vs tailnet vs SSH) +- [Pairing](/gateway/pairing) +- [Discovery](/gateway/discovery) +- [Bonjour](/gateway/bonjour) diff --git a/docs/platforms/mac/bun.md b/docs/platforms/mac/bun.md index 3b08a4ad7..9613fc711 100644 --- a/docs/platforms/mac/bun.md +++ b/docs/platforms/mac/bun.md @@ -15,7 +15,7 @@ Goal: ship **Clawdbot.app** with a self-contained relay binary that can run both App bundle layout: - `Clawdbot.app/Contents/Resources/Relay/clawdbot` - - bun `--compile` relay executable built from [`dist/macos/relay.js`](https://github.com/clawdbot/clawdbot/blob/main/dist/macos/relay.js) + - bun `--compile` relay executable built from `dist/macos/relay.js` - Supports: - `clawdbot …` (CLI) - `clawdbot gateway …` (LaunchAgent daemon) @@ -47,7 +47,7 @@ Important bundler flags: Version injection: - `--define "__CLAWDBOT_VERSION__=\"\""` -- [`src/version.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/version.ts) also supports `__CLAWDBOT_VERSION__` (and `CLAWDBOT_BUNDLED_VERSION`) so `--version` doesn’t depend on reading `package.json` at runtime. +- The relay honors `__CLAWDBOT_VERSION__` / `CLAWDBOT_BUNDLED_VERSION` so `--version` doesn’t depend on reading `package.json` at runtime. ## Launchd (Gateway as LaunchAgent) @@ -58,7 +58,7 @@ Plist location (per-user): - `~/Library/LaunchAgents/com.clawdbot.gateway.plist` Manager: -- [`apps/macos/Sources/Clawdbot/GatewayLaunchAgentManager.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/GatewayLaunchAgentManager.swift) +- The macOS app owns LaunchAgent install/update for the bundled gateway. Behavior: - “Clawdbot Active” enables/disables the LaunchAgent. @@ -79,7 +79,7 @@ Symptom (when mis-signed): Fix: - The bun executable needs JIT-ish permissions under hardened runtime. -- [`scripts/codesign-mac-app.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/codesign-mac-app.sh) signs `Relay/clawdbot` with: +- `scripts/codesign-mac-app.sh` signs `Relay/clawdbot` with: - `com.apple.security.cs.allow-jit` - `com.apple.security.cs.allow-unsigned-executable-memory` @@ -89,18 +89,14 @@ Problem: - bun can’t load some native Node addons like `sharp` (and we don’t want to ship native addon trees for the gateway). Solution: -- Central helper [`src/media/image-ops.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/media/image-ops.ts) - - Prefers `/usr/bin/sips` on macOS (esp. when running under bun) - - Falls back to `sharp` when available (Node/dev) -- Used by: - - [`src/web/media.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/web/media.ts) (optimize inbound/outbound images) - - [`src/browser/screenshot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/browser/screenshot.ts) - - [`src/agents/pi-tools.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-tools.ts) (image sanitization) +- Image operations prefer `/usr/bin/sips` on macOS (especially under bun). +- When running in Node/dev, `sharp` is used when available. +- This affects inbound/outbound media, screenshots, and tool image sanitization. ## Browser control server -The Gateway starts the browser control server (loopback only) from [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts). -It’s started from the relay daemon process, so the relay binary includes Playwright deps. +The Gateway starts the browser control server (loopback only) from the relay daemon process, +so the relay binary includes Playwright deps. ## Tests / smoke checks @@ -127,7 +123,7 @@ Bun may leave dotfiles like `*.bun-build` in the repo root or subfolders. ## DMG styling (human installer) -[`scripts/create-dmg.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/create-dmg.sh) styles the DMG via Finder AppleScript. +`scripts/create-dmg.sh` styles the DMG via Finder AppleScript. Rules of thumb: - Use a **72dpi** background image that matches the Finder window size in points. diff --git a/docs/platforms/mac/canvas.md b/docs/platforms/mac/canvas.md index 5e9eff473..f706292b8 100644 --- a/docs/platforms/mac/canvas.md +++ b/docs/platforms/mac/canvas.md @@ -5,157 +5,117 @@ read_when: - Adding agent controls for visual workspace - Debugging WKWebView canvas loads --- - # Canvas (macOS app) -Status: draft spec · Date: 2025-12-12 +The macOS app embeds an agent‑controlled **Canvas panel** using `WKWebView`. It +is a lightweight visual workspace for HTML/CSS/JS, A2UI, and small interactive +UI surfaces. -Note: for iOS/Android nodes that should render agent-edited HTML/CSS/JS over the network, prefer the Gateway `canvasHost` (serves `~/clawd/canvas` over LAN/tailnet with live reload). A2UI is also **hosted by the Gateway** over HTTP. This doc focuses on the macOS in-app canvas panel. See [`docs/configuration.md`](/gateway/configuration). +## Where Canvas lives -Clawdbot can embed an agent-controlled “visual workspace” panel (“Canvas”) inside the macOS app using `WKWebView`, served via a **custom URL scheme** (no loopback HTTP port required). +Canvas state is stored under Application Support: -This is designed for: -- Agent-written HTML/CSS/JS on disk (per-session directory). -- A real browser engine for layout, rendering, and basic interactivity. -- Agent-driven visibility (show/hide), navigation, DOM/JS queries, and snapshots. -- Minimal chrome: borderless panel; bezel/chrome appears only on hover. +- `~/Library/Application Support/Clawdbot/canvas//...` -## Why a custom scheme (vs. loopback HTTP) +The Canvas panel serves those files via a **custom URL scheme**: -Using `WKURLSchemeHandler` keeps Canvas entirely in-process: -- No port conflicts and no extra local server lifecycle. -- Easier to sandbox: only serve files we explicitly map. -- Works offline and can use an ephemeral data store (no persistent cookies/cache). - -If a Canvas page truly needs “real web” semantics (CORS, fetch to loopback endpoints, service workers), consider the loopback-server variant instead (out of scope for this doc). - -## URL ↔ directory mapping - -The Canvas scheme is: - `clawdbot-canvas:///` -Routing model: -- `clawdbot-canvas://main/` → `/main/index.html` (or `index.htm`) -- `clawdbot-canvas://main/yolo` → `/main/yolo/index.html` (or `index.htm`) +Examples: +- `clawdbot-canvas://main/` → `/main/index.html` - `clawdbot-canvas://main/assets/app.css` → `/main/assets/app.css` +- `clawdbot-canvas://main/widgets/todo/` → `/main/widgets/todo/index.html` -Directory listings are not served. +If no `index.html` exists at the root, the app shows a **built‑in scaffold page**. -When `/` has no `index.html` yet, the handler serves a **built-in scaffold page** (bundled with the macOS app). -This is a visual placeholder only (no A2UI renderer). +## Panel behavior -### Suggested on-disk location +- Borderless, resizable panel anchored near the menu bar (or mouse cursor). +- Remembers size/position per session. +- Auto‑reloads when local canvas files change. +- Only one Canvas panel is visible at a time (session is switched as needed). -Store Canvas state under the app support directory: -- `~/Library/Application Support/Clawdbot/canvas//…` +Canvas can be disabled from Settings → **Allow Canvas**. When disabled, canvas +node commands return `CANVAS_DISABLED`. -This keeps it alongside other app-owned state and avoids mixing with `~/.clawdbot/` gateway config. +## Agent API surface -## Panel behavior (agent-controlled) +Canvas is exposed via the **node bridge**, so the agent can: -Canvas is presented as a borderless `NSPanel` (similar to the existing WebChat panel): -- Can be shown/hidden at any time by the agent. -- Supports an “anchored” presentation (near the menu bar icon or another anchor rect). -- Uses a rounded container; shadow stays on, but **chrome/bezel only appears on hover**. -- Default position is the **top-right corner** of the current screen’s visible frame (unless the user moved/resized it previously). -- The panel is **user-resizable** (edge resize + hover resize handle) and the last frame is persisted per session. +- show/hide the panel +- navigate to a path or URL +- evaluate JavaScript +- capture a snapshot image -### Hover-only chrome +CLI examples: -Implementation notes: -- Keep the window borderless at all times (don’t toggle `styleMask`). -- Add an overlay view inside the content container for chrome (stroke + subtle gradient/material). -- Use an `NSTrackingArea` to fade the chrome in/out on `mouseEntered/mouseExited`. -- Optionally show close/drag affordances only while hovered. +```bash +clawdbot nodes canvas present --node +clawdbot nodes canvas navigate --node --url "/" +clawdbot nodes canvas eval --node --js "document.title" +clawdbot nodes canvas snapshot --node +``` -## Agent API surface (current) +Notes: +- `canvas.navigate` accepts **local canvas paths**, `http(s)` URLs, and `file://` URLs. +- If you pass `"/"`, the Canvas shows the local scaffold or `index.html`. -Canvas is exposed via the Gateway **node bridge**, so the agent can: -- Show/hide the panel. -- Navigate to a path (relative to the session root). -- Evaluate JavaScript and optionally return results. -- Query/modify DOM (helpers mirroring “dom query/all/attr/click/type/wait” patterns). -- Capture a snapshot image of the current canvas view. -- Optionally set panel placement (screen `x/y` + `width/height`) when showing/navigating. +## A2UI in Canvas -This should be modeled after `WebChatManager`/`WebChatSwiftUIWindowController` but targeting `clawdbot-canvas://…` URLs. +A2UI is hosted by the Gateway canvas host and rendered inside the Canvas panel. +When the Gateway advertises a Canvas host, the macOS app auto‑navigates to the +A2UI host page on first open. -Related: -- For “invoke the agent again from UI” flows, prefer the macOS deep link scheme (`clawdbot://agent?...`) so *any* UI surface (Canvas, WebChat, native views) can trigger a new agent run. See [`docs/macos.md`](/platforms/macos). - -## Agent commands (current) - -Use the main `clawdbot` CLI; it invokes canvas commands via `node.invoke`. - -- `clawdbot nodes canvas present --node [--target <...>] [--x/--y/--width/--height]` - - Local targets map into the session directory via the custom scheme (directory targets resolve `index.html|index.htm`). - - If `/` has no index file, Canvas shows the built-in scaffold page and returns `status: "welcome"`. -- `clawdbot nodes canvas hide --node ` -- `clawdbot nodes canvas eval --js --node ` -- `clawdbot nodes canvas snapshot --node ` - -### Canvas A2UI - -Canvas A2UI is hosted by the **Gateway canvas host** at: +Default A2UI host URL: ``` http://:18793/__clawdbot__/a2ui/ ``` -The macOS app simply renders that page in the Canvas panel. The agent can drive it with JSONL **server→client protocol messages** (one JSON object per line): +### A2UI commands (v0.8) -- `clawdbot nodes canvas a2ui push --jsonl --node ` -- `clawdbot nodes canvas a2ui reset --node ` +Canvas currently accepts **A2UI v0.8** server→client messages: -`push` expects a JSONL file where **each line is a single JSON object** (parsed and forwarded to the in-page A2UI renderer). +- `beginRendering` +- `surfaceUpdate` +- `dataModelUpdate` +- `deleteSurface` -Minimal example (v0.8): +`createSurface` (v0.9) is not supported. + +CLI example: ```bash -cat > /tmp/a2ui-v0.8.jsonl <<'EOF' -{"surfaceUpdate":{"surfaceId":"main","components":[{"id":"root","component":{"Column":{"children":{"explicitList":["title","content"]}}}},{"id":"title","component":{"Text":{"text":{"literalString":"Canvas (A2UI v0.8)"},"usageHint":"h1"}}},{"id":"content","component":{"Text":{"text":{"literalString":"If you can read this, `nodes canvas a2ui push` works."},"usageHint":"body"}}}]}} +cat > /tmp/a2ui-v0.8.jsonl <<'EOFA2' +{"surfaceUpdate":{"surfaceId":"main","components":[{"id":"root","component":{"Column":{"children":{"explicitList":["title","content"]}}}},{"id":"title","component":{"Text":{"text":{"literalString":"Canvas (A2UI v0.8)"},"usageHint":"h1"}}},{"id":"content","component":{"Text":{"text":{"literalString":"If you can read this, A2UI push works."},"usageHint":"body"}}}]}} {"beginRendering":{"surfaceId":"main","root":"root"}} -EOF +EOFA2 clawdbot nodes canvas a2ui push --jsonl /tmp/a2ui-v0.8.jsonl --node ``` -Notes: -- This does **not** support the A2UI v0.9 examples using `createSurface`. -- A2UI **fails** if the Gateway canvas host is unreachable (no local fallback). -- `nodes canvas a2ui push` validates JSONL (line numbers on errors) and rejects v0.9 payloads. -- Quick smoke: `clawdbot nodes canvas a2ui push --node --text "Hello from A2UI"` renders a minimal v0.8 view. +Quick smoke: -## Triggering agent runs from Canvas (deep links) +```bash +clawdbot nodes canvas a2ui push --node --text "Hello from A2UI" +``` + +## Triggering agent runs from Canvas + +Canvas can trigger new agent runs via deep links: -Canvas can trigger new agent runs via the macOS app deep-link scheme: - `clawdbot://agent?...` -This is intentionally separate from `clawdbot-canvas://…` (which is only for serving local Canvas files into the `WKWebView`). +Example (in JS): -Suggested patterns: -- HTML: render links/buttons that navigate to `clawdbot://agent?message=...`. -- JS: set `window.location.href = 'clawdbot://agent?...'` for “run this now” actions. +```js +window.location.href = "clawdbot://agent?message=Review%20this%20design"; +``` -Implementation note (important): -- In `WKWebView`, intercept `clawdbot://…` navigations in `WKNavigationDelegate` and forward them to the app, e.g. by calling `DeepLinkHandler.shared.handle(url:)` and returning `.cancel` for the navigation. +The app prompts for confirmation unless a valid key is provided. -Safety: -- Deep links (`clawdbot://agent?...`) are always enabled. -- Without a `key` query param, the app will prompt for confirmation before invoking the agent. -- With a valid `key`, the run is unattended (no prompt). For Canvas-originated actions, the app injects an internal key automatically. +## Security notes -## Security / guardrails - -Recommended defaults: -- `WKWebsiteDataStore.nonPersistent()` for Canvas (ephemeral). -- Navigation policy: allow only `clawdbot-canvas://…` (and optionally `about:blank`); open `http/https` externally. -- Scheme handler must prevent directory traversal: resolved file paths must stay under `//`. -- Disable or tightly scope any JS bridge; prefer query-string/bootstrap config over `window.webkit.messageHandlers` for sensitive data. - -## Debugging - -Suggested debugging hooks: -- Enable Web Inspector for Canvas builds (same approach as WebChat). -- Log scheme requests + resolution decisions to OSLog (subsystem `com.clawdbot`, category `Canvas`). -- Provide a “copy canvas dir” action in debug settings to quickly reveal the session directory in Finder. +- Canvas scheme blocks directory traversal; files must live under the session root. +- Local Canvas content uses a custom scheme (no loopback server required). +- External `http(s)` URLs are allowed only when explicitly navigated. diff --git a/docs/platforms/mac/child-process.md b/docs/platforms/mac/child-process.md index 37f0f7363..12836f8c9 100644 --- a/docs/platforms/mac/child-process.md +++ b/docs/platforms/mac/child-process.md @@ -1,72 +1,56 @@ --- -summary: "Running the gateway as a child process of the macOS app and why" +summary: "Gateway lifecycle on macOS (launchd + attach-only)" read_when: - Integrating the mac app with the gateway lifecycle --- -# Clawdbot gateway as a child process of the macOS app +# Gateway lifecycle on macOS -Date: 2025-12-06 · Status: draft · Owner: steipete +The macOS app **manages the Gateway via launchd** by default. This gives you +reliable auto‑start at login and restart on crashes. -Note (2025-12-19): the current implementation prefers a **launchd LaunchAgent** that runs the **bundled bun-compiled gateway**. This doc remains as an alternative mode for tighter coupling to the UI. +Child‑process mode (Gateway spawned directly by the app) is **not in use** today. +If you need tighter coupling to the UI, use **Attach‑only** and run the Gateway +manually in a terminal. -## Goal -Run the Node-based Clawdbot/clawdbot gateway as a direct child of the LSUIElement app (instead of a launchd agent) while keeping all TCC-sensitive work inside the Swift app/broker layer and wiring the existing “Clawdbot Active” toggle to start/stop the child. +## Default behavior (launchd) -## When to prefer the child-process mode -- You want gateway lifetime strictly coupled to the menu-bar app (dies when the app quits) and controlled by the “Clawdbot Active” toggle without touching launchd. -- You’re okay giving up login persistence/auto-restart that launchd provides, or you’ll add your own backoff loop. -- You want simpler log capture and supervision inside the app (no external plist or user-visible LaunchAgent). +- The app installs a per‑user LaunchAgent labeled `com.clawdbot.gateway`. +- When Local mode is enabled, the app ensures the LaunchAgent is loaded and + starts the Gateway if needed. +- Logs are written to the launchd gateway log path (visible in Debug Settings). -## Tradeoffs vs. launchd -- **Pros:** tighter coupling to UI state; simpler surface (no plist install/bootout); easier to stream stdout/stderr; fewer moving parts for beta users. -- **Cons:** no built-in KeepAlive/login auto-start; app crash kills gateway; you must build your own restart/backoff; Activity Monitor will show both processes under the app; still need correct TCC handling (see below). -- **TCC:** behaviorally, child processes often inherit the parent app’s “responsible process” for TCC, but this is *not a contract*. Continue to route all protected actions through the Swift app/broker so prompts stay tied to the signed app bundle. +Common commands: -## TCC guardrails (must keep) -- Screen Recording, Accessibility, mic, and speech prompts must originate from the signed Swift app/broker. The Node child should never call these APIs directly; route through the app’s node commands (via Gateway `node.invoke`) for: - - `system.notify` - - `system.run` (including `needsScreenRecording`) - - `screen.record` / `camera.*` - - PeekabooBridge UI automation (`peekaboo …`) -- Usage strings (`NSMicrophoneUsageDescription`, `NSSpeechRecognitionUsageDescription`, etc.) stay in the app target’s Info.plist; a bare Node binary has none and would fail. -- If you ever embed Node that *must* touch TCC, wrap that call in a tiny signed helper target inside the app bundle and have Node exec that helper instead of calling the API directly. +```bash +launchctl kickstart -k gui/$UID/com.clawdbot.gateway +launchctl bootout gui/$UID/com.clawdbot.gateway +``` -## Process manager design (Swift Subprocess) -- Add a small `GatewayProcessManager` (Swift) that owns: - - `execution: Execution?` from `Swift Subprocess` to track the child. - - `start(config)` called when “Clawdbot Active” flips ON: - - binary: host Node running the bundled gateway under `Clawdbot.app/Contents/Resources/Gateway/` - - args: current clawdbot entrypoint and flags - - cwd/env: point to `~/.clawdbot` as today; inject the expanded PATH so Homebrew Node resolves under launchd - - output: stream stdout/stderr to `/tmp/clawdbot-gateway.log` (cap buffer via Subprocess OutputLimits) - - restart: optional linear/backoff restart if exit was non-zero and Active is still true - - `stop()` called when Active flips OFF or app terminates: cancel the execution and `waitUntilExit`. -- Wire SwiftUI toggle: -- ON: `GatewayProcessManager.start(...)` -- OFF: `GatewayProcessManager.stop()` (no launchctl calls in this mode) -- Keep the existing `LaunchdManager` around so we can switch back if needed; the toggle can choose between launchd or child mode with a flag if we want both. +## Attach‑only (developer mode) -## Packaging and signing -- Bundle the gateway payload (dist + production node_modules) under `Contents/Resources/Gateway/`; rely on host Node ≥22 instead of embedding a runtime. -- Codesign native addons and dylibs inside the bundle; no nested runtime binary to sign now. -- Host runtime should not call TCC APIs directly; keep privileged work inside the app/broker. +Attach‑only tells the app to **connect to an existing Gateway** without spawning +one. This is ideal for local dev (hot‑reload, custom flags). -## Logging and observability -- Stream child stdout/stderr to `/tmp/clawdbot-gateway.log`; surface the last N lines in the Debug tab. -- Emit a user notification (via existing NotificationManager) on crash/exit while Active is true. -- Add a lightweight heartbeat from Node → app (e.g., ping over stdout) so the app can show status in the menu. +Steps: -## Failure/edge cases -- App crash/quit kills the gateway. Decide if that is acceptable for the deployment tier; otherwise, stick with launchd for production and keep child-process for dev/experiments. -- If the gateway exits repeatedly, back off (e.g., 1s/2s/5s/10s) and give up after N attempts with a menu warning. -- Respect the existing pause semantics: when paused, the broker should return `ok=false, "clawdbot paused"`; the gateway should avoid calling privileged routes while paused. +1) Start the Gateway yourself: + ```bash + pnpm gateway:watch + ``` +2) In the macOS app: Debug Settings → Gateway → **Attach only**. -## Open questions / follow-ups -- Do we need dual-mode (launchd for prod, child for dev)? If yes, gate via a setting or build flag. -- Embedding a runtime is off the table for now; we rely on host Node for size/simplicity. Revisit only if host PATH drift becomes painful. -- Do we want a tiny signed helper for rare TCC actions that cannot be brokered via the Swift app/broker? +The UI should show “Using existing gateway …” once connected. -## Decision snapshot (current recommendation) -- Keep all TCC surfaces in the Swift app/broker (node commands + PeekabooBridgeHost). -- Implement `GatewayProcessManager` with Swift Subprocess to start/stop the gateway on the “Clawdbot Active” toggle. -- Maintain the launchd path as a fallback for uptime/login persistence until child-mode proves stable. +## Remote mode + +Remote mode never starts a local Gateway. The app uses an SSH tunnel to the +remote host and connects over that tunnel. + +## Why we prefer launchd + +- Auto‑start at login. +- Built‑in restart/KeepAlive semantics. +- Predictable logs and supervision. + +If a true child‑process mode is ever needed again, it should be documented as a +separate, explicit dev‑only mode. diff --git a/docs/platforms/mac/peekaboo.md b/docs/platforms/mac/peekaboo.md index 50511bea6..bfd9017f5 100644 --- a/docs/platforms/mac/peekaboo.md +++ b/docs/platforms/mac/peekaboo.md @@ -1,170 +1,62 @@ --- -summary: "Plan for integrating Peekaboo automation into Clawdbot via PeekabooBridge (socket-based TCC broker)" +summary: "PeekabooBridge integration for macOS UI automation" read_when: - Hosting PeekabooBridge in Clawdbot.app - Integrating Peekaboo as a submodule - Changing PeekabooBridge protocol/paths --- -# Peekaboo Bridge in Clawdbot (macOS UI automation broker) +# Peekaboo Bridge (macOS UI automation) -## TL;DR -- **Peekaboo removed its XPC helper** and now exposes privileged automation via a **UNIX domain socket bridge** (`PeekabooBridge` / `PeekabooBridgeHost`, socket name `bridge.sock`). -- Clawdbot integrates by **optionally hosting the same bridge** inside **Clawdbot.app** (user-toggleable). The primary client is the **`peekaboo` CLI** (installed via npm); Clawdbot does not need its own `ui …` CLI surface. -- For **visualizations**, we keep them in **Peekaboo.app** (best UX); Clawdbot stays a thin broker host. No visualizer toggle in Clawdbot. +Clawdbot can host **PeekabooBridge** as a local, permission‑aware UI automation +broker. This lets the `peekaboo` CLI drive UI automation while reusing the +macOS app’s TCC permissions. -Non-goals: -- No auto-launching Peekaboo.app. -- No onboarding deep links from the automation endpoint (Clawdbot onboarding already handles permissions). -- No AI provider/agent runtime dependencies in Clawdbot (avoid pulling Tachikoma/MCP into the Clawdbot app/CLI). +## What this is (and isn’t) -## Big refactor (Dec 2025): XPC → Bridge -Peekaboo’s privileged execution moved from “CLI → XPC helper” to “CLI → socket bridge host”. For Clawdbot this is a win: -- It matches the existing “local socket + codesign checks” approach. -- It lets us piggyback on **either** Peekaboo.app’s permissions **or** Clawdbot.app’s permissions (whichever is running). -- It avoids “two apps with two TCC bubbles” unless needed. +- **Host**: Clawdbot.app can act as a PeekabooBridge host. +- **Client**: use the `peekaboo` CLI (no separate `clawdbot ui ...` surface). +- **UI**: visual overlays stay in Peekaboo.app; Clawdbot is a thin broker host. -Reference (Peekaboo submodule): `Peekaboo/docs/bridge-host.md`. +## Enable the bridge -## Architecture -### Processes -- **Bridge hosts** (provide TCC-backed automation): - - **Peekaboo.app** (preferred; also provides visualizations + controls) - - **Claude.app** (secondary; lets `peekaboo` reuse Claude Desktop’s granted permissions) - - **Clawdbot.app** (secondary; “thin host” only) -- **Bridge clients** (trigger single actions): - - `peekaboo …` (preferred; humans + agents) - - Optional: Clawdbot/Node shells out to `peekaboo` when it needs UI automation/capture +In the macOS app: +- Settings → **Enable Peekaboo Bridge** -### Host discovery (client-side) -Order is deliberate: -1. Peekaboo.app host (full UX) -2. Claude.app host (piggyback on Claude Desktop permissions) -3. Clawdbot.app host (piggyback on Clawdbot permissions) +When enabled, Clawdbot starts a local UNIX socket server. If disabled, the host +is stopped and `peekaboo` will fall back to other available hosts. -Socket paths (convention; exact paths must match Peekaboo): -- Peekaboo: `~/Library/Application Support/Peekaboo/bridge.sock` -- Claude: `~/Library/Application Support/Claude/bridge.sock` -- Clawdbot: `~/Library/Application Support/clawdbot/bridge.sock` +## Client discovery order -No auto-launch: if a host isn’t reachable, the command fails with a clear error (start Peekaboo.app, Claude.app, or Clawdbot.app). +Peekaboo clients typically try hosts in this order: -Override (debugging): set `PEEKABOO_BRIDGE_SOCKET=/path/to/bridge.sock`. +1. Peekaboo.app (full UX) +2. Claude.app (if installed) +3. Clawdbot.app (thin broker) -### Protocol shape -- **Single request per connection**: connect → write one JSON request → half-close → read one JSON response → close. -- **Timeout**: 10 seconds end-to-end per action (client enforced; host should also enforce per-operation). -- **Errors**: human-readable string by default; structured envelope in `--json`. +Use `peekaboo bridge status --verbose` to see which host is active and which +socket path is in use. You can override with: -## Dependency strategy (submodule) -Integrate Peekaboo via git submodule (nested submodules are OK). +```bash +export PEEKABOO_BRIDGE_SOCKET=/path/to/bridge.sock +``` -Path in Clawdbot repo: -- `./Peekaboo` (Swabble-style; keep stable so SwiftPM path deps don’t churn). +## Security & permissions -What Clawdbot should use: -- **Client side**: `PeekabooBridge` (socket client + protocol models). -- **Host side (Clawdbot.app)**: `PeekabooBridgeHost` + the minimal Peekaboo services needed to implement operations. +- The bridge validates **caller code signatures**; TeamID `Y5PE65HELJ` is + allowed by default (Peekaboo’s signing team), plus the Clawdbot app’s TeamID. +- Requests time out after ~10 seconds. +- If required permissions are missing, the bridge returns a clear error message + rather than launching System Settings. -What Clawdbot should *not* embed: -- **Visualizer UI**: keep it in Peekaboo.app for now (toggle + controls live there). -- **XPC**: don’t reintroduce helper targets; use the bridge. +## Snapshot behavior (automation) -## IPC / CLI surface -### No `clawdbot ui …` -We avoid a parallel “Clawdbot UI automation CLI”. Instead: -- `peekaboo` is the user/agent-facing CLI surface for automation and capture. -- Clawdbot.app can host PeekabooBridge as a **thin TCC broker** so Peekaboo can piggyback on Clawdbot permissions when Peekaboo.app isn’t running. +Snapshots are stored in memory and expire automatically after a short window. +If you need longer retention, re‑capture from the client. -### Diagnostics -Use Peekaboo’s built-in diagnostics to see which host would be used: -- `peekaboo bridge status` -- `peekaboo bridge status --verbose` -- `peekaboo bridge status --json` +## Troubleshooting -### Output format -Peekaboo commands default to human text output. Add `--json` for a structured envelope. - -### Timeouts -Default timeout for UI actions: **10 seconds** end-to-end (client enforced; host should also enforce per-operation). - -## Coordinate model (multi-display) -Requirement: coordinates are **per screen**, not global. - -Standardize for the CLI (agent-friendly): **top-left origin per screen**. - -Proposed request shape: -- Requests accept `screenIndex` + `{x, y}` in that screen’s local coordinate space. -- Clawdbot.app converts to global CG coordinates using `NSScreen.screens[screenIndex].frame.origin`. -- Responses should echo both: - - The resolved `screenIndex` - - The local `{x, y}` and bounds - - Optionally the global `{x, y}` for debugging - -Ordering: use `NSScreen.screens` ordering consistently (documented in the CLI help + JSON schema). - -## Targeting (per app/window) -Expose window/app targeting in the UI surface (align with Peekaboo targeting): -- frontmost -- by app name / bundle id -- by window title substring -- by (app, index) - -Peekaboo CLI targeting (agent-friendly): -- `--bundle-id ` for app targeting -- `--window-index ` (0-based) for disambiguating within an app when capturing - -All “see/click/type/scroll/wait” requests should accept a target (default: frontmost). - -## “See” + click packs (Playwright-style) -Behavior stays aligned with Peekaboo: -- `peekaboo see` returns element IDs (e.g. `B1`, `T3`) with bounds/labels. -- Follow-up actions reference those IDs without re-scanning. - -`peekaboo see` should: -- capture (optionally targeted) window/screen -- return a screenshot **file path** (default: temp directory) -- return a list of elements (text or JSON) - -Snapshot lifecycle requirement: -- Host apps are long-lived, so snapshot state should be **in-memory by default**. -- Snapshot scoping: “implicit snapshot” is **per target bundle id** (reuse last snapshot for that app when snapshot id is omitted). - -Practical flow (agent-friendly): -- `peekaboo list apps` / `peekaboo list windows` provide bundle-id context for targeting. -- `peekaboo see --bundle-id X` updates the implicit snapshot for `X`. -- `peekaboo click --bundle-id X --on B1` reuses the most recent snapshot for `X` when `--snapshot-id` is omitted. - -## Visualizer integration -Keep visualizations in **Peekaboo.app** for now. -- Clawdbot hosts the bridge, but does not render overlays. -- Any “visualizer enabled/disabled” setting is controlled in Peekaboo.app. - -## Screenshots (legacy → Peekaboo takeover) -Clawdbot should not grow a separate screenshot CLI surface. - -Migration plan: -- Use `peekaboo capture …` / `peekaboo see …` (returns a file path, default temp directory). -- Once Clawdbot’ legacy screenshot plumbing is replaced, remove it cleanly (no aliases). - -## Permissions behavior -If required permissions are missing: -- return `ok=false` with a short human error message (e.g., “Accessibility permission missing”) -- do not try to open System Settings from the automation endpoint - -## Security (socket auth) -Both hosts must enforce: -- filesystem perms on the socket path (owner read/write only) -- server-side caller validation: - - require the caller’s code signature TeamID to be `Y5PE65HELJ` - - optional bundle-id allowlist for tighter scoping - -Debug-only escape hatch (development convenience): -- “allow same-UID callers” means: *skip codesign checks for clients running under the same Unix user*. -- This must be **opt-in**, **DEBUG-only**, and guarded by an env var (Peekaboo uses `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1`). - -## Next integration steps (after this doc) -1. Add Peekaboo as a git submodule (nested submodules OK). -2. Host `PeekabooBridgeHost` inside Clawdbot.app behind a single setting (“Enable Peekaboo Bridge”, default on). -3. Ensure Clawdbot hosts the bridge at `~/Library/Application Support/clawdbot/bridge.sock` and speaks the PeekabooBridge JSON protocol. -4. Validate with `peekaboo bridge status --verbose` that Peekaboo can select Clawdbot as the fallback host (no auto-launch). -5. Keep all protocol decisions aligned with Peekaboo (coordinate system, element IDs, snapshot scoping, error envelopes). +- If `peekaboo` reports “bridge client is not authorized”, ensure the client is + properly signed or run the host with `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` + in **debug** mode only. +- If no hosts are found, open one of the host apps (Peekaboo.app or Clawdbot.app) + and confirm permissions are granted. diff --git a/docs/platforms/mac/webchat.md b/docs/platforms/mac/webchat.md index a9e0abe95..d3df84c27 100644 --- a/docs/platforms/mac/webchat.md +++ b/docs/platforms/mac/webchat.md @@ -3,25 +3,37 @@ summary: "How the mac app embeds the gateway WebChat and how to debug it" read_when: - Debugging mac WebChat view or loopback port --- -# Web Chat (macOS app) +# WebChat (macOS app) -The macOS menu bar app shows the WebChat UI as a native SwiftUI view and reuses the **primary Clawd session** (`main`, or `global` when scope is global). +The macOS menu bar app embeds the WebChat UI as a native SwiftUI view. It +connects to the Gateway and defaults to the **main session** for the selected +agent (with a session switcher for other sessions). - **Local mode**: connects directly to the local Gateway WebSocket. -- **Remote mode**: forwards the Gateway WebSocket control port over SSH and uses that as the data plane. +- **Remote mode**: forwards the Gateway control port over SSH and uses that + tunnel as the data plane. ## Launch & debugging + - Manual: Lobster menu → “Open Chat”. -- Auto-open for testing: run `dist/Clawdbot.app/Contents/MacOS/Clawdbot --webchat` (or pass `--webchat` to the binary launched by launchd). The window opens on startup. -- Logs: see [`./scripts/clawlog.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/clawlog.sh) (subsystem `com.clawdbot`, category `WebChatSwiftUI`). +- Auto‑open for testing: + ```bash + dist/Clawdbot.app/Contents/MacOS/Clawdbot --webchat + ``` +- Logs: `./scripts/clawlog.sh` (subsystem `com.clawdbot`, category `WebChatSwiftUI`). ## How it’s wired -- Implementation: [`apps/macos/Sources/Clawdbot/WebChatSwiftUI.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/WebChatSwiftUI.swift) hosts `ClawdbotChatUI` and speaks to the Gateway over `GatewayConnection`. -- Data plane: Gateway WebSocket methods `chat.history`, `chat.send`, `chat.abort`; events `chat`, `agent`, `presence`, `tick`, `health`. -- Session: usually primary (`main`); multiple transports (WhatsApp/Telegram/Discord/Desktop) share the same key. The onboarding flow uses a dedicated `onboarding` session to keep first-run setup separate. -## Security / surface area +- Data plane: Gateway WS methods `chat.history`, `chat.send`, `chat.abort` and + events `chat`, `agent`, `presence`, `tick`, `health`. +- Session: defaults to the primary session (`main`, or `global` when scope is + global). The UI can switch between sessions. +- Onboarding uses a dedicated session to keep first‑run setup separate. + +## Security surface + - Remote mode forwards only the Gateway WebSocket control port over SSH. ## Known limitations -- The UI is optimized for the primary session and typical “chat” usage (not a full browser-based sandbox surface). + +- The UI is optimized for chat sessions (not a full browser sandbox). diff --git a/docs/platforms/mac/xpc.md b/docs/platforms/mac/xpc.md index afe2e3977..2bf266fe3 100644 --- a/docs/platforms/mac/xpc.md +++ b/docs/platforms/mac/xpc.md @@ -3,7 +3,7 @@ summary: "macOS IPC architecture for Clawdbot app, gateway node bridge, and Peek read_when: - Editing IPC contracts or menu bar app IPC --- -# Clawdbot macOS IPC architecture (Dec 2025) +# Clawdbot macOS IPC architecture **Current model:** there is **no local control socket** and no `clawdbot-mac` CLI. All agent actions go through the Gateway WebSocket and `node.invoke`. UI automation still uses PeekabooBridge. @@ -21,10 +21,10 @@ read_when: - UI automation uses a separate UNIX socket named `bridge.sock` and the PeekabooBridge JSON protocol. - Host preference order (client-side): Peekaboo.app → Claude.app → Clawdbot.app → local execution. - Security: bridge hosts require TeamID `Y5PE65HELJ`; DEBUG-only same-UID escape hatch is guarded by `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` (Peekaboo convention). -- See: [`docs/mac/peekaboo.md`](/platforms/mac/peekaboo) for the Clawdbot plan and naming. +- See: [`docs/mac/peekaboo.md`](/platforms/mac/peekaboo) for PeekabooBridge usage. -### Mach/XPC (future direction) -- Still optional for internal app services, but **not required** for automation now that node.invoke is the surface. +### Mach/XPC +- Not required for automation; `node.invoke` + PeekabooBridge cover current needs. ## Operational flows - Restart/rebuild: `SIGN_IDENTITY="Apple Development: Peter Steinberger (2ZAC4GM7GD)" scripts/restart-mac.sh` @@ -37,4 +37,4 @@ read_when: - Prefer requiring a TeamID match for all privileged surfaces. - PeekabooBridge: `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` (DEBUG-only) may allow same-UID callers for local development. - All communication remains local-only; no network sockets are exposed. -- TCC prompts originate only from the GUI app bundle; run [`scripts/package-mac-app.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/package-mac-app.sh) so the signed bundle ID stays stable. +- TCC prompts originate only from the GUI app bundle; keep the signed bundle ID stable across rebuilds. diff --git a/docs/platforms/macos.md b/docs/platforms/macos.md index 33b775b9c..121a8dd93 100644 --- a/docs/platforms/macos.md +++ b/docs/platforms/macos.md @@ -1,123 +1,97 @@ --- -summary: "Spec for the Clawdbot macOS companion menu bar app (gateway + node broker)" +summary: "Clawdbot macOS companion app (menu bar + gateway broker)" read_when: - Implementing macOS app features - Changing gateway lifecycle or node bridging on macOS --- # Clawdbot macOS Companion (menu bar + gateway broker) -Author: steipete · Status: draft spec · Date: 2025-12-20 +The macOS app is the **menu‑bar companion** for Clawdbot. It owns permissions, +manages the Gateway locally, and exposes macOS capabilities to the agent as a +node. -## Support snapshot -- Core Gateway: supported (TypeScript on Node/Bun). -- Companion app: macOS menu bar app with permissions + node bridge. -- Install: [Getting Started](/start/getting-started) or [Install & updates](/install/updating). -- Gateway: [Runbook](/gateway) + [Configuration](/gateway/configuration). +## What it does -## System control (launchd) -If you run the bundled macOS app, it installs a per-user LaunchAgent labeled `com.clawdbot.gateway`. -CLI-only installs can use `clawdbot onboard --install-daemon`, `clawdbot daemon install`, or `clawdbot configure` → **Gateway daemon**. +- Shows native notifications and status in the menu bar. +- Owns TCC prompts (Notifications, Accessibility, Screen Recording, Microphone, + Speech Recognition, Automation/AppleScript). +- Runs or connects to the Gateway (local or remote). +- Exposes macOS‑only tools (Canvas, Camera, Screen Recording, `system.run`). +- Optionally hosts **PeekabooBridge** for UI automation. +- Installs a helper CLI (`clawdbot`) into `/usr/local/bin` and + `/opt/homebrew/bin` on request. + +## Local vs remote mode + +- **Local** (default): the app ensures a local Gateway is running via launchd. +- **Remote**: the app connects to a Gateway over SSH/Tailscale and never starts + a local process. +- **Attach‑only** (debug): the app connects to an already‑running local Gateway + and never spawns its own. + +## Launchd control + +The app manages a per‑user LaunchAgent labeled `com.clawdbot.gateway`. ```bash launchctl kickstart -k gui/$UID/com.clawdbot.gateway launchctl bootout gui/$UID/com.clawdbot.gateway ``` -`launchctl` only works if the LaunchAgent is installed; otherwise run `clawdbot daemon install` first. +If the LaunchAgent isn’t installed, enable it from the app or run +`clawdbot daemon install`. -Details: [Gateway runbook](/gateway) and [Bundled bun Gateway](/platforms/mac/bun). +## Node capabilities (mac) -## Purpose -- Single macOS menu-bar app named **Clawdbot** that: - - Shows native notifications for Clawdbot/clawdbot events. - - Owns TCC prompts (Notifications, Accessibility, Screen Recording, Automation/AppleScript, Microphone, Speech Recognition). - - Runs (or connects to) the **Gateway** and exposes itself as a **node** so agents can reach macOS‑only features. - - Hosts **PeekabooBridge** for UI automation (consumed by `peekaboo`; see [`docs/mac/peekaboo.md`](/platforms/mac/peekaboo)). - - Installs a single CLI (`clawdbot`) by symlinking the bundled binary. +The macOS app presents itself as a node. Common commands: -## High-level design -- SwiftPM package in `apps/macos/` (macOS 15+, Swift 6). -- Targets: - - `ClawdbotIPC` (shared Codable types + helpers for app‑internal actions). - - `Clawdbot` (LSUIElement MenuBarExtra app; hosts Gateway + node bridge + PeekabooBridgeHost). -- Bundle ID: `com.clawdbot.mac`. -- Bundled runtime binaries live under `Contents/Resources/Relay/`: - - `clawdbot` (bun‑compiled relay: CLI + gateway) -- The app symlinks `clawdbot` into `/usr/local/bin` and `/opt/homebrew/bin`. - -## Gateway + node bridge -- The mac app runs the Gateway in **local** mode (unless configured remote). -- The gateway port is configurable via `gateway.port` or `CLAWDBOT_GATEWAY_PORT` (default 18789). The mac app reads that value for launchd, probes, and remote SSH tunnels. -- The mac app connects to the bridge as a **node** and advertises capabilities/commands. -- Agent‑facing actions are exposed via `node.invoke` (no local control socket). -- The mac app watches `~/.clawdbot/clawdbot.json` and switches modes live when `gateway.mode` or `gateway.remote.url` changes. -- If `gateway.mode` is unset but `gateway.remote.url` is set, the mac app treats it as remote mode. -- Changing connection mode in the mac app writes `gateway.mode` (and `gateway.remote.url` in remote mode) back to the config file. - -### Node commands (mac) -- Canvas: `canvas.present|navigate|eval|snapshot|a2ui.*` -- Camera: `camera.snap|camera.clip` +- Canvas: `canvas.present`, `canvas.navigate`, `canvas.eval`, `canvas.snapshot`, `canvas.a2ui.*` +- Camera: `camera.snap`, `camera.clip` - Screen: `screen.record` -- System: `system.run` (shell) and `system.notify` +- System: `system.run`, `system.notify` -### Permission advertising -- Nodes include a `permissions` map in hello/pairing. -- The Gateway surfaces it via `node.list` / `node.describe` so agents can decide what to run. +The node reports a `permissions` map so agents can decide what’s allowed. -## CLI (`clawdbot`) -- The **only** CLI is `clawdbot` (TS/bun). There is no `clawdbot-mac` helper. -- For mac‑specific actions, the CLI uses `node.invoke`: - - `clawdbot nodes canvas present|navigate|eval|snapshot|a2ui push|a2ui reset` - - `clawdbot nodes run --node -- ` - - `clawdbot nodes notify --node --title ...` +## Deep links -## Onboarding -- Install CLI (symlink) → Permissions checklist → Test notification → Done. -- Remote mode skips local gateway/CLI steps. -- Selecting Local auto-enables the bundled Gateway via launchd (unless “Attach only” debug mode is enabled). - -## Deep links (URL scheme) - -Clawdbot (the macOS app) registers a URL scheme for triggering local actions from anywhere (browser, Shortcuts, CLI, etc.). - -Scheme: -- `clawdbot://…` +The app registers the `clawdbot://` URL scheme for local actions. ### `clawdbot://agent` -Triggers a Gateway `agent` request (same machinery as WebChat/agent runs). - -Example: +Triggers a Gateway `agent` request. ```bash open 'clawdbot://agent?message=Hello%20from%20deep%20link' ``` Query parameters: -- `message` (required): the agent prompt (URL-encoded). -- `sessionKey` (optional): explicit session key to use. -- `thinking` (optional): thinking hint (e.g. `low`; omit for default). -- `deliver` (optional): `true|false` (default: false). -- `to` / `provider` (optional): forwarded to the Gateway `agent` method (only meaningful with `deliver=true`). -- `timeoutSeconds` (optional): timeout hint forwarded to the Gateway. -- `key` (optional): unattended mode key (see below). +- `message` (required) +- `sessionKey` (optional) +- `thinking` (optional) +- `deliver` / `to` / `provider` (optional) +- `timeoutSeconds` (optional) +- `key` (optional unattended mode key) -Safety/guardrails: -- Always enabled. -- Without a `key` query param, the app will prompt for confirmation before invoking the agent. -- With `key=`, Clawdbot runs without prompting (intended for personal automations). - - The current key is shown in Debug Settings and stored locally in UserDefaults. +Safety: +- Without `key`, the app prompts for confirmation. +- With a valid `key`, the run is unattended (intended for personal automations). -Notes: -- In local mode, Clawdbot will start the local Gateway if needed before issuing the request. -- In remote mode, Clawdbot will use the configured remote tunnel/endpoint. +## Onboarding flow (typical) + +1) Install and launch **Clawdbot.app**. +2) Complete the permissions checklist (TCC prompts). +3) Ensure **Local** mode is active and the Gateway is running. +4) Install the CLI helper if you want terminal access. ## Build & dev workflow (native) -- `cd apps/macos && swift build` (debug) / `swift build -c release`. -- Run app for dev: `swift run Clawdbot` (or Xcode scheme). -- Package app + CLI: [`scripts/package-mac-app.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/package-mac-app.sh) (builds bun CLI + gateway). -- Tests: add Swift Testing suites under `apps/macos/Tests`. -## Open questions / decisions -- Should `system.run` support streaming stdout/stderr or keep buffered responses only? -- Should we allow node‑side permission prompts, or always require explicit app UI action? +- `cd apps/macos && swift build` +- `swift run Clawdbot` (or Xcode) +- Package app + CLI: `scripts/package-mac-app.sh` + +## Related docs + +- [Gateway runbook](/gateway) +- [Bundled bun Gateway](/platforms/mac/bun) +- [macOS permissions](/platforms/mac/permissions) +- [Canvas](/platforms/mac/canvas) diff --git a/docs/providers/slack.md b/docs/providers/slack.md index 9b21a6883..594d377b4 100644 --- a/docs/providers/slack.md +++ b/docs/providers/slack.md @@ -107,18 +107,16 @@ Slack's Conversations API is type-scoped: you only need the scopes for the conversation types you actually touch (channels, groups, im, mpim). See https://api.slack.com/docs/conversations-api for the overview. -### Required by current code +### Required scopes - `chat:write` (send/update/delete messages via `chat.postMessage`) https://api.slack.com/methods/chat.postMessage - `im:write` (open DMs via `conversations.open` for user DMs) https://api.slack.com/methods/conversations.open - `channels:history`, `groups:history`, `im:history`, `mpim:history` - (`conversations.history` in [`src/slack/actions.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/slack/actions.ts)) https://api.slack.com/methods/conversations.history - `channels:read`, `groups:read`, `im:read`, `mpim:read` - (`conversations.info` in [`src/slack/monitor.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/slack/monitor.ts)) https://api.slack.com/methods/conversations.info -- `users:read` (`users.info` in [`src/slack/monitor.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/slack/monitor.ts) + [`src/slack/actions.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/slack/actions.ts)) +- `users:read` (user lookup) https://api.slack.com/methods/users.info - `reactions:read`, `reactions:write` (`reactions.get` / `reactions.add`) https://api.slack.com/methods/reactions.get diff --git a/docs/providers/whatsapp.md b/docs/providers/whatsapp.md index 1530e1b13..5419719ff 100644 --- a/docs/providers/whatsapp.md +++ b/docs/providers/whatsapp.md @@ -188,8 +188,3 @@ Recommended for personal numbers: - Subsystems: `whatsapp/inbound`, `whatsapp/outbound`, `web-heartbeat`, `web-reconnect`. - Log file: `/tmp/clawdbot/clawdbot-YYYY-MM-DD.log` (configurable). - Troubleshooting guide: [`docs/troubleshooting.md`](/gateway/troubleshooting). - -## Tests -- [`src/web/auto-reply.test.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/web/auto-reply.test.ts) (mention gating, history injection, reply flow) -- [`src/web/monitor-inbox.test.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/web/monitor-inbox.test.ts) (inbound parsing + reply context) -- [`src/web/outbound.test.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/web/outbound.test.ts) (send mapping + media) diff --git a/docs/start/hubs.md b/docs/start/hubs.md index 18ddda1b4..1317ed565 100644 --- a/docs/start/hubs.md +++ b/docs/start/hubs.md @@ -125,7 +125,7 @@ Use these hubs to discover every page, including deep dives and reference docs t - [Linux](https://docs.clawd.bot/platforms/linux) - [Web surfaces](https://docs.clawd.bot/web) -## macOS companion app (internals) +## macOS companion app (advanced) - [macOS dev setup](https://docs.clawd.bot/platforms/mac/dev-setup) - [macOS menu bar](https://docs.clawd.bot/platforms/mac/menu-bar) @@ -144,7 +144,7 @@ Use these hubs to discover every page, including deep dives and reference docs t - [macOS bun gateway](https://docs.clawd.bot/platforms/mac/bun) - [macOS XPC](https://docs.clawd.bot/platforms/mac/xpc) - [macOS skills](https://docs.clawd.bot/platforms/mac/skills) -- [macOS Peekaboo plan](https://docs.clawd.bot/platforms/mac/peekaboo) +- [macOS Peekaboo](https://docs.clawd.bot/platforms/mac/peekaboo) ## Workspace + templates @@ -160,13 +160,13 @@ Use these hubs to discover every page, including deep dives and reference docs t - [Templates: TOOLS](https://docs.clawd.bot/reference/templates/TOOLS) - [Templates: USER](https://docs.clawd.bot/reference/templates/USER) -## Experiments + proposals +## Experiments (exploratory) - [Onboarding config protocol](https://docs.clawd.bot/experiments/onboarding-config-protocol) -- [Plan: cron hardening](https://docs.clawd.bot/experiments/plans/cron-add-hardening) -- [Plan: group policy hardening](https://docs.clawd.bot/experiments/plans/group-policy-hardening) +- [Cron hardening notes](https://docs.clawd.bot/experiments/plans/cron-add-hardening) +- [Group policy hardening notes](https://docs.clawd.bot/experiments/plans/group-policy-hardening) - [Research: memory](https://docs.clawd.bot/experiments/research/memory) -- [Proposal: model config](https://docs.clawd.bot/experiments/proposals/model-config) +- [Model config exploration](https://docs.clawd.bot/experiments/proposals/model-config) ## Testing + release diff --git a/docs/start/onboarding.md b/docs/start/onboarding.md index 4ed664933..b996b4036 100644 --- a/docs/start/onboarding.md +++ b/docs/start/onboarding.md @@ -1,207 +1,102 @@ --- -summary: "Planned first-run onboarding flow for Clawdbot (local vs remote, OAuth auth, workspace bootstrap ritual)" +summary: "First-run onboarding flow for Clawdbot (macOS app)" read_when: - Designing the macOS onboarding assistant - - Implementing Anthropic/OpenAI auth or identity setup + - Implementing auth or identity setup --- # Onboarding (macOS app) -This doc describes the intended **first-run onboarding** for Clawdbot. The goal is a good “day 0” experience: pick where the Gateway runs, bind subscription auth (Anthropic or OpenAI) for the embedded agent runtime, and then let the **agent bootstrap itself** via a first-run ritual in the workspace. +This doc describes the **current** first‑run onboarding flow. The goal is a +smooth “day 0” experience: pick where the Gateway runs, connect auth, run the +wizard, and let the agent bootstrap itself. -## Page order (high level) +## Page order (current) -1) **Local vs Remote** -2) **(Local only)** Connect subscription auth (Anthropic / OpenAI OAuth) — optional, but recommended -3) **Connect Gmail (optional)** — run `clawdbot hooks gmail setup` to configure Pub/Sub hooks -4) **Onboarding chat** — dedicated session where the agent introduces itself and guides setup +1) Welcome + security notice +2) **Gateway selection** (Local / Remote / Configure later) +3) **Auth (Anthropic OAuth)** — local only +4) **Setup Wizard** (Gateway‑driven) +5) **Permissions** (TCC prompts) +6) **CLI helper** (optional) +7) **Onboarding chat** (dedicated session) +8) Ready ## 1) Local vs Remote -First question: where does the **Gateway** run? +Where does the **Gateway** run? -- **Local (this Mac):** onboarding can run OAuth flows and write OAuth credentials locally. -- **Remote (over SSH/tailnet):** onboarding must not run OAuth locally, because credentials must exist on the **gateway host**. +- **Local (this Mac):** onboarding can run OAuth flows and write credentials + locally. +- **Remote (over SSH/Tailnet):** onboarding does **not** run OAuth locally; + credentials must exist on the gateway host. +- **Configure later:** skip setup and leave the app unconfigured. Gateway auth tip: -- If you only use Clawdbot on this Mac (loopback gateway), keep auth **Off**. -- Use **Token** for multi-machine access or non-loopback binds. +- If you only use Clawdbot locally (loopback), auth can be **Off**. +- Use a **token** for multi‑machine access or non‑loopback binds. -Implementation note (2025-12-19): in local mode, the macOS app bundles the Gateway and enables it via a per-user launchd LaunchAgent (no global npm install/Node requirement for the user). +## 2) Local-only auth (Anthropic OAuth) -## 2) Local-only: Connect subscription auth (Anthropic / OpenAI OAuth) +The macOS app supports Anthropic OAuth (Claude Pro/Max). The flow: -This is the “bind Clawdbot to subscription auth” step. It is explicitly the **Anthropic (Claude Pro/Max)** or **OpenAI (ChatGPT/Codex)** OAuth flow, not a generic “login”. +- Opens the browser for OAuth (PKCE) +- Asks the user to paste the `code#state` value +- Writes credentials to `~/.clawdbot/credentials/oauth.json` -More detail: [/concepts/oauth](/concepts/oauth) +Other providers (OpenAI, custom APIs) are configured via environment variables +or config files for now. -### Recommended: OAuth (Anthropic) +## 3) Setup Wizard (Gateway‑driven) -The macOS app should: -- Start the Anthropic OAuth (PKCE) flow in the user’s browser. -- Ask the user to paste the `code#state` value. -- Exchange it for tokens and write credentials to: - - `~/.clawdbot/credentials/oauth.json` (file mode `0600`, directory mode `0700`) +The app can run the same setup wizard as the CLI. This keeps onboarding in sync +with Gateway‑side behavior and avoids duplicating logic in SwiftUI. -Why this location matters: it’s the Clawdbot-owned OAuth store. -Clawdbot also imports `oauth.json` into the agent auth profile store (`~/.clawdbot/agents//agent/auth-profiles.json`) on first use. +## 4) Permissions -### Recommended: OAuth (OpenAI Codex) +Onboarding requests TCC permissions needed for: -The macOS app should: -- Start the OpenAI Codex OAuth (PKCE) flow in the user’s browser. -- Auto-capture the callback on `http://127.0.0.1:1455/auth/callback` when possible. -- If the callback fails, prompt the user to paste the redirect URL or code. -- Store credentials in `~/.clawdbot/credentials/oauth.json` (same OAuth store as Anthropic). -- Set `agent.model` to `openai-codex/gpt-5.2` when the model is unset or `openai/*`. +- Notifications +- Accessibility +- Screen Recording +- Microphone / Speech Recognition +- Automation (AppleScript) -### Alternative: API key (instructions only) +## 5) CLI helper (optional) -Offer an “API key” option, but for now it is **instructions only**: -- Get an Anthropic API key. -- Provide it to Clawdbot via your preferred mechanism (env/config). +The app can symlink the bundled `clawdbot` CLI into `/usr/local/bin` and +`/opt/homebrew/bin` so terminal workflows work out of the box. -Note: environment variables are often confusing when the Gateway is launched by a GUI app (launchd environment != your shell). +## 6) Onboarding chat (dedicated session) -### Model safety rule +After setup, the app opens a dedicated onboarding chat session so the agent can +introduce itself and guide next steps. This keeps first‑run guidance separate +from your normal conversation. -Clawdbot should **always pass** `--model` when invoking the embedded agent (don’t rely on defaults). +## Agent bootstrap ritual -Example (CLI): +On the first agent run, Clawdbot bootstraps a workspace (default `~/clawd`): -```bash -clawdbot agent --mode rpc --model anthropic/claude-opus-4-5 "" -``` +- Seeds `AGENTS.md`, `BOOTSTRAP.md`, `IDENTITY.md`, `USER.md` +- Runs a short Q&A ritual (one question at a time) +- Writes identity + preferences to `IDENTITY.md`, `USER.md`, `SOUL.md` +- Removes `BOOTSTRAP.md` when finished so it only runs once -If the user skips auth, onboarding should be clear: the agent likely won’t respond until auth is configured. +## Optional: Gmail hooks (manual) -## 4) Onboarding chat (dedicated session) - -The onboarding flow now embeds the SwiftUI chat view directly. It uses a **special session key** -(`onboarding`) so the “newborn agent” ritual stays separate from the main chat. - -This onboarding chat is where the agent: -- does the BOOTSTRAP.md identity ritual (one question at a time) -- visits **soul.md** with the user and writes `SOUL.md` (values, tone, boundaries) -- asks how the user wants to talk (web-only / Telegram / WhatsApp) -- guides linking steps (including showing a QR inline for WhatsApp via the `whatsapp_login` tool) - -If the workspace bootstrap is already complete (BOOTSTRAP.md removed), the onboarding chat step is skipped. - -## 2.5) Optional: Connect Gmail - -The macOS onboarding includes an optional Gmail step. It runs: +Gmail Pub/Sub setup is currently a manual step. Use: ```bash clawdbot hooks gmail setup --account you@gmail.com ``` -This writes the full `hooks.gmail` config, installs `gcloud` / `gog` / `tailscale` -via Homebrew if needed, and configures the Pub/Sub push endpoint. After setup, -restart the gateway so the internal Gmail watcher starts. +See [/automation/gmail-pubsub](/automation/gmail-pubsub) for details. -Once setup is complete, the user can switch to the normal chat (`main`) via the menu bar panel. +## Remote mode notes -## 5) Agent bootstrap ritual (outside onboarding) +When the Gateway runs on another machine, credentials and workspace files live +**on that host**. If you need OAuth in remote mode, create: -We no longer collect identity in the onboarding wizard. Instead, the **first agent run** performs a playful bootstrap ritual using files in the workspace: +- `~/.clawdbot/credentials/oauth.json` +- `~/.clawdbot/agents//agent/auth-profiles.json` -- Workspace is created implicitly (default `~/clawd`, configurable via `agent.workspace`) when local is selected, - but only if the folder is empty or already contains `AGENTS.md`. -- Files are seeded: `AGENTS.md`, `BOOTSTRAP.md`, `IDENTITY.md`, `USER.md`. -- `BOOTSTRAP.md` tells the agent to keep it conversational: - - open with a cute hello - - ask **one question at a time** (no multi-question bombardment) - - offer a small set of suggestions where helpful (name, creature, emoji) - - wait for the user’s reply before asking the next question -- The agent writes results to: - - `IDENTITY.md` (agent name, vibe/creature, emoji) - - `USER.md` (who the user is + how they want to be addressed) - - `SOUL.md` (identity, tone, boundaries — crafted from the soul.md prompt) - - `~/.clawdbot/clawdbot.json` (structured identity defaults) -- After the ritual, the agent **deletes `BOOTSTRAP.md`** so it only runs once. - -Identity data still feeds the same defaults as before: - -- outbound prefix emoji (`messages.responsePrefix`) -- group mention patterns / wake words -- default session intro (“You are Samantha…”) -- macOS UI labels - -## 6) Workspace notes (no explicit onboarding step) - -The workspace is created automatically as part of agent bootstrap (no dedicated onboarding screen). - -Recommendation: treat the workspace as the agent’s “memory” and make it a git repo (ideally private) so identity + memories are backed up: - -```bash -cd ~/clawd -git init -git add AGENTS.md -git commit -m "Add agent workspace" -``` - -Daily memory lives under `memory/` in the workspace: -- one file per day: `memory/YYYY-MM-DD.md` -- read today + yesterday on session start -- keep it short (durable facts, preferences, decisions; avoid secrets) - -## Remote mode note (why OAuth is hidden) - -If the Gateway runs on another machine, OAuth credentials must be created/stored on that host (where the agent runtime runs). - -For now, remote onboarding should: -- explain why OAuth isn't shown -- point the user at the credential location (`~/.clawdbot/credentials/oauth.json`) and the auth profile store (`~/.clawdbot/agents//agent/auth-profiles.json`) on the gateway host -- mention that the **bootstrap ritual happens on the gateway host** (same BOOTSTRAP/IDENTITY/USER files) - -### Manual credential setup - -On the gateway host, create `~/.clawdbot/credentials/oauth.json` with this exact format: - -```json -{ - "anthropic": { "type": "oauth", "access": "sk-ant-oat01-...", "refresh": "sk-ant-ort01-...", "expires": 1767304352803 }, - "openai-codex": { "type": "oauth", "access": "eyJhbGciOi...", "refresh": "oai-refresh-...", "expires": 1767304352803, "accountId": "acct_..." } -} -``` - -Set permissions: `chmod 600 ~/.clawdbot/credentials/oauth.json` - -**Note:** Clawdbot can import from legacy pi-coding-agent paths (`~/.pi/agent/oauth.json`, etc.), but Claude Code/Codex CLI credentials live in different files. - -### Using Claude Code + Codex CLI credentials (direct) - -If these CLIs are installed on the **gateway host** and you’ve already signed in, Clawdbot auto-syncs their OAuth tokens into the per-agent auth profile store (`~/.clawdbot/agents//agent/auth-profiles.json`) on load: - -- **Claude Code**: reads `~/.claude/.credentials.json` → profile `anthropic:claude-cli` -- **Codex CLI**: reads `~/.codex/auth.json` → profile `openai-codex:codex-cli` - -Verification: - -```bash -clawdbot providers list -``` - -### Fallback: convert Claude Code credentials into `oauth.json` - -If you don’t want to install Claude Code on the gateway host, you can still seed the legacy import file: - -```bash -cat ~/.claude/.credentials.json | jq '{ - anthropic: { - type: "oauth", - access: .claudeAiOauth.accessToken, - refresh: .claudeAiOauth.refreshToken, - expires: .claudeAiOauth.expiresAt - } -}' > ~/.clawdbot/credentials/oauth.json -chmod 600 ~/.clawdbot/credentials/oauth.json -``` - -## Workspace backup (recommended) - -We suggest creating a **private GitHub repository** to back up the agent -workspace. The agent is really good at keeping a git repo in shape, and GitHub -is the perfect place for it. Keep it **private**. - -Setup steps: https://docs.clawd.bot/concepts/agent-workspace +on the gateway host. diff --git a/docs/start/pairing.md b/docs/start/pairing.md index 949e5ed80..35cc9088d 100644 --- a/docs/start/pairing.md +++ b/docs/start/pairing.md @@ -43,10 +43,6 @@ Stored under `~/.clawdbot/credentials/`: Treat these as sensitive (they gate access to your assistant). -### Source of truth (code) - -- DM pairing storage + code generation: [`src/pairing/pairing-store.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/pairing/pairing-store.ts) -- CLI commands: [`src/cli/pairing-cli.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/pairing-cli.ts) ## 2) Node pairing (iOS/Android nodes joining the gateway) @@ -70,11 +66,6 @@ Stored under `~/.clawdbot/nodes/`: Full protocol + design notes: [Gateway pairing](/gateway/pairing) -### Source of truth (code) - -- Node pairing store (pending/paired + token issuance): [`src/infra/node-pairing.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/node-pairing.ts) -- Gateway methods/events (`node.pair.*`): [`src/gateway/server-methods/nodes.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/nodes.ts) -- CLI: [`src/cli/nodes-cli.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/nodes-cli.ts) ## Related docs diff --git a/docs/start/setup.md b/docs/start/setup.md index e92b1a12a..931aefdaf 100644 --- a/docs/start/setup.md +++ b/docs/start/setup.md @@ -55,7 +55,7 @@ clawdbot providers login clawdbot health ``` -If onboarding is still WIP/broken on your build: +If onboarding is not available in your build: - Run `clawdbot setup`, then `clawdbot providers login`, then start the Gateway manually (`clawdbot gateway`). ## Bleeding edge workflow (Gateway in a terminal) @@ -77,7 +77,7 @@ pnpm install pnpm gateway:watch ``` -`gateway:watch` runs `src/entry.ts gateway --force` and reloads on [`src/**/*.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/**/*.ts) changes. +`gateway:watch` runs the gateway in watch mode and reloads on TypeScript changes. ### 2) Point the macOS app at your running Gateway diff --git a/docs/tools/agent-send.md b/docs/tools/agent-send.md index 2443716f8..b01848ecb 100644 --- a/docs/tools/agent-send.md +++ b/docs/tools/agent-send.md @@ -1,21 +1,44 @@ --- -summary: "Design notes for a direct `clawdbot agent` CLI subcommand without WhatsApp delivery" +summary: "Direct `clawdbot agent` CLI runs (with optional delivery)" read_when: - Adding or modifying the agent CLI entrypoint --- -# `clawdbot agent` (direct-to-agent invocation) +# `clawdbot agent` (direct agent runs) -`clawdbot agent` lets you talk to the **embedded** agent runtime directly (no chat send unless you opt in), while reusing the same session store and thinking/verbose persistence as inbound auto-replies. +`clawdbot agent` runs a single agent turn without needing an inbound chat message. +By default it goes **through the Gateway**; add `--local` to force the embedded +runtime on the current machine. ## Behavior + - Required: `--message ` - Session selection: - - If `--session-id` is given, reuse it. - - Else if `--to ` is given, derive the session key from `session.scope` (direct chats collapse to `main`, or `global` when scope is global). -- Runs the embedded Pi agent (configured via `agent`). -- Thinking/verbose: - - Flags `--thinking ` and `--verbose ` persist into the session store. + - `--to ` derives the session key (normal direct-chat routing), **or** + - `--session-id ` reuses an existing session by id +- Runs the same embedded agent runtime as normal inbound replies. +- Thinking/verbose flags persist into the session store. - Output: - - Default: prints text (and `MEDIA:` lines) to stdout. - - `--json`: prints structured payloads + meta. -- Optional: `--deliver` sends the reply back to the selected provider (`whatsapp`, `telegram`, `discord`, `signal`, `imessage`). + - default: prints reply text (plus `MEDIA:` lines) + - `--json`: prints structured payload + metadata +- Optional delivery back to a provider with `--deliver` + `--provider`. + +If the Gateway is unreachable, the CLI **falls back** to the embedded local run. + +## Examples + +```bash +clawdbot agent --to +15555550123 --message "status update" +clawdbot agent --session-id 1234 --message "Summarize inbox" --thinking medium +clawdbot agent --to +15555550123 --message "Trace logs" --verbose on --json +clawdbot agent --to +15555550123 --message "Summon reply" --deliver +``` + +## Flags + +- `--local`: run locally (requires provider keys in your shell) +- `--deliver`: send the reply to the chosen provider (requires `--to`) +- `--provider`: `whatsapp|telegram|discord|slack|signal|imessage` (default: `whatsapp`) +- `--thinking `: persist thinking level +- `--verbose `: persist verbose level +- `--timeout `: override agent timeout +- `--json`: output structured JSON diff --git a/docs/tools/browser.md b/docs/tools/browser.md index aaf698498..5c31fc3bb 100644 --- a/docs/tools/browser.md +++ b/docs/tools/browser.md @@ -1,254 +1,142 @@ --- -summary: "Spec: integrated browser control server + action commands" +summary: "Integrated browser control server + action commands" read_when: - Adding agent-controlled browser automation - Debugging why clawd is interfering with your own Chrome - Implementing browser settings + lifecycle in the macOS app --- -# Browser (integrated) — clawd-managed Chrome +# Browser (clawd-managed) -Status: draft spec · Date: 2025-12-20 +Clawdbot can run a **dedicated Chrome/Chromium profile** that the agent controls. +It is isolated from your personal browser and is managed through a small local +control server. -Goal: give the **clawd** persona its own browser that is: -- Visually distinct (lobster-orange, profile labeled "clawd"). -- Fully agent-manageable (start/stop, list tabs, focus/close tabs, open URLs, screenshot). -- Non-interfering with the user's own browser (separate profile + dedicated ports). +## What you get -This doc covers the macOS app/gateway side. It intentionally does not mandate -Playwright vs Puppeteer; the key is the **contract** and the **separation guarantees**. +- A separate browser profile named **clawd** (orange accent by default). +- Deterministic tab control (list/open/focus/close). +- Agent actions (click/type/drag/select), snapshots, screenshots, PDFs. +- Optional multi-profile support (`clawd`, `work`, `remote`, ...). -## User-facing settings +This browser is **not** your daily driver. It is a safe, isolated surface for +agent automation and verification. -Add a dedicated settings section (preferably under **Skills** or its own "Browser" tab): +## Quick start -- **Enable clawd browser** (`default: on`) - - When off: no browser is launched, and browser tools return "disabled". -- **Browser control URL** (`default: http://127.0.0.1:18791`) - - Interpreted as the base URL of the local/remote browser-control server. - - If the URL host is not loopback, Clawdbot must **not** attempt to launch a local - browser; it only connects. -- **CDP URL** (`default: controlUrl + 1`) - - Base URL for Chrome DevTools Protocol (e.g. `http://127.0.0.1:18792`). - - Set this to a non-loopback host to attach the local control server to a remote - Chrome/Chromium CDP endpoint (SSH/Tailscale tunnel recommended). - - If the CDP URL host is non-loopback, clawd does **not** auto-launch a local browser. - - If you tunnel a remote CDP to `localhost`, set **Attach to existing only** to - avoid accidentally launching a local browser. -- **Accent color** (`default: #FF4500`, "lobster-orange") - - Used to theme the clawd browser profile (best-effort) and to tint UI indicators - in Clawdbot. +```bash +clawdbot browser status +clawdbot browser start +clawdbot browser open https://example.com +clawdbot browser snapshot +``` -Optional (advanced, can be hidden behind Debug initially): -- **Use headless browser** (`default: off`) -- **Attach to existing only** (`default: off`) — if on, never launch; only connect if - already running. -- **Browser executable path** (override, optional) -- **No sandbox** (`default: off`) — adds `--no-sandbox` + `--disable-setuid-sandbox` +If you get “Browser disabled”, enable it in config (see below) and restart the +Gateway. -### Port convention +## Configuration -Clawdbot already uses: -- Gateway WebSocket: `18789` -- Bridge (voice/node): `18790` +Browser settings live in `~/.clawdbot/clawdbot.json`. -For the clawd browser-control server, use "family" ports: -- Browser control HTTP API: `18791` (bridge + 1) -- Browser CDP/debugging port: `18792` (control + 1) -- Canvas host HTTP: `18793` by default, mounted at `/__clawdbot__/canvas/` - -The user usually only configures the **control URL** (port `18791`). CDP is an -internal detail. - -## Browser isolation guarantees (non-negotiable) - -1) **Dedicated user data dir** - - Never attach to or reuse the user's default Chrome profile. - - Store clawd browser state under an app-owned directory, e.g.: - - `~/Library/Application Support/Clawdbot/browser/clawd/` (mac app) - - or `~/.clawdbot/browser/clawd/` (gateway/CLI) - -2) **Dedicated ports** - - Never use `9222` (reserved for ad-hoc dev workflows; avoids colliding with - `agent-tools/browser-tools`). - - Default ports are `18791/18792` unless overridden. - -3) **Named tab/page management** - - The agent must be able to enumerate and target tabs deterministically (by - stable `targetId` or equivalent), not "last tab". - -## Browser selection (macOS + Linux) - -On startup (when enabled + local URL), Clawdbot chooses the browser executable -in this order: -1) **Google Chrome Canary** (if installed) -2) **Chromium** (if installed) -3) **Google Chrome** (fallback) - -Linux: -- Looks for `google-chrome` / `chromium` in common system paths. -- Use **Browser executable path** to force a specific binary. - -Implementation detail: -- macOS: detection is by existence of the `.app` bundle under `/Applications` - (and optionally `~/Applications`), then using the resolved executable path. -- Linux: common `/usr/bin`/`/snap/bin` paths. - -Rationale: -- Canary/Chromium are easy to visually distinguish from the user's daily driver. -- Chrome fallback ensures the feature works on a stock machine. - -## Visual differentiation ("lobster-orange") - -The clawd browser should be obviously different at a glance: -- Profile name: **clawd** -- Profile color: **#FF4500** - -Preferred behavior: -- Seed/patch the profile's preferences on first launch so the color + name persist. - -Fallback behavior: -- If preferences patching is not reliable, open with the dedicated profile and let - the user set the profile color/name once via Chrome UI; it must persist because - the `userDataDir` is persistent. - -## Control server contract (vNext) - -Expose a small local HTTP API (and/or gateway RPC surface) so the agent can manage -state without touching the user's Chrome. - -Basics: -- `GET /` status payload (enabled/running/pid/cdpPort/etc) -- `POST /start` start browser -- `POST /stop` stop browser -- `GET /tabs` list tabs -- `POST /tabs/open` open a new tab -- `POST /tabs/focus` focus a tab by id/prefix -- `DELETE /tabs/:targetId` close a tab by id/prefix - -Inspection: -- `POST /screenshot` `{ targetId?, fullPage?, ref?, element?, type? }` -- `GET /snapshot` `?format=aria|ai&targetId?&limit?` -- `GET /console` `?level?&targetId?` -- `POST /pdf` `{ targetId? }` - -Actions: -- `POST /navigate` -- `POST /act` `{ kind, targetId?, ... }` where `kind` is one of: - - `click`, `type`, `press`, `hover`, `drag`, `select`, `fill`, `wait`, `resize`, `close`, `evaluate` - -Hooks (arming): -- `POST /hooks/file-chooser` `{ targetId?, paths, timeoutMs? }` -- `POST /hooks/dialog` `{ targetId?, accept, promptText?, timeoutMs? }` - -### "Is it open or closed?" - -"Open" means: -- the control server is reachable at the configured URL **and** -- it reports a live browser connection. - -"Closed" means: -- control server not reachable, or server reports no browser. - -Clawdbot should treat "open/closed" as a health check (fast path), not by scanning -global Chrome processes (avoid false positives). - -## Multi-profile support - -Clawdbot supports multiple named browser profiles, each with: -- Dedicated CDP port (auto-allocated from 18800-18899) **or** a per-profile CDP URL -- Persistent user data directory (`~/.clawdbot/browser//user-data/`) -- Unique color for visual distinction - -### Configuration - -```json +```json5 { - "browser": { - "enabled": true, - "defaultProfile": "clawd", - "profiles": { - "clawd": { "cdpPort": 18800, "color": "#FF4500" }, - "work": { "cdpPort": 18801, "color": "#0066CC" }, - "remote": { "cdpUrl": "http://10.0.0.42:9222", "color": "#00AA00" } + browser: { + enabled: true, // default: true + controlUrl: "http://127.0.0.1:18791", + cdpUrl: "http://127.0.0.1:18792", // defaults to controlUrl + 1 + defaultProfile: "clawd", + color: "#FF4500", + headless: false, + noSandbox: false, + attachOnly: false, + executablePath: "/Applications/Chromium.app/Contents/MacOS/Chromium", + profiles: { + clawd: { cdpPort: 18800, color: "#FF4500" }, + work: { cdpPort: 18801, color: "#0066CC" }, + remote: { cdpUrl: "http://10.0.0.42:9222", color: "#00AA00" } } } } ``` -### Profile actions +Notes: +- `controlUrl` defaults to `http://127.0.0.1:18791`. +- If you override the Gateway port (`gateway.port` or `CLAWDBOT_GATEWAY_PORT`), + the default browser ports shift to stay in the same “family” (control = gateway + 2). +- `cdpUrl` defaults to `controlUrl + 1` when unset. +- `attachOnly: true` means “never launch Chrome; only attach if it is already running.” -- `GET /profiles` — list all profiles with status -- `POST /profiles/create` `{ name, color?, cdpUrl? }` — create new profile (auto-allocates port if no `cdpUrl`) -- `DELETE /profiles/:name` — delete profile (stops browser + removes user data for local profiles) -- `POST /reset-profile?profile=` — kill orphan process on profile's port (local profiles only) +## Local vs remote control -### Profile parameter +- **Local control (default):** `controlUrl` is loopback (`127.0.0.1`/`localhost`). + The Gateway starts the control server and can launch Chrome. +- **Remote control:** `controlUrl` is non-loopback. The Gateway **does not** start + a local server; it assumes you are pointing at an existing server elsewhere. +- **Remote CDP:** set `browser.profiles..cdpUrl` (or `browser.cdpUrl`) to + attach to a remote Chrome. In this case, Clawdbot will not launch a local browser. -All existing endpoints accept optional `?profile=` query parameter: -- `GET /?profile=work` — status for work profile -- `POST /start?profile=work` — start work profile browser -- `GET /tabs?profile=work` — list tabs for work profile -- `POST /tabs/open?profile=work` — open tab in work profile -- etc. +## Profiles (multi-browser) -When `profile` is omitted, uses `browser.defaultProfile` (defaults to "clawd"). +Clawdbot supports multiple named profiles. Each profile has its own: +- user data directory +- CDP port (local) or CDP URL (remote) +- accent color -### Agent browser tool +Defaults: +- The `clawd` profile is auto-created if missing. +- Local CDP ports allocate from **18800–18899** by default. +- Deleting a profile moves its local data directory to Trash. -The `browser` tool accepts an optional `profile` parameter for all actions: +All control endpoints accept `?profile=`; the CLI uses `--browser-profile`. -```json -{ - "action": "open", - "targetUrl": "https://example.com", - "profile": "work" -} -``` +## Isolation guarantees -This routes the operation to the specified profile's browser instance. Omitting -`profile` uses the default profile. +- **Dedicated user data dir**: never touches your personal Chrome profile. +- **Dedicated ports**: avoids `9222` to prevent collisions with dev workflows. +- **Deterministic tab control**: target tabs by `targetId`, not “last tab”. -### Profile naming rules +## Browser selection -- Lowercase alphanumeric characters and hyphens only -- Must start with a letter or number (not a hyphen) -- Maximum 64 characters -- Examples: `clawd`, `work`, `my-project-1` +When launching locally, Clawdbot picks the first available: +1. Chrome Canary +2. Chromium +3. Chrome -### Port allocation +You can override with `browser.executablePath`. -Ports are allocated from range 18800-18899 (~100 profiles max). This is far more -than practical use — memory and CPU exhaustion occur well before port exhaustion. -Ports are allocated once at profile creation and persisted permanently. -Remote profiles are attach-only and do **not** use the local port range. -## Interaction with the agent (clawd) +Platforms: +- macOS: checks `/Applications` and `~/Applications`. +- Linux: looks for `google-chrome`, `chromium`, etc. +- Windows: checks common install locations. -The agent should use browser tools only when: -- enabled in settings -- control URL is configured +## Control API (optional) -If disabled, tools must fail fast with a friendly error ("Browser disabled in settings"). +If you want to integrate directly, the browser control server exposes a small +HTTP API: -The agent should not assume tabs are ephemeral. It should: -- call `browser.tabs.list` to discover existing tabs first -- reuse an existing tab when appropriate (e.g. a persistent "main" tab) -- avoid opening duplicate tabs unless asked +- Status/start/stop: `GET /`, `POST /start`, `POST /stop` +- Tabs: `GET /tabs`, `POST /tabs/open`, `POST /tabs/focus`, `DELETE /tabs/:targetId` +- Snapshot/screenshot: `GET /snapshot`, `POST /screenshot` +- Actions: `POST /navigate`, `POST /act` +- Hooks: `POST /hooks/file-chooser`, `POST /hooks/dialog` +- Debugging: `GET /console`, `POST /pdf` -## CLI quick reference (one example each) +All endpoints accept `?profile=`. -All commands accept `--browser-profile ` to target a specific profile (default: `clawd`). +### Playwright requirement + +Some features (navigate/act/ai snapshot, element screenshots, PDF) require +Playwright. In embedded gateway builds, Playwright may be unavailable; those +endpoints return a clear 501 error. ARIA snapshots and basic screenshots still work. + +## CLI quick reference + +All commands accept `--browser-profile ` to target a specific profile. -Profile management: -- `clawdbot browser profiles` -- `clawdbot browser create-profile --name work` -- `clawdbot browser create-profile --name remote --cdp-url http://10.0.0.42:9222` -- `clawdbot browser delete-profile --name work` Basics: - `clawdbot browser status` - `clawdbot browser start` - `clawdbot browser stop` -- `clawdbot browser reset-profile` - `clawdbot browser tabs` - `clawdbot browser open https://example.com` - `clawdbot browser focus abcd1234` @@ -260,6 +148,8 @@ Inspection: - `clawdbot browser screenshot --ref 12` - `clawdbot browser snapshot` - `clawdbot browser snapshot --format aria --limit 200` +- `clawdbot browser console --level error` +- `clawdbot browser pdf` Actions: - `clawdbot browser navigate https://example.com` @@ -271,39 +161,27 @@ Actions: - `clawdbot browser drag 10 11` - `clawdbot browser select 9 OptionA OptionB` - `clawdbot browser upload /tmp/file.pdf` -- `clawdbot browser fill --fields '[{\"ref\":\"1\",\"value\":\"Ada\"}]'` +- `clawdbot browser fill --fields '[{"ref":"1","type":"text","value":"Ada"}]'` - `clawdbot browser dialog --accept` - `clawdbot browser wait --text "Done"` - `clawdbot browser evaluate --fn '(el) => el.textContent' --ref 7` -- `clawdbot browser evaluate --fn "document.querySelector('.my-class').click()"` -- `clawdbot browser console --level error` -- `clawdbot browser pdf` Notes: -- `upload` and `dialog` are **arming** calls; run them before the click/press that triggers the chooser/dialog. -- `upload` can take a `ref` to auto-click after arming (useful for single-step file uploads). -- `upload` can also take `inputRef` (aria ref) or `element` (CSS selector) to set `` directly without waiting for a file chooser. -- The arm default timeout is **2 minutes** (clamped to max 2 minutes); pass `timeoutMs` if you need shorter. -- `snapshot` defaults to `ai`; `aria` returns an accessibility tree for debugging. -- `click`/`type` require `ref` from `snapshot --format ai`; use `evaluate` for rare CSS selector one-offs. -- Avoid `wait` by default; use it only in exceptional cases when there is no reliable UI state to wait on. +- `upload` and `dialog` are **arming** calls; run them before the click/press + that triggers the chooser/dialog. +- `upload` can also set file inputs directly via `--input-ref` or `--element`. +- `snapshot` defaults to `ai` when available; use `--format aria` for the + accessibility tree. +- `click`/`type` require a `ref` from `snapshot` (CSS selectors are intentionally + not supported for actions). -## Security & privacy notes +## Security & privacy -- The clawd browser profile is app-owned; it may contain logged-in sessions. - Treat it as sensitive data. -- The control server must bind to loopback only by default (`127.0.0.1`) unless the - user explicitly configures a non-loopback URL. -- Never reuse or copy the user's default Chrome profile. -- Remote CDP endpoints should be tunneled or protected; CDP is highly privileged. - -## Non-goals (for the first cut) - -- Cross-device "sync" of tabs between Mac and Pi. -- Sharing the user's logged-in Chrome sessions automatically. -- General-purpose web scraping; this is primarily for "close-the-loop" verification - and interaction. +- The clawd browser profile may contain logged-in sessions; treat it as sensitive. +- Keep control URLs loopback-only unless you intentionally expose the server. +- Remote CDP endpoints are powerful; tunnel and protect them. ## Troubleshooting -For Linux-specific issues (especially Ubuntu with snap Chromium), see [browser-linux-troubleshooting](/tools/browser-linux-troubleshooting). +For Linux-specific issues (especially snap Chromium), see +[Browser troubleshooting](/tools/browser-linux-troubleshooting). diff --git a/docs/tools/index.md b/docs/tools/index.md index a27731fc1..7f9b2402c 100644 --- a/docs/tools/index.md +++ b/docs/tools/index.md @@ -294,25 +294,12 @@ Node targeting: - Respect user consent for camera/screen capture. - Use `status/describe` to ensure permissions before invoking media commands. -## How the model sees tools (pi-mono internals) +## How tools are presented to the agent -Tools are exposed to the model in **two parallel channels**: +Tools are exposed in two parallel channels: -1) **System prompt text**: a human-readable list + guidelines. -2) **Provider tool schema**: the actual function/tool declarations sent to the model API. +1) **System prompt text**: a human-readable list + guidance. +2) **Tool schema**: the structured function definitions sent to the model API. -In pi-mono: -- System prompt builder: [`packages/coding-agent/src/core/system-prompt.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/src/core/system-prompt.ts) - - Builds the `Available tools:` list from `toolDescriptions`. - - Appends skills and project context. -- Tool schemas passed to providers: - - OpenAI: [`packages/ai/src/providers/openai-responses.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/ai/src/providers/openai-responses.ts) (`convertTools`) - - Anthropic: [`packages/ai/src/providers/anthropic.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/ai/src/providers/anthropic.ts) (`convertTools`) - - Gemini: [`packages/ai/src/providers/google-shared.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/ai/src/providers/google-shared.ts) (`convertTools`) -- Tool execution loop: - - Agent loop: [`packages/ai/src/agent/agent-loop.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/ai/src/agent/agent-loop.ts) - - Validates tool arguments and executes tools, then appends `toolResult` messages. - -In Clawdbot: -- System prompt append: [`src/agents/system-prompt.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/system-prompt.ts) -- Tool list injected via `createClawdbotCodingTools()` in [`src/agents/pi-tools.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-tools.ts) +That means the agent sees both “what tools exist” and “how to call them.” If a tool +doesn’t appear in the system prompt or the schema, the model cannot call it. diff --git a/docs/web/tui.md b/docs/web/tui.md index df87d081d..fc5cdf5b9 100644 --- a/docs/web/tui.md +++ b/docs/web/tui.md @@ -65,8 +65,3 @@ Use SSH tunneling or Tailscale to reach the Gateway WS. ## Notes - The TUI shows Gateway chat deltas (`event: chat`) and agent tool events. - It registers as a Gateway client with `mode: "tui"` for presence and debugging. - -## Files -- CLI: [`src/cli/tui-cli.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/tui-cli.ts) -- Runner: [`src/tui/tui.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/tui/tui.ts) -- Gateway client: [`src/tui/gateway-chat.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/tui/gateway-chat.ts)