docs: refresh and simplify docs
This commit is contained in:
@@ -171,9 +171,9 @@ Notes:
|
||||
|
||||
Recommended: `clawdbot hooks gmail run` wraps the same flow and auto-renews the watch.
|
||||
|
||||
## Expose the handler (dev, unsupported hack)
|
||||
## Expose the handler (advanced, unsupported)
|
||||
|
||||
If you insist on a non-Tailscale tunnel, wire it manually and use the public URL in the push
|
||||
If you need a non-Tailscale tunnel, wire it manually and use the public URL in the push
|
||||
subscription (unsupported, no guardrails):
|
||||
|
||||
```bash
|
||||
|
||||
@@ -7,8 +7,7 @@ read_when:
|
||||
|
||||
# CLI reference
|
||||
|
||||
This page mirrors `src/cli/*` and is the source of truth for CLI behavior.
|
||||
If you change the CLI code, update this doc.
|
||||
This page describes the current CLI behavior. If commands change, update this doc.
|
||||
|
||||
## Global flags
|
||||
|
||||
@@ -25,7 +24,7 @@ If you change the CLI code, update this doc.
|
||||
|
||||
## Color palette
|
||||
|
||||
Clawdbot uses a lobster palette for CLI output. Source of truth: `src/terminal/theme.ts`.
|
||||
Clawdbot uses a lobster palette for CLI output.
|
||||
|
||||
- `accent` (#FF5A2D): headings, provider labels, primary highlights.
|
||||
- `accentBright` (#FF7A3D): command names, emphasis.
|
||||
|
||||
@@ -5,11 +5,11 @@ read_when:
|
||||
---
|
||||
# Agent Loop (Clawdis)
|
||||
|
||||
Short, exact flow of one agent run. Source of truth: current code in `src/`.
|
||||
Short, exact flow of one agent run.
|
||||
|
||||
## Entry points
|
||||
- Gateway RPC: `agent` and `agent.wait` in [`src/gateway/server-methods/agent.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/agent.ts).
|
||||
- CLI: `agentCommand` in [`src/commands/agent.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/commands/agent.ts).
|
||||
- Gateway RPC: `agent` and `agent.wait`.
|
||||
- CLI: `agent` command.
|
||||
|
||||
## High-level flow
|
||||
1) `agent` RPC validates params, resolves session (sessionKey/sessionId), persists session metadata, returns `{ runId, acceptedAt }` immediately.
|
||||
@@ -37,10 +37,8 @@ Short, exact flow of one agent run. Source of truth: current code in `src/`.
|
||||
- `tool`: streamed tool events from pi-agent-core
|
||||
|
||||
## Chat provider handling
|
||||
- `createAgentEventHandler` in [`src/gateway/server-chat.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-chat.ts):
|
||||
- buffers assistant deltas
|
||||
- emits chat `delta` messages
|
||||
- emits chat `final` when **lifecycle end/error** arrives
|
||||
- Assistant deltas are buffered into chat `delta` messages.
|
||||
- A chat `final` is emitted on **lifecycle end/error**.
|
||||
|
||||
## Timeouts
|
||||
- `agent.wait` default: 30s (just the wait). `timeoutMs` param overrides.
|
||||
@@ -51,11 +49,3 @@ Short, exact flow of one agent run. Source of truth: current code in `src/`.
|
||||
- AbortSignal (cancel)
|
||||
- Gateway disconnect or RPC timeout
|
||||
- `agent.wait` timeout (wait-only, does not stop agent)
|
||||
|
||||
## Files
|
||||
- [`src/gateway/server-methods/agent.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/agent.ts)
|
||||
- [`src/gateway/server-methods/agent-job.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/agent-job.ts)
|
||||
- [`src/commands/agent.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/commands/agent.ts)
|
||||
- [`src/agents/pi-embedded-runner.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-embedded-runner.ts)
|
||||
- [`src/agents/pi-embedded-subscribe.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-embedded-subscribe.ts)
|
||||
- [`src/gateway/server-chat.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-chat.ts)
|
||||
|
||||
@@ -5,7 +5,7 @@ read_when:
|
||||
---
|
||||
# Agent Runtime 🤖
|
||||
|
||||
CLAWDBOT runs a single embedded agent runtime derived from **p-mono** (internal name: **p**).
|
||||
CLAWDBOT runs a single embedded agent runtime derived from **p-mono**.
|
||||
|
||||
## Workspace (required)
|
||||
|
||||
@@ -43,9 +43,9 @@ To disable bootstrap file creation entirely (for pre-seeded workspaces), set:
|
||||
{ agent: { skipBootstrap: true } }
|
||||
```
|
||||
|
||||
## Built-in tools (internal)
|
||||
## Built-in tools
|
||||
|
||||
p’s embedded core tools (read/bash/edit/write and related internals) are defined in code and always available. `TOOLS.md` does **not** control which tools exist; it’s guidance for how *you* want them used.
|
||||
Core tools (read/bash/edit/write and related system tools) are always available. `TOOLS.md` does **not** control which tools exist; it’s guidance for how *you* want them used.
|
||||
|
||||
## Skills
|
||||
|
||||
@@ -63,18 +63,6 @@ Clawdbot reuses pieces of the p-mono codebase (models/tools), but **session mana
|
||||
- No p-coding agent runtime.
|
||||
- No `~/.pi/agent` or `<workspace>/.pi` settings are consulted.
|
||||
|
||||
## Peter @ steipete (only)
|
||||
|
||||
Apply these notes **only** when the user is Peter Steinberger at steipete.
|
||||
|
||||
- Gateway runs on the **Mac Studio in London**.
|
||||
- Primary work computer: **MacBook Pro**.
|
||||
- Peter travels between **Vienna** and **London**; there are two networks bridged via **Tailscale**.
|
||||
- For debugging, connect to the Mac Studio (London) or MacBook Pro (primary).
|
||||
- There is also an **M1 MacBook Pro** on the Vienna tailnet you can use to access the Vienna network.
|
||||
- Nodes can be accessed via the `clawdbot` binary (`pnpm clawdbot` in `~/Projects/clawdbot`).
|
||||
- See also `skills/clawdbot*` for node/browser/canvas/cron usage.
|
||||
|
||||
## Sessions
|
||||
|
||||
Session transcripts are stored as JSONL at:
|
||||
|
||||
@@ -3,67 +3,55 @@ summary: "WebSocket gateway architecture, components, and client flows"
|
||||
read_when:
|
||||
- Working on gateway protocol, clients, or transports
|
||||
---
|
||||
# Gateway Architecture
|
||||
# Gateway architecture
|
||||
|
||||
Last updated: 2026-01-05
|
||||
|
||||
## Overview
|
||||
- A single long-lived **Gateway** process owns all messaging surfaces (WhatsApp via Baileys, Telegram via grammY, Slack via Bolt, Discord via discord.js, Signal via signal-cli, iMessage via imsg, WebChat) and the control/event plane.
|
||||
- All clients (macOS app, CLI, web UI, automations) connect to the Gateway over one transport: **WebSocket on the configured bind host** (default `127.0.0.1:18789`; tunnel or VPN for remote).
|
||||
- One Gateway per host; it is the only place that is allowed to open a WhatsApp session. All sends/agent runs go through it.
|
||||
- By default: the Gateway exposes a Canvas host on `canvasHost.port` (default `18793`), serving `~/clawd/canvas` at `/__clawdbot__/canvas/` with live-reload; disable via `canvasHost.enabled=false` or `CLAWDBOT_SKIP_CANVAS_HOST=1`.
|
||||
|
||||
## Implementation snapshot (current code)
|
||||
|
||||
### TypeScript Gateway ([`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts))
|
||||
- Single HTTP + WebSocket server (default `18789`); bind policy `loopback|lan|tailnet|auto`. Refuses non-loopback binds without auth; Tailscale serve/funnel requires loopback.
|
||||
- Handshake: first frame must be a `connect` request; AJV validates request + params against TypeBox schemas; protocol negotiated via `minProtocol`/`maxProtocol`.
|
||||
- `hello-ok` includes snapshot (presence/health/stateVersion/uptime/configPath/stateDir), features (methods/events), policy (max payload/buffer/tick), and `canvasHostUrl` when available.
|
||||
- Events emitted: `agent`, `chat`, `presence`, `tick`, `health`, `heartbeat`, `cron`, `talk.mode`, `node.pair.requested`, `node.pair.resolved`, `voicewake.changed`, `shutdown`.
|
||||
- Idempotency keys are required for `send`, `agent`, `chat.send`, and node invokes; the dedupe cache avoids double-sends on reconnects. Payload sizes are capped per connection.
|
||||
- Optional node bridge ([`src/infra/bridge/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/bridge/server.ts)): TCP JSONL frames (`hello`, `pair-request`, `req/res`, `event`, `invoke`, `ping`). Node connect/disconnect updates presence and flows into the session bus.
|
||||
- Control UI + Canvas host: HTTP serves Control UI (base path configurable) and can host the A2UI canvas via [`src/canvas-host/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/canvas-host/server.ts) (live reload). Canvas host URL is advertised to nodes + clients.
|
||||
|
||||
### iOS node (`apps/ios`)
|
||||
- Discovery + pairing: `BridgeDiscoveryModel` uses `NWBrowser` Bonjour discovery and reads TXT fields for LAN/tailnet host hints plus gateway/bridge/canvas ports.
|
||||
- Auto-connect: `BridgeConnectionController` uses stored `node.instanceId` + Keychain token; supports manual host/port; performs `pair-and-hello`.
|
||||
- Bridge runtime: `BridgeSession` actor owns an `NWConnection`, JSONL frames, `hello`/`hello-ok`, ping/pong, `req/res`, server `event`s, and `invoke` callbacks; stores `canvasHostUrl`.
|
||||
- Commands: `NodeAppModel` executes `canvas.*`, `canvas.a2ui.*`, `camera.*`, `screen.record`, `location.get`. Canvas/camera/screen are blocked when backgrounded.
|
||||
- Canvas + actions: `WKWebView` with A2UI action bridge; accepts actions from local-network or trusted file URLs; intercepts `clawdbot://` deep links and forwards `agent.request` to the bridge.
|
||||
- Voice/talk: voice wake sends `voice.transcript` events and syncs triggers via `voicewake.get` + `voicewake.changed`; Talk Mode attaches to the bridge.
|
||||
|
||||
### Android node (`apps/android`)
|
||||
- Discovery + pairing: `BridgeDiscovery` uses mDNS/NSD to find `_clawdbot-bridge._tcp`, with manual host/port fallback.
|
||||
- Auto-connect: `NodeRuntime` restores a stored token, performs `pair-and-hello`, and reconnects to the last discovered or manual bridge.
|
||||
- Bridge runtime: `BridgeSession` owns the TCP JSONL session (`hello`/`hello-ok`, ping/pong, `req/res`, `event`, `invoke`); stores `canvasHostUrl`.
|
||||
- Commands: `NodeRuntime` executes `canvas.*`, `canvas.a2ui.*`, `camera.*`, and chat/session events; foreground-only for canvas/camera.
|
||||
- A single long‑lived **Gateway** owns all messaging surfaces (WhatsApp via
|
||||
Baileys, Telegram via grammY, Slack, Discord, Signal, iMessage, WebChat).
|
||||
- All clients (macOS app, CLI, web UI, automations) connect to the Gateway over
|
||||
**one transport: WebSocket** on the configured bind host (default
|
||||
`127.0.0.1:18789`).
|
||||
- One Gateway per host; it is the only place that opens a WhatsApp session.
|
||||
- A **bridge** (default `18790`) is used for nodes (macOS/iOS/Android).
|
||||
- A **canvas host** (default `18793`) serves agent‑editable HTML and A2UI.
|
||||
|
||||
## Components and flows
|
||||
- **Gateway (daemon)**
|
||||
- Maintains WhatsApp (Baileys), Telegram (grammY), Slack (Bolt), Discord (discord.js), Signal (signal-cli), and iMessage (imsg) connections.
|
||||
- Exposes a typed WS API (req/resp + server push events).
|
||||
- Validates every inbound frame against JSON Schema; rejects anything before a mandatory `connect`.
|
||||
- **Clients (mac app / CLI / web admin)**
|
||||
- One WS connection per client.
|
||||
- Send requests (`health`, `status`, `send`, `agent`, `system-presence`, toggles) and subscribe to events (`tick`, `agent`, `presence`, `shutdown`).
|
||||
- On macOS, the app can also be invoked via deep links (`clawdbot://agent?...`) which translate into the same Gateway `agent` request path (see [`docs/macos.md`](/platforms/macos)).
|
||||
- **Agent process (Pi)**
|
||||
- Spawned by the Gateway on demand for `agent` calls; streams events back over the same WS connection.
|
||||
- **WebChat**
|
||||
- Serves static assets locally.
|
||||
- Holds a single WS connection to the Gateway for control/data; all sends/agent runs go through the Gateway WS.
|
||||
- Remote use goes through the same SSH/Tailscale tunnel as other clients.
|
||||
|
||||
### Gateway (daemon)
|
||||
- Maintains provider connections.
|
||||
- Exposes a typed WS API (requests, responses, server‑push events).
|
||||
- Validates inbound frames against JSON Schema.
|
||||
- Emits events like `agent`, `chat`, `presence`, `health`, `heartbeat`, `cron`.
|
||||
|
||||
### Clients (mac app / CLI / web admin)
|
||||
- One WS connection per client.
|
||||
- Send requests (`health`, `status`, `send`, `agent`, `system-presence`).
|
||||
- Subscribe to events (`tick`, `agent`, `presence`, `shutdown`).
|
||||
|
||||
### Nodes (macOS / iOS / Android)
|
||||
- Connect to the **bridge** (TCP JSONL) rather than the WS server.
|
||||
- Pair with the Gateway to receive a token.
|
||||
- Expose commands like `canvas.*`, `camera.*`, `screen.record`, `location.get`.
|
||||
|
||||
### WebChat
|
||||
- Static UI that uses the Gateway WS API for chat history and sends.
|
||||
- In remote setups, connects through the same SSH/Tailscale tunnel as other
|
||||
clients.
|
||||
|
||||
## Connection lifecycle (single client)
|
||||
|
||||
```
|
||||
Client Gateway
|
||||
| |
|
||||
|---- req:connect -------->|
|
||||
|<------ res (ok) ---------| (or res error + close)
|
||||
| (payload=hello-ok carries snapshot: presence + health)
|
||||
| (payload=hello-ok carries snapshot: presence + health)
|
||||
| |
|
||||
|<------ event:presence ---| (deltas)
|
||||
|<------ event:tick -------| (keepalive/no-op)
|
||||
|<------ event:presence ---|
|
||||
|<------ event:tick -------|
|
||||
| |
|
||||
|------- req:agent ------->|
|
||||
|<------ res:agent --------| (ack: {runId,status:"accepted"})
|
||||
@@ -71,44 +59,42 @@ Client Gateway
|
||||
|<------ res:agent --------| (final: {runId,status,summary})
|
||||
| |
|
||||
```
|
||||
|
||||
## Wire protocol (summary)
|
||||
|
||||
- Transport: WebSocket, text frames with JSON payloads.
|
||||
- First frame must be `req {type:"req", id, method:"connect", params:{minProtocol, maxProtocol, client:{name,version,platform,mode,instanceId}, caps, auth?, locale?, userAgent? } }`.
|
||||
- Server replies `res {type:"res", id, ok:true, payload: hello-ok }` or `ok:false` then closes.
|
||||
- After handshake:
|
||||
- Requests: `{type:"req", id, method, params}` → `{type:"res", id, ok, payload|error}`
|
||||
- Events: `{type:"event", event:"agent"|"presence"|"tick"|"shutdown", payload, seq?, stateVersion?}`
|
||||
- If `CLAWDBOT_GATEWAY_TOKEN` (or `--token`) is set, `connect.params.auth.token` must match; otherwise the socket closes with policy violation.
|
||||
- Presence payload is structured, not free text: `{host, ip, version, mode, lastInputSeconds?, ts, reason?, tags?[], instanceId? }`.
|
||||
- Agent runs are acked `{runId,status:"accepted"}` then complete with a final res `{runId,status,summary}`; streamed output arrives as `event:"agent"`.
|
||||
- Protocol versions are bumped on breaking changes; clients must match `minClient`; Gateway chooses within client’s min/max.
|
||||
- Idempotency keys are required for side-effecting methods (`send`, `agent`) to safely retry; server keeps a short-lived dedupe cache.
|
||||
- Policy in `hello-ok` communicates payload/queue limits and tick interval.
|
||||
- First frame **must** be `connect`.
|
||||
- After handshake:
|
||||
- Requests: `{type:"req", id, method, params}` → `{type:"res", id, ok, payload|error}`
|
||||
- Events: `{type:"event", event, payload, seq?, stateVersion?}`
|
||||
- If `CLAWDBOT_GATEWAY_TOKEN` (or `--token`) is set, `connect.params.auth.token`
|
||||
must match or the socket closes.
|
||||
- Idempotency keys are required for side‑effecting methods (`send`, `agent`) to
|
||||
safely retry; the server keeps a short‑lived dedupe cache.
|
||||
|
||||
## Type system and codegen
|
||||
- Source of truth: TypeBox (or ArkType) definitions in `protocol/` on the server.
|
||||
- Build step emits JSON Schema.
|
||||
- Clients:
|
||||
- TypeScript: uses the same TypeBox types directly.
|
||||
- Swift: generated `Codable` models via quicktype from the JSON Schema.
|
||||
- Validation: AJV on the server for every inbound frame; optional client-side validation for defensive programming.
|
||||
## Protocol typing and codegen
|
||||
|
||||
## Invariants
|
||||
- Exactly one Gateway controls a single Baileys session per host. No fallbacks to ad-hoc direct Baileys sends.
|
||||
- Handshake is mandatory; any non-JSON or non-connect first frame is a hard close.
|
||||
- All methods and events are versioned; new fields are additive; breaking changes increment `protocol`.
|
||||
- No event replay: on seq gaps, clients must refresh (`health` + `system-presence`) and continue; presence is bounded via TTL/max entries.
|
||||
- TypeBox schemas define the protocol.
|
||||
- JSON Schema is generated from those schemas.
|
||||
- Swift models are generated from the JSON Schema.
|
||||
|
||||
## Remote access
|
||||
- Preferred: Tailscale or VPN; alternate: SSH tunnel `ssh -N -L 18789:127.0.0.1:18789 user@host`.
|
||||
- Same protocol over the tunnel; same handshake. If a shared token is configured, clients must send it in `connect.params.auth.token` even over the tunnel.
|
||||
- Same protocol over the tunnel; same handshake. If a shared token is configured, clients must send it in `connect.params.auth.token` even over the tunnel.
|
||||
|
||||
- Preferred: Tailscale or VPN.
|
||||
- Alternative: SSH tunnel
|
||||
```bash
|
||||
ssh -N -L 18789:127.0.0.1:18789 user@host
|
||||
```
|
||||
- The same handshake + auth token apply over the tunnel.
|
||||
|
||||
## Operations snapshot
|
||||
- Start: `clawdbot gateway` (foreground, logs to stdout).
|
||||
Supervise with launchd/systemd for restarts.
|
||||
- Health: request `health` over WS; also surfaced in `hello-ok.health`.
|
||||
- Metrics/logging: keep outside this spec; gateway should expose Prometheus text or structured logs separately.
|
||||
|
||||
## Migration notes
|
||||
- This architecture supersedes the legacy stdin RPC and the ad-hoc TCP control port. New clients should speak only the WS protocol. Legacy compatibility is intentionally dropped.
|
||||
- Start: `clawdbot gateway` (foreground, logs to stdout).
|
||||
- Health: `health` over WS (also included in `hello-ok`).
|
||||
- Supervision: launchd/systemd for auto‑restart.
|
||||
|
||||
## Invariants
|
||||
|
||||
- Exactly one Gateway controls a single Baileys session per host.
|
||||
- Handshake is mandatory; any non‑JSON or non‑connect first frame is a hard close.
|
||||
- Events are not replayed; clients must refresh on gaps.
|
||||
|
||||
@@ -61,7 +61,6 @@ Only the owner number (from `whatsapp.allowFrom`, or the bot’s own E.164 when
|
||||
4) Session-level directives (`/verbose on`, `/think high`, `/new` or `/reset`, `/compact`) apply only to that group’s session; send them as standalone messages so they register. Your personal DM session remains independent.
|
||||
|
||||
## Testing / verification
|
||||
- Automated: `pnpm test -- src/web/auto-reply.test.ts --runInBand` (covers mention gating, history injection, sender suffix).
|
||||
- Manual smoke:
|
||||
- Send an `@clawd` ping in the group and confirm a reply that references the sender name.
|
||||
- Send a second ping and verify the history block is included then cleared on the next turn.
|
||||
|
||||
@@ -1,157 +1,114 @@
|
||||
---
|
||||
summary: "Plan for models CLI: scan, list, aliases, fallbacks, status"
|
||||
summary: "Models CLI: list, set, aliases, fallbacks, scan, status"
|
||||
read_when:
|
||||
- Adding or modifying models CLI (models list/set/scan/aliases/fallbacks)
|
||||
- Changing model fallback behavior or selection UX
|
||||
- Updating model scan probes (tools/images)
|
||||
---
|
||||
# Models CLI plan
|
||||
# Models CLI
|
||||
|
||||
See [`docs/model-failover.md`](/concepts/model-failover) for how auth profiles rotate (OAuth vs API keys), cooldowns, and how that interacts with model fallbacks.
|
||||
See [/concepts/model-failover](/concepts/model-failover) for auth profile
|
||||
rotation, cooldowns, and how that interacts with fallbacks.
|
||||
|
||||
Goal: give clear model visibility + control (configured vs available), plus scan tooling
|
||||
that prefers tool-call + image-capable models and maintains ordered fallbacks.
|
||||
|
||||
## How Clawdbot models work (quick explainer)
|
||||
## How model selection works
|
||||
|
||||
Clawdbot selects models in this order:
|
||||
1) The configured **primary** model (`agent.model.primary`).
|
||||
2) If it fails, fallbacks in `agent.model.fallbacks` (in order).
|
||||
3) Auth failover happens **inside** the provider first (see [/concepts/model-failover](/concepts/model-failover)).
|
||||
|
||||
Key pieces:
|
||||
- `provider/model` is the canonical model id (e.g. `anthropic/claude-opus-4-5`).
|
||||
- `agent.models` is the **allowlist/catalog** of models Clawdbot can use, with optional aliases and provider params.
|
||||
- `agent.imageModel` is only used when the primary model **can’t** accept images.
|
||||
- `models.providers` lets you add custom providers + models (written to `models.json`).
|
||||
- `/model <id>` switches the active model for the current session; `/model list` shows what’s allowed.
|
||||
1) **Primary** model (`agent.model.primary` or `agent.model`).
|
||||
2) **Fallbacks** in `agent.model.fallbacks` (in order).
|
||||
3) **Provider auth failover** happens inside a provider before moving to the
|
||||
next model.
|
||||
|
||||
Related:
|
||||
- Context limits are model-specific; long sessions may trigger compaction. See [/concepts/compaction](/concepts/compaction).
|
||||
- `agent.models` is the allowlist/catalog of models Clawdbot can use (plus aliases).
|
||||
- `agent.imageModel` is used **only when** the primary model can’t accept images.
|
||||
|
||||
## Model recommendations
|
||||
## Config keys (overview)
|
||||
|
||||
- [Claude Opus 4.5](https://www.anthropic.com/claude/opus): default primary for assistant + general work. It’s pricey and cap-prone, so consider the [Claude Max $200 subscription](https://www.anthropic.com/pricing/) if you live here.
|
||||
- [Claude Sonnet 4.5](https://www.anthropic.com/claude/sonnet): default fallback when Opus caps out. Similar behavior with fewer limit headaches.
|
||||
- [GPT-5.2-Codex](https://developers.openai.com/codex/models): recommended for coding and sub-agents. Prefer the [Codex CLI](https://developers.openai.com/codex/cli) if you want the strongest feel.
|
||||
- `agent.model.primary` and `agent.model.fallbacks`
|
||||
- `agent.imageModel.primary` and `agent.imageModel.fallbacks`
|
||||
- `agent.models` (allowlist + aliases + provider params)
|
||||
- `models.providers` (custom providers written into `models.json`)
|
||||
|
||||
Suggested stacks:
|
||||
- Assistant-first: Opus 4.5 primary → Sonnet 4.5 fallback.
|
||||
- Agentic coding: Opus 4.5 primary → GPT-5.2-Codex for sub-agents → Sonnet 4.5 fallback.
|
||||
Model refs are normalized to lowercase. Provider aliases like `z.ai/*` normalize
|
||||
to `zai/*`.
|
||||
|
||||
## Model discussions (community notes)
|
||||
## CLI commands
|
||||
|
||||
Anecdotal notes from the Discord thread on January 4–5, 2026. Treat as “reported by users,” not a benchmark.
|
||||
```bash
|
||||
clawdbot models list
|
||||
clawdbot models status
|
||||
clawdbot models set <provider/model>
|
||||
clawdbot models set-image <provider/model>
|
||||
|
||||
**Reported working well**
|
||||
- [Claude Opus 4.5](https://www.anthropic.com/claude/opus): best overall quality in Clawdbot, especially for “assistant” work. Tradeoff is cost and hitting usage limits quickly.
|
||||
- [Claude Sonnet 4.5](https://www.anthropic.com/claude/sonnet): common fallback when Opus caps out. Similar behavior with fewer limit headaches.
|
||||
- [Gemini 3 Pro](https://deepmind.google/en/models/gemini/pro/): some users felt it maps well to Clawdbot’s structure. Vibe was “fits the framework” more than “best at everything.”
|
||||
- [GLM](https://www.zhipuai.cn/en/): used successfully as a worker model under orchestration. Seen as strong for delegated/secondary tasks, not the primary brain.
|
||||
- [MiniMax M2.1](https://platform.minimax.io/docs/guides/models-intro): “good enough” for grunt work or a cheap fallback. Community nickname was “Temu-Sonnet,” i.e. usable but not Sonnet-level polish.
|
||||
clawdbot models aliases list
|
||||
clawdbot models aliases add <alias> <provider/model>
|
||||
clawdbot models aliases remove <alias>
|
||||
|
||||
**Mixed / unclear**
|
||||
- [Antigravity](https://blog.google/technology/ai/google-ai-updates-november-2025/) (Claude Opus access): some reported extra Opus quota. Pricing/limits were unclear, so the value is hard to predict.
|
||||
clawdbot models fallbacks list
|
||||
clawdbot models fallbacks add <provider/model>
|
||||
clawdbot models fallbacks remove <provider/model>
|
||||
clawdbot models fallbacks clear
|
||||
|
||||
**Reported weak in Clawdbot**
|
||||
- [GPT-5.2-Codex](https://developers.openai.com/codex/models) inside Clawdbot: reported as rough for conversation/assistant tasks when embedded. Same notes said Codex felt stronger via the [Codex CLI](https://developers.openai.com/codex/cli) than embedded use.
|
||||
- [Grok](https://docs.x.ai/docs/models/grok-4): people tried it and then abandoned it. No strong upside showed up in the notes.
|
||||
clawdbot models image-fallbacks list
|
||||
clawdbot models image-fallbacks add <provider/model>
|
||||
clawdbot models image-fallbacks remove <provider/model>
|
||||
clawdbot models image-fallbacks clear
|
||||
```
|
||||
|
||||
**Theme**
|
||||
- Token burn feels higher than expected in long sessions; people suspect context buildup + tool outputs. Pruning/compaction helps. Check session logs before blaming providers. See [/concepts/session](/concepts/session) and [/concepts/model-failover](/concepts/model-failover).
|
||||
`clawdbot models` (no subcommand) is a shortcut for `models status`.
|
||||
|
||||
Want a tailored stack? Share whether you’re using Clawdbot or Clawdis and your main workload (agentic coding vs “assistant” work), and we can suggest a primary + fallback set based on these reports.
|
||||
### `models list`
|
||||
|
||||
## Models CLI
|
||||
Shows configured models by default. Useful flags:
|
||||
|
||||
See [/cli](/cli) for the full command tree and CLI flags.
|
||||
- `--all`: full catalog
|
||||
- `--local`: local providers only
|
||||
- `--provider <name>`: filter by provider
|
||||
- `--plain`: one model per line
|
||||
- `--json`: machine‑readable output
|
||||
|
||||
### CLI output (list + status)
|
||||
### `models status`
|
||||
|
||||
`clawdbot models list` (default) prints a table with these columns:
|
||||
- `Model`: `provider/model` key (truncated in TTY).
|
||||
- `Input`: `text` or `text+image`.
|
||||
- `Ctx`: context window in K tokens (from the model registry).
|
||||
- `Local`: `yes/no` when the provider base URL is local.
|
||||
- `Auth`: `yes/no` when the provider has usable auth.
|
||||
- `Tags`: origin + role hints.
|
||||
Shows the resolved primary model, fallbacks, image model, and an auth overview
|
||||
of configured providers. `--plain` prints only the resolved primary model.
|
||||
|
||||
Common tags:
|
||||
- `default` — resolved default model.
|
||||
- `fallback#N` — `agent.model.fallbacks` order.
|
||||
- `image` — `agent.imageModel.primary`.
|
||||
- `img-fallback#N` — `agent.imageModel.fallbacks` order.
|
||||
- `configured` — present in `agent.models`.
|
||||
- `alias:<name>` — alias from `agent.models.*.alias`.
|
||||
- `missing` — referenced in config but not found in the registry.
|
||||
## Scanning (OpenRouter free models)
|
||||
|
||||
Output formats:
|
||||
- `--plain`: prints only `provider/model` keys (one per line).
|
||||
- `--json`: `{ count, models: [{ key, name, input, contextWindow, local, available, tags, missing }] }`.
|
||||
`clawdbot models scan` inspects OpenRouter’s **free model catalog** and can
|
||||
optionally probe models for tool and image support.
|
||||
|
||||
`clawdbot models status` prints the resolved defaults, fallbacks, image model, aliases,
|
||||
and an **Auth overview** section showing which providers have profiles/env/models.json keys.
|
||||
`--plain` prints the resolved default model only; `--json` returns a structured object for tooling.
|
||||
Key flags:
|
||||
|
||||
## Config changes
|
||||
- `--no-probe`: skip live probes (metadata only)
|
||||
- `--min-params <b>`: minimum parameter size (billions)
|
||||
- `--max-age-days <days>`: skip older models
|
||||
- `--provider <name>`: provider prefix filter
|
||||
- `--max-candidates <n>`: fallback list size
|
||||
- `--set-default`: set `agent.model.primary` to the first selection
|
||||
- `--set-image`: set `agent.imageModel.primary` to the first image selection
|
||||
|
||||
- `agent.models` (configured model catalog + aliases).
|
||||
- `agent.models.*.params` (provider-specific API params passed through to requests).
|
||||
- `agent.model.primary` + `agent.model.fallbacks`.
|
||||
- `agent.imageModel.primary` + `agent.imageModel.fallbacks` (optional).
|
||||
- `auth.profiles` + `auth.order` for per-provider auth failover.
|
||||
Probing requires an OpenRouter API key (from auth profiles or
|
||||
`OPENROUTER_API_KEY`). Without a key, use `--no-probe` to list candidates only.
|
||||
|
||||
## Scan behavior (models scan)
|
||||
Scan results are ranked by:
|
||||
1) Image support
|
||||
2) Tool latency
|
||||
3) Context size
|
||||
4) Parameter count
|
||||
|
||||
<<<<<<< HEAD
|
||||
Input
|
||||
- OpenRouter `/models` list (filter `:free`)
|
||||
- Requires OpenRouter API key from auth profiles or `OPENROUTER_API_KEY` (see [/environment](/environment))
|
||||
- Optional filters: `--max-age-days`, `--min-params`, `--provider`, `--max-candidates`
|
||||
- Probe controls: `--timeout`, `--concurrency`
|
||||
|
||||
Probes (direct pi-ai complete)
|
||||
- Tool-call probe (required):
|
||||
- Provide a dummy tool, verify tool call emitted.
|
||||
- Image probe (preferred):
|
||||
- Prompt includes 1x1 PNG; success if no "unsupported image" error.
|
||||
When run in a TTY, you can select fallbacks interactively. In non‑interactive
|
||||
mode, pass `--yes` to accept defaults.
|
||||
|
||||
Scoring/selection
|
||||
- Prefer models passing tool + image for text/tool fallbacks.
|
||||
- Prefer image-only models for image tool fallback (even if tool probe fails).
|
||||
- Rank by: image ok, then lower tool latency, then larger context, then params.
|
||||
## Models registry (`models.json`)
|
||||
|
||||
Interactive selection (TTY)
|
||||
- Multiselect list with per-model stats:
|
||||
- model id, tool ok, image ok, median latency, context, inferred params.
|
||||
- Pre-select top N (default 6).
|
||||
- Non-TTY: auto-select; require `--yes`/`--no-input` to apply.
|
||||
|
||||
Output
|
||||
- Writes `agent.model.fallbacks` ordered.
|
||||
- Writes `agent.imageModel.fallbacks` ordered (image-capable models).
|
||||
- Ensures `agent.models` entries exist for selected models.
|
||||
- Optional `--set-default` to set `agent.model.primary`.
|
||||
- Optional `--set-image` to set `agent.imageModel.primary`.
|
||||
|
||||
## Runtime fallback
|
||||
|
||||
- On model failure: try `agent.model.fallbacks` in order.
|
||||
- Per-provider auth failover uses `auth.order` (or stored profile order) **before**
|
||||
moving to the next model.
|
||||
- Image routing uses `agent.imageModel` **only when configured** and the primary
|
||||
model lacks image input.
|
||||
- Persist last successful provider/model to session entry; auth profile success is global.
|
||||
- See [`docs/model-failover.md`](/concepts/model-failover) for auth profile rotation, cooldowns, and timeout handling.
|
||||
|
||||
## Tests
|
||||
|
||||
- Unit: scan selection ordering + probe classification.
|
||||
- CLI: list/aliases/fallbacks add/remove + scan writes config.
|
||||
- Status: shows last used model + fallbacks.
|
||||
|
||||
## Docs
|
||||
|
||||
- Update [`docs/configuration.md`](/gateway/configuration) with `agent.models` + `agent.model` + `agent.imageModel`.
|
||||
- Keep this doc current when CLI surface or scan logic changes.
|
||||
- Note provider aliases like `z.ai/*` -> `zai/*` when relevant.
|
||||
- Provider ids in model refs are normalized to lowercase.
|
||||
Custom providers in `models.providers` are written into `models.json` under the
|
||||
agent directory (default `~/.clawdbot/agents/<agentId>/models.json`). This file
|
||||
is merged by default unless `models.mode` is set to `replace`.
|
||||
|
||||
@@ -97,7 +97,7 @@ At runtime:
|
||||
- if `expires` is in the future → use the stored access token
|
||||
- if expired → refresh (under a file lock) and overwrite the stored credentials
|
||||
|
||||
See implementation: `src/agents/auth-profiles.ts`.
|
||||
The refresh flow is automatic; you generally don’t need to manage tokens manually.
|
||||
|
||||
## Multiple accounts (profiles) + routing
|
||||
|
||||
|
||||
@@ -7,127 +7,93 @@ read_when:
|
||||
---
|
||||
# Presence
|
||||
|
||||
Clawdbot “presence” is a lightweight, best-effort view of:
|
||||
- The **Gateway** itself (one per host), and
|
||||
- The **clients connected to the Gateway** (mac app, WebChat, CLI, etc.).
|
||||
Clawdbot “presence” is a lightweight, best‑effort view of:
|
||||
- the **Gateway** itself, and
|
||||
- **clients connected to the Gateway** (mac app, WebChat, CLI, etc.)
|
||||
|
||||
Presence is used primarily to render the mac app’s **Instances** tab and to provide quick operator visibility.
|
||||
Presence is used primarily to render the macOS app’s **Instances** tab and to
|
||||
provide quick operator visibility.
|
||||
|
||||
## The data model
|
||||
## Presence fields (what shows up)
|
||||
|
||||
Presence entries are structured objects with (some) fields:
|
||||
- `instanceId` (optional but strongly recommended): stable client identity used for dedupe
|
||||
- `host`: a human-readable name (often the machine name)
|
||||
- `ip`: best-effort IP address (may be missing or stale)
|
||||
Presence entries are structured objects with fields like:
|
||||
|
||||
- `instanceId` (optional but strongly recommended): stable client identity
|
||||
- `host`: human‑friendly host name
|
||||
- `ip`: best‑effort IP address
|
||||
- `version`: client version string
|
||||
- `deviceFamily` (optional): hardware family like `iPad`, `iPhone`, `Mac`
|
||||
- `modelIdentifier` (optional): hardware model identifier like `iPad16,6` or `Mac16,6`
|
||||
- `mode`: e.g. `gateway`, `app`, `webchat`, `cli`
|
||||
- `lastInputSeconds` (optional): “seconds since last user input” for that client machine
|
||||
- `reason`: a short marker like `self`, `connect`, `node-connected`, `node-disconnected`, `periodic`, `instances-refresh`
|
||||
- `text`: legacy/debug summary string (kept for backwards compatibility and UI display)
|
||||
- `deviceFamily` / `modelIdentifier`: hardware hints
|
||||
- `mode`: `gateway`, `app`, `webchat`, `cli`, `node`, ...
|
||||
- `lastInputSeconds`: “seconds since last user input” (if known)
|
||||
- `reason`: `self`, `connect`, `node-connected`, `periodic`, ...
|
||||
- `ts`: last update timestamp (ms since epoch)
|
||||
|
||||
## Producers (where presence comes from)
|
||||
|
||||
Presence entries are produced by multiple sources and then **merged**.
|
||||
Presence entries are produced by multiple sources and **merged**.
|
||||
|
||||
### 1) Gateway self entry
|
||||
|
||||
The Gateway seeds a “self” entry at startup so UIs always show at least the current gateway host.
|
||||
The Gateway always seeds a “self” entry at startup so UIs show the gateway host
|
||||
even before any clients connect.
|
||||
|
||||
Implementation: [`src/infra/system-presence.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/system-presence.ts) (`initSelfPresence()`).
|
||||
### 2) WebSocket connect
|
||||
|
||||
### 2) WebSocket connect (connection-derived presence)
|
||||
Every WS client begins with a `connect` request. On successful handshake the
|
||||
Gateway upserts a presence entry for that connection.
|
||||
|
||||
Every WS client must begin with a `connect` request. On successful handshake, the Gateway upserts a presence entry for that connection.
|
||||
#### Why one‑off CLI commands don’t show up
|
||||
|
||||
This is meant to answer: “Which clients are currently connected?”
|
||||
The CLI often connects for short, one‑off commands. To avoid spamming the
|
||||
Instances list, `client.mode === "cli"` is **not** turned into a presence entry.
|
||||
|
||||
Implementation: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) (connect handling uses `connect.params.client.instanceId` when provided; otherwise falls back to `connId`).
|
||||
### 3) `system-event` beacons
|
||||
|
||||
#### Why one-off CLI commands do not show up
|
||||
Clients can send richer periodic beacons via the `system-event` method. The mac
|
||||
app uses this to report host name, IP, and `lastInputSeconds`.
|
||||
|
||||
The CLI connects to the Gateway to execute one-off commands (health/status/send/agent/etc.). These are not “nodes” and would spam the Instances list, so the Gateway does not create presence entries for clients with `client.mode === "cli"`.
|
||||
|
||||
### 3) `system-event` beacons (client-reported presence)
|
||||
|
||||
Clients can publish richer periodic beacons via the `system-event` method. The mac app uses this to report:
|
||||
- a human-friendly host name
|
||||
- its best-known IP address
|
||||
- `lastInputSeconds`
|
||||
|
||||
Implementation:
|
||||
- Gateway: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) handles method `system-event` by calling `updateSystemPresence(...)`.
|
||||
- mac app beaconing: [`apps/macos/Sources/Clawdbot/PresenceReporter.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/PresenceReporter.swift).
|
||||
|
||||
### 4) Node bridge beacons (gateway-owned presence)
|
||||
### 4) Node bridge beacons
|
||||
|
||||
When a node bridge connection authenticates, the Gateway emits a presence entry
|
||||
for that node and starts periodic refresh beacons so it does not expire.
|
||||
|
||||
- Connect/disconnect markers: `node-connected`, `node-disconnected`
|
||||
- Periodic heartbeat: every 3 minutes (`reason: periodic`)
|
||||
|
||||
Implementation: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) (node bridge handlers + timer beacons).
|
||||
for that node and refreshes it periodically so it doesn’t expire.
|
||||
|
||||
## Merge + dedupe rules (why `instanceId` matters)
|
||||
|
||||
All producers write into a single in-memory presence map.
|
||||
Presence entries are stored in a single in‑memory map:
|
||||
|
||||
Key points:
|
||||
- Entries are **keyed** by a “presence key”. If two producers use the same key, they update the same entry.
|
||||
- The best key is a stable, opaque `instanceId` that does not change across restarts.
|
||||
- Keys are treated case-insensitively.
|
||||
- Entries are keyed by a **presence key**.
|
||||
- The best key is a stable `instanceId` that survives restarts.
|
||||
- Keys are case‑insensitive.
|
||||
|
||||
Implementation: [`src/infra/system-presence.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/system-presence.ts) (`normalizePresenceKey()`).
|
||||
If a client reconnects without a stable `instanceId`, it may show up as a
|
||||
**duplicate** row.
|
||||
|
||||
### mac app identity (stable UUID)
|
||||
## TTL and bounded size
|
||||
|
||||
The mac app uses a persisted UUID as `instanceId` so:
|
||||
- restarts/reconnects do not create duplicates
|
||||
- renaming the Mac does not create a new “instance”
|
||||
- debug/release builds can share the same identity
|
||||
Presence is intentionally ephemeral:
|
||||
|
||||
Implementation: [`apps/macos/Sources/Clawdbot/InstanceIdentity.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/InstanceIdentity.swift).
|
||||
- **TTL:** entries older than 5 minutes are pruned
|
||||
- **Max entries:** 200 (oldest dropped first)
|
||||
|
||||
`displayName` (machine name) is used for UI, while `instanceId` is used for dedupe.
|
||||
|
||||
## TTL and bounded size (why stale rows disappear)
|
||||
|
||||
Presence entries are not permanent:
|
||||
- TTL: entries older than 5 minutes are pruned
|
||||
- Max: map is capped at 200 entries (LRU by `ts`)
|
||||
|
||||
Implementation: [`src/infra/system-presence.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/system-presence.ts) (`TTL_MS`, `MAX_ENTRIES`, pruning in `listSystemPresence()`).
|
||||
This keeps the list fresh and avoids unbounded memory growth.
|
||||
|
||||
## Remote/tunnel caveat (loopback IPs)
|
||||
|
||||
When a client connects over an SSH tunnel / local port forward, the Gateway may see the remote address as loopback (`127.0.0.1`).
|
||||
When a client connects over an SSH tunnel / local port forward, the Gateway may
|
||||
see the remote address as `127.0.0.1`. To avoid overwriting a good client‑reported
|
||||
IP, loopback remote addresses are ignored.
|
||||
|
||||
To avoid degrading an otherwise-correct client beacon IP, the Gateway avoids writing loopback remote addresses into presence entries.
|
||||
|
||||
Implementation: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) (`isLoopbackAddress()`).
|
||||
|
||||
## Consumers (who reads presence)
|
||||
## Consumers
|
||||
|
||||
### macOS Instances tab
|
||||
|
||||
The mac app’s Instances tab renders the result of `system-presence`.
|
||||
|
||||
Implementation:
|
||||
- View: [`apps/macos/Sources/Clawdbot/InstancesSettings.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/InstancesSettings.swift)
|
||||
- Store: [`apps/macos/Sources/Clawdbot/InstancesStore.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/InstancesStore.swift)
|
||||
|
||||
The Instances rows show a small presence indicator (Active/Idle/Stale) based on
|
||||
the last beacon age. The label is derived from the entry timestamp (`ts`).
|
||||
|
||||
The store refreshes periodically and also applies `presence` WS events.
|
||||
The macOS app renders the output of `system-presence` and applies a small status
|
||||
indicator (Active/Idle/Stale) based on the age of the last update.
|
||||
|
||||
## Debugging tips
|
||||
|
||||
- To see the raw list, call `system-presence` against the gateway.
|
||||
- To see the raw list, call `system-presence` against the Gateway.
|
||||
- If you see duplicates:
|
||||
- confirm clients send a stable `instanceId` in the handshake (`connect.params.client.instanceId`)
|
||||
- confirm beaconing uses the same `instanceId`
|
||||
- check whether the connection-derived entry is missing `instanceId` (then it will be keyed by `connId` and duplicates are expected on reconnect)
|
||||
- confirm clients send a stable `instanceId` in the handshake
|
||||
- confirm periodic beacons use the same `instanceId`
|
||||
- check whether the connection‑derived entry is missing `instanceId` (duplicates are expected)
|
||||
|
||||
@@ -3,24 +3,97 @@ summary: "Routing rules per provider (WhatsApp, Telegram, Discord, web) and shar
|
||||
read_when:
|
||||
- Changing provider routing or inbox behavior
|
||||
---
|
||||
# Providers & Routing
|
||||
# Providers & routing
|
||||
|
||||
Updated: 2026-01-06
|
||||
|
||||
Goal: deterministic replies per provider, while supporting multi-agent + multi-account routing.
|
||||
Clawdbot routes replies **back to the provider where a message came from**. The
|
||||
model does not choose a provider; routing is deterministic and controlled by the
|
||||
host configuration.
|
||||
|
||||
- **Provider**: provider label (`whatsapp`, `webchat`, `telegram`, `discord`, `signal`, `imessage`, …). Routing is fixed: replies go back to the origin provider; the model doesn’t choose.
|
||||
- **AccountId**: provider account instance (e.g. WhatsApp account `"default"` vs `"work"`). Not every provider supports multi-account yet.
|
||||
- **AgentId**: one isolated “brain” (workspace + per-agent agentDir + per-agent session store).
|
||||
- **Reply context:** inbound replies include `ReplyToId`, `ReplyToBody`, and `ReplyToSender`, and the quoted context is appended to `Body` as a `[Replying to ...]` block.
|
||||
- **Canonical direct session (per agent):** direct chats collapse to `agent:<agentId>:<mainKey>` (default `main`). Groups/channels stay isolated per agent:
|
||||
- group: `agent:<agentId>:<provider>:group:<id>`
|
||||
- channel/room: `agent:<agentId>:<provider>:channel:<id>`
|
||||
- Telegram forum topics: `agent:<agentId>:telegram:group:<chatId>:topic:<threadId>`
|
||||
- **Session store:** per-agent store lives under `~/.clawdbot/agents/<agentId>/sessions/sessions.json` (override via `session.store` with `{agentId}` templating). JSONL transcripts live next to it.
|
||||
- **WebChat:** attaches to the selected agent’s main session (so desktop reflects cross-provider history for that agent).
|
||||
- **Implementation hints:**
|
||||
- Set `Provider` + `AccountId` in each ingress.
|
||||
- Route inbound to an agent via `routing.bindings` (match on `provider`, `accountId`, plus optional peer/guild/team).
|
||||
- Keep routing deterministic: originate → same provider. Use the gateway WebSocket for sends; avoid side channels.
|
||||
- Do not let the agent emit “send to X” decisions; keep that policy in the host code.
|
||||
## Key terms
|
||||
|
||||
- **Provider**: `whatsapp`, `telegram`, `discord`, `slack`, `signal`, `imessage`, `webchat`.
|
||||
- **AccountId**: per‑provider account instance (when supported).
|
||||
- **AgentId**: an isolated workspace + session store (“brain”).
|
||||
- **SessionKey**: the internal bucket key used to store context and control concurrency.
|
||||
|
||||
## Session key shapes (examples)
|
||||
|
||||
Direct messages collapse to the agent’s **main** session:
|
||||
|
||||
- `agent:<agentId>:<mainKey>` (default: `agent:main:main`)
|
||||
|
||||
Groups and channels remain isolated per provider:
|
||||
|
||||
- Groups: `agent:<agentId>:<provider>:group:<id>`
|
||||
- Channels/rooms: `agent:<agentId>:<provider>:channel:<id>`
|
||||
|
||||
Threads:
|
||||
|
||||
- Slack/Discord threads append `:thread:<threadId>` to the base key.
|
||||
- Telegram forum topics embed `:topic:<topicId>` in the group key.
|
||||
|
||||
Examples:
|
||||
|
||||
- `agent:main:telegram:group:-1001234567890:topic:42`
|
||||
- `agent:main:discord:channel:123456:thread:987654`
|
||||
|
||||
## Routing rules (how an agent is chosen)
|
||||
|
||||
Routing picks **one agent** for each inbound message:
|
||||
|
||||
1. **Exact peer match** (`routing.bindings` with `peer.kind` + `peer.id`).
|
||||
2. **Guild match** (Discord) via `guildId`.
|
||||
3. **Team match** (Slack) via `teamId`.
|
||||
4. **Account match** (`accountId` on the provider).
|
||||
5. **Provider match** (any account on that provider).
|
||||
6. **Default agent** (`routing.defaultAgentId`, fallback to `main`).
|
||||
|
||||
The matched agent determines which workspace and session store are used.
|
||||
|
||||
## Config overview
|
||||
|
||||
- `routing.defaultAgentId`: default agent when no binding matches.
|
||||
- `routing.agents`: named agent definitions (workspace, model, etc.).
|
||||
- `routing.bindings`: map inbound providers/accounts/peers to agents.
|
||||
|
||||
Example:
|
||||
|
||||
```json5
|
||||
{
|
||||
routing: {
|
||||
defaultAgentId: "main",
|
||||
agents: {
|
||||
support: { name: "Support", workspace: "~/clawd-support" }
|
||||
},
|
||||
bindings: [
|
||||
{ match: { provider: "slack", teamId: "T123" }, agentId: "support" },
|
||||
{ match: { provider: "telegram", peer: { kind: "group", id: "-100123" } }, agentId: "support" }
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Session storage
|
||||
|
||||
Session stores live under the state directory (default `~/.clawdbot`):
|
||||
|
||||
- `~/.clawdbot/agents/<agentId>/sessions/sessions.json`
|
||||
- JSONL transcripts live alongside the store
|
||||
|
||||
You can override the store path via `session.store` and `{agentId}` templating.
|
||||
|
||||
## WebChat behavior
|
||||
|
||||
WebChat attaches to the **selected agent** and defaults to the agent’s main
|
||||
session. Because of this, WebChat lets you see cross‑provider context for that
|
||||
agent in one place.
|
||||
|
||||
## Reply context
|
||||
|
||||
Inbound replies include:
|
||||
- `ReplyToId`, `ReplyToBody`, and `ReplyToSender` when available.
|
||||
- Quoted context is appended to `Body` as a `[Replying to ...]` block.
|
||||
|
||||
This is consistent across providers.
|
||||
|
||||
@@ -12,7 +12,7 @@ We now serialize command-based auto-replies (WhatsApp Web listener) through a ti
|
||||
- Serializing avoids competing for terminal/stdin, keeps logs readable, and reduces the chance of rate limits from upstream tools.
|
||||
|
||||
## How it works
|
||||
- [`src/process/command-queue.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/process/command-queue.ts) holds a lane-aware FIFO queue and drains each lane synchronously.
|
||||
- A lane-aware FIFO queue drains each lane synchronously.
|
||||
- `runEmbeddedPiAgent` enqueues by **session key** (lane `session:<key>`) to guarantee only one active run per session.
|
||||
- Each session run is then queued into a **global lane** (`main` by default) so overall parallelism is capped by `agent.maxConcurrent`.
|
||||
- When verbose logging is enabled, queued commands emit a short notice if they waited more than ~2s before starting.
|
||||
@@ -74,4 +74,4 @@ Defaults: `debounceMs: 1000`, `cap: 20`, `drop: summarize`.
|
||||
|
||||
## Troubleshooting
|
||||
- If commands seem stuck, enable verbose logs and look for “queued for …ms” lines to confirm the queue is draining.
|
||||
- `enqueueCommand` exposes a lightweight `getQueueSize()` helper if you need to surface queue depth in future diagnostics.
|
||||
- If you need queue depth, enable verbose logs and watch for queue timing lines.
|
||||
|
||||
@@ -21,7 +21,7 @@ Goal: small, hard-to-misuse tool set so agents can list sessions, fetch history,
|
||||
- Hooks use `hook:<uuid>` unless explicitly set.
|
||||
- Node bridge uses `node-<nodeId>` unless explicitly set.
|
||||
|
||||
`global` and `unknown` are internal-only and never listed. If `session.scope = "global"`, we alias it to `main` for all tools so callers never see `global`.
|
||||
`global` and `unknown` are reserved values and are never listed. If `session.scope = "global"`, we alias it to `main` for all tools so callers never see `global`.
|
||||
|
||||
## sessions_list
|
||||
List sessions as an array of rows.
|
||||
|
||||
@@ -8,7 +8,7 @@ read_when:
|
||||
|
||||
ClaudeBot builds a custom system prompt for every agent run. The prompt is **Clawdbot-owned** and does not use the p-coding-agent default prompt.
|
||||
|
||||
The prompt is assembled in `src/agents/system-prompt.ts` and injected by `src/agents/pi-embedded-runner.ts`.
|
||||
The prompt is assembled by Clawdbot and injected into each agent run.
|
||||
|
||||
## Structure
|
||||
|
||||
@@ -56,9 +56,3 @@ Skills are **not** auto-injected. Instead, the prompt instructs the model to use
|
||||
```
|
||||
|
||||
This keeps the base prompt small while still enabling targeted skill usage.
|
||||
|
||||
## Code references
|
||||
|
||||
- Prompt text: `src/agents/system-prompt.ts`
|
||||
- Prompt assembly + injection: `src/agents/pi-embedded-runner.ts`
|
||||
- Bootstrap trimming: `src/agents/pi-embedded-helpers.ts`
|
||||
|
||||
@@ -3,40 +3,34 @@ summary: "TypeBox schemas as the single source of truth for the gateway protocol
|
||||
read_when:
|
||||
- Updating protocol schemas or codegen
|
||||
---
|
||||
# TypeBox as Protocol Source of Truth
|
||||
# TypeBox as protocol source of truth
|
||||
|
||||
Last updated: 2025-12-09
|
||||
Last updated: 2026-01-08
|
||||
|
||||
We use TypeBox schemas in [`src/gateway/protocol/schema.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/protocol/schema.ts) as the single source of truth for the Gateway control plane (connect/req/res/event frames and payloads). All derived artifacts should be generated from these schemas, not edited by hand.
|
||||
TypeBox schemas define the Gateway control plane (connect/req/res/event frames and
|
||||
payloads). All generated artifacts must come from these schemas.
|
||||
|
||||
## Current pipeline
|
||||
|
||||
- **TypeBox → JSON Schema**: `pnpm protocol:gen` writes [`dist/protocol.schema.json`](https://github.com/clawdbot/clawdbot/blob/main/dist/protocol.schema.json) (draft-07) and runs AJV in the server tests.
|
||||
- **TypeBox → Swift**: `pnpm protocol:gen:swift` generates [`apps/macos/Sources/ClawdbotProtocol/GatewayModels.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/ClawdbotProtocol/GatewayModels.swift).
|
||||
- `pnpm protocol:gen`
|
||||
- writes the JSON Schema output (draft‑07)
|
||||
- `pnpm protocol:gen:swift`
|
||||
- generates Swift gateway models
|
||||
- `pnpm protocol:check`
|
||||
- runs both generators and verifies the output is committed
|
||||
|
||||
## Problem
|
||||
## Swift codegen behavior
|
||||
|
||||
- We want strong typing in Swift, including a sealed `GatewayFrame` enum with a discriminator and a forward-compatible `unknown` case.
|
||||
The Swift generator emits:
|
||||
|
||||
## Preferred plan (next step)
|
||||
- `GatewayFrame` enum with `req`, `res`, `event`, and `unknown` cases
|
||||
- Strongly typed payload structs/enums
|
||||
- `ErrorCode` values and `GATEWAY_PROTOCOL_VERSION`
|
||||
|
||||
- Add a small, custom Swift generator driven directly by the TypeBox schemas:
|
||||
- Emit a sealed `enum GatewayFrame: Codable { case req(RequestFrame), res(ResponseFrame), event(EventFrame) }`.
|
||||
- Emit strongly typed payload structs/enums (`ConnectParams`, `HelloOk`, `RequestFrame`, `ResponseFrame`, `EventFrame`, `PresenceEntry`, `Snapshot`, `StateVersion`, `ErrorShape`, `AgentEvent`, `TickEvent`, `ShutdownEvent`, `SendParams`, `AgentParams`, `ErrorCode`, `PROTOCOL_VERSION`).
|
||||
- Custom `init(from:)` / `encode(to:)` enforces the `type` discriminator and can include an `unknown` case for forward compatibility.
|
||||
- Wire a new script (e.g., `pnpm protocol:gen:swift`) into `protocol:check` so CI fails if the generated Swift is stale.
|
||||
Unknown frame types are preserved as raw payloads for forward compatibility.
|
||||
|
||||
Why this path:
|
||||
- Single source of truth stays TypeBox; no new IDL to maintain.
|
||||
- Predictable, strongly typed Swift (no optional soup).
|
||||
- Small deterministic codegen (~150–200 LOC script) we control.
|
||||
## When you change schemas
|
||||
|
||||
## Alternative (if we want off-the-shelf codegen)
|
||||
|
||||
- Wrap the existing JSON Schema into an OpenAPI 3.1 doc (auto-generated) and use **swift-openapi-generator** or **openapi-generator swift5**. More moving parts, but also yields enums with discriminator support. Keep this as a fallback if we don’t want a custom emitter.
|
||||
|
||||
## Action items
|
||||
|
||||
- Implement `protocol:gen:swift` that reads the TypeBox schemas and emits the sealed Swift enum + payload structs.
|
||||
- Update `protocol:check` to include the Swift generator output in the diff check.
|
||||
- Remove quicktype output once the custom generator is in place (or keep it for docs only).
|
||||
1) Update the TypeBox schemas.
|
||||
2) Run `pnpm protocol:check`.
|
||||
3) Commit the regenerated schema + Swift models.
|
||||
|
||||
@@ -8,11 +8,11 @@ read_when: "Changing onboarding wizard steps or config schema endpoints"
|
||||
Purpose: shared onboarding + config surfaces across CLI, macOS app, and Web UI.
|
||||
|
||||
## Components
|
||||
- Wizard engine: `src/wizard` (session + prompts + onboarding state).
|
||||
- CLI: [`src/commands/onboard-*.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/commands/onboard-*.ts) uses the wizard with the CLI prompter.
|
||||
- Gateway RPC: wizard + config schema endpoints serve UI clients.
|
||||
- macOS: SwiftUI onboarding uses the wizard step model.
|
||||
- Web UI: config form renders from JSON Schema + hints.
|
||||
- Wizard engine (shared session + prompts + onboarding state).
|
||||
- CLI onboarding uses the same wizard flow as the UI clients.
|
||||
- Gateway RPC exposes wizard + config schema endpoints.
|
||||
- macOS onboarding uses the wizard step model.
|
||||
- Web UI renders config forms from JSON Schema + UI hints.
|
||||
|
||||
## Gateway RPC
|
||||
- `wizard.start` params: `{ mode?: "local"|"remote", workspace?: string }`
|
||||
|
||||
@@ -28,44 +28,29 @@ Recent gateway logs show repeated `cron.add` failures with invalid parameters (m
|
||||
- Agent cron tool schema allows arbitrary `job` objects, enabling malformed inputs.
|
||||
- Gateway strictly validates `cron.add` with no normalization, so wrapped payloads fail.
|
||||
|
||||
## Proposed Approach
|
||||
1. **Normalize** incoming `cron.add` payloads (unwrap `data`/`job`, infer `schedule.kind` and `payload.kind`, default `wakeMode` + `sessionTarget` when safe).
|
||||
2. **Harden** the agent cron tool schema using the canonical gateway `CronAddParamsSchema` and normalize before sending to the gateway.
|
||||
3. **Align** provider enums and cron status fields across gateway schema, TS types, CLI descriptions, and UI form controls.
|
||||
4. **Test** normalization in gateway tests and tool behavior in agent tests.
|
||||
## What changed
|
||||
|
||||
## Multi-phase Execution Plan
|
||||
- `cron.add` and `cron.update` now normalize common wrapper shapes and infer missing `kind` fields.
|
||||
- Agent cron tool schema matches the gateway schema, which reduces invalid payloads.
|
||||
- Provider enums are aligned across gateway, CLI, UI, and macOS picker.
|
||||
- Control UI uses the gateway’s `jobs` count field for status.
|
||||
|
||||
### Phase 1 — Schema + type alignment
|
||||
- [x] Expand gateway `CronPayloadSchema` provider enum to include `signal` and `imessage`.
|
||||
- [x] Update CLI `--provider` descriptions to include `slack` (already supported by gateway).
|
||||
- [x] Update UI Cron payload/provider union types to include all supported providers.
|
||||
- [x] Fix UI CronStatus type to match gateway (`jobs` instead of `jobCount`).
|
||||
- [x] Update cron UI provider select to include Discord/Slack/Signal/iMessage.
|
||||
- [x] Update macOS CronJobEditor provider picker + enum to include Slack/Signal/iMessage.
|
||||
- [x] Document cron compatibility normalization policy in [`docs/cron-jobs.md`](/automation/cron-jobs).
|
||||
## Current behavior
|
||||
|
||||
### Phase 2 — Input normalization + tooling hardening
|
||||
- [x] Add shared cron input normalization helpers (`normalizeCronJobCreate`/`normalizeCronJobPatch`).
|
||||
- [x] Apply normalization in gateway `cron.add` (and patch normalization in `cron.update`).
|
||||
- [x] Tighten agent cron tool schema to `CronAddParamsSchema` and normalize job/patch before sending.
|
||||
- **Normalization:** wrapped `data`/`job` payloads are unwrapped; `schedule.kind` and `payload.kind` are inferred when safe.
|
||||
- **Defaults:** safe defaults are applied for `wakeMode` and `sessionTarget` when missing.
|
||||
- **Providers:** Discord/Slack/Signal/iMessage are now consistently surfaced across CLI/UI.
|
||||
|
||||
### Phase 3 — Tests
|
||||
- [x] Add gateway test covering wrapped `cron.add` payload normalization.
|
||||
- [x] Add cron tool test to assert normalization and defaulting for `cron.add`.
|
||||
- [x] Add gateway test covering `cron.update` normalization.
|
||||
- [x] Add UI + Swift conformance test for cron channels + status fields.
|
||||
See [`docs/cron-jobs.md`](/automation/cron-jobs) for the normalized shape and examples.
|
||||
|
||||
### Phase 4 — Verification
|
||||
- [x] Run tests (full suite executed via `pnpm test -- cron-tool`).
|
||||
## Verification
|
||||
|
||||
## Rollout/Monitoring
|
||||
- Watch gateway logs for reduced `cron.add` INVALID_REQUEST errors.
|
||||
- Confirm Control UI cron status shows job count after refresh.
|
||||
- If errors persist, extend normalization for additional common shapes (e.g., `schedule.at`, `payload.message` without `kind`).
|
||||
|
||||
## Optional Follow-ups
|
||||
- Manual Control UI smoke: add cron job per provider + verify status job count.
|
||||
|
||||
- Manual Control UI smoke: add a cron job per provider + verify status job count.
|
||||
|
||||
## Open Questions
|
||||
- Should `cron.add` accept explicit `state` from clients (currently disallowed by schema)?
|
||||
|
||||
@@ -1,126 +1,38 @@
|
||||
---
|
||||
summary: "Spec: groupPolicy hardening for Telegram allowlist parity"
|
||||
summary: "Telegram allowlist hardening: prefix + whitespace normalization"
|
||||
read_when:
|
||||
- Reviewing historical Telegram allowlist normalization changes
|
||||
- Reviewing historical Telegram allowlist changes
|
||||
---
|
||||
# Engineering Execution Spec: groupPolicy Hardening (Telegram Allowlist Parity)
|
||||
# Telegram Allowlist Hardening
|
||||
|
||||
**Date**: 2026-01-05
|
||||
**Status**: Complete
|
||||
**PR**: #216 (feat/whatsapp-group-policy)
|
||||
**PR**: #216
|
||||
|
||||
---
|
||||
## Summary
|
||||
|
||||
## Executive Summary
|
||||
Telegram allowlists now accept `telegram:` and `tg:` prefixes case-insensitively, and tolerate
|
||||
accidental whitespace. This aligns inbound allowlist checks with outbound send normalization.
|
||||
|
||||
Follow-up hardening work ensures Telegram allowlists behave consistently across inbound group/DM filtering and outbound send normalization. The focus is on prefix parity (`telegram:` / `tg:`), case-insensitive matching for prefixes, and resilience to accidental whitespace in config entries. Documentation and tests were updated to reflect and lock in this behavior.
|
||||
## What changed
|
||||
|
||||
---
|
||||
- Prefixes `telegram:` and `tg:` are treated the same (case-insensitive).
|
||||
- Allowlist entries are trimmed; empty entries are ignored.
|
||||
|
||||
## Findings Analysis
|
||||
## Examples
|
||||
|
||||
### [MED] F1: Telegram Allowlist Prefix Handling Is Case-Sensitive and Excludes `tg:`
|
||||
All of these are accepted for the same ID:
|
||||
|
||||
**Location**: [`src/telegram/bot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.ts)
|
||||
- `telegram:123456`
|
||||
- `TG:123456`
|
||||
- ` tg:123456 `
|
||||
|
||||
**Problem**: Inbound allowlist normalization only stripped a lowercase `telegram:` prefix. This rejected `TG:123` / `Telegram:123` and did not accept the `tg:` shorthand even though outbound send normalization already accepts `tg:` and case-insensitive prefixes.
|
||||
## Why it matters
|
||||
|
||||
**Impact**:
|
||||
- DMs and group allowlists fail when users copy/paste prefixed IDs from logs or existing send format.
|
||||
- Behavior is inconsistent between inbound filtering and outbound send normalization.
|
||||
Copy/paste from logs or chat IDs often includes prefixes and whitespace. Normalizing avoids
|
||||
false negatives when deciding whether to respond in DMs or groups.
|
||||
|
||||
**Fix**: Normalize allowlist entries by trimming whitespace and stripping `telegram:` / `tg:` prefixes case-insensitively at pre-compute time.
|
||||
## Related docs
|
||||
|
||||
---
|
||||
|
||||
### [LOW] F2: Allowlist Entries Are Not Trimmed
|
||||
|
||||
**Location**: [`src/telegram/bot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.ts)
|
||||
|
||||
**Problem**: Allowlist entries are not trimmed; accidental whitespace causes mismatches.
|
||||
|
||||
**Fix**: Trim and drop empty entries while normalizing allowlist inputs.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: Normalize Telegram Allowlist Inputs
|
||||
|
||||
**File**: [`src/telegram/bot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.ts)
|
||||
|
||||
**Changes**:
|
||||
1. Trim allowlist entries and drop empty values.
|
||||
2. Strip `telegram:` / `tg:` prefixes case-insensitively.
|
||||
3. Simplify DM allowlist check to rely on normalized values.
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Add Coverage for Prefix + Whitespace
|
||||
|
||||
**File**: [`src/telegram/bot.test.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.test.ts)
|
||||
|
||||
**Add Tests**:
|
||||
- DM allowlist accepts `TG:` prefix with surrounding whitespace.
|
||||
- Group allowlist accepts `TG:` prefix case-insensitively.
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: Documentation Updates
|
||||
|
||||
**Files**:
|
||||
- [`docs/groups.md`](/concepts/groups)
|
||||
- [`docs/telegram.md`](/providers/telegram)
|
||||
|
||||
**Changes**:
|
||||
- Document `tg:` alias and case-insensitive prefixes for Telegram allowlists.
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Verification
|
||||
|
||||
1. Run targeted Telegram tests (`pnpm test -- src/telegram/bot.test.ts`).
|
||||
2. If time allows, run full suite (`pnpm test`).
|
||||
|
||||
---
|
||||
|
||||
## Files Modified
|
||||
|
||||
| File | Change Type | Description |
|
||||
|------|-------------|-------------|
|
||||
| [`src/telegram/bot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.ts) | Fix | Trim allowlist values; strip `telegram:` / `tg:` prefixes case-insensitively |
|
||||
| [`src/telegram/bot.test.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/telegram/bot.test.ts) | Test | Add DM + group allowlist coverage for `TG:` prefix + whitespace |
|
||||
| [`docs/groups.md`](/concepts/groups) | Docs | Mention `tg:` alias + case-insensitive prefixes |
|
||||
| [`docs/telegram.md`](/providers/telegram) | Docs | Mention `tg:` alias + case-insensitive prefixes |
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- [x] Telegram allowlist accepts `telegram:` / `tg:` prefixes case-insensitively.
|
||||
- [x] Telegram allowlist tolerates whitespace in config entries.
|
||||
- [x] DM and group allowlist tests cover prefixed cases.
|
||||
- [x] Docs updated to reflect allowlist formats.
|
||||
- [x] Targeted tests pass.
|
||||
- [x] Full test suite passes.
|
||||
|
||||
---
|
||||
|
||||
## Risk Assessment
|
||||
|
||||
| Risk | Severity | Mitigation |
|
||||
|------|----------|------------|
|
||||
| Behavior change for malformed entries | Low | Normalization is additive and trims only whitespace |
|
||||
| Test fragility | Low | Isolated unit tests; no external dependencies |
|
||||
| Doc drift | Low | Updated docs alongside code |
|
||||
|
||||
---
|
||||
|
||||
## Estimated Complexity
|
||||
|
||||
- **Phase 1**: Low (normalization helpers)
|
||||
- **Phase 2**: Low (2 new tests)
|
||||
- **Phase 3**: Low (doc edits)
|
||||
- **Phase 4**: Low (verification)
|
||||
|
||||
**Total**: ~20 minutes
|
||||
- [Group Chats](/concepts/groups)
|
||||
- [Telegram Provider](/providers/telegram)
|
||||
|
||||
@@ -1,147 +1,32 @@
|
||||
---
|
||||
summary: "Proposal: model config, auth profiles, and fallback behavior"
|
||||
summary: "Exploration: model config, auth profiles, and fallback behavior"
|
||||
read_when:
|
||||
- Designing model selection, auth profiles, or fallback behavior
|
||||
- Migrating model config schema
|
||||
- Exploring future model selection + auth profile ideas
|
||||
---
|
||||
# Model Config (Exploration)
|
||||
|
||||
# Model config proposal
|
||||
This document captures **ideas** for future model configuration. It is not a
|
||||
shipping spec. For current behavior, see:
|
||||
- [Models](/concepts/models)
|
||||
- [Model failover](/concepts/model-failover)
|
||||
- [OAuth + profiles](/concepts/oauth)
|
||||
|
||||
Goals
|
||||
- Multi OAuth + multi API key per provider
|
||||
- Model selection via `/model` with sensible fallback
|
||||
- Global (not per-session) fallback logic
|
||||
- Keep last-known-good auth profile when switching models
|
||||
- Profile override only when explicitly requested
|
||||
- Image routing override only when explicitly configured
|
||||
## Motivation
|
||||
|
||||
Non-goals (v1)
|
||||
- Auto-discovery of provider capabilities beyond catalog input tags
|
||||
- Per-model auth profile order (see open questions)
|
||||
Operators want:
|
||||
- Multiple auth profiles per provider (personal vs work).
|
||||
- Simple `/model` selection with predictable fallbacks.
|
||||
- Clear separation between text models and image-capable models.
|
||||
|
||||
## Proposed config shape
|
||||
## Possible direction (high level)
|
||||
|
||||
```json
|
||||
{
|
||||
"auth": {
|
||||
"profiles": {
|
||||
"anthropic:default": {
|
||||
"provider": "anthropic",
|
||||
"mode": "oauth"
|
||||
},
|
||||
"anthropic:work": {
|
||||
"provider": "anthropic",
|
||||
"mode": "api_key"
|
||||
},
|
||||
"openai:default": {
|
||||
"provider": "openai",
|
||||
"mode": "oauth"
|
||||
}
|
||||
},
|
||||
"order": {
|
||||
"anthropic": ["anthropic:default", "anthropic:work"],
|
||||
"openai": ["openai:default"]
|
||||
}
|
||||
},
|
||||
"agent": {
|
||||
"models": {
|
||||
"anthropic/claude-opus-4-5": {
|
||||
"alias": "Opus"
|
||||
},
|
||||
"openai/gpt-5.2": {
|
||||
"alias": "gpt52"
|
||||
}
|
||||
},
|
||||
"model": {
|
||||
"primary": "anthropic/claude-opus-4-5",
|
||||
"fallbacks": ["openai/gpt-5.2"]
|
||||
},
|
||||
"imageModel": {
|
||||
"primary": "openai/gpt-5.2",
|
||||
"fallbacks": ["anthropic/claude-opus-4-5"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
- Keep model selection simple: `provider/model` with optional aliases.
|
||||
- Let providers have multiple auth profiles, with an explicit order.
|
||||
- Use a global fallback list so all sessions fail over consistently.
|
||||
- Only override image routing when explicitly configured.
|
||||
|
||||
Notes
|
||||
- Canonical model keys are full `provider/model`.
|
||||
- `alias` optional; used by `/model` resolution.
|
||||
- `auth.profiles` is keyed. Default CLI login creates `provider:default`.
|
||||
- `auth.order[provider]` controls rotation order for that provider.
|
||||
## Open questions
|
||||
|
||||
## CLI / UX
|
||||
|
||||
Login
|
||||
- `clawdbot login anthropic` → create/replace `anthropic:default`.
|
||||
- `clawdbot login anthropic --profile work` → create/replace `anthropic:work`.
|
||||
|
||||
Model selection
|
||||
- `/model Opus` → resolve alias to full id.
|
||||
- `/model anthropic/claude-opus-4-5` → explicit.
|
||||
- Optional: `/model Opus@anthropic:work` (explicit profile override for session only).
|
||||
|
||||
Model listing
|
||||
- `/model` list shows:
|
||||
- model id
|
||||
- alias
|
||||
- provider
|
||||
- auth order (from `auth.order`)
|
||||
- auth source for the current provider (auth-profiles.json/env/shell env/models.json)
|
||||
|
||||
## Fallback behavior (global)
|
||||
|
||||
Fallback list
|
||||
- Use `agent.model.fallbacks` globally.
|
||||
- No per-session fallback list; last-known-good is per-session but uses global ordering.
|
||||
|
||||
Auth profile rotation
|
||||
- If provider auth error (401/403/invalid refresh):
|
||||
- advance to next profile in `auth.order[provider]`.
|
||||
- if all fail, fall back to next model.
|
||||
|
||||
Rate limiting
|
||||
- If rate limit / quota error:
|
||||
- rotate auth profile first (same provider)
|
||||
- if still failing, fall back to next model.
|
||||
|
||||
Model not found / capability mismatch
|
||||
- immediate model fallback.
|
||||
|
||||
## Image routing
|
||||
|
||||
Rule
|
||||
- Only use `agent.imageModel` when explicitly configured.
|
||||
- If `agent.imageModel` is configured and the current text model lacks image input, use it.
|
||||
|
||||
Support detection
|
||||
- From model catalog `input` tags when available (e.g. `image` in models.json).
|
||||
- If unknown: treat as text-only and use `agent.imageModel`.
|
||||
|
||||
## Migration (doctor + gateway auto-run)
|
||||
|
||||
Inputs
|
||||
- Legacy keys (pre-migration):
|
||||
- `agent.model` (string)
|
||||
- `agent.modelFallbacks` (string[])
|
||||
- `agent.imageModel` (string)
|
||||
- `agent.imageModelFallbacks` (string[])
|
||||
- `agent.allowedModels` (string[])
|
||||
- `agent.modelAliases` (record)
|
||||
|
||||
Outputs
|
||||
- `agent.models` map with keys for all referenced models
|
||||
- `agent.model.primary/fallbacks`
|
||||
- `agent.imageModel.primary/fallbacks`
|
||||
- Auth profile store seeded from current auth-profiles.json/auth.json + oauth.json + env (as `provider:default`)
|
||||
- `auth.order` seeded with `["provider:default"]` when config is updated
|
||||
|
||||
Auto-run
|
||||
- Gateway start detects legacy keys and runs doctor migration.
|
||||
|
||||
## Decisions
|
||||
|
||||
- Auth order is per-provider (`auth.order`).
|
||||
- Doctor migration is required; gateway will auto-run on startup when legacy keys detected.
|
||||
- `/model Opus@profile` is explicit session override only.
|
||||
- Image routing override only when `agent.imageModel` is explicitly configured.
|
||||
- Should profile rotation be per-provider or per-model?
|
||||
- How should the UI surface profile selection for a session?
|
||||
- What is the safest migration path from legacy config keys?
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
---
|
||||
summary: "Proposal + research notes: offline memory system for Clawd workspaces (Markdown source-of-truth + derived index)"
|
||||
summary: "Research notes: offline memory system for Clawd workspaces (Markdown source-of-truth + derived index)"
|
||||
read_when:
|
||||
- Designing workspace memory (~/clawd) beyond daily Markdown logs
|
||||
- Deciding: standalone CLI vs deep Clawdbot integration
|
||||
- Adding offline recall + reflection (retain/recall/reflect)
|
||||
---
|
||||
|
||||
# Workspace Memory v2 (offline): proposal + research
|
||||
# Workspace Memory v2 (offline): research notes
|
||||
|
||||
Target: Clawd-style workspace (`agent.workspace`, default `~/clawd`) where “memory” is stored as one Markdown file per day (`memory/YYYY-MM-DD.md`) plus a small set of stable files (e.g. `memory.md`, `SOUL.md`).
|
||||
|
||||
@@ -171,8 +171,7 @@ Recommendation: **deep integration in Clawdbot**, but keep a separable core libr
|
||||
- reuse from other contexts (local scripts, future desktop app, etc.)
|
||||
|
||||
Shape:
|
||||
- `src/memory/*` (library-ish core; pure functions + sqlite adapter)
|
||||
- [`src/commands/memory/*.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/commands/memory/*.ts) (CLI glue)
|
||||
The memory tooling is intended to be a small CLI + library layer, but this is exploratory only.
|
||||
|
||||
## “S-Collide” / SuCo: when to use it (research)
|
||||
|
||||
@@ -196,29 +195,13 @@ Open question:
|
||||
- what’s the **best** offline embedding model for “personal assistant memory” on your machines (MacBook + Castle)?
|
||||
- if you already have Ollama: embed with a local model; otherwise ship a small embedding model in the toolchain.
|
||||
|
||||
## Implementation plan (phased, shippable)
|
||||
## Smallest useful pilot
|
||||
|
||||
### Phase 0: workspace conventions (no code)
|
||||
- add `bank/` files + entity pages
|
||||
- add `## Retain` convention to daily logs
|
||||
If you want a minimal, still-useful version:
|
||||
|
||||
### Phase 1: `clawdbot memory index|recall` (FTS-only)
|
||||
- parse Markdown (`memory/*.md`, `bank/*.md`) into chunks
|
||||
- write to SQLite: `facts`, `entities`, `fact_entities`, `opinions`
|
||||
- FTS5 table over `facts.content`
|
||||
- `recall` returns citations (path + line) + trimmed content budget
|
||||
|
||||
### Phase 2: entity summaries + opinion tracking
|
||||
- `reflect` updates `bank/entities/*.md`
|
||||
- opinion confidence updates with evidence pointers (no embeddings required yet)
|
||||
|
||||
### Phase 3: semantic recall (offline embeddings)
|
||||
- compute embeddings during indexing (incremental)
|
||||
- retrieval = `hybrid(FTS, vector)` with simple fusion
|
||||
|
||||
### Phase 4: “graph-ish” traversal (still simple)
|
||||
- entity links enable multi-hop: “related to Peter via warelay”
|
||||
- optional: “topic” nodes, lightweight edges (not a full KG)
|
||||
- Add `bank/` entity pages and a `## Retain` section in daily logs.
|
||||
- Use SQLite FTS for recall with citations (path + line numbers).
|
||||
- Add embeddings only if recall quality or scale demands it.
|
||||
|
||||
## References
|
||||
|
||||
|
||||
@@ -6,24 +6,29 @@ read_when:
|
||||
---
|
||||
# Bonjour / mDNS discovery
|
||||
|
||||
Clawdbot uses Bonjour (mDNS / DNS-SD) as a **LAN-only convenience** to discover a running Gateway bridge transport. It is best-effort and does **not** replace SSH or Tailnet-based connectivity.
|
||||
Clawdbot uses Bonjour (mDNS / DNS‑SD) as a **LAN‑only convenience** to discover
|
||||
an active Gateway bridge. It is best‑effort and does **not** replace SSH or
|
||||
Tailnet-based connectivity.
|
||||
|
||||
## Wide-Area Bonjour (Unicast DNS-SD) over Tailscale
|
||||
## Wide‑area Bonjour (Unicast DNS‑SD) over Tailscale
|
||||
|
||||
If you want iOS node auto-discovery while the Gateway is on another network (e.g. Vienna ⇄ London), you can keep the `NWBrowser` UX but switch discovery from multicast mDNS (`local.`) to **unicast DNS-SD** (“Wide-Area Bonjour”) over Tailscale.
|
||||
If the node and gateway are on different networks, multicast mDNS won’t cross the
|
||||
boundary. You can keep the same discovery UX by switching to **unicast DNS‑SD**
|
||||
("Wide‑Area Bonjour") over Tailscale.
|
||||
|
||||
High level:
|
||||
High‑level steps:
|
||||
|
||||
1) Run a DNS server on the gateway host (reachable via tailnet IP).
|
||||
2) Publish DNS-SD records for `_clawdbot-bridge._tcp` in a dedicated zone (example: `clawdbot.internal.`).
|
||||
3) Configure Tailscale **split DNS** so `clawdbot.internal` resolves via that DNS server for clients (including iOS).
|
||||
1) Run a DNS server on the gateway host (reachable over Tailnet).
|
||||
2) Publish DNS‑SD records for `_clawdbot-bridge._tcp` under a dedicated zone
|
||||
(example: `clawdbot.internal.`).
|
||||
3) Configure Tailscale **split DNS** so `clawdbot.internal` resolves via that
|
||||
DNS server for clients (including iOS).
|
||||
|
||||
Clawdbot standardizes on the discovery domain `clawdbot.internal.` for this mode. iOS/Android nodes browse both `local.` and `clawdbot.internal.` automatically (no per-device knob).
|
||||
Clawdbot standardizes on `clawdbot.internal.` for this mode. iOS/Android nodes
|
||||
browse both `local.` and `clawdbot.internal.` automatically.
|
||||
|
||||
### Gateway config (recommended)
|
||||
|
||||
On the gateway host (the machine running the Gateway bridge), add to `~/.clawdbot/clawdbot.json` (JSON5):
|
||||
|
||||
```json5
|
||||
{
|
||||
bridge: { bind: "tailnet" }, // tailnet-only (recommended)
|
||||
@@ -31,21 +36,17 @@ On the gateway host (the machine running the Gateway bridge), add to `~/.clawdbo
|
||||
}
|
||||
```
|
||||
|
||||
### One-time DNS server setup (gateway host)
|
||||
|
||||
On the gateway host (macOS), run:
|
||||
### One‑time DNS server setup (gateway host)
|
||||
|
||||
```bash
|
||||
clawdbot dns setup --apply
|
||||
```
|
||||
|
||||
This installs CoreDNS and configures it to:
|
||||
- listen on port 53 **only** on the gateway’s Tailscale interface IPs
|
||||
- serve the zone `clawdbot.internal.` from the gateway-owned zone file `~/.clawdbot/dns/clawdbot.internal.db`
|
||||
- listen on port 53 only on the gateway’s Tailscale interfaces
|
||||
- serve `clawdbot.internal.` from `~/.clawdbot/dns/clawdbot.internal.db`
|
||||
|
||||
The Gateway writes/updates that zone file when `discovery.wideArea.enabled` is true.
|
||||
|
||||
Validate from any tailnet-connected machine:
|
||||
Validate from a tailnet‑connected machine:
|
||||
|
||||
```bash
|
||||
dns-sd -B _clawdbot-bridge._tcp clawdbot.internal.
|
||||
@@ -59,99 +60,102 @@ In the Tailscale admin console:
|
||||
- Add a nameserver pointing at the gateway’s tailnet IP (UDP/TCP 53).
|
||||
- Add split DNS so the domain `clawdbot.internal` uses that nameserver.
|
||||
|
||||
Once clients accept tailnet DNS, iOS nodes can browse `_clawdbot-bridge._tcp` in `clawdbot.internal.` without multicast.
|
||||
Wide-area beacons also include `tailnetDns` (when available) so the macOS app can auto-fill SSH targets off-LAN.
|
||||
Once clients accept tailnet DNS, iOS nodes can browse
|
||||
`_clawdbot-bridge._tcp` in `clawdbot.internal.` without multicast.
|
||||
|
||||
### Bridge listener security (recommended)
|
||||
|
||||
The bridge port (default `18790`) is a plain TCP service. By default it binds to `0.0.0.0`, which makes it reachable from *any* interface on the gateway machine (LAN/Wi‑Fi/Tailscale).
|
||||
|
||||
For a tailnet-only setup, bind it to the Tailscale IP instead:
|
||||
The bridge port (default `18790`) is a plain TCP service. By default it binds to
|
||||
`0.0.0.0`, which makes it reachable from any interface on the gateway host.
|
||||
|
||||
For tailnet‑only setups:
|
||||
- Set `bridge.bind: "tailnet"` in `~/.clawdbot/clawdbot.json`.
|
||||
- Restart the Gateway (or restart the macOS menubar app via [`./scripts/restart-mac.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/restart-mac.sh) on that machine).
|
||||
|
||||
This keeps the bridge reachable only from devices on your tailnet (while still listening on loopback for local/SSH port-forwards).
|
||||
- Restart the Gateway (or restart the macOS menubar app).
|
||||
|
||||
## What advertises
|
||||
|
||||
Only the **Node Gateway** (`clawd` / `clawdbot gateway`) advertises Bonjour beacons.
|
||||
|
||||
- Implementation: [`src/infra/bonjour.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/bonjour.ts)
|
||||
- Gateway wiring: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts)
|
||||
Only the Gateway (when the **bridge is enabled**) advertises `_clawdbot-bridge._tcp`.
|
||||
|
||||
## Service types
|
||||
|
||||
- `_clawdbot-bridge._tcp` — bridge transport beacon (used by macOS/iOS/Android nodes).
|
||||
|
||||
## TXT keys (non-secret hints)
|
||||
## TXT keys (non‑secret hints)
|
||||
|
||||
The Gateway advertises small non-secret hints to make UI flows convenient:
|
||||
The Gateway advertises small non‑secret hints to make UI flows convenient:
|
||||
|
||||
- `role=gateway`
|
||||
- `displayName=<friendly name>`
|
||||
- `lanHost=<hostname>.local`
|
||||
- `sshPort=<port>` (defaults to 22 when not overridden)
|
||||
- `gatewayPort=<port>` (informational; the Gateway WS is typically loopback-only)
|
||||
- `gatewayPort=<port>` (informational; Gateway WS is usually loopback‑only)
|
||||
- `bridgePort=<port>` (only when bridge is enabled)
|
||||
- `canvasPort=<port>` (only when the canvas host is enabled + reachable; default `18793`; serves `/__clawdbot__/canvas/`)
|
||||
- `cliPath=<path>` (optional; absolute path to a runnable `clawdbot` entrypoint or binary)
|
||||
- `tailnetDns=<magicdns>` (optional hint; auto-detected from Tailscale when available; may be absent)
|
||||
- `canvasPort=<port>` (only when the canvas host is enabled; default `18793`)
|
||||
- `sshPort=<port>` (defaults to 22 when not overridden)
|
||||
- `transport=bridge`
|
||||
- `cliPath=<path>` (optional; absolute path to a runnable `clawdbot` entrypoint)
|
||||
- `tailnetDns=<magicdns>` (optional hint when Tailnet is available)
|
||||
|
||||
## Debugging on macOS
|
||||
|
||||
Useful built-in tools:
|
||||
Useful built‑in tools:
|
||||
|
||||
- Browse instances:
|
||||
- `dns-sd -B _clawdbot-bridge._tcp local.`
|
||||
```bash
|
||||
dns-sd -B _clawdbot-bridge._tcp local.
|
||||
```
|
||||
- Resolve one instance (replace `<instance>`):
|
||||
- `dns-sd -L "<instance>" _clawdbot-bridge._tcp local.`
|
||||
```bash
|
||||
dns-sd -L "<instance>" _clawdbot-bridge._tcp local.
|
||||
```
|
||||
|
||||
If browsing shows instances but resolving fails, you’re usually hitting a LAN policy / multicast issue.
|
||||
If browsing works but resolving fails, you’re usually hitting a LAN policy or
|
||||
mDNS resolver issue.
|
||||
|
||||
## Debugging in Gateway logs
|
||||
|
||||
The Gateway writes a rolling log file (printed on startup as `gateway log file: ...`).
|
||||
The Gateway writes a rolling log file (printed on startup as
|
||||
`gateway log file: ...`). Look for `bonjour:` lines, especially:
|
||||
|
||||
Look for `bonjour:` lines, especially:
|
||||
|
||||
- `bonjour: advertise failed ...` (probing/announce failure)
|
||||
- `bonjour: advertise failed ...`
|
||||
- `bonjour: ... name conflict resolved` / `hostname conflict resolved`
|
||||
- `bonjour: watchdog detected non-announced service; attempting re-advertise ...` (self-heal attempt after sleep/interface churn)
|
||||
- `bonjour: watchdog detected non-announced service ...`
|
||||
|
||||
## Debugging on iOS node
|
||||
|
||||
The iOS node app discovers bridges via `NWBrowser` browsing `_clawdbot-bridge._tcp`.
|
||||
The iOS node uses `NWBrowser` to discover `_clawdbot-bridge._tcp`.
|
||||
|
||||
To capture what the browser is doing:
|
||||
To capture logs:
|
||||
- Settings → Bridge → Advanced → **Discovery Debug Logs**
|
||||
- Settings → Bridge → Advanced → **Discovery Logs** → reproduce → **Copy**
|
||||
|
||||
- Settings → Bridge → Advanced → enable **Discovery Debug Logs**
|
||||
- Settings → Bridge → Advanced → open **Discovery Logs** → reproduce the “Searching…” / “No bridges found” case → **Copy**
|
||||
|
||||
The log includes browser state transitions (`ready`, `waiting`, `failed`, `cancelled`) and result-set changes (added/removed counts).
|
||||
The log includes browser state transitions and result‑set changes.
|
||||
|
||||
## Common failure modes
|
||||
|
||||
- **Bonjour doesn’t cross networks**: London/Vienna style setups require Tailnet (MagicDNS/IP) or SSH.
|
||||
- **Multicast blocked**: some Wi‑Fi networks (enterprise/hotels) disable mDNS; expect “no results”.
|
||||
- **Sleep / interface churn**: macOS may temporarily drop mDNS results when switching networks; retry.
|
||||
- **Browse works but resolve fails (iOS “NoSuchRecord”)**: make sure the advertiser publishes a valid SRV target hostname.
|
||||
- Implementation detail: `@homebridge/ciao` defaults `hostname` to the *service instance name* when `hostname` is omitted. If your instance name contains spaces/parentheses, some resolvers can fail to resolve the implied A/AAAA record.
|
||||
- Fix: set an explicit DNS-safe `hostname` (single label; no `.local`) in [`src/infra/bonjour.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/bonjour.ts).
|
||||
- **Bonjour doesn’t cross networks**: use Tailnet or SSH.
|
||||
- **Multicast blocked**: some Wi‑Fi networks disable mDNS.
|
||||
- **Sleep / interface churn**: macOS may temporarily drop mDNS results; retry.
|
||||
- **Browse works but resolve fails**: keep machine names simple (avoid emojis or
|
||||
punctuation), then restart the Gateway. The bridge instance name derives from
|
||||
the host name, so overly complex names can confuse some resolvers.
|
||||
|
||||
## Escaped instance names (`\\032`)
|
||||
Bonjour/DNS-SD often escapes bytes in service instance names as decimal `\\DDD` sequences (e.g. spaces become `\\032`).
|
||||
## Escaped instance names (`\032`)
|
||||
|
||||
Bonjour/DNS‑SD often escapes bytes in service instance names as decimal `\DDD`
|
||||
sequences (e.g. spaces become `\032`).
|
||||
|
||||
- This is normal at the protocol level.
|
||||
- UIs should decode for display (iOS uses `BonjourEscapes.decode` in `apps/shared/ClawdbotKit`).
|
||||
- UIs should decode for display (iOS uses `BonjourEscapes.decode`).
|
||||
|
||||
## Disabling / configuration
|
||||
|
||||
- `CLAWDBOT_DISABLE_BONJOUR=1` disables advertising.
|
||||
- `CLAWDBOT_BRIDGE_ENABLED=0` disables the bridge listener (and therefore the bridge beacon).
|
||||
- `bridge.bind` / `bridge.port` in `~/.clawdbot/clawdbot.json` control bridge bind/port (preferred).
|
||||
- `CLAWDBOT_BRIDGE_HOST` / `CLAWDBOT_BRIDGE_PORT` still work as a back-compat override when `bridge.bind` / `bridge.port` are not set.
|
||||
- `CLAWDBOT_SSH_PORT` overrides the SSH port advertised in `_clawdbot-bridge._tcp`.
|
||||
- `CLAWDBOT_TAILNET_DNS` publishes a `tailnetDns` hint (MagicDNS) in `_clawdbot-bridge._tcp`. If unset, the gateway auto-detects Tailscale and publishes the MagicDNS name when possible.
|
||||
- `CLAWDBOT_BRIDGE_ENABLED=0` disables the bridge listener (and the bridge beacon).
|
||||
- `bridge.bind` / `bridge.port` in `~/.clawdbot/clawdbot.json` control bridge bind/port.
|
||||
- `CLAWDBOT_BRIDGE_HOST` / `CLAWDBOT_BRIDGE_PORT` still work as back‑compat overrides.
|
||||
- `CLAWDBOT_SSH_PORT` overrides the SSH port advertised in TXT.
|
||||
- `CLAWDBOT_TAILNET_DNS` publishes a MagicDNS hint in TXT.
|
||||
- `CLAWDBOT_CLI_PATH` overrides the advertised CLI path.
|
||||
|
||||
## Related docs
|
||||
|
||||
|
||||
@@ -1383,8 +1383,8 @@ Notes:
|
||||
- `z.ai/*` and `z-ai/*` are accepted aliases and normalize to `zai/*`.
|
||||
- If `ZAI_API_KEY` is missing, requests to `zai/*` will fail with an auth error at runtime.
|
||||
- Example error: `No API key found for provider "zai".`
|
||||
- Z.AI’s general API endpoint is `https://api.z.ai/api/paas/v4`. The GLM Coding
|
||||
Plan uses the dedicated Coding endpoint `https://api.z.ai/api/coding/paas/v4`.
|
||||
- Z.AI’s general API endpoint is `https://api.z.ai/api/paas/v4`. GLM coding
|
||||
requests use the dedicated Coding endpoint `https://api.z.ai/api/coding/paas/v4`.
|
||||
The built-in `zai` provider uses the Coding endpoint. If you need the general
|
||||
endpoint, define a custom provider in `models.providers` with the base URL
|
||||
override (see the custom providers section above).
|
||||
|
||||
@@ -44,7 +44,7 @@ Target direction:
|
||||
|
||||
Troubleshooting and beacon details: [`docs/bonjour.md`](/gateway/bonjour).
|
||||
|
||||
#### Current implementation
|
||||
#### Service beacon details
|
||||
|
||||
- Service types:
|
||||
- `_clawdbot-bridge._tcp` (bridge transport beacon)
|
||||
@@ -98,15 +98,8 @@ The gateway is the source of truth for node/client admission.
|
||||
- scopes/ACLs (bridge is not a raw proxy to every gateway method)
|
||||
- rate limits
|
||||
|
||||
## Where the code lives (target architecture)
|
||||
## Responsibilities by component
|
||||
|
||||
- Node gateway:
|
||||
- advertises discovery beacons (Bonjour)
|
||||
- owns pairing storage + decisions
|
||||
- runs the bridge listener (direct transport)
|
||||
- macOS app:
|
||||
- UI for picking a gateway, showing pairing prompts, and troubleshooting
|
||||
- SSH tunneling only for the fallback path
|
||||
- iOS node:
|
||||
- browses Bonjour (LAN) as a convenience only
|
||||
- uses direct transport + pairing to connect to the gateway
|
||||
- **Gateway**: advertises discovery beacons, owns pairing decisions, runs the bridge listener.
|
||||
- **macOS app**: helps you pick a gateway, shows pairing prompts, and uses SSH only as a fallback.
|
||||
- **iOS/Android nodes**: browse Bonjour as a convenience and connect via the paired bridge.
|
||||
|
||||
@@ -1,47 +1,33 @@
|
||||
---
|
||||
summary: "Plan for heartbeat polling messages and notification rules"
|
||||
summary: "Heartbeat polling messages and notification rules"
|
||||
read_when:
|
||||
- Adjusting heartbeat cadence or messaging
|
||||
---
|
||||
# Heartbeat (Gateway)
|
||||
|
||||
Heartbeat runs periodic agent turns in the **main session** so the model can
|
||||
surface anything that needs attention without spamming the user.
|
||||
Heartbeat runs **periodic agent turns** in the main session so the model can
|
||||
surface anything that needs attention without spamming you.
|
||||
|
||||
## Defaults
|
||||
- Interval: `30m` (set `agent.heartbeat.every` to change, `0m` disables).
|
||||
|
||||
- Interval: `30m` (set `agent.heartbeat.every`; use `0m` to disable).
|
||||
- Prompt body (configurable via `agent.heartbeat.prompt`):
|
||||
`Read HEARTBEAT.md if exists. Consider outstanding tasks. Checkup sometimes on your human during (user local) day time.`
|
||||
- Heartbeat prompt text is sent **verbatim** as the user message. Clawdbot does
|
||||
not append extra body text. The system prompt includes a Heartbeats section
|
||||
and the run is flagged as a heartbeat internally.
|
||||
- The heartbeat prompt is sent **verbatim** as the user message. The system
|
||||
prompt includes a “Heartbeat” section and the run is flagged internally.
|
||||
|
||||
## Prompt contract
|
||||
- If nothing needs attention, the model should reply `HEARTBEAT_OK`.
|
||||
- During heartbeat runs, Clawdbot treats `HEARTBEAT_OK` as an ack when it appears at
|
||||
the **start or end** of the reply. Clawdbot strips the token and discards the
|
||||
reply if the remaining content is **≤ `ackMaxChars`** (default: 30).
|
||||
- If `HEARTBEAT_OK` is in the **middle** of a reply, it is not treated specially.
|
||||
- For alerts, do **not** include `HEARTBEAT_OK`; return only the alert text.
|
||||
## Response contract
|
||||
|
||||
## Prompt overrides
|
||||
- Overriding `agent.heartbeat.prompt` **replaces** the default body. Nothing is
|
||||
merged for you.
|
||||
- If you still want `HEARTBEAT.md` instructions, keep a line like
|
||||
`Read HEARTBEAT.md if exists` in your custom prompt.
|
||||
- `HEARTBEAT_OK` handling stays the same; changing the prompt won’t break acks.
|
||||
- If nothing needs attention, reply with **`HEARTBEAT_OK`**.
|
||||
- During heartbeat runs, Clawdbot treats `HEARTBEAT_OK` as an ack when it appears
|
||||
at the **start or end** of the reply. The token is stripped and the reply is
|
||||
dropped if the remaining content is **≤ `ackMaxChars`** (default: 30).
|
||||
- If `HEARTBEAT_OK` appears in the **middle** of a reply, it is not treated
|
||||
specially.
|
||||
- For alerts, **do not** include `HEARTBEAT_OK`; return only the alert text.
|
||||
|
||||
### Stray `HEARTBEAT_OK` outside heartbeats
|
||||
If the model accidentally includes `HEARTBEAT_OK` at the start or end of a
|
||||
normal (non-heartbeat) reply, Clawdbot strips the token and logs a verbose
|
||||
message. If the reply is only `HEARTBEAT_OK`, it is dropped.
|
||||
|
||||
### Outbound normalization (all providers)
|
||||
For **all providers** (WhatsApp/Web, Telegram, Slack, Discord, Signal, iMessage),
|
||||
Clawdbot applies the same filtering to tool summaries, streaming block replies,
|
||||
and final replies:
|
||||
- drop payloads that are only `HEARTBEAT_OK` with no media
|
||||
- strip `HEARTBEAT_OK` at the edges when mixed with other text
|
||||
Outside heartbeats, stray `HEARTBEAT_OK` at the start/end of a message is stripped
|
||||
and logged; a message that is only `HEARTBEAT_OK` is dropped.
|
||||
|
||||
## Config
|
||||
|
||||
@@ -51,8 +37,8 @@ and final replies:
|
||||
heartbeat: {
|
||||
every: "30m", // default: 30m (0m disables)
|
||||
model: "anthropic/claude-opus-4-5",
|
||||
target: "last", // last | whatsapp | telegram | discord | slack | signal | imessage | none
|
||||
to: "+15551234567", // optional provider-specific override (e.g. E.164 or chat id)
|
||||
target: "last", // last | whatsapp | telegram | discord | slack | signal | imessage | none
|
||||
to: "+15551234567", // optional provider-specific override
|
||||
prompt: "Read HEARTBEAT.md if exists. Consider outstanding tasks. Checkup sometimes on your human during (user local) day time.",
|
||||
ackMaxChars: 30 // max chars allowed after HEARTBEAT_OK
|
||||
}
|
||||
@@ -60,47 +46,45 @@ and final replies:
|
||||
}
|
||||
```
|
||||
|
||||
### Fields
|
||||
- `every`: heartbeat interval (duration string; default unit minutes). Default:
|
||||
`30m`. Set to `0m` to disable.
|
||||
- `model`: optional model override for heartbeat runs (`provider/model`).
|
||||
- `target`: where heartbeat output is delivered.
|
||||
- `last` (default): send to the last used external provider.
|
||||
- `whatsapp` / `telegram` / `discord` / `slack` / `signal` / `imessage`: force the provider (optionally set `to`).
|
||||
- `none`: do not deliver externally; output stays in the session (WebChat-visible).
|
||||
- `to`: optional recipient override (E.164 for WhatsApp, chat id for Telegram).
|
||||
- `prompt`: optional override for the heartbeat body (default shown above). Safe to
|
||||
change; heartbeat acks are still keyed off `HEARTBEAT_OK`.
|
||||
- `ackMaxChars`: max chars allowed after `HEARTBEAT_OK` before delivery (default: 30).
|
||||
### Field notes
|
||||
|
||||
## Cost awareness
|
||||
Heartbeats run full agent turns. Shorter intervals burn more tokens. Be
|
||||
intentional about `every`, keep `HEARTBEAT.md` tiny, and consider a cheaper
|
||||
`model` or `target: "none"` if you only want internal state updates.
|
||||
- `every`: heartbeat interval (duration string; default unit = minutes).
|
||||
- `model`: optional model override for heartbeat runs (`provider/model`).
|
||||
- `target`:
|
||||
- `last` (default): deliver to the last used external provider.
|
||||
- explicit provider: `whatsapp` / `telegram` / `discord` / `slack` / `signal` / `imessage`.
|
||||
- `none`: run the heartbeat but **do not deliver** externally.
|
||||
- `to`: optional recipient override (E.164 for WhatsApp, chat id for Telegram, etc.).
|
||||
- `prompt`: overrides the default prompt body (not merged).
|
||||
- `ackMaxChars`: max chars allowed after `HEARTBEAT_OK` before delivery.
|
||||
|
||||
## Delivery behavior
|
||||
|
||||
- Heartbeats run in the **main session** (`main`, or `global` when scope is global).
|
||||
- If the main queue is busy, the heartbeat is skipped and retried later.
|
||||
- If `target` resolves to no external destination, the run still happens but no
|
||||
outbound message is sent.
|
||||
- Heartbeat-only replies do **not** keep the session alive; the last `updatedAt`
|
||||
is restored so idle expiry behaves normally.
|
||||
|
||||
## HEARTBEAT.md (optional)
|
||||
|
||||
If a `HEARTBEAT.md` file exists in the workspace, the default prompt tells the
|
||||
agent to read it. Keep it tiny (short checklist or reminders) to avoid prompt
|
||||
bloat.
|
||||
|
||||
## Behavior
|
||||
- Runs in the main session (`main`, or `global` when scope is global).
|
||||
- Uses the main lane queue; if requests are in flight, the wake is retried.
|
||||
- Empty output or `HEARTBEAT_OK` is treated as “ok” and does **not** keep the
|
||||
session alive (`updatedAt` is restored).
|
||||
- If `target` resolves to no external destination (no last route or `none`), the
|
||||
heartbeat still runs but no outbound message is sent.
|
||||
## Manual wake (on-demand)
|
||||
|
||||
## Ideas for use
|
||||
- Check up on the user (light, respectful pings during daytime).
|
||||
- Handle mundane tasks (triage inboxes, summarize queues, refresh notes).
|
||||
- Nudge on open loops or reminders.
|
||||
- Background monitoring (health checks, status polling, low-priority alerts).
|
||||
- Scheduled routines (use [Cron jobs](/automation/cron-jobs) when you
|
||||
need exact schedules or isolated runs).
|
||||
You can enqueue a system event and trigger an immediate heartbeat with:
|
||||
|
||||
## Wake hook
|
||||
- The gateway exposes a heartbeat wake hook so cron/jobs/webhooks can request an
|
||||
immediate run (`requestHeartbeatNow`).
|
||||
- `wake` endpoints should enqueue system events and optionally trigger a wake; the
|
||||
heartbeat runner picks those up on the next tick or immediately.
|
||||
```bash
|
||||
clawdbot wake --text "Check for urgent follow-ups" --mode now
|
||||
```
|
||||
|
||||
Use `--mode next-heartbeat` to wait for the next scheduled tick.
|
||||
|
||||
## Cost awareness
|
||||
|
||||
Heartbeats run full agent turns. Shorter intervals burn more tokens. Keep
|
||||
`HEARTBEAT.md` small and consider a cheaper `model` or `target: "none"` if you
|
||||
only want internal state updates.
|
||||
|
||||
@@ -127,7 +127,9 @@ See also: [`docs/presence.md`](/concepts/presence) for how presence is produced/
|
||||
## Typing and validation
|
||||
- Server validates every inbound frame with AJV against JSON Schema emitted from the protocol definitions.
|
||||
- Clients (TS/Swift) consume generated types (TS directly; Swift via the repo’s generator).
|
||||
- Types live in [`src/gateway/protocol/*.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/protocol/*.ts); regenerate schemas/models with `pnpm protocol:gen` (writes [`dist/protocol.schema.json`](https://github.com/clawdbot/clawdbot/blob/main/dist/protocol.schema.json)) and `pnpm protocol:gen:swift` (writes [`apps/macos/Sources/ClawdbotProtocol/GatewayModels.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/ClawdbotProtocol/GatewayModels.swift)).
|
||||
- Protocol definitions are the source of truth; regenerate schema/models with:
|
||||
- `pnpm protocol:gen`
|
||||
- `pnpm protocol:gen:swift`
|
||||
|
||||
## Connection snapshot
|
||||
- `hello-ok` includes a `snapshot` with `presence`, `health`, `stateVersion`, and `uptimeMs` plus `policy {maxPayload,maxBufferedBytes,tickIntervalMs}` so clients can render immediately without extra requests.
|
||||
|
||||
@@ -10,12 +10,10 @@ read_when:
|
||||
Clawdbot has two log “surfaces”:
|
||||
|
||||
- **Console output** (what you see in the terminal / Debug UI).
|
||||
- **File logs** (JSON lines) written by the internal logger.
|
||||
- **File logs** (JSON lines) written by the gateway logger.
|
||||
|
||||
## File-based logger
|
||||
|
||||
Clawdbot uses a file logger backed by `tslog` ([`src/logging.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/logging.ts)).
|
||||
|
||||
- Default rolling log file is under `/tmp/clawdbot/` (one file per day): `clawdbot-YYYY-MM-DD.log`
|
||||
- The log file path and level can be configured via `~/.clawdbot/clawdbot.json`:
|
||||
- `logging.file`
|
||||
@@ -40,9 +38,8 @@ clawdbot logs --follow
|
||||
|
||||
## Console capture
|
||||
|
||||
The CLI entrypoint enables console capture ([`src/index.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/index.ts) calls `enableConsoleCapture()`).
|
||||
That means every `console.log/info/warn/error/debug/trace` is also written into the file logs,
|
||||
while still behaving normally on stdout/stderr.
|
||||
The CLI captures `console.log/info/warn/error/debug/trace` and writes them to file logs,
|
||||
while still printing to stdout/stderr.
|
||||
|
||||
You can tune console verbosity independently via:
|
||||
|
||||
@@ -94,13 +91,8 @@ clawdbot gateway --verbose --ws-log full
|
||||
|
||||
## Console formatting (subsystem logging)
|
||||
|
||||
Clawdbot formats console logs via a small wrapper on top of the existing stack:
|
||||
|
||||
- **tslog** for structured file logs ([`src/logging.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/logging.ts))
|
||||
- **chalk** for colors ([`src/globals.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/globals.ts))
|
||||
|
||||
The console formatter is **TTY-aware** and prints consistent, prefixed lines.
|
||||
Subsystem loggers are created via `createSubsystemLogger("gateway")`.
|
||||
Subsystem loggers keep output grouped and scannable.
|
||||
|
||||
Behavior:
|
||||
|
||||
|
||||
@@ -7,103 +7,83 @@ read_when:
|
||||
---
|
||||
# Gateway-owned pairing (Option B)
|
||||
|
||||
Goal: The Gateway (`clawd`) is the **source of truth** for which nodes are allowed to join the network.
|
||||
|
||||
This enables:
|
||||
- Headless approval via terminal/CLI (no Swift UI required).
|
||||
- Optional macOS UI approval (Swift app is just a frontend).
|
||||
- One consistent membership store for iOS, mac nodes, future hardware nodes.
|
||||
In Gateway-owned pairing, the **Gateway** is the source of truth for which nodes
|
||||
are allowed to join. UIs (macOS app, future clients) are just frontends that
|
||||
approve or reject pending requests.
|
||||
|
||||
## Concepts
|
||||
- **Pending request**: a node asked to join; requires explicit approve/reject.
|
||||
- **Paired node**: node is allowed; gateway returns an auth token for subsequent connects.
|
||||
- **Bridge**: direct transport endpoint owned by the gateway. The bridge does not decide membership.
|
||||
|
||||
- **Pending request**: a node asked to join; requires approval.
|
||||
- **Paired node**: approved node with an issued auth token.
|
||||
- **Bridge**: transport endpoint only; it forwards requests but does not decide
|
||||
membership.
|
||||
|
||||
## How pairing works
|
||||
|
||||
1. A node connects to the bridge and requests pairing.
|
||||
2. The Gateway stores a **pending request** and emits `node.pair.requested`.
|
||||
3. You approve or reject the request (CLI or UI).
|
||||
4. On approval, the Gateway issues a **new token** (tokens are rotated on re‑pair).
|
||||
5. The node reconnects using the token and is now “paired”.
|
||||
|
||||
Pending requests expire automatically after **5 minutes**.
|
||||
|
||||
## CLI workflow (headless friendly)
|
||||
|
||||
```bash
|
||||
clawdbot nodes pending
|
||||
clawdbot nodes approve <requestId>
|
||||
clawdbot nodes reject <requestId>
|
||||
clawdbot nodes status
|
||||
clawdbot nodes rename --node <id|name|ip> --name "Living Room iPad"
|
||||
```
|
||||
|
||||
`nodes status` shows paired/connected nodes and their capabilities.
|
||||
|
||||
## API surface (gateway protocol)
|
||||
These are conceptual method names; wire them into [`src/gateway/protocol/schema.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/protocol/schema.ts) and regenerate Swift types.
|
||||
|
||||
### Events
|
||||
- `node.pair.requested`
|
||||
- Emitted whenever a new pending pairing request is created.
|
||||
- Payload:
|
||||
- `requestId` (string)
|
||||
- `nodeId` (string)
|
||||
- `displayName?` (string)
|
||||
- `platform?` (string)
|
||||
- `version?` (string)
|
||||
- `remoteIp?` (string)
|
||||
- `silent?` (boolean) — hint that the UI may attempt auto-approval
|
||||
- `ts` (ms since epoch)
|
||||
- `node.pair.resolved`
|
||||
- Emitted when a pending request is approved/rejected.
|
||||
- Payload:
|
||||
- `requestId` (string)
|
||||
- `nodeId` (string)
|
||||
- `decision` ("approved" | "rejected" | "expired")
|
||||
- `ts` (ms since epoch)
|
||||
Events:
|
||||
- `node.pair.requested` — emitted when a new pending request is created.
|
||||
- `node.pair.resolved` — emitted when a request is approved/rejected/expired.
|
||||
|
||||
### Methods
|
||||
- `node.pair.request`
|
||||
- Creates (or returns) a pending request.
|
||||
- Params: node metadata (same shape as `node.pair.requested` payload, minus `requestId`/`ts`).
|
||||
- Optional `silent` flag hints that the UI can attempt an SSH auto-approve before showing an alert.
|
||||
- Result:
|
||||
- `status` ("pending")
|
||||
- `created` (boolean) — whether this call created the pending request
|
||||
- `request` (pending request object), including `isRepair` when the node was already paired
|
||||
- Security: **never returns an existing token**. If a paired node “lost” its token, it must be approved again (token rotation).
|
||||
- `node.pair.list`
|
||||
- Returns:
|
||||
- `pending[]` (pending requests)
|
||||
- `paired[]` (paired node records)
|
||||
- `node.pair.approve`
|
||||
- Params: `{ requestId }`
|
||||
- Result: `{ requestId, node: { nodeId, token, ... } }`
|
||||
- Must be idempotent (first decision wins).
|
||||
- `node.pair.reject`
|
||||
- Params: `{ requestId }`
|
||||
- Result: `{ requestId, nodeId }`
|
||||
- `node.pair.verify`
|
||||
- Params: `{ nodeId, token }`
|
||||
- Result: `{ ok: boolean, node?: { nodeId, ... } }`
|
||||
|
||||
## CLI flows
|
||||
CLI must be able to fully operate without any GUI:
|
||||
- `clawdbot nodes pending`
|
||||
- `clawdbot nodes approve <requestId>`
|
||||
- `clawdbot nodes reject <requestId>`
|
||||
- `clawdbot nodes status` (paired nodes + connection status/capabilities)
|
||||
|
||||
Optional interactive helper:
|
||||
- `clawdbot nodes watch` (subscribe to `node.pair.requested` and prompt in-place)
|
||||
|
||||
Implementation pointers:
|
||||
- CLI commands: [`src/cli/nodes-cli.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/nodes-cli.ts)
|
||||
- Gateway handlers + events: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) + [`src/gateway/server-methods/nodes.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/nodes.ts)
|
||||
- Pairing store: [`src/infra/node-pairing.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/node-pairing.ts) (under `~/.clawdbot/nodes/`)
|
||||
- Optional macOS UI prompt (frontend only): [`apps/macos/Sources/Clawdbot/NodePairingApprovalPrompter.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/NodePairingApprovalPrompter.swift)
|
||||
- Push-first: listens to `node.pair.requested`/`node.pair.resolved`, does a `node.pair.list` on startup/reconnect,
|
||||
and only runs a slow safety poll while a request is pending/visible.
|
||||
|
||||
## Storage (private, local)
|
||||
Gateway stores the authoritative state under `~/.clawdbot/`:
|
||||
- `~/.clawdbot/nodes/paired.json`
|
||||
- `~/.clawdbot/nodes/pending.json` (or `~/.clawdbot/nodes/pending/*.json`)
|
||||
Methods:
|
||||
- `node.pair.request` — create or reuse a pending request.
|
||||
- `node.pair.list` — list pending + paired nodes.
|
||||
- `node.pair.approve` — approve a pending request (issues token).
|
||||
- `node.pair.reject` — reject a pending request.
|
||||
- `node.pair.verify` — verify `{ nodeId, token }`.
|
||||
|
||||
Notes:
|
||||
- Tokens are secrets. Treat `paired.json` as sensitive.
|
||||
- Pending entries should have a TTL (e.g. 5 minutes) and expire automatically.
|
||||
- `node.pair.request` is idempotent per node: repeated calls return the same
|
||||
pending request.
|
||||
- Approval **always** generates a fresh token; no token is ever returned from
|
||||
`node.pair.request`.
|
||||
- Requests may include `silent: true` as a hint for auto-approval flows.
|
||||
|
||||
## Bridge integration
|
||||
Target direction:
|
||||
- The gateway runs the bridge listener (LAN/tailnet-facing) and advertises discovery beacons (Bonjour).
|
||||
- The bridge is transport only; it forwards/scopes requests and enforces ACLs, but pairing decisions are made by the gateway.
|
||||
## Auto-approval (macOS app)
|
||||
|
||||
The macOS UI (Swift) can:
|
||||
- Subscribe to `node.pair.requested`, show an alert (including `remoteIp`), and call `node.pair.approve` or `node.pair.reject`.
|
||||
- Or ignore/dismiss (“Later”) and let CLI handle it.
|
||||
- When `silent` is set, it can try a short SSH probe (same user) and auto-approve if reachable; otherwise fall back to the normal alert.
|
||||
The macOS app can optionally attempt a **silent approval** when:
|
||||
- the request is marked `silent`, and
|
||||
- the app can verify an SSH connection to the gateway host using the same user.
|
||||
|
||||
## Implementation note
|
||||
If the bridge is only provided by the macOS app, then “no Swift app running” cannot work end-to-end.
|
||||
The long-term goal is to move bridge hosting + Bonjour advertising into the Node gateway so headless pairing works by default.
|
||||
If silent approval fails, it falls back to the normal “Approve/Reject” prompt.
|
||||
|
||||
## Storage (local, private)
|
||||
|
||||
Pairing state is stored under the Gateway state directory (default `~/.clawdbot`):
|
||||
|
||||
- `~/.clawdbot/nodes/paired.json`
|
||||
- `~/.clawdbot/nodes/pending.json`
|
||||
|
||||
If you override `CLAWDBOT_STATE_DIR`, the `nodes/` folder moves with it.
|
||||
|
||||
Security notes:
|
||||
- Tokens are secrets; treat `paired.json` as sensitive.
|
||||
- Rotating a token requires re-approval (or deleting the node entry).
|
||||
|
||||
## Bridge behavior
|
||||
|
||||
- The bridge is **transport only**; it does not store membership.
|
||||
- If the Gateway is offline or pairing is disabled, nodes cannot pair.
|
||||
- If the bridge is running but the Gateway is in remote mode, pairing still
|
||||
happens against the remote Gateway’s store.
|
||||
|
||||
@@ -140,11 +140,3 @@ Nodes may include a `permissions` map in `node.list` / `node.describe`, keyed by
|
||||
|
||||
- The macOS menubar app connects to the Gateway bridge as a node (so `clawdbot nodes …` works against this Mac).
|
||||
- In remote mode, the app opens an SSH tunnel for the bridge port and connects to `localhost`.
|
||||
|
||||
## Where to look in code
|
||||
|
||||
- CLI wiring: [`src/cli/nodes-cli.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/nodes-cli.ts)
|
||||
- Canvas snapshot decoding/temp paths: [`src/cli/nodes-canvas.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/nodes-canvas.ts)
|
||||
- Duration parsing for CLI: [`src/cli/parse-duration.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/parse-duration.ts)
|
||||
- iOS node commands: [`apps/ios/Sources/Model/NodeAppModel.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/ios/Sources/Model/NodeAppModel.swift)
|
||||
- Android node commands: `apps/android/app/src/main/java/com/clawdbot/android/node/*`
|
||||
|
||||
@@ -76,7 +76,7 @@ Goal: model can request location even when node is backgrounded, but only when:
|
||||
|
||||
Push-triggered flow (future):
|
||||
1) Gateway sends a push to the node (silent push or FCM data).
|
||||
2) Node wakes briefly and calls `location.get` internally.
|
||||
2) Node wakes briefly and requests location from the device.
|
||||
3) Node forwards payload to Gateway.
|
||||
|
||||
Notes:
|
||||
|
||||
@@ -1,381 +1,105 @@
|
||||
---
|
||||
summary: "iOS app (node): architecture + connection runbook"
|
||||
summary: "iOS node app: connect to the Gateway, pairing, canvas, and troubleshooting"
|
||||
read_when:
|
||||
- Pairing or reconnecting the iOS node
|
||||
- Debugging iOS bridge discovery or auth
|
||||
- Sending screen/canvas commands to iOS
|
||||
- Designing iOS node + gateway integration
|
||||
- Extending the Gateway protocol for node/canvas commands
|
||||
- Implementing Bonjour pairing or transport security
|
||||
- Running the iOS app from source
|
||||
- Debugging bridge discovery or canvas commands
|
||||
---
|
||||
# iOS App (Node)
|
||||
|
||||
Status: prototype implemented (internal) · Date: 2025-12-13
|
||||
Availability: internal preview. The iOS app is not publicly distributed yet.
|
||||
|
||||
## Support snapshot
|
||||
- Role: companion node app (iOS does not host the Gateway).
|
||||
- Gateway required: yes (run it on macOS, Linux, or Windows via WSL2).
|
||||
- Install: [Getting Started](/start/getting-started) + [Pairing](/gateway/pairing).
|
||||
- Gateway: [Runbook](/gateway) + [Configuration](/gateway/configuration).
|
||||
## What it does
|
||||
|
||||
## System control
|
||||
System control (launchd/systemd) lives on the Gateway host. See [Gateway](/gateway).
|
||||
- Connects to a Gateway over the bridge (LAN or tailnet).
|
||||
- Exposes node capabilities: Canvas, Screen snapshot, Camera capture, Location, Talk mode, Voice wake.
|
||||
- Receives `node.invoke` commands and reports node status events.
|
||||
|
||||
## Connection Runbook
|
||||
## Requirements
|
||||
|
||||
This is the practical “how do I connect the iOS node” guide:
|
||||
- Gateway running on another device (macOS, Linux, or Windows via WSL2).
|
||||
- Bridge enabled (default).
|
||||
- Network path:
|
||||
- Same LAN via Bonjour, **or**
|
||||
- Tailnet via unicast DNS-SD (`clawdbot.internal.`), **or**
|
||||
- Manual host/port (fallback).
|
||||
|
||||
**iOS app** ⇄ (Bonjour + TCP bridge) ⇄ **Gateway bridge** ⇄ (loopback WS) ⇄ **Gateway**
|
||||
## Quick start (pair + connect)
|
||||
|
||||
The Gateway WebSocket stays loopback-only (`ws://127.0.0.1:18789`). The iOS node talks to the LAN-facing **bridge** (default `tcp://0.0.0.0:18790`) and uses Gateway-owned pairing.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- You can run the Gateway on the “master” machine.
|
||||
- iOS node app can reach the gateway bridge:
|
||||
- Same LAN with Bonjour/mDNS, **or**
|
||||
- Same Tailscale tailnet using Wide-Area Bonjour / unicast DNS-SD (see below), **or**
|
||||
- Manual bridge host/port (fallback)
|
||||
- You can run the CLI (`clawdbot`) on the gateway machine (or via SSH).
|
||||
|
||||
### 1) Start the Gateway (with bridge enabled)
|
||||
|
||||
Bridge is enabled by default (disable via `CLAWDBOT_BRIDGE_ENABLED=0`).
|
||||
1) Start the Gateway (bridge enabled by default):
|
||||
|
||||
```bash
|
||||
clawdbot gateway --port 18789 --verbose
|
||||
clawdbot gateway --port 18789
|
||||
```
|
||||
|
||||
Confirm in logs you see something like:
|
||||
- `bridge listening on tcp://0.0.0.0:18790 (node)`
|
||||
2) In the iOS app, open Settings and pick a discovered gateway (or enable Manual Bridge and enter host/port).
|
||||
|
||||
For tailnet-only setups (recommended for Vienna ⇄ London), bind the bridge to the gateway machine’s Tailscale IP instead:
|
||||
|
||||
- Set `bridge.bind: "tailnet"` in `~/.clawdbot/clawdbot.json` on the gateway host.
|
||||
- Restart the Gateway / macOS menubar app.
|
||||
|
||||
### 2) Verify Bonjour discovery (optional but recommended)
|
||||
|
||||
From the gateway machine:
|
||||
|
||||
```bash
|
||||
dns-sd -B _clawdbot-bridge._tcp local.
|
||||
```
|
||||
|
||||
You should see your gateway advertising `_clawdbot-bridge._tcp`.
|
||||
|
||||
If browse works, but the iOS node can’t connect, try resolving one instance:
|
||||
|
||||
```bash
|
||||
dns-sd -L "<instance name>" _clawdbot-bridge._tcp local.
|
||||
```
|
||||
|
||||
More debugging notes: [`docs/bonjour.md`](/gateway/bonjour).
|
||||
|
||||
#### Tailnet (Vienna ⇄ London) discovery via unicast DNS-SD
|
||||
|
||||
If the iOS node and the gateway are on different networks but connected via Tailscale, multicast mDNS won’t cross the boundary. Use Wide-Area Bonjour / unicast DNS-SD instead:
|
||||
|
||||
1) Set up a DNS-SD zone (example `clawdbot.internal.`) on the gateway host and publish `_clawdbot-bridge._tcp` records.
|
||||
2) Configure Tailscale split DNS for `clawdbot.internal` pointing at that DNS server.
|
||||
|
||||
Details and example CoreDNS config: [`docs/bonjour.md`](/gateway/bonjour).
|
||||
|
||||
### 3) Connect from the iOS node app
|
||||
|
||||
In the iOS node app:
|
||||
- Pick the discovered bridge (or hit refresh).
|
||||
- If not paired yet, it will initiate pairing automatically.
|
||||
- After the first successful pairing, it will auto-reconnect **strictly to the last discovered gateway** on launch (including after reinstall), as long as the iOS Keychain entry is still present.
|
||||
|
||||
#### Connection indicator (always visible)
|
||||
|
||||
The Settings tab icon shows a small status dot:
|
||||
- **Green**: connected to the bridge
|
||||
- **Yellow**: connecting (subtle pulse)
|
||||
- **Red**: not connected / error
|
||||
|
||||
### 4) Approve pairing (CLI)
|
||||
|
||||
On the gateway machine:
|
||||
3) Approve the pairing request on the gateway host:
|
||||
|
||||
```bash
|
||||
clawdbot nodes pending
|
||||
```
|
||||
|
||||
Approve the request:
|
||||
|
||||
```bash
|
||||
clawdbot nodes approve <requestId>
|
||||
```
|
||||
|
||||
After approval, the iOS node receives/stores the token and reconnects authenticated.
|
||||
|
||||
Pairing details: [`docs/gateway/pairing.md`](/gateway/pairing).
|
||||
|
||||
### 5) Verify the node is connected
|
||||
|
||||
- In the macOS app: **Instances** tab should show something like `iOS Node (...)` with a green “Active” presence dot shortly after connect.
|
||||
- Via nodes status (paired + connected):
|
||||
```bash
|
||||
clawdbot nodes status
|
||||
```
|
||||
- Via Gateway (paired + connected):
|
||||
```bash
|
||||
clawdbot gateway call node.list --params "{}"
|
||||
```
|
||||
- Via Gateway presence (legacy-ish, still useful):
|
||||
```bash
|
||||
clawdbot gateway call system-presence --params "{}"
|
||||
```
|
||||
Look for the node `instanceId` (often a UUID).
|
||||
|
||||
### 6) Drive the iOS Canvas (draw / snapshot)
|
||||
|
||||
The iOS node runs a WKWebView “Canvas” scaffold which exposes:
|
||||
- `window.__clawdbot.canvas`
|
||||
- `window.__clawdbot.ctx` (2D context)
|
||||
- `window.__clawdbot.setStatus(title, subtitle)`
|
||||
|
||||
#### Gateway Canvas Host (recommended for web content)
|
||||
|
||||
If you want the node to show real HTML/CSS/JS that the agent can edit on disk, point it at the Gateway canvas host.
|
||||
|
||||
Note: nodes always use the standalone canvas host on `canvasHost.port` (default `18793`), bound to the bridge interface.
|
||||
|
||||
1) Create `~/clawd/canvas/index.html` on the gateway host.
|
||||
|
||||
2) Navigate the node to it (LAN):
|
||||
4) Verify connection:
|
||||
|
||||
```bash
|
||||
clawdbot nodes invoke --node "iOS Node" --command canvas.navigate --params '{"url":"http://<gateway-hostname>.local:18793/__clawdbot__/canvas/"}'
|
||||
clawdbot nodes status
|
||||
clawdbot gateway call node.list --params "{}"
|
||||
```
|
||||
|
||||
## Discovery paths
|
||||
|
||||
### Bonjour (LAN)
|
||||
|
||||
The Gateway advertises `_clawdbot-bridge._tcp` on `local.`. The iOS app lists these automatically.
|
||||
|
||||
### Tailnet (cross-network)
|
||||
|
||||
If mDNS is blocked, use a unicast DNS-SD zone (recommended domain: `clawdbot.internal.`) and Tailscale split DNS.
|
||||
See [`docs/bonjour.md`](/gateway/bonjour) for the CoreDNS example.
|
||||
|
||||
### Manual host/port
|
||||
|
||||
In Settings, enable **Manual Bridge** and enter the gateway host + port (default `18790`).
|
||||
|
||||
## Canvas + A2UI
|
||||
|
||||
The iOS node renders a WKWebView canvas. Use `node.invoke` to drive it:
|
||||
|
||||
```bash
|
||||
clawdbot nodes invoke --node "iOS Node" --command canvas.navigate --params '{"url":"http://<gateway-host>:18793/__clawdbot__/canvas/"}'
|
||||
```
|
||||
|
||||
Notes:
|
||||
- The server injects a live-reload client into HTML and reloads on file changes.
|
||||
- A2UI is hosted on the same canvas host at `http://<gateway-host>:18793/__clawdbot__/a2ui/`.
|
||||
- Tailnet (optional): if both devices are on Tailscale, use a MagicDNS name or tailnet IP instead of `.local`, e.g. `http://<gateway-magicdns>:18793/__clawdbot__/canvas/`.
|
||||
- iOS may require App Transport Security allowances to load plain `http://` URLs; if it fails to load, prefer HTTPS or adjust the iOS app’s ATS config.
|
||||
- The Gateway canvas host serves `/__clawdbot__/canvas/` and `/__clawdbot__/a2ui/`.
|
||||
- The iOS node auto-navigates to A2UI on connect when a canvas host URL is advertised.
|
||||
- Return to the built-in scaffold with `canvas.navigate` and `{"url":""}`.
|
||||
|
||||
#### Draw with `canvas.eval`
|
||||
### Canvas eval / snapshot
|
||||
|
||||
```bash
|
||||
clawdbot nodes invoke --node "iOS Node" --command canvas.eval --params "$(cat <<'JSON'
|
||||
{"javaScript":"(() => { const {ctx,setStatus} = window.__clawdbot; setStatus('Drawing','…'); ctx.clearRect(0,0,innerWidth,innerHeight); ctx.lineWidth=6; ctx.strokeStyle='#ff2d55'; ctx.beginPath(); ctx.moveTo(40,40); ctx.lineTo(innerWidth-40, innerHeight-40); ctx.stroke(); setStatus(null,null); return 'ok'; })()"}
|
||||
JSON
|
||||
)"
|
||||
clawdbot nodes invoke --node "iOS Node" --command canvas.eval --params '{"javaScript":"(() => { const {ctx} = window.__clawdbot; ctx.clearRect(0,0,innerWidth,innerHeight); ctx.lineWidth=6; ctx.strokeStyle=\"#ff2d55\"; ctx.beginPath(); ctx.moveTo(40,40); ctx.lineTo(innerWidth-40, innerHeight-40); ctx.stroke(); return \"ok\"; })()"}'
|
||||
```
|
||||
|
||||
#### Snapshot with `canvas.snapshot`
|
||||
|
||||
```bash
|
||||
clawdbot nodes invoke --node 192.168.0.88 --command canvas.snapshot --params '{"maxWidth":900}'
|
||||
clawdbot nodes invoke --node "iOS Node" --command canvas.snapshot --params '{"maxWidth":900,"format":"jpeg"}'
|
||||
```
|
||||
|
||||
The response includes `{ format, base64 }` image data (default `format="jpeg"`; pass `{"format":"png"}` when you specifically need lossless PNG).
|
||||
## Voice wake + talk mode
|
||||
|
||||
### Common gotchas
|
||||
- Voice wake and talk mode are available in Settings.
|
||||
- iOS may suspend background audio; treat voice features as best-effort when the app is not active.
|
||||
|
||||
- **iOS in background:** all `canvas.*` commands fail fast with `NODE_BACKGROUND_UNAVAILABLE` (bring the iOS node app to foreground).
|
||||
- **Return to default scaffold:** `canvas.navigate` with `{"url":""}` or `{"url":"/"}` returns to the built-in scaffold page.
|
||||
- **mDNS blocked:** some networks block multicast; use a different LAN or plan a tailnet-capable bridge (see [`docs/discovery.md`](/gateway/discovery)).
|
||||
- **Wrong node selector:** `--node` can be the node id (UUID), display name (e.g. `iOS Node`), IP, or an unambiguous prefix. If it’s ambiguous, the CLI will tell you.
|
||||
- **Stale pairing / Keychain cleared:** if the pairing token is missing (or iOS Keychain was wiped), the node must pair again; approve a new pending request.
|
||||
- **App reinstall but no reconnect:** the node restores `instanceId` + last bridge preference from Keychain; if it still comes up “unpaired”, verify Keychain persistence on your device/simulator and re-pair once.
|
||||
## Common errors
|
||||
|
||||
## Design + Architecture
|
||||
|
||||
### Goals
|
||||
- Build an **iOS app** that acts as a **remote node** for Clawdbot:
|
||||
- **Voice trigger** (wake-word / always-listening intent) that forwards transcripts to the Gateway `agent` method.
|
||||
- **Canvas** surface that the agent can control: navigate, draw/render, evaluate JS, snapshot.
|
||||
- **Dead-simple setup**:
|
||||
- Auto-discover the host on the local network via **Bonjour**.
|
||||
- One-tap pairing with an approval prompt on the Mac.
|
||||
- iOS is **never** a local gateway; it is always a remote node.
|
||||
- Operational clarity:
|
||||
- When iOS is backgrounded, voice may still run; **canvas commands must fail fast** with a structured error.
|
||||
- Provide **settings**: node display name, enable/disable voice wake, pairing status.
|
||||
|
||||
Non-goals (v1):
|
||||
- Exposing the Node Gateway directly on the LAN.
|
||||
- Supporting arbitrary third-party “plugins” on iOS.
|
||||
- Perfect App Store compliance; this is **internal-only** initially.
|
||||
|
||||
### Current repo reality (constraints we respect)
|
||||
- The Gateway WebSocket server binds to `127.0.0.1:18789` ([`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts)) with an optional `CLAWDBOT_GATEWAY_TOKEN`.
|
||||
- The Gateway exposes a Canvas file server (`canvasHost`) on `canvasHost.port` (default `18793`), so nodes can `canvas.navigate` to `http://<lanHost>:18793/__clawdbot__/canvas/` and auto-reload on file changes ([`docs/configuration.md`](/gateway/configuration)).
|
||||
- macOS “Canvas” is controlled via the Gateway node protocol (`canvas.*`), matching iOS/Android ([`docs/mac/canvas.md`](/platforms/mac/canvas)).
|
||||
- Voice wake forwards via `GatewayChannel` to Gateway `agent` (mac app: `VoiceWakeForwarder` → `GatewayConnection.sendAgent`).
|
||||
|
||||
### Recommended topology (B): Gateway-owned Bridge + loopback Gateway
|
||||
Keep the Node gateway loopback-only; expose a dedicated **gateway-owned bridge** to the LAN/tailnet.
|
||||
|
||||
**iOS App** ⇄ (TLS + pairing) ⇄ **Bridge (in gateway)** ⇄ (loopback) ⇄ **Gateway WS** (`ws://127.0.0.1:18789`)
|
||||
|
||||
Why:
|
||||
- Preserves current threat model: Gateway remains local-only.
|
||||
- Centralizes auth, rate limiting, and allowlisting in the bridge.
|
||||
- Lets us unify “canvas node” semantics across mac + iOS without exposing raw gateway methods.
|
||||
|
||||
### Security plan (internal, but still robust)
|
||||
#### Transport
|
||||
- **Current (v0):** bridge is a LAN-facing **TCP** listener with token-based auth after pairing.
|
||||
- **Next:** wrap the bridge in **TLS** and prefer key-pinned or mTLS-like auth after pairing.
|
||||
|
||||
#### Pairing
|
||||
- Bonjour discovery shows a candidate “Clawdbot Bridge” on the LAN.
|
||||
- First connection:
|
||||
1) iOS generates a keypair (Secure Enclave if available).
|
||||
2) iOS connects to the bridge and requests pairing.
|
||||
3) The bridge forwards the pairing request to the **Gateway** as a *pending request*.
|
||||
4) Approval can happen via:
|
||||
- **macOS UI** (Clawdbot shows an alert with Approve/Reject/Later, including the node IP), or
|
||||
- **Terminal/CLI** (headless flows).
|
||||
5) Once approved, the bridge returns a token to iOS; iOS stores it in Keychain.
|
||||
- Subsequent connections:
|
||||
- The bridge requires the paired identity. Unpaired clients get a structured “not paired” error and no access.
|
||||
|
||||
##### Gateway-owned pairing (Option B details)
|
||||
Pairing decisions must be owned by the Gateway (`clawd` / Node) so nodes can be approved without the macOS app running.
|
||||
|
||||
Key idea:
|
||||
- The Swift app may still show an alert, but it is only a **frontend** for pending requests stored in the Gateway.
|
||||
|
||||
Desired behavior:
|
||||
- If the Swift UI is present: show alert with Approve/Reject/Later.
|
||||
- If the Swift UI is not present: `clawdbot` CLI can list pending requests and approve/reject.
|
||||
|
||||
See [`docs/gateway/pairing.md`](/gateway/pairing) for the API/events and storage.
|
||||
|
||||
CLI (headless approvals):
|
||||
- `clawdbot nodes pending`
|
||||
- `clawdbot nodes approve <requestId>`
|
||||
- `clawdbot nodes reject <requestId>`
|
||||
|
||||
#### Authorization / scope control (bridge-side ACL)
|
||||
The bridge must not be a raw proxy to every gateway method.
|
||||
|
||||
- Allow by default:
|
||||
- `agent` (with guardrails; idempotency required)
|
||||
- minimal `system-event` beacons (presence updates for the node)
|
||||
- node/canvas methods defined below (new protocol surface)
|
||||
- Deny by default:
|
||||
- anything that widens control without explicit intent (future “shell”, “files”, etc.)
|
||||
- Rate limit:
|
||||
- handshake attempts
|
||||
- voice forwards per minute
|
||||
- snapshot frequency / payload size
|
||||
|
||||
### Protocol unification: add “node/canvas” to Gateway protocol
|
||||
#### Principle
|
||||
Unify mac Canvas + iOS Canvas under a single conceptual surface:
|
||||
- The agent talks to the Gateway using a stable method set (typed protocol).
|
||||
- The Gateway routes node-targeted requests to:
|
||||
- local mac Canvas implementation, or
|
||||
- remote iOS node via the bridge
|
||||
|
||||
#### Minimal protocol additions (v1)
|
||||
Add to [`src/gateway/protocol/schema.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/protocol/schema.ts) (and regenerate Swift models):
|
||||
|
||||
**Identity**
|
||||
- Node identity comes from `connect.params.client.instanceId` (stable), and `connect.params.client.mode = "node"` (or `"ios-node"`).
|
||||
|
||||
**Methods**
|
||||
- `node.list` → list paired/connected nodes + capabilities
|
||||
- `node.describe` → describe a node (capabilities + supported `node.invoke` commands)
|
||||
- `node.invoke` → send a command to a specific node
|
||||
- Params: `{ nodeId, command, params?, timeoutMs? }`
|
||||
|
||||
**Events**
|
||||
- `node.event` → async node status/errors
|
||||
- e.g. background/foreground transitions, voice availability, canvas availability
|
||||
|
||||
#### Node command set (canvas)
|
||||
These are values for `node.invoke.command`:
|
||||
- `canvas.present` / `canvas.hide`
|
||||
- `canvas.navigate` with `{ url }` (loads a URL; use `""` or `"/"` to return to the default scaffold)
|
||||
- `canvas.eval` with `{ javaScript }`
|
||||
- `canvas.snapshot` with `{ maxWidth?, quality?, format? }`
|
||||
- A2UI (mobile + macOS canvas):
|
||||
- `canvas.a2ui.push` with `{ messages: [...] }` (A2UI v0.8 server→client messages)
|
||||
- `canvas.a2ui.pushJSONL` with `{ jsonl: "..." }` (legacy alias)
|
||||
- `canvas.a2ui.reset`
|
||||
- A2UI is hosted by the Gateway canvas host (`/__clawdbot__/a2ui/`) on `canvasHost.port`. Commands fail if the host is unreachable.
|
||||
|
||||
Result pattern:
|
||||
- Request is a standard `req/res` with `ok` / `error`.
|
||||
- Long operations (loads, streaming drawing, etc.) may also emit `node.event` progress.
|
||||
|
||||
##### Current (implemented)
|
||||
As of 2025-12-13, the Gateway supports `node.invoke` for bridge-connected nodes.
|
||||
|
||||
Example: draw a diagonal line on the iOS Canvas:
|
||||
```bash
|
||||
clawdbot nodes invoke --node ios-node --command canvas.eval --params '{"javaScript":"(() => { const {ctx} = window.__clawdbot; ctx.clearRect(0,0,innerWidth,innerHeight); ctx.lineWidth=6; ctx.strokeStyle=\"#ff2d55\"; ctx.beginPath(); ctx.moveTo(40,40); ctx.lineTo(innerWidth-40, innerHeight-40); ctx.stroke(); return \"ok\"; })()"}'
|
||||
```
|
||||
|
||||
### Background behavior requirement
|
||||
When iOS is backgrounded:
|
||||
- Voice may still be active (subject to iOS suspension).
|
||||
- **All `canvas.*` commands must fail** with a stable error code, e.g.:
|
||||
- `NODE_BACKGROUND_UNAVAILABLE`
|
||||
- Include `retryable: true` and `retryAfterMs` if we want the agent to wait.
|
||||
|
||||
## iOS app architecture (SwiftUI)
|
||||
### App structure
|
||||
- Single fullscreen Canvas surface (WKWebView).
|
||||
- One settings entry point: a **gear button** that opens a settings sheet.
|
||||
- All navigation is **agent-driven** (no local URL bar).
|
||||
|
||||
### Components
|
||||
- `BridgeDiscovery`: Bonjour browse + resolve (Network.framework `NWBrowser`)
|
||||
- `BridgeConnection`: TCP session + pairing handshake + reconnect (TLS planned)
|
||||
- `NodeRuntime`:
|
||||
- Voice pipeline (wake-word + capture + forward)
|
||||
- Canvas pipeline (WKWebView controller + snapshot + eval)
|
||||
- Background state tracking; enforces “canvas unavailable in background”
|
||||
|
||||
### Voice in background (internal)
|
||||
- Enable background audio mode (and required session configuration) so the mic pipeline can keep running when the user switches apps.
|
||||
- If iOS suspends the app anyway, surface a clear node status (`node.event`) so operators can see voice is unavailable.
|
||||
|
||||
## Code sharing (macOS + iOS)
|
||||
Create/expand SwiftPM targets so both apps share:
|
||||
- `ClawdbotProtocol` (generated models; platform-neutral)
|
||||
- `ClawdbotGatewayClient` (shared WS framing + connect/req/res + seq-gap handling)
|
||||
- `ClawdbotKit` (node/canvas command types + deep links + shared utilities)
|
||||
|
||||
macOS continues to own:
|
||||
- local Canvas implementation details (custom scheme handler serving on-disk HTML, window/panel presentation)
|
||||
|
||||
iOS owns:
|
||||
- iOS-specific audio/speech + WKWebView presentation and lifecycle
|
||||
|
||||
## Repo layout
|
||||
- iOS app: `apps/ios/` (XcodeGen `project.yml`)
|
||||
- Shared Swift packages: `apps/shared/`
|
||||
- Lint/format: iOS target runs `swiftformat --lint` + `swiftlint lint` using repo configs (`.swiftformat`, `.swiftlint.yml`).
|
||||
|
||||
Generate the Xcode project:
|
||||
```bash
|
||||
cd apps/ios
|
||||
xcodegen generate
|
||||
open Clawdbot.xcodeproj
|
||||
```
|
||||
|
||||
## Storage plan (private by default)
|
||||
### iOS
|
||||
- Canvas/workspace files (persistent, private):
|
||||
- `Application Support/Clawdbot/canvas/<sessionKey>/...`
|
||||
- Snapshots / temp exports (evictable):
|
||||
- `Library/Caches/Clawdbot/canvas-snapshots/<sessionKey>/...`
|
||||
- Credentials:
|
||||
- Keychain (paired identity + bridge trust anchor)
|
||||
- `NODE_BACKGROUND_UNAVAILABLE`: bring the iOS app to the foreground (canvas/camera/screen commands require it).
|
||||
- `A2UI_HOST_NOT_CONFIGURED`: the Gateway did not advertise a canvas host URL; check `canvasHost` in [`docs/configuration.md`](/gateway/configuration).
|
||||
- Pairing prompt never appears: run `clawdbot nodes pending` and approve manually.
|
||||
- Reconnect fails after reinstall: the Keychain pairing token was cleared; re-pair the node.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`docs/gateway.md`](/gateway) (gateway runbook)
|
||||
- [`docs/gateway/pairing.md`](/gateway/pairing) (approval + storage)
|
||||
- [`docs/bonjour.md`](/gateway/bonjour) (discovery debugging)
|
||||
- [`docs/discovery.md`](/gateway/discovery) (LAN vs tailnet vs SSH)
|
||||
- [Pairing](/gateway/pairing)
|
||||
- [Discovery](/gateway/discovery)
|
||||
- [Bonjour](/gateway/bonjour)
|
||||
|
||||
@@ -15,7 +15,7 @@ Goal: ship **Clawdbot.app** with a self-contained relay binary that can run both
|
||||
App bundle layout:
|
||||
|
||||
- `Clawdbot.app/Contents/Resources/Relay/clawdbot`
|
||||
- bun `--compile` relay executable built from [`dist/macos/relay.js`](https://github.com/clawdbot/clawdbot/blob/main/dist/macos/relay.js)
|
||||
- bun `--compile` relay executable built from `dist/macos/relay.js`
|
||||
- Supports:
|
||||
- `clawdbot …` (CLI)
|
||||
- `clawdbot gateway …` (LaunchAgent daemon)
|
||||
@@ -47,7 +47,7 @@ Important bundler flags:
|
||||
|
||||
Version injection:
|
||||
- `--define "__CLAWDBOT_VERSION__=\"<pkg version>\""`
|
||||
- [`src/version.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/version.ts) also supports `__CLAWDBOT_VERSION__` (and `CLAWDBOT_BUNDLED_VERSION`) so `--version` doesn’t depend on reading `package.json` at runtime.
|
||||
- The relay honors `__CLAWDBOT_VERSION__` / `CLAWDBOT_BUNDLED_VERSION` so `--version` doesn’t depend on reading `package.json` at runtime.
|
||||
|
||||
## Launchd (Gateway as LaunchAgent)
|
||||
|
||||
@@ -58,7 +58,7 @@ Plist location (per-user):
|
||||
- `~/Library/LaunchAgents/com.clawdbot.gateway.plist`
|
||||
|
||||
Manager:
|
||||
- [`apps/macos/Sources/Clawdbot/GatewayLaunchAgentManager.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/GatewayLaunchAgentManager.swift)
|
||||
- The macOS app owns LaunchAgent install/update for the bundled gateway.
|
||||
|
||||
Behavior:
|
||||
- “Clawdbot Active” enables/disables the LaunchAgent.
|
||||
@@ -79,7 +79,7 @@ Symptom (when mis-signed):
|
||||
|
||||
Fix:
|
||||
- The bun executable needs JIT-ish permissions under hardened runtime.
|
||||
- [`scripts/codesign-mac-app.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/codesign-mac-app.sh) signs `Relay/clawdbot` with:
|
||||
- `scripts/codesign-mac-app.sh` signs `Relay/clawdbot` with:
|
||||
- `com.apple.security.cs.allow-jit`
|
||||
- `com.apple.security.cs.allow-unsigned-executable-memory`
|
||||
|
||||
@@ -89,18 +89,14 @@ Problem:
|
||||
- bun can’t load some native Node addons like `sharp` (and we don’t want to ship native addon trees for the gateway).
|
||||
|
||||
Solution:
|
||||
- Central helper [`src/media/image-ops.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/media/image-ops.ts)
|
||||
- Prefers `/usr/bin/sips` on macOS (esp. when running under bun)
|
||||
- Falls back to `sharp` when available (Node/dev)
|
||||
- Used by:
|
||||
- [`src/web/media.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/web/media.ts) (optimize inbound/outbound images)
|
||||
- [`src/browser/screenshot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/browser/screenshot.ts)
|
||||
- [`src/agents/pi-tools.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-tools.ts) (image sanitization)
|
||||
- Image operations prefer `/usr/bin/sips` on macOS (especially under bun).
|
||||
- When running in Node/dev, `sharp` is used when available.
|
||||
- This affects inbound/outbound media, screenshots, and tool image sanitization.
|
||||
|
||||
## Browser control server
|
||||
|
||||
The Gateway starts the browser control server (loopback only) from [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts).
|
||||
It’s started from the relay daemon process, so the relay binary includes Playwright deps.
|
||||
The Gateway starts the browser control server (loopback only) from the relay daemon process,
|
||||
so the relay binary includes Playwright deps.
|
||||
|
||||
## Tests / smoke checks
|
||||
|
||||
@@ -127,7 +123,7 @@ Bun may leave dotfiles like `*.bun-build` in the repo root or subfolders.
|
||||
|
||||
## DMG styling (human installer)
|
||||
|
||||
[`scripts/create-dmg.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/create-dmg.sh) styles the DMG via Finder AppleScript.
|
||||
`scripts/create-dmg.sh` styles the DMG via Finder AppleScript.
|
||||
|
||||
Rules of thumb:
|
||||
- Use a **72dpi** background image that matches the Finder window size in points.
|
||||
|
||||
@@ -5,157 +5,117 @@ read_when:
|
||||
- Adding agent controls for visual workspace
|
||||
- Debugging WKWebView canvas loads
|
||||
---
|
||||
|
||||
# Canvas (macOS app)
|
||||
|
||||
Status: draft spec · Date: 2025-12-12
|
||||
The macOS app embeds an agent‑controlled **Canvas panel** using `WKWebView`. It
|
||||
is a lightweight visual workspace for HTML/CSS/JS, A2UI, and small interactive
|
||||
UI surfaces.
|
||||
|
||||
Note: for iOS/Android nodes that should render agent-edited HTML/CSS/JS over the network, prefer the Gateway `canvasHost` (serves `~/clawd/canvas` over LAN/tailnet with live reload). A2UI is also **hosted by the Gateway** over HTTP. This doc focuses on the macOS in-app canvas panel. See [`docs/configuration.md`](/gateway/configuration).
|
||||
## Where Canvas lives
|
||||
|
||||
Clawdbot can embed an agent-controlled “visual workspace” panel (“Canvas”) inside the macOS app using `WKWebView`, served via a **custom URL scheme** (no loopback HTTP port required).
|
||||
Canvas state is stored under Application Support:
|
||||
|
||||
This is designed for:
|
||||
- Agent-written HTML/CSS/JS on disk (per-session directory).
|
||||
- A real browser engine for layout, rendering, and basic interactivity.
|
||||
- Agent-driven visibility (show/hide), navigation, DOM/JS queries, and snapshots.
|
||||
- Minimal chrome: borderless panel; bezel/chrome appears only on hover.
|
||||
- `~/Library/Application Support/Clawdbot/canvas/<session>/...`
|
||||
|
||||
## Why a custom scheme (vs. loopback HTTP)
|
||||
The Canvas panel serves those files via a **custom URL scheme**:
|
||||
|
||||
Using `WKURLSchemeHandler` keeps Canvas entirely in-process:
|
||||
- No port conflicts and no extra local server lifecycle.
|
||||
- Easier to sandbox: only serve files we explicitly map.
|
||||
- Works offline and can use an ephemeral data store (no persistent cookies/cache).
|
||||
|
||||
If a Canvas page truly needs “real web” semantics (CORS, fetch to loopback endpoints, service workers), consider the loopback-server variant instead (out of scope for this doc).
|
||||
|
||||
## URL ↔ directory mapping
|
||||
|
||||
The Canvas scheme is:
|
||||
- `clawdbot-canvas://<session>/<path>`
|
||||
|
||||
Routing model:
|
||||
- `clawdbot-canvas://main/` → `<canvasRoot>/main/index.html` (or `index.htm`)
|
||||
- `clawdbot-canvas://main/yolo` → `<canvasRoot>/main/yolo/index.html` (or `index.htm`)
|
||||
Examples:
|
||||
- `clawdbot-canvas://main/` → `<canvasRoot>/main/index.html`
|
||||
- `clawdbot-canvas://main/assets/app.css` → `<canvasRoot>/main/assets/app.css`
|
||||
- `clawdbot-canvas://main/widgets/todo/` → `<canvasRoot>/main/widgets/todo/index.html`
|
||||
|
||||
Directory listings are not served.
|
||||
If no `index.html` exists at the root, the app shows a **built‑in scaffold page**.
|
||||
|
||||
When `/` has no `index.html` yet, the handler serves a **built-in scaffold page** (bundled with the macOS app).
|
||||
This is a visual placeholder only (no A2UI renderer).
|
||||
## Panel behavior
|
||||
|
||||
### Suggested on-disk location
|
||||
- Borderless, resizable panel anchored near the menu bar (or mouse cursor).
|
||||
- Remembers size/position per session.
|
||||
- Auto‑reloads when local canvas files change.
|
||||
- Only one Canvas panel is visible at a time (session is switched as needed).
|
||||
|
||||
Store Canvas state under the app support directory:
|
||||
- `~/Library/Application Support/Clawdbot/canvas/<session>/…`
|
||||
Canvas can be disabled from Settings → **Allow Canvas**. When disabled, canvas
|
||||
node commands return `CANVAS_DISABLED`.
|
||||
|
||||
This keeps it alongside other app-owned state and avoids mixing with `~/.clawdbot/` gateway config.
|
||||
## Agent API surface
|
||||
|
||||
## Panel behavior (agent-controlled)
|
||||
Canvas is exposed via the **node bridge**, so the agent can:
|
||||
|
||||
Canvas is presented as a borderless `NSPanel` (similar to the existing WebChat panel):
|
||||
- Can be shown/hidden at any time by the agent.
|
||||
- Supports an “anchored” presentation (near the menu bar icon or another anchor rect).
|
||||
- Uses a rounded container; shadow stays on, but **chrome/bezel only appears on hover**.
|
||||
- Default position is the **top-right corner** of the current screen’s visible frame (unless the user moved/resized it previously).
|
||||
- The panel is **user-resizable** (edge resize + hover resize handle) and the last frame is persisted per session.
|
||||
- show/hide the panel
|
||||
- navigate to a path or URL
|
||||
- evaluate JavaScript
|
||||
- capture a snapshot image
|
||||
|
||||
### Hover-only chrome
|
||||
CLI examples:
|
||||
|
||||
Implementation notes:
|
||||
- Keep the window borderless at all times (don’t toggle `styleMask`).
|
||||
- Add an overlay view inside the content container for chrome (stroke + subtle gradient/material).
|
||||
- Use an `NSTrackingArea` to fade the chrome in/out on `mouseEntered/mouseExited`.
|
||||
- Optionally show close/drag affordances only while hovered.
|
||||
```bash
|
||||
clawdbot nodes canvas present --node <id>
|
||||
clawdbot nodes canvas navigate --node <id> --url "/"
|
||||
clawdbot nodes canvas eval --node <id> --js "document.title"
|
||||
clawdbot nodes canvas snapshot --node <id>
|
||||
```
|
||||
|
||||
## Agent API surface (current)
|
||||
Notes:
|
||||
- `canvas.navigate` accepts **local canvas paths**, `http(s)` URLs, and `file://` URLs.
|
||||
- If you pass `"/"`, the Canvas shows the local scaffold or `index.html`.
|
||||
|
||||
Canvas is exposed via the Gateway **node bridge**, so the agent can:
|
||||
- Show/hide the panel.
|
||||
- Navigate to a path (relative to the session root).
|
||||
- Evaluate JavaScript and optionally return results.
|
||||
- Query/modify DOM (helpers mirroring “dom query/all/attr/click/type/wait” patterns).
|
||||
- Capture a snapshot image of the current canvas view.
|
||||
- Optionally set panel placement (screen `x/y` + `width/height`) when showing/navigating.
|
||||
## A2UI in Canvas
|
||||
|
||||
This should be modeled after `WebChatManager`/`WebChatSwiftUIWindowController` but targeting `clawdbot-canvas://…` URLs.
|
||||
A2UI is hosted by the Gateway canvas host and rendered inside the Canvas panel.
|
||||
When the Gateway advertises a Canvas host, the macOS app auto‑navigates to the
|
||||
A2UI host page on first open.
|
||||
|
||||
Related:
|
||||
- For “invoke the agent again from UI” flows, prefer the macOS deep link scheme (`clawdbot://agent?...`) so *any* UI surface (Canvas, WebChat, native views) can trigger a new agent run. See [`docs/macos.md`](/platforms/macos).
|
||||
|
||||
## Agent commands (current)
|
||||
|
||||
Use the main `clawdbot` CLI; it invokes canvas commands via `node.invoke`.
|
||||
|
||||
- `clawdbot nodes canvas present --node <id> [--target <...>] [--x/--y/--width/--height]`
|
||||
- Local targets map into the session directory via the custom scheme (directory targets resolve `index.html|index.htm`).
|
||||
- If `/` has no index file, Canvas shows the built-in scaffold page and returns `status: "welcome"`.
|
||||
- `clawdbot nodes canvas hide --node <id>`
|
||||
- `clawdbot nodes canvas eval --js <code> --node <id>`
|
||||
- `clawdbot nodes canvas snapshot --node <id>`
|
||||
|
||||
### Canvas A2UI
|
||||
|
||||
Canvas A2UI is hosted by the **Gateway canvas host** at:
|
||||
Default A2UI host URL:
|
||||
|
||||
```
|
||||
http://<gateway-host>:18793/__clawdbot__/a2ui/
|
||||
```
|
||||
|
||||
The macOS app simply renders that page in the Canvas panel. The agent can drive it with JSONL **server→client protocol messages** (one JSON object per line):
|
||||
### A2UI commands (v0.8)
|
||||
|
||||
- `clawdbot nodes canvas a2ui push --jsonl <path> --node <id>`
|
||||
- `clawdbot nodes canvas a2ui reset --node <id>`
|
||||
Canvas currently accepts **A2UI v0.8** server→client messages:
|
||||
|
||||
`push` expects a JSONL file where **each line is a single JSON object** (parsed and forwarded to the in-page A2UI renderer).
|
||||
- `beginRendering`
|
||||
- `surfaceUpdate`
|
||||
- `dataModelUpdate`
|
||||
- `deleteSurface`
|
||||
|
||||
Minimal example (v0.8):
|
||||
`createSurface` (v0.9) is not supported.
|
||||
|
||||
CLI example:
|
||||
|
||||
```bash
|
||||
cat > /tmp/a2ui-v0.8.jsonl <<'EOF'
|
||||
{"surfaceUpdate":{"surfaceId":"main","components":[{"id":"root","component":{"Column":{"children":{"explicitList":["title","content"]}}}},{"id":"title","component":{"Text":{"text":{"literalString":"Canvas (A2UI v0.8)"},"usageHint":"h1"}}},{"id":"content","component":{"Text":{"text":{"literalString":"If you can read this, `nodes canvas a2ui push` works."},"usageHint":"body"}}}]}}
|
||||
cat > /tmp/a2ui-v0.8.jsonl <<'EOFA2'
|
||||
{"surfaceUpdate":{"surfaceId":"main","components":[{"id":"root","component":{"Column":{"children":{"explicitList":["title","content"]}}}},{"id":"title","component":{"Text":{"text":{"literalString":"Canvas (A2UI v0.8)"},"usageHint":"h1"}}},{"id":"content","component":{"Text":{"text":{"literalString":"If you can read this, A2UI push works."},"usageHint":"body"}}}]}}
|
||||
{"beginRendering":{"surfaceId":"main","root":"root"}}
|
||||
EOF
|
||||
EOFA2
|
||||
|
||||
clawdbot nodes canvas a2ui push --jsonl /tmp/a2ui-v0.8.jsonl --node <id>
|
||||
```
|
||||
|
||||
Notes:
|
||||
- This does **not** support the A2UI v0.9 examples using `createSurface`.
|
||||
- A2UI **fails** if the Gateway canvas host is unreachable (no local fallback).
|
||||
- `nodes canvas a2ui push` validates JSONL (line numbers on errors) and rejects v0.9 payloads.
|
||||
- Quick smoke: `clawdbot nodes canvas a2ui push --node <id> --text "Hello from A2UI"` renders a minimal v0.8 view.
|
||||
Quick smoke:
|
||||
|
||||
## Triggering agent runs from Canvas (deep links)
|
||||
```bash
|
||||
clawdbot nodes canvas a2ui push --node <id> --text "Hello from A2UI"
|
||||
```
|
||||
|
||||
## Triggering agent runs from Canvas
|
||||
|
||||
Canvas can trigger new agent runs via deep links:
|
||||
|
||||
Canvas can trigger new agent runs via the macOS app deep-link scheme:
|
||||
- `clawdbot://agent?...`
|
||||
|
||||
This is intentionally separate from `clawdbot-canvas://…` (which is only for serving local Canvas files into the `WKWebView`).
|
||||
Example (in JS):
|
||||
|
||||
Suggested patterns:
|
||||
- HTML: render links/buttons that navigate to `clawdbot://agent?message=...`.
|
||||
- JS: set `window.location.href = 'clawdbot://agent?...'` for “run this now” actions.
|
||||
```js
|
||||
window.location.href = "clawdbot://agent?message=Review%20this%20design";
|
||||
```
|
||||
|
||||
Implementation note (important):
|
||||
- In `WKWebView`, intercept `clawdbot://…` navigations in `WKNavigationDelegate` and forward them to the app, e.g. by calling `DeepLinkHandler.shared.handle(url:)` and returning `.cancel` for the navigation.
|
||||
The app prompts for confirmation unless a valid key is provided.
|
||||
|
||||
Safety:
|
||||
- Deep links (`clawdbot://agent?...`) are always enabled.
|
||||
- Without a `key` query param, the app will prompt for confirmation before invoking the agent.
|
||||
- With a valid `key`, the run is unattended (no prompt). For Canvas-originated actions, the app injects an internal key automatically.
|
||||
## Security notes
|
||||
|
||||
## Security / guardrails
|
||||
|
||||
Recommended defaults:
|
||||
- `WKWebsiteDataStore.nonPersistent()` for Canvas (ephemeral).
|
||||
- Navigation policy: allow only `clawdbot-canvas://…` (and optionally `about:blank`); open `http/https` externally.
|
||||
- Scheme handler must prevent directory traversal: resolved file paths must stay under `<canvasRoot>/<session>/`.
|
||||
- Disable or tightly scope any JS bridge; prefer query-string/bootstrap config over `window.webkit.messageHandlers` for sensitive data.
|
||||
|
||||
## Debugging
|
||||
|
||||
Suggested debugging hooks:
|
||||
- Enable Web Inspector for Canvas builds (same approach as WebChat).
|
||||
- Log scheme requests + resolution decisions to OSLog (subsystem `com.clawdbot`, category `Canvas`).
|
||||
- Provide a “copy canvas dir” action in debug settings to quickly reveal the session directory in Finder.
|
||||
- Canvas scheme blocks directory traversal; files must live under the session root.
|
||||
- Local Canvas content uses a custom scheme (no loopback server required).
|
||||
- External `http(s)` URLs are allowed only when explicitly navigated.
|
||||
|
||||
@@ -1,72 +1,56 @@
|
||||
---
|
||||
summary: "Running the gateway as a child process of the macOS app and why"
|
||||
summary: "Gateway lifecycle on macOS (launchd + attach-only)"
|
||||
read_when:
|
||||
- Integrating the mac app with the gateway lifecycle
|
||||
---
|
||||
# Clawdbot gateway as a child process of the macOS app
|
||||
# Gateway lifecycle on macOS
|
||||
|
||||
Date: 2025-12-06 · Status: draft · Owner: steipete
|
||||
The macOS app **manages the Gateway via launchd** by default. This gives you
|
||||
reliable auto‑start at login and restart on crashes.
|
||||
|
||||
Note (2025-12-19): the current implementation prefers a **launchd LaunchAgent** that runs the **bundled bun-compiled gateway**. This doc remains as an alternative mode for tighter coupling to the UI.
|
||||
Child‑process mode (Gateway spawned directly by the app) is **not in use** today.
|
||||
If you need tighter coupling to the UI, use **Attach‑only** and run the Gateway
|
||||
manually in a terminal.
|
||||
|
||||
## Goal
|
||||
Run the Node-based Clawdbot/clawdbot gateway as a direct child of the LSUIElement app (instead of a launchd agent) while keeping all TCC-sensitive work inside the Swift app/broker layer and wiring the existing “Clawdbot Active” toggle to start/stop the child.
|
||||
## Default behavior (launchd)
|
||||
|
||||
## When to prefer the child-process mode
|
||||
- You want gateway lifetime strictly coupled to the menu-bar app (dies when the app quits) and controlled by the “Clawdbot Active” toggle without touching launchd.
|
||||
- You’re okay giving up login persistence/auto-restart that launchd provides, or you’ll add your own backoff loop.
|
||||
- You want simpler log capture and supervision inside the app (no external plist or user-visible LaunchAgent).
|
||||
- The app installs a per‑user LaunchAgent labeled `com.clawdbot.gateway`.
|
||||
- When Local mode is enabled, the app ensures the LaunchAgent is loaded and
|
||||
starts the Gateway if needed.
|
||||
- Logs are written to the launchd gateway log path (visible in Debug Settings).
|
||||
|
||||
## Tradeoffs vs. launchd
|
||||
- **Pros:** tighter coupling to UI state; simpler surface (no plist install/bootout); easier to stream stdout/stderr; fewer moving parts for beta users.
|
||||
- **Cons:** no built-in KeepAlive/login auto-start; app crash kills gateway; you must build your own restart/backoff; Activity Monitor will show both processes under the app; still need correct TCC handling (see below).
|
||||
- **TCC:** behaviorally, child processes often inherit the parent app’s “responsible process” for TCC, but this is *not a contract*. Continue to route all protected actions through the Swift app/broker so prompts stay tied to the signed app bundle.
|
||||
Common commands:
|
||||
|
||||
## TCC guardrails (must keep)
|
||||
- Screen Recording, Accessibility, mic, and speech prompts must originate from the signed Swift app/broker. The Node child should never call these APIs directly; route through the app’s node commands (via Gateway `node.invoke`) for:
|
||||
- `system.notify`
|
||||
- `system.run` (including `needsScreenRecording`)
|
||||
- `screen.record` / `camera.*`
|
||||
- PeekabooBridge UI automation (`peekaboo …`)
|
||||
- Usage strings (`NSMicrophoneUsageDescription`, `NSSpeechRecognitionUsageDescription`, etc.) stay in the app target’s Info.plist; a bare Node binary has none and would fail.
|
||||
- If you ever embed Node that *must* touch TCC, wrap that call in a tiny signed helper target inside the app bundle and have Node exec that helper instead of calling the API directly.
|
||||
```bash
|
||||
launchctl kickstart -k gui/$UID/com.clawdbot.gateway
|
||||
launchctl bootout gui/$UID/com.clawdbot.gateway
|
||||
```
|
||||
|
||||
## Process manager design (Swift Subprocess)
|
||||
- Add a small `GatewayProcessManager` (Swift) that owns:
|
||||
- `execution: Execution?` from `Swift Subprocess` to track the child.
|
||||
- `start(config)` called when “Clawdbot Active” flips ON:
|
||||
- binary: host Node running the bundled gateway under `Clawdbot.app/Contents/Resources/Gateway/`
|
||||
- args: current clawdbot entrypoint and flags
|
||||
- cwd/env: point to `~/.clawdbot` as today; inject the expanded PATH so Homebrew Node resolves under launchd
|
||||
- output: stream stdout/stderr to `/tmp/clawdbot-gateway.log` (cap buffer via Subprocess OutputLimits)
|
||||
- restart: optional linear/backoff restart if exit was non-zero and Active is still true
|
||||
- `stop()` called when Active flips OFF or app terminates: cancel the execution and `waitUntilExit`.
|
||||
- Wire SwiftUI toggle:
|
||||
- ON: `GatewayProcessManager.start(...)`
|
||||
- OFF: `GatewayProcessManager.stop()` (no launchctl calls in this mode)
|
||||
- Keep the existing `LaunchdManager` around so we can switch back if needed; the toggle can choose between launchd or child mode with a flag if we want both.
|
||||
## Attach‑only (developer mode)
|
||||
|
||||
## Packaging and signing
|
||||
- Bundle the gateway payload (dist + production node_modules) under `Contents/Resources/Gateway/`; rely on host Node ≥22 instead of embedding a runtime.
|
||||
- Codesign native addons and dylibs inside the bundle; no nested runtime binary to sign now.
|
||||
- Host runtime should not call TCC APIs directly; keep privileged work inside the app/broker.
|
||||
Attach‑only tells the app to **connect to an existing Gateway** without spawning
|
||||
one. This is ideal for local dev (hot‑reload, custom flags).
|
||||
|
||||
## Logging and observability
|
||||
- Stream child stdout/stderr to `/tmp/clawdbot-gateway.log`; surface the last N lines in the Debug tab.
|
||||
- Emit a user notification (via existing NotificationManager) on crash/exit while Active is true.
|
||||
- Add a lightweight heartbeat from Node → app (e.g., ping over stdout) so the app can show status in the menu.
|
||||
Steps:
|
||||
|
||||
## Failure/edge cases
|
||||
- App crash/quit kills the gateway. Decide if that is acceptable for the deployment tier; otherwise, stick with launchd for production and keep child-process for dev/experiments.
|
||||
- If the gateway exits repeatedly, back off (e.g., 1s/2s/5s/10s) and give up after N attempts with a menu warning.
|
||||
- Respect the existing pause semantics: when paused, the broker should return `ok=false, "clawdbot paused"`; the gateway should avoid calling privileged routes while paused.
|
||||
1) Start the Gateway yourself:
|
||||
```bash
|
||||
pnpm gateway:watch
|
||||
```
|
||||
2) In the macOS app: Debug Settings → Gateway → **Attach only**.
|
||||
|
||||
## Open questions / follow-ups
|
||||
- Do we need dual-mode (launchd for prod, child for dev)? If yes, gate via a setting or build flag.
|
||||
- Embedding a runtime is off the table for now; we rely on host Node for size/simplicity. Revisit only if host PATH drift becomes painful.
|
||||
- Do we want a tiny signed helper for rare TCC actions that cannot be brokered via the Swift app/broker?
|
||||
The UI should show “Using existing gateway …” once connected.
|
||||
|
||||
## Decision snapshot (current recommendation)
|
||||
- Keep all TCC surfaces in the Swift app/broker (node commands + PeekabooBridgeHost).
|
||||
- Implement `GatewayProcessManager` with Swift Subprocess to start/stop the gateway on the “Clawdbot Active” toggle.
|
||||
- Maintain the launchd path as a fallback for uptime/login persistence until child-mode proves stable.
|
||||
## Remote mode
|
||||
|
||||
Remote mode never starts a local Gateway. The app uses an SSH tunnel to the
|
||||
remote host and connects over that tunnel.
|
||||
|
||||
## Why we prefer launchd
|
||||
|
||||
- Auto‑start at login.
|
||||
- Built‑in restart/KeepAlive semantics.
|
||||
- Predictable logs and supervision.
|
||||
|
||||
If a true child‑process mode is ever needed again, it should be documented as a
|
||||
separate, explicit dev‑only mode.
|
||||
|
||||
@@ -1,170 +1,62 @@
|
||||
---
|
||||
summary: "Plan for integrating Peekaboo automation into Clawdbot via PeekabooBridge (socket-based TCC broker)"
|
||||
summary: "PeekabooBridge integration for macOS UI automation"
|
||||
read_when:
|
||||
- Hosting PeekabooBridge in Clawdbot.app
|
||||
- Integrating Peekaboo as a submodule
|
||||
- Changing PeekabooBridge protocol/paths
|
||||
---
|
||||
# Peekaboo Bridge in Clawdbot (macOS UI automation broker)
|
||||
# Peekaboo Bridge (macOS UI automation)
|
||||
|
||||
## TL;DR
|
||||
- **Peekaboo removed its XPC helper** and now exposes privileged automation via a **UNIX domain socket bridge** (`PeekabooBridge` / `PeekabooBridgeHost`, socket name `bridge.sock`).
|
||||
- Clawdbot integrates by **optionally hosting the same bridge** inside **Clawdbot.app** (user-toggleable). The primary client is the **`peekaboo` CLI** (installed via npm); Clawdbot does not need its own `ui …` CLI surface.
|
||||
- For **visualizations**, we keep them in **Peekaboo.app** (best UX); Clawdbot stays a thin broker host. No visualizer toggle in Clawdbot.
|
||||
Clawdbot can host **PeekabooBridge** as a local, permission‑aware UI automation
|
||||
broker. This lets the `peekaboo` CLI drive UI automation while reusing the
|
||||
macOS app’s TCC permissions.
|
||||
|
||||
Non-goals:
|
||||
- No auto-launching Peekaboo.app.
|
||||
- No onboarding deep links from the automation endpoint (Clawdbot onboarding already handles permissions).
|
||||
- No AI provider/agent runtime dependencies in Clawdbot (avoid pulling Tachikoma/MCP into the Clawdbot app/CLI).
|
||||
## What this is (and isn’t)
|
||||
|
||||
## Big refactor (Dec 2025): XPC → Bridge
|
||||
Peekaboo’s privileged execution moved from “CLI → XPC helper” to “CLI → socket bridge host”. For Clawdbot this is a win:
|
||||
- It matches the existing “local socket + codesign checks” approach.
|
||||
- It lets us piggyback on **either** Peekaboo.app’s permissions **or** Clawdbot.app’s permissions (whichever is running).
|
||||
- It avoids “two apps with two TCC bubbles” unless needed.
|
||||
- **Host**: Clawdbot.app can act as a PeekabooBridge host.
|
||||
- **Client**: use the `peekaboo` CLI (no separate `clawdbot ui ...` surface).
|
||||
- **UI**: visual overlays stay in Peekaboo.app; Clawdbot is a thin broker host.
|
||||
|
||||
Reference (Peekaboo submodule): `Peekaboo/docs/bridge-host.md`.
|
||||
## Enable the bridge
|
||||
|
||||
## Architecture
|
||||
### Processes
|
||||
- **Bridge hosts** (provide TCC-backed automation):
|
||||
- **Peekaboo.app** (preferred; also provides visualizations + controls)
|
||||
- **Claude.app** (secondary; lets `peekaboo` reuse Claude Desktop’s granted permissions)
|
||||
- **Clawdbot.app** (secondary; “thin host” only)
|
||||
- **Bridge clients** (trigger single actions):
|
||||
- `peekaboo …` (preferred; humans + agents)
|
||||
- Optional: Clawdbot/Node shells out to `peekaboo` when it needs UI automation/capture
|
||||
In the macOS app:
|
||||
- Settings → **Enable Peekaboo Bridge**
|
||||
|
||||
### Host discovery (client-side)
|
||||
Order is deliberate:
|
||||
1. Peekaboo.app host (full UX)
|
||||
2. Claude.app host (piggyback on Claude Desktop permissions)
|
||||
3. Clawdbot.app host (piggyback on Clawdbot permissions)
|
||||
When enabled, Clawdbot starts a local UNIX socket server. If disabled, the host
|
||||
is stopped and `peekaboo` will fall back to other available hosts.
|
||||
|
||||
Socket paths (convention; exact paths must match Peekaboo):
|
||||
- Peekaboo: `~/Library/Application Support/Peekaboo/bridge.sock`
|
||||
- Claude: `~/Library/Application Support/Claude/bridge.sock`
|
||||
- Clawdbot: `~/Library/Application Support/clawdbot/bridge.sock`
|
||||
## Client discovery order
|
||||
|
||||
No auto-launch: if a host isn’t reachable, the command fails with a clear error (start Peekaboo.app, Claude.app, or Clawdbot.app).
|
||||
Peekaboo clients typically try hosts in this order:
|
||||
|
||||
Override (debugging): set `PEEKABOO_BRIDGE_SOCKET=/path/to/bridge.sock`.
|
||||
1. Peekaboo.app (full UX)
|
||||
2. Claude.app (if installed)
|
||||
3. Clawdbot.app (thin broker)
|
||||
|
||||
### Protocol shape
|
||||
- **Single request per connection**: connect → write one JSON request → half-close → read one JSON response → close.
|
||||
- **Timeout**: 10 seconds end-to-end per action (client enforced; host should also enforce per-operation).
|
||||
- **Errors**: human-readable string by default; structured envelope in `--json`.
|
||||
Use `peekaboo bridge status --verbose` to see which host is active and which
|
||||
socket path is in use. You can override with:
|
||||
|
||||
## Dependency strategy (submodule)
|
||||
Integrate Peekaboo via git submodule (nested submodules are OK).
|
||||
```bash
|
||||
export PEEKABOO_BRIDGE_SOCKET=/path/to/bridge.sock
|
||||
```
|
||||
|
||||
Path in Clawdbot repo:
|
||||
- `./Peekaboo` (Swabble-style; keep stable so SwiftPM path deps don’t churn).
|
||||
## Security & permissions
|
||||
|
||||
What Clawdbot should use:
|
||||
- **Client side**: `PeekabooBridge` (socket client + protocol models).
|
||||
- **Host side (Clawdbot.app)**: `PeekabooBridgeHost` + the minimal Peekaboo services needed to implement operations.
|
||||
- The bridge validates **caller code signatures**; TeamID `Y5PE65HELJ` is
|
||||
allowed by default (Peekaboo’s signing team), plus the Clawdbot app’s TeamID.
|
||||
- Requests time out after ~10 seconds.
|
||||
- If required permissions are missing, the bridge returns a clear error message
|
||||
rather than launching System Settings.
|
||||
|
||||
What Clawdbot should *not* embed:
|
||||
- **Visualizer UI**: keep it in Peekaboo.app for now (toggle + controls live there).
|
||||
- **XPC**: don’t reintroduce helper targets; use the bridge.
|
||||
## Snapshot behavior (automation)
|
||||
|
||||
## IPC / CLI surface
|
||||
### No `clawdbot ui …`
|
||||
We avoid a parallel “Clawdbot UI automation CLI”. Instead:
|
||||
- `peekaboo` is the user/agent-facing CLI surface for automation and capture.
|
||||
- Clawdbot.app can host PeekabooBridge as a **thin TCC broker** so Peekaboo can piggyback on Clawdbot permissions when Peekaboo.app isn’t running.
|
||||
Snapshots are stored in memory and expire automatically after a short window.
|
||||
If you need longer retention, re‑capture from the client.
|
||||
|
||||
### Diagnostics
|
||||
Use Peekaboo’s built-in diagnostics to see which host would be used:
|
||||
- `peekaboo bridge status`
|
||||
- `peekaboo bridge status --verbose`
|
||||
- `peekaboo bridge status --json`
|
||||
## Troubleshooting
|
||||
|
||||
### Output format
|
||||
Peekaboo commands default to human text output. Add `--json` for a structured envelope.
|
||||
|
||||
### Timeouts
|
||||
Default timeout for UI actions: **10 seconds** end-to-end (client enforced; host should also enforce per-operation).
|
||||
|
||||
## Coordinate model (multi-display)
|
||||
Requirement: coordinates are **per screen**, not global.
|
||||
|
||||
Standardize for the CLI (agent-friendly): **top-left origin per screen**.
|
||||
|
||||
Proposed request shape:
|
||||
- Requests accept `screenIndex` + `{x, y}` in that screen’s local coordinate space.
|
||||
- Clawdbot.app converts to global CG coordinates using `NSScreen.screens[screenIndex].frame.origin`.
|
||||
- Responses should echo both:
|
||||
- The resolved `screenIndex`
|
||||
- The local `{x, y}` and bounds
|
||||
- Optionally the global `{x, y}` for debugging
|
||||
|
||||
Ordering: use `NSScreen.screens` ordering consistently (documented in the CLI help + JSON schema).
|
||||
|
||||
## Targeting (per app/window)
|
||||
Expose window/app targeting in the UI surface (align with Peekaboo targeting):
|
||||
- frontmost
|
||||
- by app name / bundle id
|
||||
- by window title substring
|
||||
- by (app, index)
|
||||
|
||||
Peekaboo CLI targeting (agent-friendly):
|
||||
- `--bundle-id <id>` for app targeting
|
||||
- `--window-index <n>` (0-based) for disambiguating within an app when capturing
|
||||
|
||||
All “see/click/type/scroll/wait” requests should accept a target (default: frontmost).
|
||||
|
||||
## “See” + click packs (Playwright-style)
|
||||
Behavior stays aligned with Peekaboo:
|
||||
- `peekaboo see` returns element IDs (e.g. `B1`, `T3`) with bounds/labels.
|
||||
- Follow-up actions reference those IDs without re-scanning.
|
||||
|
||||
`peekaboo see` should:
|
||||
- capture (optionally targeted) window/screen
|
||||
- return a screenshot **file path** (default: temp directory)
|
||||
- return a list of elements (text or JSON)
|
||||
|
||||
Snapshot lifecycle requirement:
|
||||
- Host apps are long-lived, so snapshot state should be **in-memory by default**.
|
||||
- Snapshot scoping: “implicit snapshot” is **per target bundle id** (reuse last snapshot for that app when snapshot id is omitted).
|
||||
|
||||
Practical flow (agent-friendly):
|
||||
- `peekaboo list apps` / `peekaboo list windows` provide bundle-id context for targeting.
|
||||
- `peekaboo see --bundle-id X` updates the implicit snapshot for `X`.
|
||||
- `peekaboo click --bundle-id X --on B1` reuses the most recent snapshot for `X` when `--snapshot-id` is omitted.
|
||||
|
||||
## Visualizer integration
|
||||
Keep visualizations in **Peekaboo.app** for now.
|
||||
- Clawdbot hosts the bridge, but does not render overlays.
|
||||
- Any “visualizer enabled/disabled” setting is controlled in Peekaboo.app.
|
||||
|
||||
## Screenshots (legacy → Peekaboo takeover)
|
||||
Clawdbot should not grow a separate screenshot CLI surface.
|
||||
|
||||
Migration plan:
|
||||
- Use `peekaboo capture …` / `peekaboo see …` (returns a file path, default temp directory).
|
||||
- Once Clawdbot’ legacy screenshot plumbing is replaced, remove it cleanly (no aliases).
|
||||
|
||||
## Permissions behavior
|
||||
If required permissions are missing:
|
||||
- return `ok=false` with a short human error message (e.g., “Accessibility permission missing”)
|
||||
- do not try to open System Settings from the automation endpoint
|
||||
|
||||
## Security (socket auth)
|
||||
Both hosts must enforce:
|
||||
- filesystem perms on the socket path (owner read/write only)
|
||||
- server-side caller validation:
|
||||
- require the caller’s code signature TeamID to be `Y5PE65HELJ`
|
||||
- optional bundle-id allowlist for tighter scoping
|
||||
|
||||
Debug-only escape hatch (development convenience):
|
||||
- “allow same-UID callers” means: *skip codesign checks for clients running under the same Unix user*.
|
||||
- This must be **opt-in**, **DEBUG-only**, and guarded by an env var (Peekaboo uses `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1`).
|
||||
|
||||
## Next integration steps (after this doc)
|
||||
1. Add Peekaboo as a git submodule (nested submodules OK).
|
||||
2. Host `PeekabooBridgeHost` inside Clawdbot.app behind a single setting (“Enable Peekaboo Bridge”, default on).
|
||||
3. Ensure Clawdbot hosts the bridge at `~/Library/Application Support/clawdbot/bridge.sock` and speaks the PeekabooBridge JSON protocol.
|
||||
4. Validate with `peekaboo bridge status --verbose` that Peekaboo can select Clawdbot as the fallback host (no auto-launch).
|
||||
5. Keep all protocol decisions aligned with Peekaboo (coordinate system, element IDs, snapshot scoping, error envelopes).
|
||||
- If `peekaboo` reports “bridge client is not authorized”, ensure the client is
|
||||
properly signed or run the host with `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1`
|
||||
in **debug** mode only.
|
||||
- If no hosts are found, open one of the host apps (Peekaboo.app or Clawdbot.app)
|
||||
and confirm permissions are granted.
|
||||
|
||||
@@ -3,25 +3,37 @@ summary: "How the mac app embeds the gateway WebChat and how to debug it"
|
||||
read_when:
|
||||
- Debugging mac WebChat view or loopback port
|
||||
---
|
||||
# Web Chat (macOS app)
|
||||
# WebChat (macOS app)
|
||||
|
||||
The macOS menu bar app shows the WebChat UI as a native SwiftUI view and reuses the **primary Clawd session** (`main`, or `global` when scope is global).
|
||||
The macOS menu bar app embeds the WebChat UI as a native SwiftUI view. It
|
||||
connects to the Gateway and defaults to the **main session** for the selected
|
||||
agent (with a session switcher for other sessions).
|
||||
|
||||
- **Local mode**: connects directly to the local Gateway WebSocket.
|
||||
- **Remote mode**: forwards the Gateway WebSocket control port over SSH and uses that as the data plane.
|
||||
- **Remote mode**: forwards the Gateway control port over SSH and uses that
|
||||
tunnel as the data plane.
|
||||
|
||||
## Launch & debugging
|
||||
|
||||
- Manual: Lobster menu → “Open Chat”.
|
||||
- Auto-open for testing: run `dist/Clawdbot.app/Contents/MacOS/Clawdbot --webchat` (or pass `--webchat` to the binary launched by launchd). The window opens on startup.
|
||||
- Logs: see [`./scripts/clawlog.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/clawlog.sh) (subsystem `com.clawdbot`, category `WebChatSwiftUI`).
|
||||
- Auto‑open for testing:
|
||||
```bash
|
||||
dist/Clawdbot.app/Contents/MacOS/Clawdbot --webchat
|
||||
```
|
||||
- Logs: `./scripts/clawlog.sh` (subsystem `com.clawdbot`, category `WebChatSwiftUI`).
|
||||
|
||||
## How it’s wired
|
||||
- Implementation: [`apps/macos/Sources/Clawdbot/WebChatSwiftUI.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/WebChatSwiftUI.swift) hosts `ClawdbotChatUI` and speaks to the Gateway over `GatewayConnection`.
|
||||
- Data plane: Gateway WebSocket methods `chat.history`, `chat.send`, `chat.abort`; events `chat`, `agent`, `presence`, `tick`, `health`.
|
||||
- Session: usually primary (`main`); multiple transports (WhatsApp/Telegram/Discord/Desktop) share the same key. The onboarding flow uses a dedicated `onboarding` session to keep first-run setup separate.
|
||||
|
||||
## Security / surface area
|
||||
- Data plane: Gateway WS methods `chat.history`, `chat.send`, `chat.abort` and
|
||||
events `chat`, `agent`, `presence`, `tick`, `health`.
|
||||
- Session: defaults to the primary session (`main`, or `global` when scope is
|
||||
global). The UI can switch between sessions.
|
||||
- Onboarding uses a dedicated session to keep first‑run setup separate.
|
||||
|
||||
## Security surface
|
||||
|
||||
- Remote mode forwards only the Gateway WebSocket control port over SSH.
|
||||
|
||||
## Known limitations
|
||||
- The UI is optimized for the primary session and typical “chat” usage (not a full browser-based sandbox surface).
|
||||
|
||||
- The UI is optimized for chat sessions (not a full browser sandbox).
|
||||
|
||||
@@ -3,7 +3,7 @@ summary: "macOS IPC architecture for Clawdbot app, gateway node bridge, and Peek
|
||||
read_when:
|
||||
- Editing IPC contracts or menu bar app IPC
|
||||
---
|
||||
# Clawdbot macOS IPC architecture (Dec 2025)
|
||||
# Clawdbot macOS IPC architecture
|
||||
|
||||
**Current model:** there is **no local control socket** and no `clawdbot-mac` CLI. All agent actions go through the Gateway WebSocket and `node.invoke`. UI automation still uses PeekabooBridge.
|
||||
|
||||
@@ -21,10 +21,10 @@ read_when:
|
||||
- UI automation uses a separate UNIX socket named `bridge.sock` and the PeekabooBridge JSON protocol.
|
||||
- Host preference order (client-side): Peekaboo.app → Claude.app → Clawdbot.app → local execution.
|
||||
- Security: bridge hosts require TeamID `Y5PE65HELJ`; DEBUG-only same-UID escape hatch is guarded by `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` (Peekaboo convention).
|
||||
- See: [`docs/mac/peekaboo.md`](/platforms/mac/peekaboo) for the Clawdbot plan and naming.
|
||||
- See: [`docs/mac/peekaboo.md`](/platforms/mac/peekaboo) for PeekabooBridge usage.
|
||||
|
||||
### Mach/XPC (future direction)
|
||||
- Still optional for internal app services, but **not required** for automation now that node.invoke is the surface.
|
||||
### Mach/XPC
|
||||
- Not required for automation; `node.invoke` + PeekabooBridge cover current needs.
|
||||
|
||||
## Operational flows
|
||||
- Restart/rebuild: `SIGN_IDENTITY="Apple Development: Peter Steinberger (2ZAC4GM7GD)" scripts/restart-mac.sh`
|
||||
@@ -37,4 +37,4 @@ read_when:
|
||||
- Prefer requiring a TeamID match for all privileged surfaces.
|
||||
- PeekabooBridge: `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` (DEBUG-only) may allow same-UID callers for local development.
|
||||
- All communication remains local-only; no network sockets are exposed.
|
||||
- TCC prompts originate only from the GUI app bundle; run [`scripts/package-mac-app.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/package-mac-app.sh) so the signed bundle ID stays stable.
|
||||
- TCC prompts originate only from the GUI app bundle; keep the signed bundle ID stable across rebuilds.
|
||||
|
||||
@@ -1,123 +1,97 @@
|
||||
---
|
||||
summary: "Spec for the Clawdbot macOS companion menu bar app (gateway + node broker)"
|
||||
summary: "Clawdbot macOS companion app (menu bar + gateway broker)"
|
||||
read_when:
|
||||
- Implementing macOS app features
|
||||
- Changing gateway lifecycle or node bridging on macOS
|
||||
---
|
||||
# Clawdbot macOS Companion (menu bar + gateway broker)
|
||||
|
||||
Author: steipete · Status: draft spec · Date: 2025-12-20
|
||||
The macOS app is the **menu‑bar companion** for Clawdbot. It owns permissions,
|
||||
manages the Gateway locally, and exposes macOS capabilities to the agent as a
|
||||
node.
|
||||
|
||||
## Support snapshot
|
||||
- Core Gateway: supported (TypeScript on Node/Bun).
|
||||
- Companion app: macOS menu bar app with permissions + node bridge.
|
||||
- Install: [Getting Started](/start/getting-started) or [Install & updates](/install/updating).
|
||||
- Gateway: [Runbook](/gateway) + [Configuration](/gateway/configuration).
|
||||
## What it does
|
||||
|
||||
## System control (launchd)
|
||||
If you run the bundled macOS app, it installs a per-user LaunchAgent labeled `com.clawdbot.gateway`.
|
||||
CLI-only installs can use `clawdbot onboard --install-daemon`, `clawdbot daemon install`, or `clawdbot configure` → **Gateway daemon**.
|
||||
- Shows native notifications and status in the menu bar.
|
||||
- Owns TCC prompts (Notifications, Accessibility, Screen Recording, Microphone,
|
||||
Speech Recognition, Automation/AppleScript).
|
||||
- Runs or connects to the Gateway (local or remote).
|
||||
- Exposes macOS‑only tools (Canvas, Camera, Screen Recording, `system.run`).
|
||||
- Optionally hosts **PeekabooBridge** for UI automation.
|
||||
- Installs a helper CLI (`clawdbot`) into `/usr/local/bin` and
|
||||
`/opt/homebrew/bin` on request.
|
||||
|
||||
## Local vs remote mode
|
||||
|
||||
- **Local** (default): the app ensures a local Gateway is running via launchd.
|
||||
- **Remote**: the app connects to a Gateway over SSH/Tailscale and never starts
|
||||
a local process.
|
||||
- **Attach‑only** (debug): the app connects to an already‑running local Gateway
|
||||
and never spawns its own.
|
||||
|
||||
## Launchd control
|
||||
|
||||
The app manages a per‑user LaunchAgent labeled `com.clawdbot.gateway`.
|
||||
|
||||
```bash
|
||||
launchctl kickstart -k gui/$UID/com.clawdbot.gateway
|
||||
launchctl bootout gui/$UID/com.clawdbot.gateway
|
||||
```
|
||||
|
||||
`launchctl` only works if the LaunchAgent is installed; otherwise run `clawdbot daemon install` first.
|
||||
If the LaunchAgent isn’t installed, enable it from the app or run
|
||||
`clawdbot daemon install`.
|
||||
|
||||
Details: [Gateway runbook](/gateway) and [Bundled bun Gateway](/platforms/mac/bun).
|
||||
## Node capabilities (mac)
|
||||
|
||||
## Purpose
|
||||
- Single macOS menu-bar app named **Clawdbot** that:
|
||||
- Shows native notifications for Clawdbot/clawdbot events.
|
||||
- Owns TCC prompts (Notifications, Accessibility, Screen Recording, Automation/AppleScript, Microphone, Speech Recognition).
|
||||
- Runs (or connects to) the **Gateway** and exposes itself as a **node** so agents can reach macOS‑only features.
|
||||
- Hosts **PeekabooBridge** for UI automation (consumed by `peekaboo`; see [`docs/mac/peekaboo.md`](/platforms/mac/peekaboo)).
|
||||
- Installs a single CLI (`clawdbot`) by symlinking the bundled binary.
|
||||
The macOS app presents itself as a node. Common commands:
|
||||
|
||||
## High-level design
|
||||
- SwiftPM package in `apps/macos/` (macOS 15+, Swift 6).
|
||||
- Targets:
|
||||
- `ClawdbotIPC` (shared Codable types + helpers for app‑internal actions).
|
||||
- `Clawdbot` (LSUIElement MenuBarExtra app; hosts Gateway + node bridge + PeekabooBridgeHost).
|
||||
- Bundle ID: `com.clawdbot.mac`.
|
||||
- Bundled runtime binaries live under `Contents/Resources/Relay/`:
|
||||
- `clawdbot` (bun‑compiled relay: CLI + gateway)
|
||||
- The app symlinks `clawdbot` into `/usr/local/bin` and `/opt/homebrew/bin`.
|
||||
|
||||
## Gateway + node bridge
|
||||
- The mac app runs the Gateway in **local** mode (unless configured remote).
|
||||
- The gateway port is configurable via `gateway.port` or `CLAWDBOT_GATEWAY_PORT` (default 18789). The mac app reads that value for launchd, probes, and remote SSH tunnels.
|
||||
- The mac app connects to the bridge as a **node** and advertises capabilities/commands.
|
||||
- Agent‑facing actions are exposed via `node.invoke` (no local control socket).
|
||||
- The mac app watches `~/.clawdbot/clawdbot.json` and switches modes live when `gateway.mode` or `gateway.remote.url` changes.
|
||||
- If `gateway.mode` is unset but `gateway.remote.url` is set, the mac app treats it as remote mode.
|
||||
- Changing connection mode in the mac app writes `gateway.mode` (and `gateway.remote.url` in remote mode) back to the config file.
|
||||
|
||||
### Node commands (mac)
|
||||
- Canvas: `canvas.present|navigate|eval|snapshot|a2ui.*`
|
||||
- Camera: `camera.snap|camera.clip`
|
||||
- Canvas: `canvas.present`, `canvas.navigate`, `canvas.eval`, `canvas.snapshot`, `canvas.a2ui.*`
|
||||
- Camera: `camera.snap`, `camera.clip`
|
||||
- Screen: `screen.record`
|
||||
- System: `system.run` (shell) and `system.notify`
|
||||
- System: `system.run`, `system.notify`
|
||||
|
||||
### Permission advertising
|
||||
- Nodes include a `permissions` map in hello/pairing.
|
||||
- The Gateway surfaces it via `node.list` / `node.describe` so agents can decide what to run.
|
||||
The node reports a `permissions` map so agents can decide what’s allowed.
|
||||
|
||||
## CLI (`clawdbot`)
|
||||
- The **only** CLI is `clawdbot` (TS/bun). There is no `clawdbot-mac` helper.
|
||||
- For mac‑specific actions, the CLI uses `node.invoke`:
|
||||
- `clawdbot nodes canvas present|navigate|eval|snapshot|a2ui push|a2ui reset`
|
||||
- `clawdbot nodes run --node <id> -- <command...>`
|
||||
- `clawdbot nodes notify --node <id> --title ...`
|
||||
## Deep links
|
||||
|
||||
## Onboarding
|
||||
- Install CLI (symlink) → Permissions checklist → Test notification → Done.
|
||||
- Remote mode skips local gateway/CLI steps.
|
||||
- Selecting Local auto-enables the bundled Gateway via launchd (unless “Attach only” debug mode is enabled).
|
||||
|
||||
## Deep links (URL scheme)
|
||||
|
||||
Clawdbot (the macOS app) registers a URL scheme for triggering local actions from anywhere (browser, Shortcuts, CLI, etc.).
|
||||
|
||||
Scheme:
|
||||
- `clawdbot://…`
|
||||
The app registers the `clawdbot://` URL scheme for local actions.
|
||||
|
||||
### `clawdbot://agent`
|
||||
|
||||
Triggers a Gateway `agent` request (same machinery as WebChat/agent runs).
|
||||
|
||||
Example:
|
||||
Triggers a Gateway `agent` request.
|
||||
|
||||
```bash
|
||||
open 'clawdbot://agent?message=Hello%20from%20deep%20link'
|
||||
```
|
||||
|
||||
Query parameters:
|
||||
- `message` (required): the agent prompt (URL-encoded).
|
||||
- `sessionKey` (optional): explicit session key to use.
|
||||
- `thinking` (optional): thinking hint (e.g. `low`; omit for default).
|
||||
- `deliver` (optional): `true|false` (default: false).
|
||||
- `to` / `provider` (optional): forwarded to the Gateway `agent` method (only meaningful with `deliver=true`).
|
||||
- `timeoutSeconds` (optional): timeout hint forwarded to the Gateway.
|
||||
- `key` (optional): unattended mode key (see below).
|
||||
- `message` (required)
|
||||
- `sessionKey` (optional)
|
||||
- `thinking` (optional)
|
||||
- `deliver` / `to` / `provider` (optional)
|
||||
- `timeoutSeconds` (optional)
|
||||
- `key` (optional unattended mode key)
|
||||
|
||||
Safety/guardrails:
|
||||
- Always enabled.
|
||||
- Without a `key` query param, the app will prompt for confirmation before invoking the agent.
|
||||
- With `key=<value>`, Clawdbot runs without prompting (intended for personal automations).
|
||||
- The current key is shown in Debug Settings and stored locally in UserDefaults.
|
||||
Safety:
|
||||
- Without `key`, the app prompts for confirmation.
|
||||
- With a valid `key`, the run is unattended (intended for personal automations).
|
||||
|
||||
Notes:
|
||||
- In local mode, Clawdbot will start the local Gateway if needed before issuing the request.
|
||||
- In remote mode, Clawdbot will use the configured remote tunnel/endpoint.
|
||||
## Onboarding flow (typical)
|
||||
|
||||
1) Install and launch **Clawdbot.app**.
|
||||
2) Complete the permissions checklist (TCC prompts).
|
||||
3) Ensure **Local** mode is active and the Gateway is running.
|
||||
4) Install the CLI helper if you want terminal access.
|
||||
|
||||
## Build & dev workflow (native)
|
||||
- `cd apps/macos && swift build` (debug) / `swift build -c release`.
|
||||
- Run app for dev: `swift run Clawdbot` (or Xcode scheme).
|
||||
- Package app + CLI: [`scripts/package-mac-app.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/package-mac-app.sh) (builds bun CLI + gateway).
|
||||
- Tests: add Swift Testing suites under `apps/macos/Tests`.
|
||||
|
||||
## Open questions / decisions
|
||||
- Should `system.run` support streaming stdout/stderr or keep buffered responses only?
|
||||
- Should we allow node‑side permission prompts, or always require explicit app UI action?
|
||||
- `cd apps/macos && swift build`
|
||||
- `swift run Clawdbot` (or Xcode)
|
||||
- Package app + CLI: `scripts/package-mac-app.sh`
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Gateway runbook](/gateway)
|
||||
- [Bundled bun Gateway](/platforms/mac/bun)
|
||||
- [macOS permissions](/platforms/mac/permissions)
|
||||
- [Canvas](/platforms/mac/canvas)
|
||||
|
||||
@@ -107,18 +107,16 @@ Slack's Conversations API is type-scoped: you only need the scopes for the
|
||||
conversation types you actually touch (channels, groups, im, mpim). See
|
||||
https://api.slack.com/docs/conversations-api for the overview.
|
||||
|
||||
### Required by current code
|
||||
### Required scopes
|
||||
- `chat:write` (send/update/delete messages via `chat.postMessage`)
|
||||
https://api.slack.com/methods/chat.postMessage
|
||||
- `im:write` (open DMs via `conversations.open` for user DMs)
|
||||
https://api.slack.com/methods/conversations.open
|
||||
- `channels:history`, `groups:history`, `im:history`, `mpim:history`
|
||||
(`conversations.history` in [`src/slack/actions.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/slack/actions.ts))
|
||||
https://api.slack.com/methods/conversations.history
|
||||
- `channels:read`, `groups:read`, `im:read`, `mpim:read`
|
||||
(`conversations.info` in [`src/slack/monitor.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/slack/monitor.ts))
|
||||
https://api.slack.com/methods/conversations.info
|
||||
- `users:read` (`users.info` in [`src/slack/monitor.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/slack/monitor.ts) + [`src/slack/actions.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/slack/actions.ts))
|
||||
- `users:read` (user lookup)
|
||||
https://api.slack.com/methods/users.info
|
||||
- `reactions:read`, `reactions:write` (`reactions.get` / `reactions.add`)
|
||||
https://api.slack.com/methods/reactions.get
|
||||
|
||||
@@ -188,8 +188,3 @@ Recommended for personal numbers:
|
||||
- Subsystems: `whatsapp/inbound`, `whatsapp/outbound`, `web-heartbeat`, `web-reconnect`.
|
||||
- Log file: `/tmp/clawdbot/clawdbot-YYYY-MM-DD.log` (configurable).
|
||||
- Troubleshooting guide: [`docs/troubleshooting.md`](/gateway/troubleshooting).
|
||||
|
||||
## Tests
|
||||
- [`src/web/auto-reply.test.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/web/auto-reply.test.ts) (mention gating, history injection, reply flow)
|
||||
- [`src/web/monitor-inbox.test.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/web/monitor-inbox.test.ts) (inbound parsing + reply context)
|
||||
- [`src/web/outbound.test.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/web/outbound.test.ts) (send mapping + media)
|
||||
|
||||
@@ -125,7 +125,7 @@ Use these hubs to discover every page, including deep dives and reference docs t
|
||||
- [Linux](https://docs.clawd.bot/platforms/linux)
|
||||
- [Web surfaces](https://docs.clawd.bot/web)
|
||||
|
||||
## macOS companion app (internals)
|
||||
## macOS companion app (advanced)
|
||||
|
||||
- [macOS dev setup](https://docs.clawd.bot/platforms/mac/dev-setup)
|
||||
- [macOS menu bar](https://docs.clawd.bot/platforms/mac/menu-bar)
|
||||
@@ -144,7 +144,7 @@ Use these hubs to discover every page, including deep dives and reference docs t
|
||||
- [macOS bun gateway](https://docs.clawd.bot/platforms/mac/bun)
|
||||
- [macOS XPC](https://docs.clawd.bot/platforms/mac/xpc)
|
||||
- [macOS skills](https://docs.clawd.bot/platforms/mac/skills)
|
||||
- [macOS Peekaboo plan](https://docs.clawd.bot/platforms/mac/peekaboo)
|
||||
- [macOS Peekaboo](https://docs.clawd.bot/platforms/mac/peekaboo)
|
||||
|
||||
## Workspace + templates
|
||||
|
||||
@@ -160,13 +160,13 @@ Use these hubs to discover every page, including deep dives and reference docs t
|
||||
- [Templates: TOOLS](https://docs.clawd.bot/reference/templates/TOOLS)
|
||||
- [Templates: USER](https://docs.clawd.bot/reference/templates/USER)
|
||||
|
||||
## Experiments + proposals
|
||||
## Experiments (exploratory)
|
||||
|
||||
- [Onboarding config protocol](https://docs.clawd.bot/experiments/onboarding-config-protocol)
|
||||
- [Plan: cron hardening](https://docs.clawd.bot/experiments/plans/cron-add-hardening)
|
||||
- [Plan: group policy hardening](https://docs.clawd.bot/experiments/plans/group-policy-hardening)
|
||||
- [Cron hardening notes](https://docs.clawd.bot/experiments/plans/cron-add-hardening)
|
||||
- [Group policy hardening notes](https://docs.clawd.bot/experiments/plans/group-policy-hardening)
|
||||
- [Research: memory](https://docs.clawd.bot/experiments/research/memory)
|
||||
- [Proposal: model config](https://docs.clawd.bot/experiments/proposals/model-config)
|
||||
- [Model config exploration](https://docs.clawd.bot/experiments/proposals/model-config)
|
||||
|
||||
## Testing + release
|
||||
|
||||
|
||||
@@ -1,207 +1,102 @@
|
||||
---
|
||||
summary: "Planned first-run onboarding flow for Clawdbot (local vs remote, OAuth auth, workspace bootstrap ritual)"
|
||||
summary: "First-run onboarding flow for Clawdbot (macOS app)"
|
||||
read_when:
|
||||
- Designing the macOS onboarding assistant
|
||||
- Implementing Anthropic/OpenAI auth or identity setup
|
||||
- Implementing auth or identity setup
|
||||
---
|
||||
# Onboarding (macOS app)
|
||||
|
||||
This doc describes the intended **first-run onboarding** for Clawdbot. The goal is a good “day 0” experience: pick where the Gateway runs, bind subscription auth (Anthropic or OpenAI) for the embedded agent runtime, and then let the **agent bootstrap itself** via a first-run ritual in the workspace.
|
||||
This doc describes the **current** first‑run onboarding flow. The goal is a
|
||||
smooth “day 0” experience: pick where the Gateway runs, connect auth, run the
|
||||
wizard, and let the agent bootstrap itself.
|
||||
|
||||
## Page order (high level)
|
||||
## Page order (current)
|
||||
|
||||
1) **Local vs Remote**
|
||||
2) **(Local only)** Connect subscription auth (Anthropic / OpenAI OAuth) — optional, but recommended
|
||||
3) **Connect Gmail (optional)** — run `clawdbot hooks gmail setup` to configure Pub/Sub hooks
|
||||
4) **Onboarding chat** — dedicated session where the agent introduces itself and guides setup
|
||||
1) Welcome + security notice
|
||||
2) **Gateway selection** (Local / Remote / Configure later)
|
||||
3) **Auth (Anthropic OAuth)** — local only
|
||||
4) **Setup Wizard** (Gateway‑driven)
|
||||
5) **Permissions** (TCC prompts)
|
||||
6) **CLI helper** (optional)
|
||||
7) **Onboarding chat** (dedicated session)
|
||||
8) Ready
|
||||
|
||||
## 1) Local vs Remote
|
||||
|
||||
First question: where does the **Gateway** run?
|
||||
Where does the **Gateway** run?
|
||||
|
||||
- **Local (this Mac):** onboarding can run OAuth flows and write OAuth credentials locally.
|
||||
- **Remote (over SSH/tailnet):** onboarding must not run OAuth locally, because credentials must exist on the **gateway host**.
|
||||
- **Local (this Mac):** onboarding can run OAuth flows and write credentials
|
||||
locally.
|
||||
- **Remote (over SSH/Tailnet):** onboarding does **not** run OAuth locally;
|
||||
credentials must exist on the gateway host.
|
||||
- **Configure later:** skip setup and leave the app unconfigured.
|
||||
|
||||
Gateway auth tip:
|
||||
- If you only use Clawdbot on this Mac (loopback gateway), keep auth **Off**.
|
||||
- Use **Token** for multi-machine access or non-loopback binds.
|
||||
- If you only use Clawdbot locally (loopback), auth can be **Off**.
|
||||
- Use a **token** for multi‑machine access or non‑loopback binds.
|
||||
|
||||
Implementation note (2025-12-19): in local mode, the macOS app bundles the Gateway and enables it via a per-user launchd LaunchAgent (no global npm install/Node requirement for the user).
|
||||
## 2) Local-only auth (Anthropic OAuth)
|
||||
|
||||
## 2) Local-only: Connect subscription auth (Anthropic / OpenAI OAuth)
|
||||
The macOS app supports Anthropic OAuth (Claude Pro/Max). The flow:
|
||||
|
||||
This is the “bind Clawdbot to subscription auth” step. It is explicitly the **Anthropic (Claude Pro/Max)** or **OpenAI (ChatGPT/Codex)** OAuth flow, not a generic “login”.
|
||||
- Opens the browser for OAuth (PKCE)
|
||||
- Asks the user to paste the `code#state` value
|
||||
- Writes credentials to `~/.clawdbot/credentials/oauth.json`
|
||||
|
||||
More detail: [/concepts/oauth](/concepts/oauth)
|
||||
Other providers (OpenAI, custom APIs) are configured via environment variables
|
||||
or config files for now.
|
||||
|
||||
### Recommended: OAuth (Anthropic)
|
||||
## 3) Setup Wizard (Gateway‑driven)
|
||||
|
||||
The macOS app should:
|
||||
- Start the Anthropic OAuth (PKCE) flow in the user’s browser.
|
||||
- Ask the user to paste the `code#state` value.
|
||||
- Exchange it for tokens and write credentials to:
|
||||
- `~/.clawdbot/credentials/oauth.json` (file mode `0600`, directory mode `0700`)
|
||||
The app can run the same setup wizard as the CLI. This keeps onboarding in sync
|
||||
with Gateway‑side behavior and avoids duplicating logic in SwiftUI.
|
||||
|
||||
Why this location matters: it’s the Clawdbot-owned OAuth store.
|
||||
Clawdbot also imports `oauth.json` into the agent auth profile store (`~/.clawdbot/agents/<agentId>/agent/auth-profiles.json`) on first use.
|
||||
## 4) Permissions
|
||||
|
||||
### Recommended: OAuth (OpenAI Codex)
|
||||
Onboarding requests TCC permissions needed for:
|
||||
|
||||
The macOS app should:
|
||||
- Start the OpenAI Codex OAuth (PKCE) flow in the user’s browser.
|
||||
- Auto-capture the callback on `http://127.0.0.1:1455/auth/callback` when possible.
|
||||
- If the callback fails, prompt the user to paste the redirect URL or code.
|
||||
- Store credentials in `~/.clawdbot/credentials/oauth.json` (same OAuth store as Anthropic).
|
||||
- Set `agent.model` to `openai-codex/gpt-5.2` when the model is unset or `openai/*`.
|
||||
- Notifications
|
||||
- Accessibility
|
||||
- Screen Recording
|
||||
- Microphone / Speech Recognition
|
||||
- Automation (AppleScript)
|
||||
|
||||
### Alternative: API key (instructions only)
|
||||
## 5) CLI helper (optional)
|
||||
|
||||
Offer an “API key” option, but for now it is **instructions only**:
|
||||
- Get an Anthropic API key.
|
||||
- Provide it to Clawdbot via your preferred mechanism (env/config).
|
||||
The app can symlink the bundled `clawdbot` CLI into `/usr/local/bin` and
|
||||
`/opt/homebrew/bin` so terminal workflows work out of the box.
|
||||
|
||||
Note: environment variables are often confusing when the Gateway is launched by a GUI app (launchd environment != your shell).
|
||||
## 6) Onboarding chat (dedicated session)
|
||||
|
||||
### Model safety rule
|
||||
After setup, the app opens a dedicated onboarding chat session so the agent can
|
||||
introduce itself and guide next steps. This keeps first‑run guidance separate
|
||||
from your normal conversation.
|
||||
|
||||
Clawdbot should **always pass** `--model` when invoking the embedded agent (don’t rely on defaults).
|
||||
## Agent bootstrap ritual
|
||||
|
||||
Example (CLI):
|
||||
On the first agent run, Clawdbot bootstraps a workspace (default `~/clawd`):
|
||||
|
||||
```bash
|
||||
clawdbot agent --mode rpc --model anthropic/claude-opus-4-5 "<message>"
|
||||
```
|
||||
- Seeds `AGENTS.md`, `BOOTSTRAP.md`, `IDENTITY.md`, `USER.md`
|
||||
- Runs a short Q&A ritual (one question at a time)
|
||||
- Writes identity + preferences to `IDENTITY.md`, `USER.md`, `SOUL.md`
|
||||
- Removes `BOOTSTRAP.md` when finished so it only runs once
|
||||
|
||||
If the user skips auth, onboarding should be clear: the agent likely won’t respond until auth is configured.
|
||||
## Optional: Gmail hooks (manual)
|
||||
|
||||
## 4) Onboarding chat (dedicated session)
|
||||
|
||||
The onboarding flow now embeds the SwiftUI chat view directly. It uses a **special session key**
|
||||
(`onboarding`) so the “newborn agent” ritual stays separate from the main chat.
|
||||
|
||||
This onboarding chat is where the agent:
|
||||
- does the BOOTSTRAP.md identity ritual (one question at a time)
|
||||
- visits **soul.md** with the user and writes `SOUL.md` (values, tone, boundaries)
|
||||
- asks how the user wants to talk (web-only / Telegram / WhatsApp)
|
||||
- guides linking steps (including showing a QR inline for WhatsApp via the `whatsapp_login` tool)
|
||||
|
||||
If the workspace bootstrap is already complete (BOOTSTRAP.md removed), the onboarding chat step is skipped.
|
||||
|
||||
## 2.5) Optional: Connect Gmail
|
||||
|
||||
The macOS onboarding includes an optional Gmail step. It runs:
|
||||
Gmail Pub/Sub setup is currently a manual step. Use:
|
||||
|
||||
```bash
|
||||
clawdbot hooks gmail setup --account you@gmail.com
|
||||
```
|
||||
|
||||
This writes the full `hooks.gmail` config, installs `gcloud` / `gog` / `tailscale`
|
||||
via Homebrew if needed, and configures the Pub/Sub push endpoint. After setup,
|
||||
restart the gateway so the internal Gmail watcher starts.
|
||||
See [/automation/gmail-pubsub](/automation/gmail-pubsub) for details.
|
||||
|
||||
Once setup is complete, the user can switch to the normal chat (`main`) via the menu bar panel.
|
||||
## Remote mode notes
|
||||
|
||||
## 5) Agent bootstrap ritual (outside onboarding)
|
||||
When the Gateway runs on another machine, credentials and workspace files live
|
||||
**on that host**. If you need OAuth in remote mode, create:
|
||||
|
||||
We no longer collect identity in the onboarding wizard. Instead, the **first agent run** performs a playful bootstrap ritual using files in the workspace:
|
||||
- `~/.clawdbot/credentials/oauth.json`
|
||||
- `~/.clawdbot/agents/<agentId>/agent/auth-profiles.json`
|
||||
|
||||
- Workspace is created implicitly (default `~/clawd`, configurable via `agent.workspace`) when local is selected,
|
||||
but only if the folder is empty or already contains `AGENTS.md`.
|
||||
- Files are seeded: `AGENTS.md`, `BOOTSTRAP.md`, `IDENTITY.md`, `USER.md`.
|
||||
- `BOOTSTRAP.md` tells the agent to keep it conversational:
|
||||
- open with a cute hello
|
||||
- ask **one question at a time** (no multi-question bombardment)
|
||||
- offer a small set of suggestions where helpful (name, creature, emoji)
|
||||
- wait for the user’s reply before asking the next question
|
||||
- The agent writes results to:
|
||||
- `IDENTITY.md` (agent name, vibe/creature, emoji)
|
||||
- `USER.md` (who the user is + how they want to be addressed)
|
||||
- `SOUL.md` (identity, tone, boundaries — crafted from the soul.md prompt)
|
||||
- `~/.clawdbot/clawdbot.json` (structured identity defaults)
|
||||
- After the ritual, the agent **deletes `BOOTSTRAP.md`** so it only runs once.
|
||||
|
||||
Identity data still feeds the same defaults as before:
|
||||
|
||||
- outbound prefix emoji (`messages.responsePrefix`)
|
||||
- group mention patterns / wake words
|
||||
- default session intro (“You are Samantha…”)
|
||||
- macOS UI labels
|
||||
|
||||
## 6) Workspace notes (no explicit onboarding step)
|
||||
|
||||
The workspace is created automatically as part of agent bootstrap (no dedicated onboarding screen).
|
||||
|
||||
Recommendation: treat the workspace as the agent’s “memory” and make it a git repo (ideally private) so identity + memories are backed up:
|
||||
|
||||
```bash
|
||||
cd ~/clawd
|
||||
git init
|
||||
git add AGENTS.md
|
||||
git commit -m "Add agent workspace"
|
||||
```
|
||||
|
||||
Daily memory lives under `memory/` in the workspace:
|
||||
- one file per day: `memory/YYYY-MM-DD.md`
|
||||
- read today + yesterday on session start
|
||||
- keep it short (durable facts, preferences, decisions; avoid secrets)
|
||||
|
||||
## Remote mode note (why OAuth is hidden)
|
||||
|
||||
If the Gateway runs on another machine, OAuth credentials must be created/stored on that host (where the agent runtime runs).
|
||||
|
||||
For now, remote onboarding should:
|
||||
- explain why OAuth isn't shown
|
||||
- point the user at the credential location (`~/.clawdbot/credentials/oauth.json`) and the auth profile store (`~/.clawdbot/agents/<agentId>/agent/auth-profiles.json`) on the gateway host
|
||||
- mention that the **bootstrap ritual happens on the gateway host** (same BOOTSTRAP/IDENTITY/USER files)
|
||||
|
||||
### Manual credential setup
|
||||
|
||||
On the gateway host, create `~/.clawdbot/credentials/oauth.json` with this exact format:
|
||||
|
||||
```json
|
||||
{
|
||||
"anthropic": { "type": "oauth", "access": "sk-ant-oat01-...", "refresh": "sk-ant-ort01-...", "expires": 1767304352803 },
|
||||
"openai-codex": { "type": "oauth", "access": "eyJhbGciOi...", "refresh": "oai-refresh-...", "expires": 1767304352803, "accountId": "acct_..." }
|
||||
}
|
||||
```
|
||||
|
||||
Set permissions: `chmod 600 ~/.clawdbot/credentials/oauth.json`
|
||||
|
||||
**Note:** Clawdbot can import from legacy pi-coding-agent paths (`~/.pi/agent/oauth.json`, etc.), but Claude Code/Codex CLI credentials live in different files.
|
||||
|
||||
### Using Claude Code + Codex CLI credentials (direct)
|
||||
|
||||
If these CLIs are installed on the **gateway host** and you’ve already signed in, Clawdbot auto-syncs their OAuth tokens into the per-agent auth profile store (`~/.clawdbot/agents/<agentId>/agent/auth-profiles.json`) on load:
|
||||
|
||||
- **Claude Code**: reads `~/.claude/.credentials.json` → profile `anthropic:claude-cli`
|
||||
- **Codex CLI**: reads `~/.codex/auth.json` → profile `openai-codex:codex-cli`
|
||||
|
||||
Verification:
|
||||
|
||||
```bash
|
||||
clawdbot providers list
|
||||
```
|
||||
|
||||
### Fallback: convert Claude Code credentials into `oauth.json`
|
||||
|
||||
If you don’t want to install Claude Code on the gateway host, you can still seed the legacy import file:
|
||||
|
||||
```bash
|
||||
cat ~/.claude/.credentials.json | jq '{
|
||||
anthropic: {
|
||||
type: "oauth",
|
||||
access: .claudeAiOauth.accessToken,
|
||||
refresh: .claudeAiOauth.refreshToken,
|
||||
expires: .claudeAiOauth.expiresAt
|
||||
}
|
||||
}' > ~/.clawdbot/credentials/oauth.json
|
||||
chmod 600 ~/.clawdbot/credentials/oauth.json
|
||||
```
|
||||
|
||||
## Workspace backup (recommended)
|
||||
|
||||
We suggest creating a **private GitHub repository** to back up the agent
|
||||
workspace. The agent is really good at keeping a git repo in shape, and GitHub
|
||||
is the perfect place for it. Keep it **private**.
|
||||
|
||||
Setup steps: https://docs.clawd.bot/concepts/agent-workspace
|
||||
on the gateway host.
|
||||
|
||||
@@ -43,10 +43,6 @@ Stored under `~/.clawdbot/credentials/`:
|
||||
|
||||
Treat these as sensitive (they gate access to your assistant).
|
||||
|
||||
### Source of truth (code)
|
||||
|
||||
- DM pairing storage + code generation: [`src/pairing/pairing-store.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/pairing/pairing-store.ts)
|
||||
- CLI commands: [`src/cli/pairing-cli.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/pairing-cli.ts)
|
||||
|
||||
## 2) Node pairing (iOS/Android nodes joining the gateway)
|
||||
|
||||
@@ -70,11 +66,6 @@ Stored under `~/.clawdbot/nodes/`:
|
||||
|
||||
Full protocol + design notes: [Gateway pairing](/gateway/pairing)
|
||||
|
||||
### Source of truth (code)
|
||||
|
||||
- Node pairing store (pending/paired + token issuance): [`src/infra/node-pairing.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/node-pairing.ts)
|
||||
- Gateway methods/events (`node.pair.*`): [`src/gateway/server-methods/nodes.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/nodes.ts)
|
||||
- CLI: [`src/cli/nodes-cli.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/nodes-cli.ts)
|
||||
|
||||
## Related docs
|
||||
|
||||
|
||||
@@ -55,7 +55,7 @@ clawdbot providers login
|
||||
clawdbot health
|
||||
```
|
||||
|
||||
If onboarding is still WIP/broken on your build:
|
||||
If onboarding is not available in your build:
|
||||
- Run `clawdbot setup`, then `clawdbot providers login`, then start the Gateway manually (`clawdbot gateway`).
|
||||
|
||||
## Bleeding edge workflow (Gateway in a terminal)
|
||||
@@ -77,7 +77,7 @@ pnpm install
|
||||
pnpm gateway:watch
|
||||
```
|
||||
|
||||
`gateway:watch` runs `src/entry.ts gateway --force` and reloads on [`src/**/*.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/**/*.ts) changes.
|
||||
`gateway:watch` runs the gateway in watch mode and reloads on TypeScript changes.
|
||||
|
||||
### 2) Point the macOS app at your running Gateway
|
||||
|
||||
|
||||
@@ -1,21 +1,44 @@
|
||||
---
|
||||
summary: "Design notes for a direct `clawdbot agent` CLI subcommand without WhatsApp delivery"
|
||||
summary: "Direct `clawdbot agent` CLI runs (with optional delivery)"
|
||||
read_when:
|
||||
- Adding or modifying the agent CLI entrypoint
|
||||
---
|
||||
# `clawdbot agent` (direct-to-agent invocation)
|
||||
# `clawdbot agent` (direct agent runs)
|
||||
|
||||
`clawdbot agent` lets you talk to the **embedded** agent runtime directly (no chat send unless you opt in), while reusing the same session store and thinking/verbose persistence as inbound auto-replies.
|
||||
`clawdbot agent` runs a single agent turn without needing an inbound chat message.
|
||||
By default it goes **through the Gateway**; add `--local` to force the embedded
|
||||
runtime on the current machine.
|
||||
|
||||
## Behavior
|
||||
|
||||
- Required: `--message <text>`
|
||||
- Session selection:
|
||||
- If `--session-id` is given, reuse it.
|
||||
- Else if `--to <e164>` is given, derive the session key from `session.scope` (direct chats collapse to `main`, or `global` when scope is global).
|
||||
- Runs the embedded Pi agent (configured via `agent`).
|
||||
- Thinking/verbose:
|
||||
- Flags `--thinking <off|minimal|low|medium|high>` and `--verbose <on|off>` persist into the session store.
|
||||
- `--to <E.164>` derives the session key (normal direct-chat routing), **or**
|
||||
- `--session-id <id>` reuses an existing session by id
|
||||
- Runs the same embedded agent runtime as normal inbound replies.
|
||||
- Thinking/verbose flags persist into the session store.
|
||||
- Output:
|
||||
- Default: prints text (and `MEDIA:<url>` lines) to stdout.
|
||||
- `--json`: prints structured payloads + meta.
|
||||
- Optional: `--deliver` sends the reply back to the selected provider (`whatsapp`, `telegram`, `discord`, `signal`, `imessage`).
|
||||
- default: prints reply text (plus `MEDIA:<url>` lines)
|
||||
- `--json`: prints structured payload + metadata
|
||||
- Optional delivery back to a provider with `--deliver` + `--provider`.
|
||||
|
||||
If the Gateway is unreachable, the CLI **falls back** to the embedded local run.
|
||||
|
||||
## Examples
|
||||
|
||||
```bash
|
||||
clawdbot agent --to +15555550123 --message "status update"
|
||||
clawdbot agent --session-id 1234 --message "Summarize inbox" --thinking medium
|
||||
clawdbot agent --to +15555550123 --message "Trace logs" --verbose on --json
|
||||
clawdbot agent --to +15555550123 --message "Summon reply" --deliver
|
||||
```
|
||||
|
||||
## Flags
|
||||
|
||||
- `--local`: run locally (requires provider keys in your shell)
|
||||
- `--deliver`: send the reply to the chosen provider (requires `--to`)
|
||||
- `--provider`: `whatsapp|telegram|discord|slack|signal|imessage` (default: `whatsapp`)
|
||||
- `--thinking <off|minimal|low|medium|high>`: persist thinking level
|
||||
- `--verbose <on|off>`: persist verbose level
|
||||
- `--timeout <seconds>`: override agent timeout
|
||||
- `--json`: output structured JSON
|
||||
|
||||
@@ -1,254 +1,142 @@
|
||||
---
|
||||
summary: "Spec: integrated browser control server + action commands"
|
||||
summary: "Integrated browser control server + action commands"
|
||||
read_when:
|
||||
- Adding agent-controlled browser automation
|
||||
- Debugging why clawd is interfering with your own Chrome
|
||||
- Implementing browser settings + lifecycle in the macOS app
|
||||
---
|
||||
|
||||
# Browser (integrated) — clawd-managed Chrome
|
||||
# Browser (clawd-managed)
|
||||
|
||||
Status: draft spec · Date: 2025-12-20
|
||||
Clawdbot can run a **dedicated Chrome/Chromium profile** that the agent controls.
|
||||
It is isolated from your personal browser and is managed through a small local
|
||||
control server.
|
||||
|
||||
Goal: give the **clawd** persona its own browser that is:
|
||||
- Visually distinct (lobster-orange, profile labeled "clawd").
|
||||
- Fully agent-manageable (start/stop, list tabs, focus/close tabs, open URLs, screenshot).
|
||||
- Non-interfering with the user's own browser (separate profile + dedicated ports).
|
||||
## What you get
|
||||
|
||||
This doc covers the macOS app/gateway side. It intentionally does not mandate
|
||||
Playwright vs Puppeteer; the key is the **contract** and the **separation guarantees**.
|
||||
- A separate browser profile named **clawd** (orange accent by default).
|
||||
- Deterministic tab control (list/open/focus/close).
|
||||
- Agent actions (click/type/drag/select), snapshots, screenshots, PDFs.
|
||||
- Optional multi-profile support (`clawd`, `work`, `remote`, ...).
|
||||
|
||||
## User-facing settings
|
||||
This browser is **not** your daily driver. It is a safe, isolated surface for
|
||||
agent automation and verification.
|
||||
|
||||
Add a dedicated settings section (preferably under **Skills** or its own "Browser" tab):
|
||||
## Quick start
|
||||
|
||||
- **Enable clawd browser** (`default: on`)
|
||||
- When off: no browser is launched, and browser tools return "disabled".
|
||||
- **Browser control URL** (`default: http://127.0.0.1:18791`)
|
||||
- Interpreted as the base URL of the local/remote browser-control server.
|
||||
- If the URL host is not loopback, Clawdbot must **not** attempt to launch a local
|
||||
browser; it only connects.
|
||||
- **CDP URL** (`default: controlUrl + 1`)
|
||||
- Base URL for Chrome DevTools Protocol (e.g. `http://127.0.0.1:18792`).
|
||||
- Set this to a non-loopback host to attach the local control server to a remote
|
||||
Chrome/Chromium CDP endpoint (SSH/Tailscale tunnel recommended).
|
||||
- If the CDP URL host is non-loopback, clawd does **not** auto-launch a local browser.
|
||||
- If you tunnel a remote CDP to `localhost`, set **Attach to existing only** to
|
||||
avoid accidentally launching a local browser.
|
||||
- **Accent color** (`default: #FF4500`, "lobster-orange")
|
||||
- Used to theme the clawd browser profile (best-effort) and to tint UI indicators
|
||||
in Clawdbot.
|
||||
```bash
|
||||
clawdbot browser status
|
||||
clawdbot browser start
|
||||
clawdbot browser open https://example.com
|
||||
clawdbot browser snapshot
|
||||
```
|
||||
|
||||
Optional (advanced, can be hidden behind Debug initially):
|
||||
- **Use headless browser** (`default: off`)
|
||||
- **Attach to existing only** (`default: off`) — if on, never launch; only connect if
|
||||
already running.
|
||||
- **Browser executable path** (override, optional)
|
||||
- **No sandbox** (`default: off`) — adds `--no-sandbox` + `--disable-setuid-sandbox`
|
||||
If you get “Browser disabled”, enable it in config (see below) and restart the
|
||||
Gateway.
|
||||
|
||||
### Port convention
|
||||
## Configuration
|
||||
|
||||
Clawdbot already uses:
|
||||
- Gateway WebSocket: `18789`
|
||||
- Bridge (voice/node): `18790`
|
||||
Browser settings live in `~/.clawdbot/clawdbot.json`.
|
||||
|
||||
For the clawd browser-control server, use "family" ports:
|
||||
- Browser control HTTP API: `18791` (bridge + 1)
|
||||
- Browser CDP/debugging port: `18792` (control + 1)
|
||||
- Canvas host HTTP: `18793` by default, mounted at `/__clawdbot__/canvas/`
|
||||
|
||||
The user usually only configures the **control URL** (port `18791`). CDP is an
|
||||
internal detail.
|
||||
|
||||
## Browser isolation guarantees (non-negotiable)
|
||||
|
||||
1) **Dedicated user data dir**
|
||||
- Never attach to or reuse the user's default Chrome profile.
|
||||
- Store clawd browser state under an app-owned directory, e.g.:
|
||||
- `~/Library/Application Support/Clawdbot/browser/clawd/` (mac app)
|
||||
- or `~/.clawdbot/browser/clawd/` (gateway/CLI)
|
||||
|
||||
2) **Dedicated ports**
|
||||
- Never use `9222` (reserved for ad-hoc dev workflows; avoids colliding with
|
||||
`agent-tools/browser-tools`).
|
||||
- Default ports are `18791/18792` unless overridden.
|
||||
|
||||
3) **Named tab/page management**
|
||||
- The agent must be able to enumerate and target tabs deterministically (by
|
||||
stable `targetId` or equivalent), not "last tab".
|
||||
|
||||
## Browser selection (macOS + Linux)
|
||||
|
||||
On startup (when enabled + local URL), Clawdbot chooses the browser executable
|
||||
in this order:
|
||||
1) **Google Chrome Canary** (if installed)
|
||||
2) **Chromium** (if installed)
|
||||
3) **Google Chrome** (fallback)
|
||||
|
||||
Linux:
|
||||
- Looks for `google-chrome` / `chromium` in common system paths.
|
||||
- Use **Browser executable path** to force a specific binary.
|
||||
|
||||
Implementation detail:
|
||||
- macOS: detection is by existence of the `.app` bundle under `/Applications`
|
||||
(and optionally `~/Applications`), then using the resolved executable path.
|
||||
- Linux: common `/usr/bin`/`/snap/bin` paths.
|
||||
|
||||
Rationale:
|
||||
- Canary/Chromium are easy to visually distinguish from the user's daily driver.
|
||||
- Chrome fallback ensures the feature works on a stock machine.
|
||||
|
||||
## Visual differentiation ("lobster-orange")
|
||||
|
||||
The clawd browser should be obviously different at a glance:
|
||||
- Profile name: **clawd**
|
||||
- Profile color: **#FF4500**
|
||||
|
||||
Preferred behavior:
|
||||
- Seed/patch the profile's preferences on first launch so the color + name persist.
|
||||
|
||||
Fallback behavior:
|
||||
- If preferences patching is not reliable, open with the dedicated profile and let
|
||||
the user set the profile color/name once via Chrome UI; it must persist because
|
||||
the `userDataDir` is persistent.
|
||||
|
||||
## Control server contract (vNext)
|
||||
|
||||
Expose a small local HTTP API (and/or gateway RPC surface) so the agent can manage
|
||||
state without touching the user's Chrome.
|
||||
|
||||
Basics:
|
||||
- `GET /` status payload (enabled/running/pid/cdpPort/etc)
|
||||
- `POST /start` start browser
|
||||
- `POST /stop` stop browser
|
||||
- `GET /tabs` list tabs
|
||||
- `POST /tabs/open` open a new tab
|
||||
- `POST /tabs/focus` focus a tab by id/prefix
|
||||
- `DELETE /tabs/:targetId` close a tab by id/prefix
|
||||
|
||||
Inspection:
|
||||
- `POST /screenshot` `{ targetId?, fullPage?, ref?, element?, type? }`
|
||||
- `GET /snapshot` `?format=aria|ai&targetId?&limit?`
|
||||
- `GET /console` `?level?&targetId?`
|
||||
- `POST /pdf` `{ targetId? }`
|
||||
|
||||
Actions:
|
||||
- `POST /navigate`
|
||||
- `POST /act` `{ kind, targetId?, ... }` where `kind` is one of:
|
||||
- `click`, `type`, `press`, `hover`, `drag`, `select`, `fill`, `wait`, `resize`, `close`, `evaluate`
|
||||
|
||||
Hooks (arming):
|
||||
- `POST /hooks/file-chooser` `{ targetId?, paths, timeoutMs? }`
|
||||
- `POST /hooks/dialog` `{ targetId?, accept, promptText?, timeoutMs? }`
|
||||
|
||||
### "Is it open or closed?"
|
||||
|
||||
"Open" means:
|
||||
- the control server is reachable at the configured URL **and**
|
||||
- it reports a live browser connection.
|
||||
|
||||
"Closed" means:
|
||||
- control server not reachable, or server reports no browser.
|
||||
|
||||
Clawdbot should treat "open/closed" as a health check (fast path), not by scanning
|
||||
global Chrome processes (avoid false positives).
|
||||
|
||||
## Multi-profile support
|
||||
|
||||
Clawdbot supports multiple named browser profiles, each with:
|
||||
- Dedicated CDP port (auto-allocated from 18800-18899) **or** a per-profile CDP URL
|
||||
- Persistent user data directory (`~/.clawdbot/browser/<name>/user-data/`)
|
||||
- Unique color for visual distinction
|
||||
|
||||
### Configuration
|
||||
|
||||
```json
|
||||
```json5
|
||||
{
|
||||
"browser": {
|
||||
"enabled": true,
|
||||
"defaultProfile": "clawd",
|
||||
"profiles": {
|
||||
"clawd": { "cdpPort": 18800, "color": "#FF4500" },
|
||||
"work": { "cdpPort": 18801, "color": "#0066CC" },
|
||||
"remote": { "cdpUrl": "http://10.0.0.42:9222", "color": "#00AA00" }
|
||||
browser: {
|
||||
enabled: true, // default: true
|
||||
controlUrl: "http://127.0.0.1:18791",
|
||||
cdpUrl: "http://127.0.0.1:18792", // defaults to controlUrl + 1
|
||||
defaultProfile: "clawd",
|
||||
color: "#FF4500",
|
||||
headless: false,
|
||||
noSandbox: false,
|
||||
attachOnly: false,
|
||||
executablePath: "/Applications/Chromium.app/Contents/MacOS/Chromium",
|
||||
profiles: {
|
||||
clawd: { cdpPort: 18800, color: "#FF4500" },
|
||||
work: { cdpPort: 18801, color: "#0066CC" },
|
||||
remote: { cdpUrl: "http://10.0.0.42:9222", color: "#00AA00" }
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Profile actions
|
||||
Notes:
|
||||
- `controlUrl` defaults to `http://127.0.0.1:18791`.
|
||||
- If you override the Gateway port (`gateway.port` or `CLAWDBOT_GATEWAY_PORT`),
|
||||
the default browser ports shift to stay in the same “family” (control = gateway + 2).
|
||||
- `cdpUrl` defaults to `controlUrl + 1` when unset.
|
||||
- `attachOnly: true` means “never launch Chrome; only attach if it is already running.”
|
||||
|
||||
- `GET /profiles` — list all profiles with status
|
||||
- `POST /profiles/create` `{ name, color?, cdpUrl? }` — create new profile (auto-allocates port if no `cdpUrl`)
|
||||
- `DELETE /profiles/:name` — delete profile (stops browser + removes user data for local profiles)
|
||||
- `POST /reset-profile?profile=<name>` — kill orphan process on profile's port (local profiles only)
|
||||
## Local vs remote control
|
||||
|
||||
### Profile parameter
|
||||
- **Local control (default):** `controlUrl` is loopback (`127.0.0.1`/`localhost`).
|
||||
The Gateway starts the control server and can launch Chrome.
|
||||
- **Remote control:** `controlUrl` is non-loopback. The Gateway **does not** start
|
||||
a local server; it assumes you are pointing at an existing server elsewhere.
|
||||
- **Remote CDP:** set `browser.profiles.<name>.cdpUrl` (or `browser.cdpUrl`) to
|
||||
attach to a remote Chrome. In this case, Clawdbot will not launch a local browser.
|
||||
|
||||
All existing endpoints accept optional `?profile=<name>` query parameter:
|
||||
- `GET /?profile=work` — status for work profile
|
||||
- `POST /start?profile=work` — start work profile browser
|
||||
- `GET /tabs?profile=work` — list tabs for work profile
|
||||
- `POST /tabs/open?profile=work` — open tab in work profile
|
||||
- etc.
|
||||
## Profiles (multi-browser)
|
||||
|
||||
When `profile` is omitted, uses `browser.defaultProfile` (defaults to "clawd").
|
||||
Clawdbot supports multiple named profiles. Each profile has its own:
|
||||
- user data directory
|
||||
- CDP port (local) or CDP URL (remote)
|
||||
- accent color
|
||||
|
||||
### Agent browser tool
|
||||
Defaults:
|
||||
- The `clawd` profile is auto-created if missing.
|
||||
- Local CDP ports allocate from **18800–18899** by default.
|
||||
- Deleting a profile moves its local data directory to Trash.
|
||||
|
||||
The `browser` tool accepts an optional `profile` parameter for all actions:
|
||||
All control endpoints accept `?profile=<name>`; the CLI uses `--browser-profile`.
|
||||
|
||||
```json
|
||||
{
|
||||
"action": "open",
|
||||
"targetUrl": "https://example.com",
|
||||
"profile": "work"
|
||||
}
|
||||
```
|
||||
## Isolation guarantees
|
||||
|
||||
This routes the operation to the specified profile's browser instance. Omitting
|
||||
`profile` uses the default profile.
|
||||
- **Dedicated user data dir**: never touches your personal Chrome profile.
|
||||
- **Dedicated ports**: avoids `9222` to prevent collisions with dev workflows.
|
||||
- **Deterministic tab control**: target tabs by `targetId`, not “last tab”.
|
||||
|
||||
### Profile naming rules
|
||||
## Browser selection
|
||||
|
||||
- Lowercase alphanumeric characters and hyphens only
|
||||
- Must start with a letter or number (not a hyphen)
|
||||
- Maximum 64 characters
|
||||
- Examples: `clawd`, `work`, `my-project-1`
|
||||
When launching locally, Clawdbot picks the first available:
|
||||
1. Chrome Canary
|
||||
2. Chromium
|
||||
3. Chrome
|
||||
|
||||
### Port allocation
|
||||
You can override with `browser.executablePath`.
|
||||
|
||||
Ports are allocated from range 18800-18899 (~100 profiles max). This is far more
|
||||
than practical use — memory and CPU exhaustion occur well before port exhaustion.
|
||||
Ports are allocated once at profile creation and persisted permanently.
|
||||
Remote profiles are attach-only and do **not** use the local port range.
|
||||
## Interaction with the agent (clawd)
|
||||
Platforms:
|
||||
- macOS: checks `/Applications` and `~/Applications`.
|
||||
- Linux: looks for `google-chrome`, `chromium`, etc.
|
||||
- Windows: checks common install locations.
|
||||
|
||||
The agent should use browser tools only when:
|
||||
- enabled in settings
|
||||
- control URL is configured
|
||||
## Control API (optional)
|
||||
|
||||
If disabled, tools must fail fast with a friendly error ("Browser disabled in settings").
|
||||
If you want to integrate directly, the browser control server exposes a small
|
||||
HTTP API:
|
||||
|
||||
The agent should not assume tabs are ephemeral. It should:
|
||||
- call `browser.tabs.list` to discover existing tabs first
|
||||
- reuse an existing tab when appropriate (e.g. a persistent "main" tab)
|
||||
- avoid opening duplicate tabs unless asked
|
||||
- Status/start/stop: `GET /`, `POST /start`, `POST /stop`
|
||||
- Tabs: `GET /tabs`, `POST /tabs/open`, `POST /tabs/focus`, `DELETE /tabs/:targetId`
|
||||
- Snapshot/screenshot: `GET /snapshot`, `POST /screenshot`
|
||||
- Actions: `POST /navigate`, `POST /act`
|
||||
- Hooks: `POST /hooks/file-chooser`, `POST /hooks/dialog`
|
||||
- Debugging: `GET /console`, `POST /pdf`
|
||||
|
||||
## CLI quick reference (one example each)
|
||||
All endpoints accept `?profile=<name>`.
|
||||
|
||||
All commands accept `--browser-profile <name>` to target a specific profile (default: `clawd`).
|
||||
### Playwright requirement
|
||||
|
||||
Some features (navigate/act/ai snapshot, element screenshots, PDF) require
|
||||
Playwright. In embedded gateway builds, Playwright may be unavailable; those
|
||||
endpoints return a clear 501 error. ARIA snapshots and basic screenshots still work.
|
||||
|
||||
## CLI quick reference
|
||||
|
||||
All commands accept `--browser-profile <name>` to target a specific profile.
|
||||
|
||||
Profile management:
|
||||
- `clawdbot browser profiles`
|
||||
- `clawdbot browser create-profile --name work`
|
||||
- `clawdbot browser create-profile --name remote --cdp-url http://10.0.0.42:9222`
|
||||
- `clawdbot browser delete-profile --name work`
|
||||
Basics:
|
||||
- `clawdbot browser status`
|
||||
- `clawdbot browser start`
|
||||
- `clawdbot browser stop`
|
||||
- `clawdbot browser reset-profile`
|
||||
- `clawdbot browser tabs`
|
||||
- `clawdbot browser open https://example.com`
|
||||
- `clawdbot browser focus abcd1234`
|
||||
@@ -260,6 +148,8 @@ Inspection:
|
||||
- `clawdbot browser screenshot --ref 12`
|
||||
- `clawdbot browser snapshot`
|
||||
- `clawdbot browser snapshot --format aria --limit 200`
|
||||
- `clawdbot browser console --level error`
|
||||
- `clawdbot browser pdf`
|
||||
|
||||
Actions:
|
||||
- `clawdbot browser navigate https://example.com`
|
||||
@@ -271,39 +161,27 @@ Actions:
|
||||
- `clawdbot browser drag 10 11`
|
||||
- `clawdbot browser select 9 OptionA OptionB`
|
||||
- `clawdbot browser upload /tmp/file.pdf`
|
||||
- `clawdbot browser fill --fields '[{\"ref\":\"1\",\"value\":\"Ada\"}]'`
|
||||
- `clawdbot browser fill --fields '[{"ref":"1","type":"text","value":"Ada"}]'`
|
||||
- `clawdbot browser dialog --accept`
|
||||
- `clawdbot browser wait --text "Done"`
|
||||
- `clawdbot browser evaluate --fn '(el) => el.textContent' --ref 7`
|
||||
- `clawdbot browser evaluate --fn "document.querySelector('.my-class').click()"`
|
||||
- `clawdbot browser console --level error`
|
||||
- `clawdbot browser pdf`
|
||||
|
||||
Notes:
|
||||
- `upload` and `dialog` are **arming** calls; run them before the click/press that triggers the chooser/dialog.
|
||||
- `upload` can take a `ref` to auto-click after arming (useful for single-step file uploads).
|
||||
- `upload` can also take `inputRef` (aria ref) or `element` (CSS selector) to set `<input type="file">` directly without waiting for a file chooser.
|
||||
- The arm default timeout is **2 minutes** (clamped to max 2 minutes); pass `timeoutMs` if you need shorter.
|
||||
- `snapshot` defaults to `ai`; `aria` returns an accessibility tree for debugging.
|
||||
- `click`/`type` require `ref` from `snapshot --format ai`; use `evaluate` for rare CSS selector one-offs.
|
||||
- Avoid `wait` by default; use it only in exceptional cases when there is no reliable UI state to wait on.
|
||||
- `upload` and `dialog` are **arming** calls; run them before the click/press
|
||||
that triggers the chooser/dialog.
|
||||
- `upload` can also set file inputs directly via `--input-ref` or `--element`.
|
||||
- `snapshot` defaults to `ai` when available; use `--format aria` for the
|
||||
accessibility tree.
|
||||
- `click`/`type` require a `ref` from `snapshot` (CSS selectors are intentionally
|
||||
not supported for actions).
|
||||
|
||||
## Security & privacy notes
|
||||
## Security & privacy
|
||||
|
||||
- The clawd browser profile is app-owned; it may contain logged-in sessions.
|
||||
Treat it as sensitive data.
|
||||
- The control server must bind to loopback only by default (`127.0.0.1`) unless the
|
||||
user explicitly configures a non-loopback URL.
|
||||
- Never reuse or copy the user's default Chrome profile.
|
||||
- Remote CDP endpoints should be tunneled or protected; CDP is highly privileged.
|
||||
|
||||
## Non-goals (for the first cut)
|
||||
|
||||
- Cross-device "sync" of tabs between Mac and Pi.
|
||||
- Sharing the user's logged-in Chrome sessions automatically.
|
||||
- General-purpose web scraping; this is primarily for "close-the-loop" verification
|
||||
and interaction.
|
||||
- The clawd browser profile may contain logged-in sessions; treat it as sensitive.
|
||||
- Keep control URLs loopback-only unless you intentionally expose the server.
|
||||
- Remote CDP endpoints are powerful; tunnel and protect them.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
For Linux-specific issues (especially Ubuntu with snap Chromium), see [browser-linux-troubleshooting](/tools/browser-linux-troubleshooting).
|
||||
For Linux-specific issues (especially snap Chromium), see
|
||||
[Browser troubleshooting](/tools/browser-linux-troubleshooting).
|
||||
|
||||
@@ -294,25 +294,12 @@ Node targeting:
|
||||
- Respect user consent for camera/screen capture.
|
||||
- Use `status/describe` to ensure permissions before invoking media commands.
|
||||
|
||||
## How the model sees tools (pi-mono internals)
|
||||
## How tools are presented to the agent
|
||||
|
||||
Tools are exposed to the model in **two parallel channels**:
|
||||
Tools are exposed in two parallel channels:
|
||||
|
||||
1) **System prompt text**: a human-readable list + guidelines.
|
||||
2) **Provider tool schema**: the actual function/tool declarations sent to the model API.
|
||||
1) **System prompt text**: a human-readable list + guidance.
|
||||
2) **Tool schema**: the structured function definitions sent to the model API.
|
||||
|
||||
In pi-mono:
|
||||
- System prompt builder: [`packages/coding-agent/src/core/system-prompt.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/src/core/system-prompt.ts)
|
||||
- Builds the `Available tools:` list from `toolDescriptions`.
|
||||
- Appends skills and project context.
|
||||
- Tool schemas passed to providers:
|
||||
- OpenAI: [`packages/ai/src/providers/openai-responses.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/ai/src/providers/openai-responses.ts) (`convertTools`)
|
||||
- Anthropic: [`packages/ai/src/providers/anthropic.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/ai/src/providers/anthropic.ts) (`convertTools`)
|
||||
- Gemini: [`packages/ai/src/providers/google-shared.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/ai/src/providers/google-shared.ts) (`convertTools`)
|
||||
- Tool execution loop:
|
||||
- Agent loop: [`packages/ai/src/agent/agent-loop.ts`](https://github.com/badlogic/pi-mono/blob/main/packages/ai/src/agent/agent-loop.ts)
|
||||
- Validates tool arguments and executes tools, then appends `toolResult` messages.
|
||||
|
||||
In Clawdbot:
|
||||
- System prompt append: [`src/agents/system-prompt.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/system-prompt.ts)
|
||||
- Tool list injected via `createClawdbotCodingTools()` in [`src/agents/pi-tools.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-tools.ts)
|
||||
That means the agent sees both “what tools exist” and “how to call them.” If a tool
|
||||
doesn’t appear in the system prompt or the schema, the model cannot call it.
|
||||
|
||||
@@ -65,8 +65,3 @@ Use SSH tunneling or Tailscale to reach the Gateway WS.
|
||||
## Notes
|
||||
- The TUI shows Gateway chat deltas (`event: chat`) and agent tool events.
|
||||
- It registers as a Gateway client with `mode: "tui"` for presence and debugging.
|
||||
|
||||
## Files
|
||||
- CLI: [`src/cli/tui-cli.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/cli/tui-cli.ts)
|
||||
- Runner: [`src/tui/tui.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/tui/tui.ts)
|
||||
- Gateway client: [`src/tui/gateway-chat.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/tui/gateway-chat.ts)
|
||||
|
||||
Reference in New Issue
Block a user