docs: refresh and simplify docs

This commit is contained in:
Peter Steinberger
2026-01-08 23:06:56 +01:00
parent 88dca1afdf
commit a6c309824e
46 changed files with 1117 additions and 2155 deletions

View File

@@ -1,381 +1,105 @@
---
summary: "iOS app (node): architecture + connection runbook"
summary: "iOS node app: connect to the Gateway, pairing, canvas, and troubleshooting"
read_when:
- Pairing or reconnecting the iOS node
- Debugging iOS bridge discovery or auth
- Sending screen/canvas commands to iOS
- Designing iOS node + gateway integration
- Extending the Gateway protocol for node/canvas commands
- Implementing Bonjour pairing or transport security
- Running the iOS app from source
- Debugging bridge discovery or canvas commands
---
# iOS App (Node)
Status: prototype implemented (internal) · Date: 2025-12-13
Availability: internal preview. The iOS app is not publicly distributed yet.
## Support snapshot
- Role: companion node app (iOS does not host the Gateway).
- Gateway required: yes (run it on macOS, Linux, or Windows via WSL2).
- Install: [Getting Started](/start/getting-started) + [Pairing](/gateway/pairing).
- Gateway: [Runbook](/gateway) + [Configuration](/gateway/configuration).
## What it does
## System control
System control (launchd/systemd) lives on the Gateway host. See [Gateway](/gateway).
- Connects to a Gateway over the bridge (LAN or tailnet).
- Exposes node capabilities: Canvas, Screen snapshot, Camera capture, Location, Talk mode, Voice wake.
- Receives `node.invoke` commands and reports node status events.
## Connection Runbook
## Requirements
This is the practical “how do I connect the iOS node” guide:
- Gateway running on another device (macOS, Linux, or Windows via WSL2).
- Bridge enabled (default).
- Network path:
- Same LAN via Bonjour, **or**
- Tailnet via unicast DNS-SD (`clawdbot.internal.`), **or**
- Manual host/port (fallback).
**iOS app** ⇄ (Bonjour + TCP bridge) ⇄ **Gateway bridge** ⇄ (loopback WS) ⇄ **Gateway**
## Quick start (pair + connect)
The Gateway WebSocket stays loopback-only (`ws://127.0.0.1:18789`). The iOS node talks to the LAN-facing **bridge** (default `tcp://0.0.0.0:18790`) and uses Gateway-owned pairing.
### Prerequisites
- You can run the Gateway on the “master” machine.
- iOS node app can reach the gateway bridge:
- Same LAN with Bonjour/mDNS, **or**
- Same Tailscale tailnet using Wide-Area Bonjour / unicast DNS-SD (see below), **or**
- Manual bridge host/port (fallback)
- You can run the CLI (`clawdbot`) on the gateway machine (or via SSH).
### 1) Start the Gateway (with bridge enabled)
Bridge is enabled by default (disable via `CLAWDBOT_BRIDGE_ENABLED=0`).
1) Start the Gateway (bridge enabled by default):
```bash
clawdbot gateway --port 18789 --verbose
clawdbot gateway --port 18789
```
Confirm in logs you see something like:
- `bridge listening on tcp://0.0.0.0:18790 (node)`
2) In the iOS app, open Settings and pick a discovered gateway (or enable Manual Bridge and enter host/port).
For tailnet-only setups (recommended for Vienna ⇄ London), bind the bridge to the gateway machines Tailscale IP instead:
- Set `bridge.bind: "tailnet"` in `~/.clawdbot/clawdbot.json` on the gateway host.
- Restart the Gateway / macOS menubar app.
### 2) Verify Bonjour discovery (optional but recommended)
From the gateway machine:
```bash
dns-sd -B _clawdbot-bridge._tcp local.
```
You should see your gateway advertising `_clawdbot-bridge._tcp`.
If browse works, but the iOS node cant connect, try resolving one instance:
```bash
dns-sd -L "<instance name>" _clawdbot-bridge._tcp local.
```
More debugging notes: [`docs/bonjour.md`](/gateway/bonjour).
#### Tailnet (Vienna ⇄ London) discovery via unicast DNS-SD
If the iOS node and the gateway are on different networks but connected via Tailscale, multicast mDNS wont cross the boundary. Use Wide-Area Bonjour / unicast DNS-SD instead:
1) Set up a DNS-SD zone (example `clawdbot.internal.`) on the gateway host and publish `_clawdbot-bridge._tcp` records.
2) Configure Tailscale split DNS for `clawdbot.internal` pointing at that DNS server.
Details and example CoreDNS config: [`docs/bonjour.md`](/gateway/bonjour).
### 3) Connect from the iOS node app
In the iOS node app:
- Pick the discovered bridge (or hit refresh).
- If not paired yet, it will initiate pairing automatically.
- After the first successful pairing, it will auto-reconnect **strictly to the last discovered gateway** on launch (including after reinstall), as long as the iOS Keychain entry is still present.
#### Connection indicator (always visible)
The Settings tab icon shows a small status dot:
- **Green**: connected to the bridge
- **Yellow**: connecting (subtle pulse)
- **Red**: not connected / error
### 4) Approve pairing (CLI)
On the gateway machine:
3) Approve the pairing request on the gateway host:
```bash
clawdbot nodes pending
```
Approve the request:
```bash
clawdbot nodes approve <requestId>
```
After approval, the iOS node receives/stores the token and reconnects authenticated.
Pairing details: [`docs/gateway/pairing.md`](/gateway/pairing).
### 5) Verify the node is connected
- In the macOS app: **Instances** tab should show something like `iOS Node (...)` with a green “Active” presence dot shortly after connect.
- Via nodes status (paired + connected):
```bash
clawdbot nodes status
```
- Via Gateway (paired + connected):
```bash
clawdbot gateway call node.list --params "{}"
```
- Via Gateway presence (legacy-ish, still useful):
```bash
clawdbot gateway call system-presence --params "{}"
```
Look for the node `instanceId` (often a UUID).
### 6) Drive the iOS Canvas (draw / snapshot)
The iOS node runs a WKWebView “Canvas” scaffold which exposes:
- `window.__clawdbot.canvas`
- `window.__clawdbot.ctx` (2D context)
- `window.__clawdbot.setStatus(title, subtitle)`
#### Gateway Canvas Host (recommended for web content)
If you want the node to show real HTML/CSS/JS that the agent can edit on disk, point it at the Gateway canvas host.
Note: nodes always use the standalone canvas host on `canvasHost.port` (default `18793`), bound to the bridge interface.
1) Create `~/clawd/canvas/index.html` on the gateway host.
2) Navigate the node to it (LAN):
4) Verify connection:
```bash
clawdbot nodes invoke --node "iOS Node" --command canvas.navigate --params '{"url":"http://<gateway-hostname>.local:18793/__clawdbot__/canvas/"}'
clawdbot nodes status
clawdbot gateway call node.list --params "{}"
```
## Discovery paths
### Bonjour (LAN)
The Gateway advertises `_clawdbot-bridge._tcp` on `local.`. The iOS app lists these automatically.
### Tailnet (cross-network)
If mDNS is blocked, use a unicast DNS-SD zone (recommended domain: `clawdbot.internal.`) and Tailscale split DNS.
See [`docs/bonjour.md`](/gateway/bonjour) for the CoreDNS example.
### Manual host/port
In Settings, enable **Manual Bridge** and enter the gateway host + port (default `18790`).
## Canvas + A2UI
The iOS node renders a WKWebView canvas. Use `node.invoke` to drive it:
```bash
clawdbot nodes invoke --node "iOS Node" --command canvas.navigate --params '{"url":"http://<gateway-host>:18793/__clawdbot__/canvas/"}'
```
Notes:
- The server injects a live-reload client into HTML and reloads on file changes.
- A2UI is hosted on the same canvas host at `http://<gateway-host>:18793/__clawdbot__/a2ui/`.
- Tailnet (optional): if both devices are on Tailscale, use a MagicDNS name or tailnet IP instead of `.local`, e.g. `http://<gateway-magicdns>:18793/__clawdbot__/canvas/`.
- iOS may require App Transport Security allowances to load plain `http://` URLs; if it fails to load, prefer HTTPS or adjust the iOS apps ATS config.
- The Gateway canvas host serves `/__clawdbot__/canvas/` and `/__clawdbot__/a2ui/`.
- The iOS node auto-navigates to A2UI on connect when a canvas host URL is advertised.
- Return to the built-in scaffold with `canvas.navigate` and `{"url":""}`.
#### Draw with `canvas.eval`
### Canvas eval / snapshot
```bash
clawdbot nodes invoke --node "iOS Node" --command canvas.eval --params "$(cat <<'JSON'
{"javaScript":"(() => { const {ctx,setStatus} = window.__clawdbot; setStatus('Drawing','…'); ctx.clearRect(0,0,innerWidth,innerHeight); ctx.lineWidth=6; ctx.strokeStyle='#ff2d55'; ctx.beginPath(); ctx.moveTo(40,40); ctx.lineTo(innerWidth-40, innerHeight-40); ctx.stroke(); setStatus(null,null); return 'ok'; })()"}
JSON
)"
clawdbot nodes invoke --node "iOS Node" --command canvas.eval --params '{"javaScript":"(() => { const {ctx} = window.__clawdbot; ctx.clearRect(0,0,innerWidth,innerHeight); ctx.lineWidth=6; ctx.strokeStyle=\"#ff2d55\"; ctx.beginPath(); ctx.moveTo(40,40); ctx.lineTo(innerWidth-40, innerHeight-40); ctx.stroke(); return \"ok\"; })()"}'
```
#### Snapshot with `canvas.snapshot`
```bash
clawdbot nodes invoke --node 192.168.0.88 --command canvas.snapshot --params '{"maxWidth":900}'
clawdbot nodes invoke --node "iOS Node" --command canvas.snapshot --params '{"maxWidth":900,"format":"jpeg"}'
```
The response includes `{ format, base64 }` image data (default `format="jpeg"`; pass `{"format":"png"}` when you specifically need lossless PNG).
## Voice wake + talk mode
### Common gotchas
- Voice wake and talk mode are available in Settings.
- iOS may suspend background audio; treat voice features as best-effort when the app is not active.
- **iOS in background:** all `canvas.*` commands fail fast with `NODE_BACKGROUND_UNAVAILABLE` (bring the iOS node app to foreground).
- **Return to default scaffold:** `canvas.navigate` with `{"url":""}` or `{"url":"/"}` returns to the built-in scaffold page.
- **mDNS blocked:** some networks block multicast; use a different LAN or plan a tailnet-capable bridge (see [`docs/discovery.md`](/gateway/discovery)).
- **Wrong node selector:** `--node` can be the node id (UUID), display name (e.g. `iOS Node`), IP, or an unambiguous prefix. If its ambiguous, the CLI will tell you.
- **Stale pairing / Keychain cleared:** if the pairing token is missing (or iOS Keychain was wiped), the node must pair again; approve a new pending request.
- **App reinstall but no reconnect:** the node restores `instanceId` + last bridge preference from Keychain; if it still comes up “unpaired”, verify Keychain persistence on your device/simulator and re-pair once.
## Common errors
## Design + Architecture
### Goals
- Build an **iOS app** that acts as a **remote node** for Clawdbot:
- **Voice trigger** (wake-word / always-listening intent) that forwards transcripts to the Gateway `agent` method.
- **Canvas** surface that the agent can control: navigate, draw/render, evaluate JS, snapshot.
- **Dead-simple setup**:
- Auto-discover the host on the local network via **Bonjour**.
- One-tap pairing with an approval prompt on the Mac.
- iOS is **never** a local gateway; it is always a remote node.
- Operational clarity:
- When iOS is backgrounded, voice may still run; **canvas commands must fail fast** with a structured error.
- Provide **settings**: node display name, enable/disable voice wake, pairing status.
Non-goals (v1):
- Exposing the Node Gateway directly on the LAN.
- Supporting arbitrary third-party “plugins” on iOS.
- Perfect App Store compliance; this is **internal-only** initially.
### Current repo reality (constraints we respect)
- The Gateway WebSocket server binds to `127.0.0.1:18789` ([`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts)) with an optional `CLAWDBOT_GATEWAY_TOKEN`.
- The Gateway exposes a Canvas file server (`canvasHost`) on `canvasHost.port` (default `18793`), so nodes can `canvas.navigate` to `http://<lanHost>:18793/__clawdbot__/canvas/` and auto-reload on file changes ([`docs/configuration.md`](/gateway/configuration)).
- macOS “Canvas” is controlled via the Gateway node protocol (`canvas.*`), matching iOS/Android ([`docs/mac/canvas.md`](/platforms/mac/canvas)).
- Voice wake forwards via `GatewayChannel` to Gateway `agent` (mac app: `VoiceWakeForwarder` → `GatewayConnection.sendAgent`).
### Recommended topology (B): Gateway-owned Bridge + loopback Gateway
Keep the Node gateway loopback-only; expose a dedicated **gateway-owned bridge** to the LAN/tailnet.
**iOS App** ⇄ (TLS + pairing) ⇄ **Bridge (in gateway)** ⇄ (loopback) ⇄ **Gateway WS** (`ws://127.0.0.1:18789`)
Why:
- Preserves current threat model: Gateway remains local-only.
- Centralizes auth, rate limiting, and allowlisting in the bridge.
- Lets us unify “canvas node” semantics across mac + iOS without exposing raw gateway methods.
### Security plan (internal, but still robust)
#### Transport
- **Current (v0):** bridge is a LAN-facing **TCP** listener with token-based auth after pairing.
- **Next:** wrap the bridge in **TLS** and prefer key-pinned or mTLS-like auth after pairing.
#### Pairing
- Bonjour discovery shows a candidate “Clawdbot Bridge” on the LAN.
- First connection:
1) iOS generates a keypair (Secure Enclave if available).
2) iOS connects to the bridge and requests pairing.
3) The bridge forwards the pairing request to the **Gateway** as a *pending request*.
4) Approval can happen via:
- **macOS UI** (Clawdbot shows an alert with Approve/Reject/Later, including the node IP), or
- **Terminal/CLI** (headless flows).
5) Once approved, the bridge returns a token to iOS; iOS stores it in Keychain.
- Subsequent connections:
- The bridge requires the paired identity. Unpaired clients get a structured “not paired” error and no access.
##### Gateway-owned pairing (Option B details)
Pairing decisions must be owned by the Gateway (`clawd` / Node) so nodes can be approved without the macOS app running.
Key idea:
- The Swift app may still show an alert, but it is only a **frontend** for pending requests stored in the Gateway.
Desired behavior:
- If the Swift UI is present: show alert with Approve/Reject/Later.
- If the Swift UI is not present: `clawdbot` CLI can list pending requests and approve/reject.
See [`docs/gateway/pairing.md`](/gateway/pairing) for the API/events and storage.
CLI (headless approvals):
- `clawdbot nodes pending`
- `clawdbot nodes approve <requestId>`
- `clawdbot nodes reject <requestId>`
#### Authorization / scope control (bridge-side ACL)
The bridge must not be a raw proxy to every gateway method.
- Allow by default:
- `agent` (with guardrails; idempotency required)
- minimal `system-event` beacons (presence updates for the node)
- node/canvas methods defined below (new protocol surface)
- Deny by default:
- anything that widens control without explicit intent (future “shell”, “files”, etc.)
- Rate limit:
- handshake attempts
- voice forwards per minute
- snapshot frequency / payload size
### Protocol unification: add “node/canvas” to Gateway protocol
#### Principle
Unify mac Canvas + iOS Canvas under a single conceptual surface:
- The agent talks to the Gateway using a stable method set (typed protocol).
- The Gateway routes node-targeted requests to:
- local mac Canvas implementation, or
- remote iOS node via the bridge
#### Minimal protocol additions (v1)
Add to [`src/gateway/protocol/schema.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/protocol/schema.ts) (and regenerate Swift models):
**Identity**
- Node identity comes from `connect.params.client.instanceId` (stable), and `connect.params.client.mode = "node"` (or `"ios-node"`).
**Methods**
- `node.list` → list paired/connected nodes + capabilities
- `node.describe` → describe a node (capabilities + supported `node.invoke` commands)
- `node.invoke` → send a command to a specific node
- Params: `{ nodeId, command, params?, timeoutMs? }`
**Events**
- `node.event` → async node status/errors
- e.g. background/foreground transitions, voice availability, canvas availability
#### Node command set (canvas)
These are values for `node.invoke.command`:
- `canvas.present` / `canvas.hide`
- `canvas.navigate` with `{ url }` (loads a URL; use `""` or `"/"` to return to the default scaffold)
- `canvas.eval` with `{ javaScript }`
- `canvas.snapshot` with `{ maxWidth?, quality?, format? }`
- A2UI (mobile + macOS canvas):
- `canvas.a2ui.push` with `{ messages: [...] }` (A2UI v0.8 server→client messages)
- `canvas.a2ui.pushJSONL` with `{ jsonl: "..." }` (legacy alias)
- `canvas.a2ui.reset`
- A2UI is hosted by the Gateway canvas host (`/__clawdbot__/a2ui/`) on `canvasHost.port`. Commands fail if the host is unreachable.
Result pattern:
- Request is a standard `req/res` with `ok` / `error`.
- Long operations (loads, streaming drawing, etc.) may also emit `node.event` progress.
##### Current (implemented)
As of 2025-12-13, the Gateway supports `node.invoke` for bridge-connected nodes.
Example: draw a diagonal line on the iOS Canvas:
```bash
clawdbot nodes invoke --node ios-node --command canvas.eval --params '{"javaScript":"(() => { const {ctx} = window.__clawdbot; ctx.clearRect(0,0,innerWidth,innerHeight); ctx.lineWidth=6; ctx.strokeStyle=\"#ff2d55\"; ctx.beginPath(); ctx.moveTo(40,40); ctx.lineTo(innerWidth-40, innerHeight-40); ctx.stroke(); return \"ok\"; })()"}'
```
### Background behavior requirement
When iOS is backgrounded:
- Voice may still be active (subject to iOS suspension).
- **All `canvas.*` commands must fail** with a stable error code, e.g.:
- `NODE_BACKGROUND_UNAVAILABLE`
- Include `retryable: true` and `retryAfterMs` if we want the agent to wait.
## iOS app architecture (SwiftUI)
### App structure
- Single fullscreen Canvas surface (WKWebView).
- One settings entry point: a **gear button** that opens a settings sheet.
- All navigation is **agent-driven** (no local URL bar).
### Components
- `BridgeDiscovery`: Bonjour browse + resolve (Network.framework `NWBrowser`)
- `BridgeConnection`: TCP session + pairing handshake + reconnect (TLS planned)
- `NodeRuntime`:
- Voice pipeline (wake-word + capture + forward)
- Canvas pipeline (WKWebView controller + snapshot + eval)
- Background state tracking; enforces “canvas unavailable in background”
### Voice in background (internal)
- Enable background audio mode (and required session configuration) so the mic pipeline can keep running when the user switches apps.
- If iOS suspends the app anyway, surface a clear node status (`node.event`) so operators can see voice is unavailable.
## Code sharing (macOS + iOS)
Create/expand SwiftPM targets so both apps share:
- `ClawdbotProtocol` (generated models; platform-neutral)
- `ClawdbotGatewayClient` (shared WS framing + connect/req/res + seq-gap handling)
- `ClawdbotKit` (node/canvas command types + deep links + shared utilities)
macOS continues to own:
- local Canvas implementation details (custom scheme handler serving on-disk HTML, window/panel presentation)
iOS owns:
- iOS-specific audio/speech + WKWebView presentation and lifecycle
## Repo layout
- iOS app: `apps/ios/` (XcodeGen `project.yml`)
- Shared Swift packages: `apps/shared/`
- Lint/format: iOS target runs `swiftformat --lint` + `swiftlint lint` using repo configs (`.swiftformat`, `.swiftlint.yml`).
Generate the Xcode project:
```bash
cd apps/ios
xcodegen generate
open Clawdbot.xcodeproj
```
## Storage plan (private by default)
### iOS
- Canvas/workspace files (persistent, private):
- `Application Support/Clawdbot/canvas/<sessionKey>/...`
- Snapshots / temp exports (evictable):
- `Library/Caches/Clawdbot/canvas-snapshots/<sessionKey>/...`
- Credentials:
- Keychain (paired identity + bridge trust anchor)
- `NODE_BACKGROUND_UNAVAILABLE`: bring the iOS app to the foreground (canvas/camera/screen commands require it).
- `A2UI_HOST_NOT_CONFIGURED`: the Gateway did not advertise a canvas host URL; check `canvasHost` in [`docs/configuration.md`](/gateway/configuration).
- Pairing prompt never appears: run `clawdbot nodes pending` and approve manually.
- Reconnect fails after reinstall: the Keychain pairing token was cleared; re-pair the node.
## Related docs
- [`docs/gateway.md`](/gateway) (gateway runbook)
- [`docs/gateway/pairing.md`](/gateway/pairing) (approval + storage)
- [`docs/bonjour.md`](/gateway/bonjour) (discovery debugging)
- [`docs/discovery.md`](/gateway/discovery) (LAN vs tailnet vs SSH)
- [Pairing](/gateway/pairing)
- [Discovery](/gateway/discovery)
- [Bonjour](/gateway/bonjour)

View File

@@ -15,7 +15,7 @@ Goal: ship **Clawdbot.app** with a self-contained relay binary that can run both
App bundle layout:
- `Clawdbot.app/Contents/Resources/Relay/clawdbot`
- bun `--compile` relay executable built from [`dist/macos/relay.js`](https://github.com/clawdbot/clawdbot/blob/main/dist/macos/relay.js)
- bun `--compile` relay executable built from `dist/macos/relay.js`
- Supports:
- `clawdbot …` (CLI)
- `clawdbot gateway …` (LaunchAgent daemon)
@@ -47,7 +47,7 @@ Important bundler flags:
Version injection:
- `--define "__CLAWDBOT_VERSION__=\"<pkg version>\""`
- [`src/version.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/version.ts) also supports `__CLAWDBOT_VERSION__` (and `CLAWDBOT_BUNDLED_VERSION`) so `--version` doesnt depend on reading `package.json` at runtime.
- The relay honors `__CLAWDBOT_VERSION__` / `CLAWDBOT_BUNDLED_VERSION` so `--version` doesnt depend on reading `package.json` at runtime.
## Launchd (Gateway as LaunchAgent)
@@ -58,7 +58,7 @@ Plist location (per-user):
- `~/Library/LaunchAgents/com.clawdbot.gateway.plist`
Manager:
- [`apps/macos/Sources/Clawdbot/GatewayLaunchAgentManager.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/GatewayLaunchAgentManager.swift)
- The macOS app owns LaunchAgent install/update for the bundled gateway.
Behavior:
- “Clawdbot Active” enables/disables the LaunchAgent.
@@ -79,7 +79,7 @@ Symptom (when mis-signed):
Fix:
- The bun executable needs JIT-ish permissions under hardened runtime.
- [`scripts/codesign-mac-app.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/codesign-mac-app.sh) signs `Relay/clawdbot` with:
- `scripts/codesign-mac-app.sh` signs `Relay/clawdbot` with:
- `com.apple.security.cs.allow-jit`
- `com.apple.security.cs.allow-unsigned-executable-memory`
@@ -89,18 +89,14 @@ Problem:
- bun cant load some native Node addons like `sharp` (and we dont want to ship native addon trees for the gateway).
Solution:
- Central helper [`src/media/image-ops.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/media/image-ops.ts)
- Prefers `/usr/bin/sips` on macOS (esp. when running under bun)
- Falls back to `sharp` when available (Node/dev)
- Used by:
- [`src/web/media.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/web/media.ts) (optimize inbound/outbound images)
- [`src/browser/screenshot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/browser/screenshot.ts)
- [`src/agents/pi-tools.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-tools.ts) (image sanitization)
- Image operations prefer `/usr/bin/sips` on macOS (especially under bun).
- When running in Node/dev, `sharp` is used when available.
- This affects inbound/outbound media, screenshots, and tool image sanitization.
## Browser control server
The Gateway starts the browser control server (loopback only) from [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts).
Its started from the relay daemon process, so the relay binary includes Playwright deps.
The Gateway starts the browser control server (loopback only) from the relay daemon process,
so the relay binary includes Playwright deps.
## Tests / smoke checks
@@ -127,7 +123,7 @@ Bun may leave dotfiles like `*.bun-build` in the repo root or subfolders.
## DMG styling (human installer)
[`scripts/create-dmg.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/create-dmg.sh) styles the DMG via Finder AppleScript.
`scripts/create-dmg.sh` styles the DMG via Finder AppleScript.
Rules of thumb:
- Use a **72dpi** background image that matches the Finder window size in points.

View File

@@ -5,157 +5,117 @@ read_when:
- Adding agent controls for visual workspace
- Debugging WKWebView canvas loads
---
# Canvas (macOS app)
Status: draft spec · Date: 2025-12-12
The macOS app embeds an agentcontrolled **Canvas panel** using `WKWebView`. It
is a lightweight visual workspace for HTML/CSS/JS, A2UI, and small interactive
UI surfaces.
Note: for iOS/Android nodes that should render agent-edited HTML/CSS/JS over the network, prefer the Gateway `canvasHost` (serves `~/clawd/canvas` over LAN/tailnet with live reload). A2UI is also **hosted by the Gateway** over HTTP. This doc focuses on the macOS in-app canvas panel. See [`docs/configuration.md`](/gateway/configuration).
## Where Canvas lives
Clawdbot can embed an agent-controlled “visual workspace” panel (“Canvas”) inside the macOS app using `WKWebView`, served via a **custom URL scheme** (no loopback HTTP port required).
Canvas state is stored under Application Support:
This is designed for:
- Agent-written HTML/CSS/JS on disk (per-session directory).
- A real browser engine for layout, rendering, and basic interactivity.
- Agent-driven visibility (show/hide), navigation, DOM/JS queries, and snapshots.
- Minimal chrome: borderless panel; bezel/chrome appears only on hover.
- `~/Library/Application Support/Clawdbot/canvas/<session>/...`
## Why a custom scheme (vs. loopback HTTP)
The Canvas panel serves those files via a **custom URL scheme**:
Using `WKURLSchemeHandler` keeps Canvas entirely in-process:
- No port conflicts and no extra local server lifecycle.
- Easier to sandbox: only serve files we explicitly map.
- Works offline and can use an ephemeral data store (no persistent cookies/cache).
If a Canvas page truly needs “real web” semantics (CORS, fetch to loopback endpoints, service workers), consider the loopback-server variant instead (out of scope for this doc).
## URL ↔ directory mapping
The Canvas scheme is:
- `clawdbot-canvas://<session>/<path>`
Routing model:
- `clawdbot-canvas://main/``<canvasRoot>/main/index.html` (or `index.htm`)
- `clawdbot-canvas://main/yolo``<canvasRoot>/main/yolo/index.html` (or `index.htm`)
Examples:
- `clawdbot-canvas://main/``<canvasRoot>/main/index.html`
- `clawdbot-canvas://main/assets/app.css``<canvasRoot>/main/assets/app.css`
- `clawdbot-canvas://main/widgets/todo/``<canvasRoot>/main/widgets/todo/index.html`
Directory listings are not served.
If no `index.html` exists at the root, the app shows a **builtin scaffold page**.
When `/` has no `index.html` yet, the handler serves a **built-in scaffold page** (bundled with the macOS app).
This is a visual placeholder only (no A2UI renderer).
## Panel behavior
### Suggested on-disk location
- Borderless, resizable panel anchored near the menu bar (or mouse cursor).
- Remembers size/position per session.
- Autoreloads when local canvas files change.
- Only one Canvas panel is visible at a time (session is switched as needed).
Store Canvas state under the app support directory:
- `~/Library/Application Support/Clawdbot/canvas/<session>/…`
Canvas can be disabled from Settings → **Allow Canvas**. When disabled, canvas
node commands return `CANVAS_DISABLED`.
This keeps it alongside other app-owned state and avoids mixing with `~/.clawdbot/` gateway config.
## Agent API surface
## Panel behavior (agent-controlled)
Canvas is exposed via the **node bridge**, so the agent can:
Canvas is presented as a borderless `NSPanel` (similar to the existing WebChat panel):
- Can be shown/hidden at any time by the agent.
- Supports an “anchored” presentation (near the menu bar icon or another anchor rect).
- Uses a rounded container; shadow stays on, but **chrome/bezel only appears on hover**.
- Default position is the **top-right corner** of the current screens visible frame (unless the user moved/resized it previously).
- The panel is **user-resizable** (edge resize + hover resize handle) and the last frame is persisted per session.
- show/hide the panel
- navigate to a path or URL
- evaluate JavaScript
- capture a snapshot image
### Hover-only chrome
CLI examples:
Implementation notes:
- Keep the window borderless at all times (dont toggle `styleMask`).
- Add an overlay view inside the content container for chrome (stroke + subtle gradient/material).
- Use an `NSTrackingArea` to fade the chrome in/out on `mouseEntered/mouseExited`.
- Optionally show close/drag affordances only while hovered.
```bash
clawdbot nodes canvas present --node <id>
clawdbot nodes canvas navigate --node <id> --url "/"
clawdbot nodes canvas eval --node <id> --js "document.title"
clawdbot nodes canvas snapshot --node <id>
```
## Agent API surface (current)
Notes:
- `canvas.navigate` accepts **local canvas paths**, `http(s)` URLs, and `file://` URLs.
- If you pass `"/"`, the Canvas shows the local scaffold or `index.html`.
Canvas is exposed via the Gateway **node bridge**, so the agent can:
- Show/hide the panel.
- Navigate to a path (relative to the session root).
- Evaluate JavaScript and optionally return results.
- Query/modify DOM (helpers mirroring “dom query/all/attr/click/type/wait” patterns).
- Capture a snapshot image of the current canvas view.
- Optionally set panel placement (screen `x/y` + `width/height`) when showing/navigating.
## A2UI in Canvas
This should be modeled after `WebChatManager`/`WebChatSwiftUIWindowController` but targeting `clawdbot-canvas://…` URLs.
A2UI is hosted by the Gateway canvas host and rendered inside the Canvas panel.
When the Gateway advertises a Canvas host, the macOS app autonavigates to the
A2UI host page on first open.
Related:
- For “invoke the agent again from UI” flows, prefer the macOS deep link scheme (`clawdbot://agent?...`) so *any* UI surface (Canvas, WebChat, native views) can trigger a new agent run. See [`docs/macos.md`](/platforms/macos).
## Agent commands (current)
Use the main `clawdbot` CLI; it invokes canvas commands via `node.invoke`.
- `clawdbot nodes canvas present --node <id> [--target <...>] [--x/--y/--width/--height]`
- Local targets map into the session directory via the custom scheme (directory targets resolve `index.html|index.htm`).
- If `/` has no index file, Canvas shows the built-in scaffold page and returns `status: "welcome"`.
- `clawdbot nodes canvas hide --node <id>`
- `clawdbot nodes canvas eval --js <code> --node <id>`
- `clawdbot nodes canvas snapshot --node <id>`
### Canvas A2UI
Canvas A2UI is hosted by the **Gateway canvas host** at:
Default A2UI host URL:
```
http://<gateway-host>:18793/__clawdbot__/a2ui/
```
The macOS app simply renders that page in the Canvas panel. The agent can drive it with JSONL **server→client protocol messages** (one JSON object per line):
### A2UI commands (v0.8)
- `clawdbot nodes canvas a2ui push --jsonl <path> --node <id>`
- `clawdbot nodes canvas a2ui reset --node <id>`
Canvas currently accepts **A2UI v0.8** server→client messages:
`push` expects a JSONL file where **each line is a single JSON object** (parsed and forwarded to the in-page A2UI renderer).
- `beginRendering`
- `surfaceUpdate`
- `dataModelUpdate`
- `deleteSurface`
Minimal example (v0.8):
`createSurface` (v0.9) is not supported.
CLI example:
```bash
cat > /tmp/a2ui-v0.8.jsonl <<'EOF'
{"surfaceUpdate":{"surfaceId":"main","components":[{"id":"root","component":{"Column":{"children":{"explicitList":["title","content"]}}}},{"id":"title","component":{"Text":{"text":{"literalString":"Canvas (A2UI v0.8)"},"usageHint":"h1"}}},{"id":"content","component":{"Text":{"text":{"literalString":"If you can read this, `nodes canvas a2ui push` works."},"usageHint":"body"}}}]}}
cat > /tmp/a2ui-v0.8.jsonl <<'EOFA2'
{"surfaceUpdate":{"surfaceId":"main","components":[{"id":"root","component":{"Column":{"children":{"explicitList":["title","content"]}}}},{"id":"title","component":{"Text":{"text":{"literalString":"Canvas (A2UI v0.8)"},"usageHint":"h1"}}},{"id":"content","component":{"Text":{"text":{"literalString":"If you can read this, A2UI push works."},"usageHint":"body"}}}]}}
{"beginRendering":{"surfaceId":"main","root":"root"}}
EOF
EOFA2
clawdbot nodes canvas a2ui push --jsonl /tmp/a2ui-v0.8.jsonl --node <id>
```
Notes:
- This does **not** support the A2UI v0.9 examples using `createSurface`.
- A2UI **fails** if the Gateway canvas host is unreachable (no local fallback).
- `nodes canvas a2ui push` validates JSONL (line numbers on errors) and rejects v0.9 payloads.
- Quick smoke: `clawdbot nodes canvas a2ui push --node <id> --text "Hello from A2UI"` renders a minimal v0.8 view.
Quick smoke:
## Triggering agent runs from Canvas (deep links)
```bash
clawdbot nodes canvas a2ui push --node <id> --text "Hello from A2UI"
```
## Triggering agent runs from Canvas
Canvas can trigger new agent runs via deep links:
Canvas can trigger new agent runs via the macOS app deep-link scheme:
- `clawdbot://agent?...`
This is intentionally separate from `clawdbot-canvas://…` (which is only for serving local Canvas files into the `WKWebView`).
Example (in JS):
Suggested patterns:
- HTML: render links/buttons that navigate to `clawdbot://agent?message=...`.
- JS: set `window.location.href = 'clawdbot://agent?...'` for “run this now” actions.
```js
window.location.href = "clawdbot://agent?message=Review%20this%20design";
```
Implementation note (important):
- In `WKWebView`, intercept `clawdbot://…` navigations in `WKNavigationDelegate` and forward them to the app, e.g. by calling `DeepLinkHandler.shared.handle(url:)` and returning `.cancel` for the navigation.
The app prompts for confirmation unless a valid key is provided.
Safety:
- Deep links (`clawdbot://agent?...`) are always enabled.
- Without a `key` query param, the app will prompt for confirmation before invoking the agent.
- With a valid `key`, the run is unattended (no prompt). For Canvas-originated actions, the app injects an internal key automatically.
## Security notes
## Security / guardrails
Recommended defaults:
- `WKWebsiteDataStore.nonPersistent()` for Canvas (ephemeral).
- Navigation policy: allow only `clawdbot-canvas://…` (and optionally `about:blank`); open `http/https` externally.
- Scheme handler must prevent directory traversal: resolved file paths must stay under `<canvasRoot>/<session>/`.
- Disable or tightly scope any JS bridge; prefer query-string/bootstrap config over `window.webkit.messageHandlers` for sensitive data.
## Debugging
Suggested debugging hooks:
- Enable Web Inspector for Canvas builds (same approach as WebChat).
- Log scheme requests + resolution decisions to OSLog (subsystem `com.clawdbot`, category `Canvas`).
- Provide a “copy canvas dir” action in debug settings to quickly reveal the session directory in Finder.
- Canvas scheme blocks directory traversal; files must live under the session root.
- Local Canvas content uses a custom scheme (no loopback server required).
- External `http(s)` URLs are allowed only when explicitly navigated.

View File

@@ -1,72 +1,56 @@
---
summary: "Running the gateway as a child process of the macOS app and why"
summary: "Gateway lifecycle on macOS (launchd + attach-only)"
read_when:
- Integrating the mac app with the gateway lifecycle
---
# Clawdbot gateway as a child process of the macOS app
# Gateway lifecycle on macOS
Date: 2025-12-06 · Status: draft · Owner: steipete
The macOS app **manages the Gateway via launchd** by default. This gives you
reliable autostart at login and restart on crashes.
Note (2025-12-19): the current implementation prefers a **launchd LaunchAgent** that runs the **bundled bun-compiled gateway**. This doc remains as an alternative mode for tighter coupling to the UI.
Childprocess mode (Gateway spawned directly by the app) is **not in use** today.
If you need tighter coupling to the UI, use **Attachonly** and run the Gateway
manually in a terminal.
## Goal
Run the Node-based Clawdbot/clawdbot gateway as a direct child of the LSUIElement app (instead of a launchd agent) while keeping all TCC-sensitive work inside the Swift app/broker layer and wiring the existing “Clawdbot Active” toggle to start/stop the child.
## Default behavior (launchd)
## When to prefer the child-process mode
- You want gateway lifetime strictly coupled to the menu-bar app (dies when the app quits) and controlled by the “Clawdbot Active” toggle without touching launchd.
- Youre okay giving up login persistence/auto-restart that launchd provides, or youll add your own backoff loop.
- You want simpler log capture and supervision inside the app (no external plist or user-visible LaunchAgent).
- The app installs a peruser LaunchAgent labeled `com.clawdbot.gateway`.
- When Local mode is enabled, the app ensures the LaunchAgent is loaded and
starts the Gateway if needed.
- Logs are written to the launchd gateway log path (visible in Debug Settings).
## Tradeoffs vs. launchd
- **Pros:** tighter coupling to UI state; simpler surface (no plist install/bootout); easier to stream stdout/stderr; fewer moving parts for beta users.
- **Cons:** no built-in KeepAlive/login auto-start; app crash kills gateway; you must build your own restart/backoff; Activity Monitor will show both processes under the app; still need correct TCC handling (see below).
- **TCC:** behaviorally, child processes often inherit the parent apps “responsible process” for TCC, but this is *not a contract*. Continue to route all protected actions through the Swift app/broker so prompts stay tied to the signed app bundle.
Common commands:
## TCC guardrails (must keep)
- Screen Recording, Accessibility, mic, and speech prompts must originate from the signed Swift app/broker. The Node child should never call these APIs directly; route through the apps node commands (via Gateway `node.invoke`) for:
- `system.notify`
- `system.run` (including `needsScreenRecording`)
- `screen.record` / `camera.*`
- PeekabooBridge UI automation (`peekaboo …`)
- Usage strings (`NSMicrophoneUsageDescription`, `NSSpeechRecognitionUsageDescription`, etc.) stay in the app targets Info.plist; a bare Node binary has none and would fail.
- If you ever embed Node that *must* touch TCC, wrap that call in a tiny signed helper target inside the app bundle and have Node exec that helper instead of calling the API directly.
```bash
launchctl kickstart -k gui/$UID/com.clawdbot.gateway
launchctl bootout gui/$UID/com.clawdbot.gateway
```
## Process manager design (Swift Subprocess)
- Add a small `GatewayProcessManager` (Swift) that owns:
- `execution: Execution?` from `Swift Subprocess` to track the child.
- `start(config)` called when “Clawdbot Active” flips ON:
- binary: host Node running the bundled gateway under `Clawdbot.app/Contents/Resources/Gateway/`
- args: current clawdbot entrypoint and flags
- cwd/env: point to `~/.clawdbot` as today; inject the expanded PATH so Homebrew Node resolves under launchd
- output: stream stdout/stderr to `/tmp/clawdbot-gateway.log` (cap buffer via Subprocess OutputLimits)
- restart: optional linear/backoff restart if exit was non-zero and Active is still true
- `stop()` called when Active flips OFF or app terminates: cancel the execution and `waitUntilExit`.
- Wire SwiftUI toggle:
- ON: `GatewayProcessManager.start(...)`
- OFF: `GatewayProcessManager.stop()` (no launchctl calls in this mode)
- Keep the existing `LaunchdManager` around so we can switch back if needed; the toggle can choose between launchd or child mode with a flag if we want both.
## Attachonly (developer mode)
## Packaging and signing
- Bundle the gateway payload (dist + production node_modules) under `Contents/Resources/Gateway/`; rely on host Node ≥22 instead of embedding a runtime.
- Codesign native addons and dylibs inside the bundle; no nested runtime binary to sign now.
- Host runtime should not call TCC APIs directly; keep privileged work inside the app/broker.
Attachonly tells the app to **connect to an existing Gateway** without spawning
one. This is ideal for local dev (hotreload, custom flags).
## Logging and observability
- Stream child stdout/stderr to `/tmp/clawdbot-gateway.log`; surface the last N lines in the Debug tab.
- Emit a user notification (via existing NotificationManager) on crash/exit while Active is true.
- Add a lightweight heartbeat from Node → app (e.g., ping over stdout) so the app can show status in the menu.
Steps:
## Failure/edge cases
- App crash/quit kills the gateway. Decide if that is acceptable for the deployment tier; otherwise, stick with launchd for production and keep child-process for dev/experiments.
- If the gateway exits repeatedly, back off (e.g., 1s/2s/5s/10s) and give up after N attempts with a menu warning.
- Respect the existing pause semantics: when paused, the broker should return `ok=false, "clawdbot paused"`; the gateway should avoid calling privileged routes while paused.
1) Start the Gateway yourself:
```bash
pnpm gateway:watch
```
2) In the macOS app: Debug Settings → Gateway → **Attach only**.
## Open questions / follow-ups
- Do we need dual-mode (launchd for prod, child for dev)? If yes, gate via a setting or build flag.
- Embedding a runtime is off the table for now; we rely on host Node for size/simplicity. Revisit only if host PATH drift becomes painful.
- Do we want a tiny signed helper for rare TCC actions that cannot be brokered via the Swift app/broker?
The UI should show “Using existing gateway …” once connected.
## Decision snapshot (current recommendation)
- Keep all TCC surfaces in the Swift app/broker (node commands + PeekabooBridgeHost).
- Implement `GatewayProcessManager` with Swift Subprocess to start/stop the gateway on the “Clawdbot Active” toggle.
- Maintain the launchd path as a fallback for uptime/login persistence until child-mode proves stable.
## Remote mode
Remote mode never starts a local Gateway. The app uses an SSH tunnel to the
remote host and connects over that tunnel.
## Why we prefer launchd
- Autostart at login.
- Builtin restart/KeepAlive semantics.
- Predictable logs and supervision.
If a true childprocess mode is ever needed again, it should be documented as a
separate, explicit devonly mode.

View File

@@ -1,170 +1,62 @@
---
summary: "Plan for integrating Peekaboo automation into Clawdbot via PeekabooBridge (socket-based TCC broker)"
summary: "PeekabooBridge integration for macOS UI automation"
read_when:
- Hosting PeekabooBridge in Clawdbot.app
- Integrating Peekaboo as a submodule
- Changing PeekabooBridge protocol/paths
---
# Peekaboo Bridge in Clawdbot (macOS UI automation broker)
# Peekaboo Bridge (macOS UI automation)
## TL;DR
- **Peekaboo removed its XPC helper** and now exposes privileged automation via a **UNIX domain socket bridge** (`PeekabooBridge` / `PeekabooBridgeHost`, socket name `bridge.sock`).
- Clawdbot integrates by **optionally hosting the same bridge** inside **Clawdbot.app** (user-toggleable). The primary client is the **`peekaboo` CLI** (installed via npm); Clawdbot does not need its own `ui …` CLI surface.
- For **visualizations**, we keep them in **Peekaboo.app** (best UX); Clawdbot stays a thin broker host. No visualizer toggle in Clawdbot.
Clawdbot can host **PeekabooBridge** as a local, permissionaware UI automation
broker. This lets the `peekaboo` CLI drive UI automation while reusing the
macOS apps TCC permissions.
Non-goals:
- No auto-launching Peekaboo.app.
- No onboarding deep links from the automation endpoint (Clawdbot onboarding already handles permissions).
- No AI provider/agent runtime dependencies in Clawdbot (avoid pulling Tachikoma/MCP into the Clawdbot app/CLI).
## What this is (and isnt)
## Big refactor (Dec 2025): XPC → Bridge
Peekaboos privileged execution moved from “CLI → XPC helper” to “CLI → socket bridge host”. For Clawdbot this is a win:
- It matches the existing “local socket + codesign checks” approach.
- It lets us piggyback on **either** Peekaboo.apps permissions **or** Clawdbot.apps permissions (whichever is running).
- It avoids “two apps with two TCC bubbles” unless needed.
- **Host**: Clawdbot.app can act as a PeekabooBridge host.
- **Client**: use the `peekaboo` CLI (no separate `clawdbot ui ...` surface).
- **UI**: visual overlays stay in Peekaboo.app; Clawdbot is a thin broker host.
Reference (Peekaboo submodule): `Peekaboo/docs/bridge-host.md`.
## Enable the bridge
## Architecture
### Processes
- **Bridge hosts** (provide TCC-backed automation):
- **Peekaboo.app** (preferred; also provides visualizations + controls)
- **Claude.app** (secondary; lets `peekaboo` reuse Claude Desktops granted permissions)
- **Clawdbot.app** (secondary; “thin host” only)
- **Bridge clients** (trigger single actions):
- `peekaboo …` (preferred; humans + agents)
- Optional: Clawdbot/Node shells out to `peekaboo` when it needs UI automation/capture
In the macOS app:
- Settings → **Enable Peekaboo Bridge**
### Host discovery (client-side)
Order is deliberate:
1. Peekaboo.app host (full UX)
2. Claude.app host (piggyback on Claude Desktop permissions)
3. Clawdbot.app host (piggyback on Clawdbot permissions)
When enabled, Clawdbot starts a local UNIX socket server. If disabled, the host
is stopped and `peekaboo` will fall back to other available hosts.
Socket paths (convention; exact paths must match Peekaboo):
- Peekaboo: `~/Library/Application Support/Peekaboo/bridge.sock`
- Claude: `~/Library/Application Support/Claude/bridge.sock`
- Clawdbot: `~/Library/Application Support/clawdbot/bridge.sock`
## Client discovery order
No auto-launch: if a host isnt reachable, the command fails with a clear error (start Peekaboo.app, Claude.app, or Clawdbot.app).
Peekaboo clients typically try hosts in this order:
Override (debugging): set `PEEKABOO_BRIDGE_SOCKET=/path/to/bridge.sock`.
1. Peekaboo.app (full UX)
2. Claude.app (if installed)
3. Clawdbot.app (thin broker)
### Protocol shape
- **Single request per connection**: connect → write one JSON request → half-close → read one JSON response → close.
- **Timeout**: 10 seconds end-to-end per action (client enforced; host should also enforce per-operation).
- **Errors**: human-readable string by default; structured envelope in `--json`.
Use `peekaboo bridge status --verbose` to see which host is active and which
socket path is in use. You can override with:
## Dependency strategy (submodule)
Integrate Peekaboo via git submodule (nested submodules are OK).
```bash
export PEEKABOO_BRIDGE_SOCKET=/path/to/bridge.sock
```
Path in Clawdbot repo:
- `./Peekaboo` (Swabble-style; keep stable so SwiftPM path deps dont churn).
## Security & permissions
What Clawdbot should use:
- **Client side**: `PeekabooBridge` (socket client + protocol models).
- **Host side (Clawdbot.app)**: `PeekabooBridgeHost` + the minimal Peekaboo services needed to implement operations.
- The bridge validates **caller code signatures**; TeamID `Y5PE65HELJ` is
allowed by default (Peekaboos signing team), plus the Clawdbot apps TeamID.
- Requests time out after ~10 seconds.
- If required permissions are missing, the bridge returns a clear error message
rather than launching System Settings.
What Clawdbot should *not* embed:
- **Visualizer UI**: keep it in Peekaboo.app for now (toggle + controls live there).
- **XPC**: dont reintroduce helper targets; use the bridge.
## Snapshot behavior (automation)
## IPC / CLI surface
### No `clawdbot ui …`
We avoid a parallel “Clawdbot UI automation CLI”. Instead:
- `peekaboo` is the user/agent-facing CLI surface for automation and capture.
- Clawdbot.app can host PeekabooBridge as a **thin TCC broker** so Peekaboo can piggyback on Clawdbot permissions when Peekaboo.app isnt running.
Snapshots are stored in memory and expire automatically after a short window.
If you need longer retention, recapture from the client.
### Diagnostics
Use Peekaboos built-in diagnostics to see which host would be used:
- `peekaboo bridge status`
- `peekaboo bridge status --verbose`
- `peekaboo bridge status --json`
## Troubleshooting
### Output format
Peekaboo commands default to human text output. Add `--json` for a structured envelope.
### Timeouts
Default timeout for UI actions: **10 seconds** end-to-end (client enforced; host should also enforce per-operation).
## Coordinate model (multi-display)
Requirement: coordinates are **per screen**, not global.
Standardize for the CLI (agent-friendly): **top-left origin per screen**.
Proposed request shape:
- Requests accept `screenIndex` + `{x, y}` in that screens local coordinate space.
- Clawdbot.app converts to global CG coordinates using `NSScreen.screens[screenIndex].frame.origin`.
- Responses should echo both:
- The resolved `screenIndex`
- The local `{x, y}` and bounds
- Optionally the global `{x, y}` for debugging
Ordering: use `NSScreen.screens` ordering consistently (documented in the CLI help + JSON schema).
## Targeting (per app/window)
Expose window/app targeting in the UI surface (align with Peekaboo targeting):
- frontmost
- by app name / bundle id
- by window title substring
- by (app, index)
Peekaboo CLI targeting (agent-friendly):
- `--bundle-id <id>` for app targeting
- `--window-index <n>` (0-based) for disambiguating within an app when capturing
All “see/click/type/scroll/wait” requests should accept a target (default: frontmost).
## “See” + click packs (Playwright-style)
Behavior stays aligned with Peekaboo:
- `peekaboo see` returns element IDs (e.g. `B1`, `T3`) with bounds/labels.
- Follow-up actions reference those IDs without re-scanning.
`peekaboo see` should:
- capture (optionally targeted) window/screen
- return a screenshot **file path** (default: temp directory)
- return a list of elements (text or JSON)
Snapshot lifecycle requirement:
- Host apps are long-lived, so snapshot state should be **in-memory by default**.
- Snapshot scoping: “implicit snapshot” is **per target bundle id** (reuse last snapshot for that app when snapshot id is omitted).
Practical flow (agent-friendly):
- `peekaboo list apps` / `peekaboo list windows` provide bundle-id context for targeting.
- `peekaboo see --bundle-id X` updates the implicit snapshot for `X`.
- `peekaboo click --bundle-id X --on B1` reuses the most recent snapshot for `X` when `--snapshot-id` is omitted.
## Visualizer integration
Keep visualizations in **Peekaboo.app** for now.
- Clawdbot hosts the bridge, but does not render overlays.
- Any “visualizer enabled/disabled” setting is controlled in Peekaboo.app.
## Screenshots (legacy → Peekaboo takeover)
Clawdbot should not grow a separate screenshot CLI surface.
Migration plan:
- Use `peekaboo capture …` / `peekaboo see …` (returns a file path, default temp directory).
- Once Clawdbot legacy screenshot plumbing is replaced, remove it cleanly (no aliases).
## Permissions behavior
If required permissions are missing:
- return `ok=false` with a short human error message (e.g., “Accessibility permission missing”)
- do not try to open System Settings from the automation endpoint
## Security (socket auth)
Both hosts must enforce:
- filesystem perms on the socket path (owner read/write only)
- server-side caller validation:
- require the callers code signature TeamID to be `Y5PE65HELJ`
- optional bundle-id allowlist for tighter scoping
Debug-only escape hatch (development convenience):
- “allow same-UID callers” means: *skip codesign checks for clients running under the same Unix user*.
- This must be **opt-in**, **DEBUG-only**, and guarded by an env var (Peekaboo uses `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1`).
## Next integration steps (after this doc)
1. Add Peekaboo as a git submodule (nested submodules OK).
2. Host `PeekabooBridgeHost` inside Clawdbot.app behind a single setting (“Enable Peekaboo Bridge”, default on).
3. Ensure Clawdbot hosts the bridge at `~/Library/Application Support/clawdbot/bridge.sock` and speaks the PeekabooBridge JSON protocol.
4. Validate with `peekaboo bridge status --verbose` that Peekaboo can select Clawdbot as the fallback host (no auto-launch).
5. Keep all protocol decisions aligned with Peekaboo (coordinate system, element IDs, snapshot scoping, error envelopes).
- If `peekaboo` reports “bridge client is not authorized”, ensure the client is
properly signed or run the host with `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1`
in **debug** mode only.
- If no hosts are found, open one of the host apps (Peekaboo.app or Clawdbot.app)
and confirm permissions are granted.

View File

@@ -3,25 +3,37 @@ summary: "How the mac app embeds the gateway WebChat and how to debug it"
read_when:
- Debugging mac WebChat view or loopback port
---
# Web Chat (macOS app)
# WebChat (macOS app)
The macOS menu bar app shows the WebChat UI as a native SwiftUI view and reuses the **primary Clawd session** (`main`, or `global` when scope is global).
The macOS menu bar app embeds the WebChat UI as a native SwiftUI view. It
connects to the Gateway and defaults to the **main session** for the selected
agent (with a session switcher for other sessions).
- **Local mode**: connects directly to the local Gateway WebSocket.
- **Remote mode**: forwards the Gateway WebSocket control port over SSH and uses that as the data plane.
- **Remote mode**: forwards the Gateway control port over SSH and uses that
tunnel as the data plane.
## Launch & debugging
- Manual: Lobster menu → “Open Chat”.
- Auto-open for testing: run `dist/Clawdbot.app/Contents/MacOS/Clawdbot --webchat` (or pass `--webchat` to the binary launched by launchd). The window opens on startup.
- Logs: see [`./scripts/clawlog.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/clawlog.sh) (subsystem `com.clawdbot`, category `WebChatSwiftUI`).
- Autoopen for testing:
```bash
dist/Clawdbot.app/Contents/MacOS/Clawdbot --webchat
```
- Logs: `./scripts/clawlog.sh` (subsystem `com.clawdbot`, category `WebChatSwiftUI`).
## How its wired
- Implementation: [`apps/macos/Sources/Clawdbot/WebChatSwiftUI.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/WebChatSwiftUI.swift) hosts `ClawdbotChatUI` and speaks to the Gateway over `GatewayConnection`.
- Data plane: Gateway WebSocket methods `chat.history`, `chat.send`, `chat.abort`; events `chat`, `agent`, `presence`, `tick`, `health`.
- Session: usually primary (`main`); multiple transports (WhatsApp/Telegram/Discord/Desktop) share the same key. The onboarding flow uses a dedicated `onboarding` session to keep first-run setup separate.
## Security / surface area
- Data plane: Gateway WS methods `chat.history`, `chat.send`, `chat.abort` and
events `chat`, `agent`, `presence`, `tick`, `health`.
- Session: defaults to the primary session (`main`, or `global` when scope is
global). The UI can switch between sessions.
- Onboarding uses a dedicated session to keep firstrun setup separate.
## Security surface
- Remote mode forwards only the Gateway WebSocket control port over SSH.
## Known limitations
- The UI is optimized for the primary session and typical “chat” usage (not a full browser-based sandbox surface).
- The UI is optimized for chat sessions (not a full browser sandbox).

View File

@@ -3,7 +3,7 @@ summary: "macOS IPC architecture for Clawdbot app, gateway node bridge, and Peek
read_when:
- Editing IPC contracts or menu bar app IPC
---
# Clawdbot macOS IPC architecture (Dec 2025)
# Clawdbot macOS IPC architecture
**Current model:** there is **no local control socket** and no `clawdbot-mac` CLI. All agent actions go through the Gateway WebSocket and `node.invoke`. UI automation still uses PeekabooBridge.
@@ -21,10 +21,10 @@ read_when:
- UI automation uses a separate UNIX socket named `bridge.sock` and the PeekabooBridge JSON protocol.
- Host preference order (client-side): Peekaboo.app → Claude.app → Clawdbot.app → local execution.
- Security: bridge hosts require TeamID `Y5PE65HELJ`; DEBUG-only same-UID escape hatch is guarded by `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` (Peekaboo convention).
- See: [`docs/mac/peekaboo.md`](/platforms/mac/peekaboo) for the Clawdbot plan and naming.
- See: [`docs/mac/peekaboo.md`](/platforms/mac/peekaboo) for PeekabooBridge usage.
### Mach/XPC (future direction)
- Still optional for internal app services, but **not required** for automation now that node.invoke is the surface.
### Mach/XPC
- Not required for automation; `node.invoke` + PeekabooBridge cover current needs.
## Operational flows
- Restart/rebuild: `SIGN_IDENTITY="Apple Development: Peter Steinberger (2ZAC4GM7GD)" scripts/restart-mac.sh`
@@ -37,4 +37,4 @@ read_when:
- Prefer requiring a TeamID match for all privileged surfaces.
- PeekabooBridge: `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` (DEBUG-only) may allow same-UID callers for local development.
- All communication remains local-only; no network sockets are exposed.
- TCC prompts originate only from the GUI app bundle; run [`scripts/package-mac-app.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/package-mac-app.sh) so the signed bundle ID stays stable.
- TCC prompts originate only from the GUI app bundle; keep the signed bundle ID stable across rebuilds.

View File

@@ -1,123 +1,97 @@
---
summary: "Spec for the Clawdbot macOS companion menu bar app (gateway + node broker)"
summary: "Clawdbot macOS companion app (menu bar + gateway broker)"
read_when:
- Implementing macOS app features
- Changing gateway lifecycle or node bridging on macOS
---
# Clawdbot macOS Companion (menu bar + gateway broker)
Author: steipete · Status: draft spec · Date: 2025-12-20
The macOS app is the **menubar companion** for Clawdbot. It owns permissions,
manages the Gateway locally, and exposes macOS capabilities to the agent as a
node.
## Support snapshot
- Core Gateway: supported (TypeScript on Node/Bun).
- Companion app: macOS menu bar app with permissions + node bridge.
- Install: [Getting Started](/start/getting-started) or [Install & updates](/install/updating).
- Gateway: [Runbook](/gateway) + [Configuration](/gateway/configuration).
## What it does
## System control (launchd)
If you run the bundled macOS app, it installs a per-user LaunchAgent labeled `com.clawdbot.gateway`.
CLI-only installs can use `clawdbot onboard --install-daemon`, `clawdbot daemon install`, or `clawdbot configure`**Gateway daemon**.
- Shows native notifications and status in the menu bar.
- Owns TCC prompts (Notifications, Accessibility, Screen Recording, Microphone,
Speech Recognition, Automation/AppleScript).
- Runs or connects to the Gateway (local or remote).
- Exposes macOSonly tools (Canvas, Camera, Screen Recording, `system.run`).
- Optionally hosts **PeekabooBridge** for UI automation.
- Installs a helper CLI (`clawdbot`) into `/usr/local/bin` and
`/opt/homebrew/bin` on request.
## Local vs remote mode
- **Local** (default): the app ensures a local Gateway is running via launchd.
- **Remote**: the app connects to a Gateway over SSH/Tailscale and never starts
a local process.
- **Attachonly** (debug): the app connects to an alreadyrunning local Gateway
and never spawns its own.
## Launchd control
The app manages a peruser LaunchAgent labeled `com.clawdbot.gateway`.
```bash
launchctl kickstart -k gui/$UID/com.clawdbot.gateway
launchctl bootout gui/$UID/com.clawdbot.gateway
```
`launchctl` only works if the LaunchAgent is installed; otherwise run `clawdbot daemon install` first.
If the LaunchAgent isnt installed, enable it from the app or run
`clawdbot daemon install`.
Details: [Gateway runbook](/gateway) and [Bundled bun Gateway](/platforms/mac/bun).
## Node capabilities (mac)
## Purpose
- Single macOS menu-bar app named **Clawdbot** that:
- Shows native notifications for Clawdbot/clawdbot events.
- Owns TCC prompts (Notifications, Accessibility, Screen Recording, Automation/AppleScript, Microphone, Speech Recognition).
- Runs (or connects to) the **Gateway** and exposes itself as a **node** so agents can reach macOSonly features.
- Hosts **PeekabooBridge** for UI automation (consumed by `peekaboo`; see [`docs/mac/peekaboo.md`](/platforms/mac/peekaboo)).
- Installs a single CLI (`clawdbot`) by symlinking the bundled binary.
The macOS app presents itself as a node. Common commands:
## High-level design
- SwiftPM package in `apps/macos/` (macOS 15+, Swift 6).
- Targets:
- `ClawdbotIPC` (shared Codable types + helpers for appinternal actions).
- `Clawdbot` (LSUIElement MenuBarExtra app; hosts Gateway + node bridge + PeekabooBridgeHost).
- Bundle ID: `com.clawdbot.mac`.
- Bundled runtime binaries live under `Contents/Resources/Relay/`:
- `clawdbot` (buncompiled relay: CLI + gateway)
- The app symlinks `clawdbot` into `/usr/local/bin` and `/opt/homebrew/bin`.
## Gateway + node bridge
- The mac app runs the Gateway in **local** mode (unless configured remote).
- The gateway port is configurable via `gateway.port` or `CLAWDBOT_GATEWAY_PORT` (default 18789). The mac app reads that value for launchd, probes, and remote SSH tunnels.
- The mac app connects to the bridge as a **node** and advertises capabilities/commands.
- Agentfacing actions are exposed via `node.invoke` (no local control socket).
- The mac app watches `~/.clawdbot/clawdbot.json` and switches modes live when `gateway.mode` or `gateway.remote.url` changes.
- If `gateway.mode` is unset but `gateway.remote.url` is set, the mac app treats it as remote mode.
- Changing connection mode in the mac app writes `gateway.mode` (and `gateway.remote.url` in remote mode) back to the config file.
### Node commands (mac)
- Canvas: `canvas.present|navigate|eval|snapshot|a2ui.*`
- Camera: `camera.snap|camera.clip`
- Canvas: `canvas.present`, `canvas.navigate`, `canvas.eval`, `canvas.snapshot`, `canvas.a2ui.*`
- Camera: `camera.snap`, `camera.clip`
- Screen: `screen.record`
- System: `system.run` (shell) and `system.notify`
- System: `system.run`, `system.notify`
### Permission advertising
- Nodes include a `permissions` map in hello/pairing.
- The Gateway surfaces it via `node.list` / `node.describe` so agents can decide what to run.
The node reports a `permissions` map so agents can decide whats allowed.
## CLI (`clawdbot`)
- The **only** CLI is `clawdbot` (TS/bun). There is no `clawdbot-mac` helper.
- For macspecific actions, the CLI uses `node.invoke`:
- `clawdbot nodes canvas present|navigate|eval|snapshot|a2ui push|a2ui reset`
- `clawdbot nodes run --node <id> -- <command...>`
- `clawdbot nodes notify --node <id> --title ...`
## Deep links
## Onboarding
- Install CLI (symlink) → Permissions checklist → Test notification → Done.
- Remote mode skips local gateway/CLI steps.
- Selecting Local auto-enables the bundled Gateway via launchd (unless “Attach only” debug mode is enabled).
## Deep links (URL scheme)
Clawdbot (the macOS app) registers a URL scheme for triggering local actions from anywhere (browser, Shortcuts, CLI, etc.).
Scheme:
- `clawdbot://…`
The app registers the `clawdbot://` URL scheme for local actions.
### `clawdbot://agent`
Triggers a Gateway `agent` request (same machinery as WebChat/agent runs).
Example:
Triggers a Gateway `agent` request.
```bash
open 'clawdbot://agent?message=Hello%20from%20deep%20link'
```
Query parameters:
- `message` (required): the agent prompt (URL-encoded).
- `sessionKey` (optional): explicit session key to use.
- `thinking` (optional): thinking hint (e.g. `low`; omit for default).
- `deliver` (optional): `true|false` (default: false).
- `to` / `provider` (optional): forwarded to the Gateway `agent` method (only meaningful with `deliver=true`).
- `timeoutSeconds` (optional): timeout hint forwarded to the Gateway.
- `key` (optional): unattended mode key (see below).
- `message` (required)
- `sessionKey` (optional)
- `thinking` (optional)
- `deliver` / `to` / `provider` (optional)
- `timeoutSeconds` (optional)
- `key` (optional unattended mode key)
Safety/guardrails:
- Always enabled.
- Without a `key` query param, the app will prompt for confirmation before invoking the agent.
- With `key=<value>`, Clawdbot runs without prompting (intended for personal automations).
- The current key is shown in Debug Settings and stored locally in UserDefaults.
Safety:
- Without `key`, the app prompts for confirmation.
- With a valid `key`, the run is unattended (intended for personal automations).
Notes:
- In local mode, Clawdbot will start the local Gateway if needed before issuing the request.
- In remote mode, Clawdbot will use the configured remote tunnel/endpoint.
## Onboarding flow (typical)
1) Install and launch **Clawdbot.app**.
2) Complete the permissions checklist (TCC prompts).
3) Ensure **Local** mode is active and the Gateway is running.
4) Install the CLI helper if you want terminal access.
## Build & dev workflow (native)
- `cd apps/macos && swift build` (debug) / `swift build -c release`.
- Run app for dev: `swift run Clawdbot` (or Xcode scheme).
- Package app + CLI: [`scripts/package-mac-app.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/package-mac-app.sh) (builds bun CLI + gateway).
- Tests: add Swift Testing suites under `apps/macos/Tests`.
## Open questions / decisions
- Should `system.run` support streaming stdout/stderr or keep buffered responses only?
- Should we allow nodeside permission prompts, or always require explicit app UI action?
- `cd apps/macos && swift build`
- `swift run Clawdbot` (or Xcode)
- Package app + CLI: `scripts/package-mac-app.sh`
## Related docs
- [Gateway runbook](/gateway)
- [Bundled bun Gateway](/platforms/mac/bun)
- [macOS permissions](/platforms/mac/permissions)
- [Canvas](/platforms/mac/canvas)