docs: refresh and simplify docs

This commit is contained in:
Peter Steinberger
2026-01-08 23:06:56 +01:00
parent 88dca1afdf
commit a6c309824e
46 changed files with 1117 additions and 2155 deletions

View File

@@ -15,7 +15,7 @@ Goal: ship **Clawdbot.app** with a self-contained relay binary that can run both
App bundle layout:
- `Clawdbot.app/Contents/Resources/Relay/clawdbot`
- bun `--compile` relay executable built from [`dist/macos/relay.js`](https://github.com/clawdbot/clawdbot/blob/main/dist/macos/relay.js)
- bun `--compile` relay executable built from `dist/macos/relay.js`
- Supports:
- `clawdbot …` (CLI)
- `clawdbot gateway …` (LaunchAgent daemon)
@@ -47,7 +47,7 @@ Important bundler flags:
Version injection:
- `--define "__CLAWDBOT_VERSION__=\"<pkg version>\""`
- [`src/version.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/version.ts) also supports `__CLAWDBOT_VERSION__` (and `CLAWDBOT_BUNDLED_VERSION`) so `--version` doesnt depend on reading `package.json` at runtime.
- The relay honors `__CLAWDBOT_VERSION__` / `CLAWDBOT_BUNDLED_VERSION` so `--version` doesnt depend on reading `package.json` at runtime.
## Launchd (Gateway as LaunchAgent)
@@ -58,7 +58,7 @@ Plist location (per-user):
- `~/Library/LaunchAgents/com.clawdbot.gateway.plist`
Manager:
- [`apps/macos/Sources/Clawdbot/GatewayLaunchAgentManager.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/GatewayLaunchAgentManager.swift)
- The macOS app owns LaunchAgent install/update for the bundled gateway.
Behavior:
- “Clawdbot Active” enables/disables the LaunchAgent.
@@ -79,7 +79,7 @@ Symptom (when mis-signed):
Fix:
- The bun executable needs JIT-ish permissions under hardened runtime.
- [`scripts/codesign-mac-app.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/codesign-mac-app.sh) signs `Relay/clawdbot` with:
- `scripts/codesign-mac-app.sh` signs `Relay/clawdbot` with:
- `com.apple.security.cs.allow-jit`
- `com.apple.security.cs.allow-unsigned-executable-memory`
@@ -89,18 +89,14 @@ Problem:
- bun cant load some native Node addons like `sharp` (and we dont want to ship native addon trees for the gateway).
Solution:
- Central helper [`src/media/image-ops.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/media/image-ops.ts)
- Prefers `/usr/bin/sips` on macOS (esp. when running under bun)
- Falls back to `sharp` when available (Node/dev)
- Used by:
- [`src/web/media.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/web/media.ts) (optimize inbound/outbound images)
- [`src/browser/screenshot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/browser/screenshot.ts)
- [`src/agents/pi-tools.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-tools.ts) (image sanitization)
- Image operations prefer `/usr/bin/sips` on macOS (especially under bun).
- When running in Node/dev, `sharp` is used when available.
- This affects inbound/outbound media, screenshots, and tool image sanitization.
## Browser control server
The Gateway starts the browser control server (loopback only) from [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts).
Its started from the relay daemon process, so the relay binary includes Playwright deps.
The Gateway starts the browser control server (loopback only) from the relay daemon process,
so the relay binary includes Playwright deps.
## Tests / smoke checks
@@ -127,7 +123,7 @@ Bun may leave dotfiles like `*.bun-build` in the repo root or subfolders.
## DMG styling (human installer)
[`scripts/create-dmg.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/create-dmg.sh) styles the DMG via Finder AppleScript.
`scripts/create-dmg.sh` styles the DMG via Finder AppleScript.
Rules of thumb:
- Use a **72dpi** background image that matches the Finder window size in points.

View File

@@ -5,157 +5,117 @@ read_when:
- Adding agent controls for visual workspace
- Debugging WKWebView canvas loads
---
# Canvas (macOS app)
Status: draft spec · Date: 2025-12-12
The macOS app embeds an agentcontrolled **Canvas panel** using `WKWebView`. It
is a lightweight visual workspace for HTML/CSS/JS, A2UI, and small interactive
UI surfaces.
Note: for iOS/Android nodes that should render agent-edited HTML/CSS/JS over the network, prefer the Gateway `canvasHost` (serves `~/clawd/canvas` over LAN/tailnet with live reload). A2UI is also **hosted by the Gateway** over HTTP. This doc focuses on the macOS in-app canvas panel. See [`docs/configuration.md`](/gateway/configuration).
## Where Canvas lives
Clawdbot can embed an agent-controlled “visual workspace” panel (“Canvas”) inside the macOS app using `WKWebView`, served via a **custom URL scheme** (no loopback HTTP port required).
Canvas state is stored under Application Support:
This is designed for:
- Agent-written HTML/CSS/JS on disk (per-session directory).
- A real browser engine for layout, rendering, and basic interactivity.
- Agent-driven visibility (show/hide), navigation, DOM/JS queries, and snapshots.
- Minimal chrome: borderless panel; bezel/chrome appears only on hover.
- `~/Library/Application Support/Clawdbot/canvas/<session>/...`
## Why a custom scheme (vs. loopback HTTP)
The Canvas panel serves those files via a **custom URL scheme**:
Using `WKURLSchemeHandler` keeps Canvas entirely in-process:
- No port conflicts and no extra local server lifecycle.
- Easier to sandbox: only serve files we explicitly map.
- Works offline and can use an ephemeral data store (no persistent cookies/cache).
If a Canvas page truly needs “real web” semantics (CORS, fetch to loopback endpoints, service workers), consider the loopback-server variant instead (out of scope for this doc).
## URL ↔ directory mapping
The Canvas scheme is:
- `clawdbot-canvas://<session>/<path>`
Routing model:
- `clawdbot-canvas://main/``<canvasRoot>/main/index.html` (or `index.htm`)
- `clawdbot-canvas://main/yolo``<canvasRoot>/main/yolo/index.html` (or `index.htm`)
Examples:
- `clawdbot-canvas://main/``<canvasRoot>/main/index.html`
- `clawdbot-canvas://main/assets/app.css``<canvasRoot>/main/assets/app.css`
- `clawdbot-canvas://main/widgets/todo/``<canvasRoot>/main/widgets/todo/index.html`
Directory listings are not served.
If no `index.html` exists at the root, the app shows a **builtin scaffold page**.
When `/` has no `index.html` yet, the handler serves a **built-in scaffold page** (bundled with the macOS app).
This is a visual placeholder only (no A2UI renderer).
## Panel behavior
### Suggested on-disk location
- Borderless, resizable panel anchored near the menu bar (or mouse cursor).
- Remembers size/position per session.
- Autoreloads when local canvas files change.
- Only one Canvas panel is visible at a time (session is switched as needed).
Store Canvas state under the app support directory:
- `~/Library/Application Support/Clawdbot/canvas/<session>/…`
Canvas can be disabled from Settings → **Allow Canvas**. When disabled, canvas
node commands return `CANVAS_DISABLED`.
This keeps it alongside other app-owned state and avoids mixing with `~/.clawdbot/` gateway config.
## Agent API surface
## Panel behavior (agent-controlled)
Canvas is exposed via the **node bridge**, so the agent can:
Canvas is presented as a borderless `NSPanel` (similar to the existing WebChat panel):
- Can be shown/hidden at any time by the agent.
- Supports an “anchored” presentation (near the menu bar icon or another anchor rect).
- Uses a rounded container; shadow stays on, but **chrome/bezel only appears on hover**.
- Default position is the **top-right corner** of the current screens visible frame (unless the user moved/resized it previously).
- The panel is **user-resizable** (edge resize + hover resize handle) and the last frame is persisted per session.
- show/hide the panel
- navigate to a path or URL
- evaluate JavaScript
- capture a snapshot image
### Hover-only chrome
CLI examples:
Implementation notes:
- Keep the window borderless at all times (dont toggle `styleMask`).
- Add an overlay view inside the content container for chrome (stroke + subtle gradient/material).
- Use an `NSTrackingArea` to fade the chrome in/out on `mouseEntered/mouseExited`.
- Optionally show close/drag affordances only while hovered.
```bash
clawdbot nodes canvas present --node <id>
clawdbot nodes canvas navigate --node <id> --url "/"
clawdbot nodes canvas eval --node <id> --js "document.title"
clawdbot nodes canvas snapshot --node <id>
```
## Agent API surface (current)
Notes:
- `canvas.navigate` accepts **local canvas paths**, `http(s)` URLs, and `file://` URLs.
- If you pass `"/"`, the Canvas shows the local scaffold or `index.html`.
Canvas is exposed via the Gateway **node bridge**, so the agent can:
- Show/hide the panel.
- Navigate to a path (relative to the session root).
- Evaluate JavaScript and optionally return results.
- Query/modify DOM (helpers mirroring “dom query/all/attr/click/type/wait” patterns).
- Capture a snapshot image of the current canvas view.
- Optionally set panel placement (screen `x/y` + `width/height`) when showing/navigating.
## A2UI in Canvas
This should be modeled after `WebChatManager`/`WebChatSwiftUIWindowController` but targeting `clawdbot-canvas://…` URLs.
A2UI is hosted by the Gateway canvas host and rendered inside the Canvas panel.
When the Gateway advertises a Canvas host, the macOS app autonavigates to the
A2UI host page on first open.
Related:
- For “invoke the agent again from UI” flows, prefer the macOS deep link scheme (`clawdbot://agent?...`) so *any* UI surface (Canvas, WebChat, native views) can trigger a new agent run. See [`docs/macos.md`](/platforms/macos).
## Agent commands (current)
Use the main `clawdbot` CLI; it invokes canvas commands via `node.invoke`.
- `clawdbot nodes canvas present --node <id> [--target <...>] [--x/--y/--width/--height]`
- Local targets map into the session directory via the custom scheme (directory targets resolve `index.html|index.htm`).
- If `/` has no index file, Canvas shows the built-in scaffold page and returns `status: "welcome"`.
- `clawdbot nodes canvas hide --node <id>`
- `clawdbot nodes canvas eval --js <code> --node <id>`
- `clawdbot nodes canvas snapshot --node <id>`
### Canvas A2UI
Canvas A2UI is hosted by the **Gateway canvas host** at:
Default A2UI host URL:
```
http://<gateway-host>:18793/__clawdbot__/a2ui/
```
The macOS app simply renders that page in the Canvas panel. The agent can drive it with JSONL **server→client protocol messages** (one JSON object per line):
### A2UI commands (v0.8)
- `clawdbot nodes canvas a2ui push --jsonl <path> --node <id>`
- `clawdbot nodes canvas a2ui reset --node <id>`
Canvas currently accepts **A2UI v0.8** server→client messages:
`push` expects a JSONL file where **each line is a single JSON object** (parsed and forwarded to the in-page A2UI renderer).
- `beginRendering`
- `surfaceUpdate`
- `dataModelUpdate`
- `deleteSurface`
Minimal example (v0.8):
`createSurface` (v0.9) is not supported.
CLI example:
```bash
cat > /tmp/a2ui-v0.8.jsonl <<'EOF'
{"surfaceUpdate":{"surfaceId":"main","components":[{"id":"root","component":{"Column":{"children":{"explicitList":["title","content"]}}}},{"id":"title","component":{"Text":{"text":{"literalString":"Canvas (A2UI v0.8)"},"usageHint":"h1"}}},{"id":"content","component":{"Text":{"text":{"literalString":"If you can read this, `nodes canvas a2ui push` works."},"usageHint":"body"}}}]}}
cat > /tmp/a2ui-v0.8.jsonl <<'EOFA2'
{"surfaceUpdate":{"surfaceId":"main","components":[{"id":"root","component":{"Column":{"children":{"explicitList":["title","content"]}}}},{"id":"title","component":{"Text":{"text":{"literalString":"Canvas (A2UI v0.8)"},"usageHint":"h1"}}},{"id":"content","component":{"Text":{"text":{"literalString":"If you can read this, A2UI push works."},"usageHint":"body"}}}]}}
{"beginRendering":{"surfaceId":"main","root":"root"}}
EOF
EOFA2
clawdbot nodes canvas a2ui push --jsonl /tmp/a2ui-v0.8.jsonl --node <id>
```
Notes:
- This does **not** support the A2UI v0.9 examples using `createSurface`.
- A2UI **fails** if the Gateway canvas host is unreachable (no local fallback).
- `nodes canvas a2ui push` validates JSONL (line numbers on errors) and rejects v0.9 payloads.
- Quick smoke: `clawdbot nodes canvas a2ui push --node <id> --text "Hello from A2UI"` renders a minimal v0.8 view.
Quick smoke:
## Triggering agent runs from Canvas (deep links)
```bash
clawdbot nodes canvas a2ui push --node <id> --text "Hello from A2UI"
```
## Triggering agent runs from Canvas
Canvas can trigger new agent runs via deep links:
Canvas can trigger new agent runs via the macOS app deep-link scheme:
- `clawdbot://agent?...`
This is intentionally separate from `clawdbot-canvas://…` (which is only for serving local Canvas files into the `WKWebView`).
Example (in JS):
Suggested patterns:
- HTML: render links/buttons that navigate to `clawdbot://agent?message=...`.
- JS: set `window.location.href = 'clawdbot://agent?...'` for “run this now” actions.
```js
window.location.href = "clawdbot://agent?message=Review%20this%20design";
```
Implementation note (important):
- In `WKWebView`, intercept `clawdbot://…` navigations in `WKNavigationDelegate` and forward them to the app, e.g. by calling `DeepLinkHandler.shared.handle(url:)` and returning `.cancel` for the navigation.
The app prompts for confirmation unless a valid key is provided.
Safety:
- Deep links (`clawdbot://agent?...`) are always enabled.
- Without a `key` query param, the app will prompt for confirmation before invoking the agent.
- With a valid `key`, the run is unattended (no prompt). For Canvas-originated actions, the app injects an internal key automatically.
## Security notes
## Security / guardrails
Recommended defaults:
- `WKWebsiteDataStore.nonPersistent()` for Canvas (ephemeral).
- Navigation policy: allow only `clawdbot-canvas://…` (and optionally `about:blank`); open `http/https` externally.
- Scheme handler must prevent directory traversal: resolved file paths must stay under `<canvasRoot>/<session>/`.
- Disable or tightly scope any JS bridge; prefer query-string/bootstrap config over `window.webkit.messageHandlers` for sensitive data.
## Debugging
Suggested debugging hooks:
- Enable Web Inspector for Canvas builds (same approach as WebChat).
- Log scheme requests + resolution decisions to OSLog (subsystem `com.clawdbot`, category `Canvas`).
- Provide a “copy canvas dir” action in debug settings to quickly reveal the session directory in Finder.
- Canvas scheme blocks directory traversal; files must live under the session root.
- Local Canvas content uses a custom scheme (no loopback server required).
- External `http(s)` URLs are allowed only when explicitly navigated.

View File

@@ -1,72 +1,56 @@
---
summary: "Running the gateway as a child process of the macOS app and why"
summary: "Gateway lifecycle on macOS (launchd + attach-only)"
read_when:
- Integrating the mac app with the gateway lifecycle
---
# Clawdbot gateway as a child process of the macOS app
# Gateway lifecycle on macOS
Date: 2025-12-06 · Status: draft · Owner: steipete
The macOS app **manages the Gateway via launchd** by default. This gives you
reliable autostart at login and restart on crashes.
Note (2025-12-19): the current implementation prefers a **launchd LaunchAgent** that runs the **bundled bun-compiled gateway**. This doc remains as an alternative mode for tighter coupling to the UI.
Childprocess mode (Gateway spawned directly by the app) is **not in use** today.
If you need tighter coupling to the UI, use **Attachonly** and run the Gateway
manually in a terminal.
## Goal
Run the Node-based Clawdbot/clawdbot gateway as a direct child of the LSUIElement app (instead of a launchd agent) while keeping all TCC-sensitive work inside the Swift app/broker layer and wiring the existing “Clawdbot Active” toggle to start/stop the child.
## Default behavior (launchd)
## When to prefer the child-process mode
- You want gateway lifetime strictly coupled to the menu-bar app (dies when the app quits) and controlled by the “Clawdbot Active” toggle without touching launchd.
- Youre okay giving up login persistence/auto-restart that launchd provides, or youll add your own backoff loop.
- You want simpler log capture and supervision inside the app (no external plist or user-visible LaunchAgent).
- The app installs a peruser LaunchAgent labeled `com.clawdbot.gateway`.
- When Local mode is enabled, the app ensures the LaunchAgent is loaded and
starts the Gateway if needed.
- Logs are written to the launchd gateway log path (visible in Debug Settings).
## Tradeoffs vs. launchd
- **Pros:** tighter coupling to UI state; simpler surface (no plist install/bootout); easier to stream stdout/stderr; fewer moving parts for beta users.
- **Cons:** no built-in KeepAlive/login auto-start; app crash kills gateway; you must build your own restart/backoff; Activity Monitor will show both processes under the app; still need correct TCC handling (see below).
- **TCC:** behaviorally, child processes often inherit the parent apps “responsible process” for TCC, but this is *not a contract*. Continue to route all protected actions through the Swift app/broker so prompts stay tied to the signed app bundle.
Common commands:
## TCC guardrails (must keep)
- Screen Recording, Accessibility, mic, and speech prompts must originate from the signed Swift app/broker. The Node child should never call these APIs directly; route through the apps node commands (via Gateway `node.invoke`) for:
- `system.notify`
- `system.run` (including `needsScreenRecording`)
- `screen.record` / `camera.*`
- PeekabooBridge UI automation (`peekaboo …`)
- Usage strings (`NSMicrophoneUsageDescription`, `NSSpeechRecognitionUsageDescription`, etc.) stay in the app targets Info.plist; a bare Node binary has none and would fail.
- If you ever embed Node that *must* touch TCC, wrap that call in a tiny signed helper target inside the app bundle and have Node exec that helper instead of calling the API directly.
```bash
launchctl kickstart -k gui/$UID/com.clawdbot.gateway
launchctl bootout gui/$UID/com.clawdbot.gateway
```
## Process manager design (Swift Subprocess)
- Add a small `GatewayProcessManager` (Swift) that owns:
- `execution: Execution?` from `Swift Subprocess` to track the child.
- `start(config)` called when “Clawdbot Active” flips ON:
- binary: host Node running the bundled gateway under `Clawdbot.app/Contents/Resources/Gateway/`
- args: current clawdbot entrypoint and flags
- cwd/env: point to `~/.clawdbot` as today; inject the expanded PATH so Homebrew Node resolves under launchd
- output: stream stdout/stderr to `/tmp/clawdbot-gateway.log` (cap buffer via Subprocess OutputLimits)
- restart: optional linear/backoff restart if exit was non-zero and Active is still true
- `stop()` called when Active flips OFF or app terminates: cancel the execution and `waitUntilExit`.
- Wire SwiftUI toggle:
- ON: `GatewayProcessManager.start(...)`
- OFF: `GatewayProcessManager.stop()` (no launchctl calls in this mode)
- Keep the existing `LaunchdManager` around so we can switch back if needed; the toggle can choose between launchd or child mode with a flag if we want both.
## Attachonly (developer mode)
## Packaging and signing
- Bundle the gateway payload (dist + production node_modules) under `Contents/Resources/Gateway/`; rely on host Node ≥22 instead of embedding a runtime.
- Codesign native addons and dylibs inside the bundle; no nested runtime binary to sign now.
- Host runtime should not call TCC APIs directly; keep privileged work inside the app/broker.
Attachonly tells the app to **connect to an existing Gateway** without spawning
one. This is ideal for local dev (hotreload, custom flags).
## Logging and observability
- Stream child stdout/stderr to `/tmp/clawdbot-gateway.log`; surface the last N lines in the Debug tab.
- Emit a user notification (via existing NotificationManager) on crash/exit while Active is true.
- Add a lightweight heartbeat from Node → app (e.g., ping over stdout) so the app can show status in the menu.
Steps:
## Failure/edge cases
- App crash/quit kills the gateway. Decide if that is acceptable for the deployment tier; otherwise, stick with launchd for production and keep child-process for dev/experiments.
- If the gateway exits repeatedly, back off (e.g., 1s/2s/5s/10s) and give up after N attempts with a menu warning.
- Respect the existing pause semantics: when paused, the broker should return `ok=false, "clawdbot paused"`; the gateway should avoid calling privileged routes while paused.
1) Start the Gateway yourself:
```bash
pnpm gateway:watch
```
2) In the macOS app: Debug Settings → Gateway → **Attach only**.
## Open questions / follow-ups
- Do we need dual-mode (launchd for prod, child for dev)? If yes, gate via a setting or build flag.
- Embedding a runtime is off the table for now; we rely on host Node for size/simplicity. Revisit only if host PATH drift becomes painful.
- Do we want a tiny signed helper for rare TCC actions that cannot be brokered via the Swift app/broker?
The UI should show “Using existing gateway …” once connected.
## Decision snapshot (current recommendation)
- Keep all TCC surfaces in the Swift app/broker (node commands + PeekabooBridgeHost).
- Implement `GatewayProcessManager` with Swift Subprocess to start/stop the gateway on the “Clawdbot Active” toggle.
- Maintain the launchd path as a fallback for uptime/login persistence until child-mode proves stable.
## Remote mode
Remote mode never starts a local Gateway. The app uses an SSH tunnel to the
remote host and connects over that tunnel.
## Why we prefer launchd
- Autostart at login.
- Builtin restart/KeepAlive semantics.
- Predictable logs and supervision.
If a true childprocess mode is ever needed again, it should be documented as a
separate, explicit devonly mode.

View File

@@ -1,170 +1,62 @@
---
summary: "Plan for integrating Peekaboo automation into Clawdbot via PeekabooBridge (socket-based TCC broker)"
summary: "PeekabooBridge integration for macOS UI automation"
read_when:
- Hosting PeekabooBridge in Clawdbot.app
- Integrating Peekaboo as a submodule
- Changing PeekabooBridge protocol/paths
---
# Peekaboo Bridge in Clawdbot (macOS UI automation broker)
# Peekaboo Bridge (macOS UI automation)
## TL;DR
- **Peekaboo removed its XPC helper** and now exposes privileged automation via a **UNIX domain socket bridge** (`PeekabooBridge` / `PeekabooBridgeHost`, socket name `bridge.sock`).
- Clawdbot integrates by **optionally hosting the same bridge** inside **Clawdbot.app** (user-toggleable). The primary client is the **`peekaboo` CLI** (installed via npm); Clawdbot does not need its own `ui …` CLI surface.
- For **visualizations**, we keep them in **Peekaboo.app** (best UX); Clawdbot stays a thin broker host. No visualizer toggle in Clawdbot.
Clawdbot can host **PeekabooBridge** as a local, permissionaware UI automation
broker. This lets the `peekaboo` CLI drive UI automation while reusing the
macOS apps TCC permissions.
Non-goals:
- No auto-launching Peekaboo.app.
- No onboarding deep links from the automation endpoint (Clawdbot onboarding already handles permissions).
- No AI provider/agent runtime dependencies in Clawdbot (avoid pulling Tachikoma/MCP into the Clawdbot app/CLI).
## What this is (and isnt)
## Big refactor (Dec 2025): XPC → Bridge
Peekaboos privileged execution moved from “CLI → XPC helper” to “CLI → socket bridge host”. For Clawdbot this is a win:
- It matches the existing “local socket + codesign checks” approach.
- It lets us piggyback on **either** Peekaboo.apps permissions **or** Clawdbot.apps permissions (whichever is running).
- It avoids “two apps with two TCC bubbles” unless needed.
- **Host**: Clawdbot.app can act as a PeekabooBridge host.
- **Client**: use the `peekaboo` CLI (no separate `clawdbot ui ...` surface).
- **UI**: visual overlays stay in Peekaboo.app; Clawdbot is a thin broker host.
Reference (Peekaboo submodule): `Peekaboo/docs/bridge-host.md`.
## Enable the bridge
## Architecture
### Processes
- **Bridge hosts** (provide TCC-backed automation):
- **Peekaboo.app** (preferred; also provides visualizations + controls)
- **Claude.app** (secondary; lets `peekaboo` reuse Claude Desktops granted permissions)
- **Clawdbot.app** (secondary; “thin host” only)
- **Bridge clients** (trigger single actions):
- `peekaboo …` (preferred; humans + agents)
- Optional: Clawdbot/Node shells out to `peekaboo` when it needs UI automation/capture
In the macOS app:
- Settings → **Enable Peekaboo Bridge**
### Host discovery (client-side)
Order is deliberate:
1. Peekaboo.app host (full UX)
2. Claude.app host (piggyback on Claude Desktop permissions)
3. Clawdbot.app host (piggyback on Clawdbot permissions)
When enabled, Clawdbot starts a local UNIX socket server. If disabled, the host
is stopped and `peekaboo` will fall back to other available hosts.
Socket paths (convention; exact paths must match Peekaboo):
- Peekaboo: `~/Library/Application Support/Peekaboo/bridge.sock`
- Claude: `~/Library/Application Support/Claude/bridge.sock`
- Clawdbot: `~/Library/Application Support/clawdbot/bridge.sock`
## Client discovery order
No auto-launch: if a host isnt reachable, the command fails with a clear error (start Peekaboo.app, Claude.app, or Clawdbot.app).
Peekaboo clients typically try hosts in this order:
Override (debugging): set `PEEKABOO_BRIDGE_SOCKET=/path/to/bridge.sock`.
1. Peekaboo.app (full UX)
2. Claude.app (if installed)
3. Clawdbot.app (thin broker)
### Protocol shape
- **Single request per connection**: connect → write one JSON request → half-close → read one JSON response → close.
- **Timeout**: 10 seconds end-to-end per action (client enforced; host should also enforce per-operation).
- **Errors**: human-readable string by default; structured envelope in `--json`.
Use `peekaboo bridge status --verbose` to see which host is active and which
socket path is in use. You can override with:
## Dependency strategy (submodule)
Integrate Peekaboo via git submodule (nested submodules are OK).
```bash
export PEEKABOO_BRIDGE_SOCKET=/path/to/bridge.sock
```
Path in Clawdbot repo:
- `./Peekaboo` (Swabble-style; keep stable so SwiftPM path deps dont churn).
## Security & permissions
What Clawdbot should use:
- **Client side**: `PeekabooBridge` (socket client + protocol models).
- **Host side (Clawdbot.app)**: `PeekabooBridgeHost` + the minimal Peekaboo services needed to implement operations.
- The bridge validates **caller code signatures**; TeamID `Y5PE65HELJ` is
allowed by default (Peekaboos signing team), plus the Clawdbot apps TeamID.
- Requests time out after ~10 seconds.
- If required permissions are missing, the bridge returns a clear error message
rather than launching System Settings.
What Clawdbot should *not* embed:
- **Visualizer UI**: keep it in Peekaboo.app for now (toggle + controls live there).
- **XPC**: dont reintroduce helper targets; use the bridge.
## Snapshot behavior (automation)
## IPC / CLI surface
### No `clawdbot ui …`
We avoid a parallel “Clawdbot UI automation CLI”. Instead:
- `peekaboo` is the user/agent-facing CLI surface for automation and capture.
- Clawdbot.app can host PeekabooBridge as a **thin TCC broker** so Peekaboo can piggyback on Clawdbot permissions when Peekaboo.app isnt running.
Snapshots are stored in memory and expire automatically after a short window.
If you need longer retention, recapture from the client.
### Diagnostics
Use Peekaboos built-in diagnostics to see which host would be used:
- `peekaboo bridge status`
- `peekaboo bridge status --verbose`
- `peekaboo bridge status --json`
## Troubleshooting
### Output format
Peekaboo commands default to human text output. Add `--json` for a structured envelope.
### Timeouts
Default timeout for UI actions: **10 seconds** end-to-end (client enforced; host should also enforce per-operation).
## Coordinate model (multi-display)
Requirement: coordinates are **per screen**, not global.
Standardize for the CLI (agent-friendly): **top-left origin per screen**.
Proposed request shape:
- Requests accept `screenIndex` + `{x, y}` in that screens local coordinate space.
- Clawdbot.app converts to global CG coordinates using `NSScreen.screens[screenIndex].frame.origin`.
- Responses should echo both:
- The resolved `screenIndex`
- The local `{x, y}` and bounds
- Optionally the global `{x, y}` for debugging
Ordering: use `NSScreen.screens` ordering consistently (documented in the CLI help + JSON schema).
## Targeting (per app/window)
Expose window/app targeting in the UI surface (align with Peekaboo targeting):
- frontmost
- by app name / bundle id
- by window title substring
- by (app, index)
Peekaboo CLI targeting (agent-friendly):
- `--bundle-id <id>` for app targeting
- `--window-index <n>` (0-based) for disambiguating within an app when capturing
All “see/click/type/scroll/wait” requests should accept a target (default: frontmost).
## “See” + click packs (Playwright-style)
Behavior stays aligned with Peekaboo:
- `peekaboo see` returns element IDs (e.g. `B1`, `T3`) with bounds/labels.
- Follow-up actions reference those IDs without re-scanning.
`peekaboo see` should:
- capture (optionally targeted) window/screen
- return a screenshot **file path** (default: temp directory)
- return a list of elements (text or JSON)
Snapshot lifecycle requirement:
- Host apps are long-lived, so snapshot state should be **in-memory by default**.
- Snapshot scoping: “implicit snapshot” is **per target bundle id** (reuse last snapshot for that app when snapshot id is omitted).
Practical flow (agent-friendly):
- `peekaboo list apps` / `peekaboo list windows` provide bundle-id context for targeting.
- `peekaboo see --bundle-id X` updates the implicit snapshot for `X`.
- `peekaboo click --bundle-id X --on B1` reuses the most recent snapshot for `X` when `--snapshot-id` is omitted.
## Visualizer integration
Keep visualizations in **Peekaboo.app** for now.
- Clawdbot hosts the bridge, but does not render overlays.
- Any “visualizer enabled/disabled” setting is controlled in Peekaboo.app.
## Screenshots (legacy → Peekaboo takeover)
Clawdbot should not grow a separate screenshot CLI surface.
Migration plan:
- Use `peekaboo capture …` / `peekaboo see …` (returns a file path, default temp directory).
- Once Clawdbot legacy screenshot plumbing is replaced, remove it cleanly (no aliases).
## Permissions behavior
If required permissions are missing:
- return `ok=false` with a short human error message (e.g., “Accessibility permission missing”)
- do not try to open System Settings from the automation endpoint
## Security (socket auth)
Both hosts must enforce:
- filesystem perms on the socket path (owner read/write only)
- server-side caller validation:
- require the callers code signature TeamID to be `Y5PE65HELJ`
- optional bundle-id allowlist for tighter scoping
Debug-only escape hatch (development convenience):
- “allow same-UID callers” means: *skip codesign checks for clients running under the same Unix user*.
- This must be **opt-in**, **DEBUG-only**, and guarded by an env var (Peekaboo uses `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1`).
## Next integration steps (after this doc)
1. Add Peekaboo as a git submodule (nested submodules OK).
2. Host `PeekabooBridgeHost` inside Clawdbot.app behind a single setting (“Enable Peekaboo Bridge”, default on).
3. Ensure Clawdbot hosts the bridge at `~/Library/Application Support/clawdbot/bridge.sock` and speaks the PeekabooBridge JSON protocol.
4. Validate with `peekaboo bridge status --verbose` that Peekaboo can select Clawdbot as the fallback host (no auto-launch).
5. Keep all protocol decisions aligned with Peekaboo (coordinate system, element IDs, snapshot scoping, error envelopes).
- If `peekaboo` reports “bridge client is not authorized”, ensure the client is
properly signed or run the host with `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1`
in **debug** mode only.
- If no hosts are found, open one of the host apps (Peekaboo.app or Clawdbot.app)
and confirm permissions are granted.

View File

@@ -3,25 +3,37 @@ summary: "How the mac app embeds the gateway WebChat and how to debug it"
read_when:
- Debugging mac WebChat view or loopback port
---
# Web Chat (macOS app)
# WebChat (macOS app)
The macOS menu bar app shows the WebChat UI as a native SwiftUI view and reuses the **primary Clawd session** (`main`, or `global` when scope is global).
The macOS menu bar app embeds the WebChat UI as a native SwiftUI view. It
connects to the Gateway and defaults to the **main session** for the selected
agent (with a session switcher for other sessions).
- **Local mode**: connects directly to the local Gateway WebSocket.
- **Remote mode**: forwards the Gateway WebSocket control port over SSH and uses that as the data plane.
- **Remote mode**: forwards the Gateway control port over SSH and uses that
tunnel as the data plane.
## Launch & debugging
- Manual: Lobster menu → “Open Chat”.
- Auto-open for testing: run `dist/Clawdbot.app/Contents/MacOS/Clawdbot --webchat` (or pass `--webchat` to the binary launched by launchd). The window opens on startup.
- Logs: see [`./scripts/clawlog.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/clawlog.sh) (subsystem `com.clawdbot`, category `WebChatSwiftUI`).
- Autoopen for testing:
```bash
dist/Clawdbot.app/Contents/MacOS/Clawdbot --webchat
```
- Logs: `./scripts/clawlog.sh` (subsystem `com.clawdbot`, category `WebChatSwiftUI`).
## How its wired
- Implementation: [`apps/macos/Sources/Clawdbot/WebChatSwiftUI.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/WebChatSwiftUI.swift) hosts `ClawdbotChatUI` and speaks to the Gateway over `GatewayConnection`.
- Data plane: Gateway WebSocket methods `chat.history`, `chat.send`, `chat.abort`; events `chat`, `agent`, `presence`, `tick`, `health`.
- Session: usually primary (`main`); multiple transports (WhatsApp/Telegram/Discord/Desktop) share the same key. The onboarding flow uses a dedicated `onboarding` session to keep first-run setup separate.
## Security / surface area
- Data plane: Gateway WS methods `chat.history`, `chat.send`, `chat.abort` and
events `chat`, `agent`, `presence`, `tick`, `health`.
- Session: defaults to the primary session (`main`, or `global` when scope is
global). The UI can switch between sessions.
- Onboarding uses a dedicated session to keep firstrun setup separate.
## Security surface
- Remote mode forwards only the Gateway WebSocket control port over SSH.
## Known limitations
- The UI is optimized for the primary session and typical “chat” usage (not a full browser-based sandbox surface).
- The UI is optimized for chat sessions (not a full browser sandbox).

View File

@@ -3,7 +3,7 @@ summary: "macOS IPC architecture for Clawdbot app, gateway node bridge, and Peek
read_when:
- Editing IPC contracts or menu bar app IPC
---
# Clawdbot macOS IPC architecture (Dec 2025)
# Clawdbot macOS IPC architecture
**Current model:** there is **no local control socket** and no `clawdbot-mac` CLI. All agent actions go through the Gateway WebSocket and `node.invoke`. UI automation still uses PeekabooBridge.
@@ -21,10 +21,10 @@ read_when:
- UI automation uses a separate UNIX socket named `bridge.sock` and the PeekabooBridge JSON protocol.
- Host preference order (client-side): Peekaboo.app → Claude.app → Clawdbot.app → local execution.
- Security: bridge hosts require TeamID `Y5PE65HELJ`; DEBUG-only same-UID escape hatch is guarded by `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` (Peekaboo convention).
- See: [`docs/mac/peekaboo.md`](/platforms/mac/peekaboo) for the Clawdbot plan and naming.
- See: [`docs/mac/peekaboo.md`](/platforms/mac/peekaboo) for PeekabooBridge usage.
### Mach/XPC (future direction)
- Still optional for internal app services, but **not required** for automation now that node.invoke is the surface.
### Mach/XPC
- Not required for automation; `node.invoke` + PeekabooBridge cover current needs.
## Operational flows
- Restart/rebuild: `SIGN_IDENTITY="Apple Development: Peter Steinberger (2ZAC4GM7GD)" scripts/restart-mac.sh`
@@ -37,4 +37,4 @@ read_when:
- Prefer requiring a TeamID match for all privileged surfaces.
- PeekabooBridge: `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` (DEBUG-only) may allow same-UID callers for local development.
- All communication remains local-only; no network sockets are exposed.
- TCC prompts originate only from the GUI app bundle; run [`scripts/package-mac-app.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/package-mac-app.sh) so the signed bundle ID stays stable.
- TCC prompts originate only from the GUI app bundle; keep the signed bundle ID stable across rebuilds.