docs: refresh and simplify docs
This commit is contained in:
@@ -15,7 +15,7 @@ Goal: ship **Clawdbot.app** with a self-contained relay binary that can run both
|
||||
App bundle layout:
|
||||
|
||||
- `Clawdbot.app/Contents/Resources/Relay/clawdbot`
|
||||
- bun `--compile` relay executable built from [`dist/macos/relay.js`](https://github.com/clawdbot/clawdbot/blob/main/dist/macos/relay.js)
|
||||
- bun `--compile` relay executable built from `dist/macos/relay.js`
|
||||
- Supports:
|
||||
- `clawdbot …` (CLI)
|
||||
- `clawdbot gateway …` (LaunchAgent daemon)
|
||||
@@ -47,7 +47,7 @@ Important bundler flags:
|
||||
|
||||
Version injection:
|
||||
- `--define "__CLAWDBOT_VERSION__=\"<pkg version>\""`
|
||||
- [`src/version.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/version.ts) also supports `__CLAWDBOT_VERSION__` (and `CLAWDBOT_BUNDLED_VERSION`) so `--version` doesn’t depend on reading `package.json` at runtime.
|
||||
- The relay honors `__CLAWDBOT_VERSION__` / `CLAWDBOT_BUNDLED_VERSION` so `--version` doesn’t depend on reading `package.json` at runtime.
|
||||
|
||||
## Launchd (Gateway as LaunchAgent)
|
||||
|
||||
@@ -58,7 +58,7 @@ Plist location (per-user):
|
||||
- `~/Library/LaunchAgents/com.clawdbot.gateway.plist`
|
||||
|
||||
Manager:
|
||||
- [`apps/macos/Sources/Clawdbot/GatewayLaunchAgentManager.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/GatewayLaunchAgentManager.swift)
|
||||
- The macOS app owns LaunchAgent install/update for the bundled gateway.
|
||||
|
||||
Behavior:
|
||||
- “Clawdbot Active” enables/disables the LaunchAgent.
|
||||
@@ -79,7 +79,7 @@ Symptom (when mis-signed):
|
||||
|
||||
Fix:
|
||||
- The bun executable needs JIT-ish permissions under hardened runtime.
|
||||
- [`scripts/codesign-mac-app.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/codesign-mac-app.sh) signs `Relay/clawdbot` with:
|
||||
- `scripts/codesign-mac-app.sh` signs `Relay/clawdbot` with:
|
||||
- `com.apple.security.cs.allow-jit`
|
||||
- `com.apple.security.cs.allow-unsigned-executable-memory`
|
||||
|
||||
@@ -89,18 +89,14 @@ Problem:
|
||||
- bun can’t load some native Node addons like `sharp` (and we don’t want to ship native addon trees for the gateway).
|
||||
|
||||
Solution:
|
||||
- Central helper [`src/media/image-ops.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/media/image-ops.ts)
|
||||
- Prefers `/usr/bin/sips` on macOS (esp. when running under bun)
|
||||
- Falls back to `sharp` when available (Node/dev)
|
||||
- Used by:
|
||||
- [`src/web/media.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/web/media.ts) (optimize inbound/outbound images)
|
||||
- [`src/browser/screenshot.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/browser/screenshot.ts)
|
||||
- [`src/agents/pi-tools.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-tools.ts) (image sanitization)
|
||||
- Image operations prefer `/usr/bin/sips` on macOS (especially under bun).
|
||||
- When running in Node/dev, `sharp` is used when available.
|
||||
- This affects inbound/outbound media, screenshots, and tool image sanitization.
|
||||
|
||||
## Browser control server
|
||||
|
||||
The Gateway starts the browser control server (loopback only) from [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts).
|
||||
It’s started from the relay daemon process, so the relay binary includes Playwright deps.
|
||||
The Gateway starts the browser control server (loopback only) from the relay daemon process,
|
||||
so the relay binary includes Playwright deps.
|
||||
|
||||
## Tests / smoke checks
|
||||
|
||||
@@ -127,7 +123,7 @@ Bun may leave dotfiles like `*.bun-build` in the repo root or subfolders.
|
||||
|
||||
## DMG styling (human installer)
|
||||
|
||||
[`scripts/create-dmg.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/create-dmg.sh) styles the DMG via Finder AppleScript.
|
||||
`scripts/create-dmg.sh` styles the DMG via Finder AppleScript.
|
||||
|
||||
Rules of thumb:
|
||||
- Use a **72dpi** background image that matches the Finder window size in points.
|
||||
|
||||
@@ -5,157 +5,117 @@ read_when:
|
||||
- Adding agent controls for visual workspace
|
||||
- Debugging WKWebView canvas loads
|
||||
---
|
||||
|
||||
# Canvas (macOS app)
|
||||
|
||||
Status: draft spec · Date: 2025-12-12
|
||||
The macOS app embeds an agent‑controlled **Canvas panel** using `WKWebView`. It
|
||||
is a lightweight visual workspace for HTML/CSS/JS, A2UI, and small interactive
|
||||
UI surfaces.
|
||||
|
||||
Note: for iOS/Android nodes that should render agent-edited HTML/CSS/JS over the network, prefer the Gateway `canvasHost` (serves `~/clawd/canvas` over LAN/tailnet with live reload). A2UI is also **hosted by the Gateway** over HTTP. This doc focuses on the macOS in-app canvas panel. See [`docs/configuration.md`](/gateway/configuration).
|
||||
## Where Canvas lives
|
||||
|
||||
Clawdbot can embed an agent-controlled “visual workspace” panel (“Canvas”) inside the macOS app using `WKWebView`, served via a **custom URL scheme** (no loopback HTTP port required).
|
||||
Canvas state is stored under Application Support:
|
||||
|
||||
This is designed for:
|
||||
- Agent-written HTML/CSS/JS on disk (per-session directory).
|
||||
- A real browser engine for layout, rendering, and basic interactivity.
|
||||
- Agent-driven visibility (show/hide), navigation, DOM/JS queries, and snapshots.
|
||||
- Minimal chrome: borderless panel; bezel/chrome appears only on hover.
|
||||
- `~/Library/Application Support/Clawdbot/canvas/<session>/...`
|
||||
|
||||
## Why a custom scheme (vs. loopback HTTP)
|
||||
The Canvas panel serves those files via a **custom URL scheme**:
|
||||
|
||||
Using `WKURLSchemeHandler` keeps Canvas entirely in-process:
|
||||
- No port conflicts and no extra local server lifecycle.
|
||||
- Easier to sandbox: only serve files we explicitly map.
|
||||
- Works offline and can use an ephemeral data store (no persistent cookies/cache).
|
||||
|
||||
If a Canvas page truly needs “real web” semantics (CORS, fetch to loopback endpoints, service workers), consider the loopback-server variant instead (out of scope for this doc).
|
||||
|
||||
## URL ↔ directory mapping
|
||||
|
||||
The Canvas scheme is:
|
||||
- `clawdbot-canvas://<session>/<path>`
|
||||
|
||||
Routing model:
|
||||
- `clawdbot-canvas://main/` → `<canvasRoot>/main/index.html` (or `index.htm`)
|
||||
- `clawdbot-canvas://main/yolo` → `<canvasRoot>/main/yolo/index.html` (or `index.htm`)
|
||||
Examples:
|
||||
- `clawdbot-canvas://main/` → `<canvasRoot>/main/index.html`
|
||||
- `clawdbot-canvas://main/assets/app.css` → `<canvasRoot>/main/assets/app.css`
|
||||
- `clawdbot-canvas://main/widgets/todo/` → `<canvasRoot>/main/widgets/todo/index.html`
|
||||
|
||||
Directory listings are not served.
|
||||
If no `index.html` exists at the root, the app shows a **built‑in scaffold page**.
|
||||
|
||||
When `/` has no `index.html` yet, the handler serves a **built-in scaffold page** (bundled with the macOS app).
|
||||
This is a visual placeholder only (no A2UI renderer).
|
||||
## Panel behavior
|
||||
|
||||
### Suggested on-disk location
|
||||
- Borderless, resizable panel anchored near the menu bar (or mouse cursor).
|
||||
- Remembers size/position per session.
|
||||
- Auto‑reloads when local canvas files change.
|
||||
- Only one Canvas panel is visible at a time (session is switched as needed).
|
||||
|
||||
Store Canvas state under the app support directory:
|
||||
- `~/Library/Application Support/Clawdbot/canvas/<session>/…`
|
||||
Canvas can be disabled from Settings → **Allow Canvas**. When disabled, canvas
|
||||
node commands return `CANVAS_DISABLED`.
|
||||
|
||||
This keeps it alongside other app-owned state and avoids mixing with `~/.clawdbot/` gateway config.
|
||||
## Agent API surface
|
||||
|
||||
## Panel behavior (agent-controlled)
|
||||
Canvas is exposed via the **node bridge**, so the agent can:
|
||||
|
||||
Canvas is presented as a borderless `NSPanel` (similar to the existing WebChat panel):
|
||||
- Can be shown/hidden at any time by the agent.
|
||||
- Supports an “anchored” presentation (near the menu bar icon or another anchor rect).
|
||||
- Uses a rounded container; shadow stays on, but **chrome/bezel only appears on hover**.
|
||||
- Default position is the **top-right corner** of the current screen’s visible frame (unless the user moved/resized it previously).
|
||||
- The panel is **user-resizable** (edge resize + hover resize handle) and the last frame is persisted per session.
|
||||
- show/hide the panel
|
||||
- navigate to a path or URL
|
||||
- evaluate JavaScript
|
||||
- capture a snapshot image
|
||||
|
||||
### Hover-only chrome
|
||||
CLI examples:
|
||||
|
||||
Implementation notes:
|
||||
- Keep the window borderless at all times (don’t toggle `styleMask`).
|
||||
- Add an overlay view inside the content container for chrome (stroke + subtle gradient/material).
|
||||
- Use an `NSTrackingArea` to fade the chrome in/out on `mouseEntered/mouseExited`.
|
||||
- Optionally show close/drag affordances only while hovered.
|
||||
```bash
|
||||
clawdbot nodes canvas present --node <id>
|
||||
clawdbot nodes canvas navigate --node <id> --url "/"
|
||||
clawdbot nodes canvas eval --node <id> --js "document.title"
|
||||
clawdbot nodes canvas snapshot --node <id>
|
||||
```
|
||||
|
||||
## Agent API surface (current)
|
||||
Notes:
|
||||
- `canvas.navigate` accepts **local canvas paths**, `http(s)` URLs, and `file://` URLs.
|
||||
- If you pass `"/"`, the Canvas shows the local scaffold or `index.html`.
|
||||
|
||||
Canvas is exposed via the Gateway **node bridge**, so the agent can:
|
||||
- Show/hide the panel.
|
||||
- Navigate to a path (relative to the session root).
|
||||
- Evaluate JavaScript and optionally return results.
|
||||
- Query/modify DOM (helpers mirroring “dom query/all/attr/click/type/wait” patterns).
|
||||
- Capture a snapshot image of the current canvas view.
|
||||
- Optionally set panel placement (screen `x/y` + `width/height`) when showing/navigating.
|
||||
## A2UI in Canvas
|
||||
|
||||
This should be modeled after `WebChatManager`/`WebChatSwiftUIWindowController` but targeting `clawdbot-canvas://…` URLs.
|
||||
A2UI is hosted by the Gateway canvas host and rendered inside the Canvas panel.
|
||||
When the Gateway advertises a Canvas host, the macOS app auto‑navigates to the
|
||||
A2UI host page on first open.
|
||||
|
||||
Related:
|
||||
- For “invoke the agent again from UI” flows, prefer the macOS deep link scheme (`clawdbot://agent?...`) so *any* UI surface (Canvas, WebChat, native views) can trigger a new agent run. See [`docs/macos.md`](/platforms/macos).
|
||||
|
||||
## Agent commands (current)
|
||||
|
||||
Use the main `clawdbot` CLI; it invokes canvas commands via `node.invoke`.
|
||||
|
||||
- `clawdbot nodes canvas present --node <id> [--target <...>] [--x/--y/--width/--height]`
|
||||
- Local targets map into the session directory via the custom scheme (directory targets resolve `index.html|index.htm`).
|
||||
- If `/` has no index file, Canvas shows the built-in scaffold page and returns `status: "welcome"`.
|
||||
- `clawdbot nodes canvas hide --node <id>`
|
||||
- `clawdbot nodes canvas eval --js <code> --node <id>`
|
||||
- `clawdbot nodes canvas snapshot --node <id>`
|
||||
|
||||
### Canvas A2UI
|
||||
|
||||
Canvas A2UI is hosted by the **Gateway canvas host** at:
|
||||
Default A2UI host URL:
|
||||
|
||||
```
|
||||
http://<gateway-host>:18793/__clawdbot__/a2ui/
|
||||
```
|
||||
|
||||
The macOS app simply renders that page in the Canvas panel. The agent can drive it with JSONL **server→client protocol messages** (one JSON object per line):
|
||||
### A2UI commands (v0.8)
|
||||
|
||||
- `clawdbot nodes canvas a2ui push --jsonl <path> --node <id>`
|
||||
- `clawdbot nodes canvas a2ui reset --node <id>`
|
||||
Canvas currently accepts **A2UI v0.8** server→client messages:
|
||||
|
||||
`push` expects a JSONL file where **each line is a single JSON object** (parsed and forwarded to the in-page A2UI renderer).
|
||||
- `beginRendering`
|
||||
- `surfaceUpdate`
|
||||
- `dataModelUpdate`
|
||||
- `deleteSurface`
|
||||
|
||||
Minimal example (v0.8):
|
||||
`createSurface` (v0.9) is not supported.
|
||||
|
||||
CLI example:
|
||||
|
||||
```bash
|
||||
cat > /tmp/a2ui-v0.8.jsonl <<'EOF'
|
||||
{"surfaceUpdate":{"surfaceId":"main","components":[{"id":"root","component":{"Column":{"children":{"explicitList":["title","content"]}}}},{"id":"title","component":{"Text":{"text":{"literalString":"Canvas (A2UI v0.8)"},"usageHint":"h1"}}},{"id":"content","component":{"Text":{"text":{"literalString":"If you can read this, `nodes canvas a2ui push` works."},"usageHint":"body"}}}]}}
|
||||
cat > /tmp/a2ui-v0.8.jsonl <<'EOFA2'
|
||||
{"surfaceUpdate":{"surfaceId":"main","components":[{"id":"root","component":{"Column":{"children":{"explicitList":["title","content"]}}}},{"id":"title","component":{"Text":{"text":{"literalString":"Canvas (A2UI v0.8)"},"usageHint":"h1"}}},{"id":"content","component":{"Text":{"text":{"literalString":"If you can read this, A2UI push works."},"usageHint":"body"}}}]}}
|
||||
{"beginRendering":{"surfaceId":"main","root":"root"}}
|
||||
EOF
|
||||
EOFA2
|
||||
|
||||
clawdbot nodes canvas a2ui push --jsonl /tmp/a2ui-v0.8.jsonl --node <id>
|
||||
```
|
||||
|
||||
Notes:
|
||||
- This does **not** support the A2UI v0.9 examples using `createSurface`.
|
||||
- A2UI **fails** if the Gateway canvas host is unreachable (no local fallback).
|
||||
- `nodes canvas a2ui push` validates JSONL (line numbers on errors) and rejects v0.9 payloads.
|
||||
- Quick smoke: `clawdbot nodes canvas a2ui push --node <id> --text "Hello from A2UI"` renders a minimal v0.8 view.
|
||||
Quick smoke:
|
||||
|
||||
## Triggering agent runs from Canvas (deep links)
|
||||
```bash
|
||||
clawdbot nodes canvas a2ui push --node <id> --text "Hello from A2UI"
|
||||
```
|
||||
|
||||
## Triggering agent runs from Canvas
|
||||
|
||||
Canvas can trigger new agent runs via deep links:
|
||||
|
||||
Canvas can trigger new agent runs via the macOS app deep-link scheme:
|
||||
- `clawdbot://agent?...`
|
||||
|
||||
This is intentionally separate from `clawdbot-canvas://…` (which is only for serving local Canvas files into the `WKWebView`).
|
||||
Example (in JS):
|
||||
|
||||
Suggested patterns:
|
||||
- HTML: render links/buttons that navigate to `clawdbot://agent?message=...`.
|
||||
- JS: set `window.location.href = 'clawdbot://agent?...'` for “run this now” actions.
|
||||
```js
|
||||
window.location.href = "clawdbot://agent?message=Review%20this%20design";
|
||||
```
|
||||
|
||||
Implementation note (important):
|
||||
- In `WKWebView`, intercept `clawdbot://…` navigations in `WKNavigationDelegate` and forward them to the app, e.g. by calling `DeepLinkHandler.shared.handle(url:)` and returning `.cancel` for the navigation.
|
||||
The app prompts for confirmation unless a valid key is provided.
|
||||
|
||||
Safety:
|
||||
- Deep links (`clawdbot://agent?...`) are always enabled.
|
||||
- Without a `key` query param, the app will prompt for confirmation before invoking the agent.
|
||||
- With a valid `key`, the run is unattended (no prompt). For Canvas-originated actions, the app injects an internal key automatically.
|
||||
## Security notes
|
||||
|
||||
## Security / guardrails
|
||||
|
||||
Recommended defaults:
|
||||
- `WKWebsiteDataStore.nonPersistent()` for Canvas (ephemeral).
|
||||
- Navigation policy: allow only `clawdbot-canvas://…` (and optionally `about:blank`); open `http/https` externally.
|
||||
- Scheme handler must prevent directory traversal: resolved file paths must stay under `<canvasRoot>/<session>/`.
|
||||
- Disable or tightly scope any JS bridge; prefer query-string/bootstrap config over `window.webkit.messageHandlers` for sensitive data.
|
||||
|
||||
## Debugging
|
||||
|
||||
Suggested debugging hooks:
|
||||
- Enable Web Inspector for Canvas builds (same approach as WebChat).
|
||||
- Log scheme requests + resolution decisions to OSLog (subsystem `com.clawdbot`, category `Canvas`).
|
||||
- Provide a “copy canvas dir” action in debug settings to quickly reveal the session directory in Finder.
|
||||
- Canvas scheme blocks directory traversal; files must live under the session root.
|
||||
- Local Canvas content uses a custom scheme (no loopback server required).
|
||||
- External `http(s)` URLs are allowed only when explicitly navigated.
|
||||
|
||||
@@ -1,72 +1,56 @@
|
||||
---
|
||||
summary: "Running the gateway as a child process of the macOS app and why"
|
||||
summary: "Gateway lifecycle on macOS (launchd + attach-only)"
|
||||
read_when:
|
||||
- Integrating the mac app with the gateway lifecycle
|
||||
---
|
||||
# Clawdbot gateway as a child process of the macOS app
|
||||
# Gateway lifecycle on macOS
|
||||
|
||||
Date: 2025-12-06 · Status: draft · Owner: steipete
|
||||
The macOS app **manages the Gateway via launchd** by default. This gives you
|
||||
reliable auto‑start at login and restart on crashes.
|
||||
|
||||
Note (2025-12-19): the current implementation prefers a **launchd LaunchAgent** that runs the **bundled bun-compiled gateway**. This doc remains as an alternative mode for tighter coupling to the UI.
|
||||
Child‑process mode (Gateway spawned directly by the app) is **not in use** today.
|
||||
If you need tighter coupling to the UI, use **Attach‑only** and run the Gateway
|
||||
manually in a terminal.
|
||||
|
||||
## Goal
|
||||
Run the Node-based Clawdbot/clawdbot gateway as a direct child of the LSUIElement app (instead of a launchd agent) while keeping all TCC-sensitive work inside the Swift app/broker layer and wiring the existing “Clawdbot Active” toggle to start/stop the child.
|
||||
## Default behavior (launchd)
|
||||
|
||||
## When to prefer the child-process mode
|
||||
- You want gateway lifetime strictly coupled to the menu-bar app (dies when the app quits) and controlled by the “Clawdbot Active” toggle without touching launchd.
|
||||
- You’re okay giving up login persistence/auto-restart that launchd provides, or you’ll add your own backoff loop.
|
||||
- You want simpler log capture and supervision inside the app (no external plist or user-visible LaunchAgent).
|
||||
- The app installs a per‑user LaunchAgent labeled `com.clawdbot.gateway`.
|
||||
- When Local mode is enabled, the app ensures the LaunchAgent is loaded and
|
||||
starts the Gateway if needed.
|
||||
- Logs are written to the launchd gateway log path (visible in Debug Settings).
|
||||
|
||||
## Tradeoffs vs. launchd
|
||||
- **Pros:** tighter coupling to UI state; simpler surface (no plist install/bootout); easier to stream stdout/stderr; fewer moving parts for beta users.
|
||||
- **Cons:** no built-in KeepAlive/login auto-start; app crash kills gateway; you must build your own restart/backoff; Activity Monitor will show both processes under the app; still need correct TCC handling (see below).
|
||||
- **TCC:** behaviorally, child processes often inherit the parent app’s “responsible process” for TCC, but this is *not a contract*. Continue to route all protected actions through the Swift app/broker so prompts stay tied to the signed app bundle.
|
||||
Common commands:
|
||||
|
||||
## TCC guardrails (must keep)
|
||||
- Screen Recording, Accessibility, mic, and speech prompts must originate from the signed Swift app/broker. The Node child should never call these APIs directly; route through the app’s node commands (via Gateway `node.invoke`) for:
|
||||
- `system.notify`
|
||||
- `system.run` (including `needsScreenRecording`)
|
||||
- `screen.record` / `camera.*`
|
||||
- PeekabooBridge UI automation (`peekaboo …`)
|
||||
- Usage strings (`NSMicrophoneUsageDescription`, `NSSpeechRecognitionUsageDescription`, etc.) stay in the app target’s Info.plist; a bare Node binary has none and would fail.
|
||||
- If you ever embed Node that *must* touch TCC, wrap that call in a tiny signed helper target inside the app bundle and have Node exec that helper instead of calling the API directly.
|
||||
```bash
|
||||
launchctl kickstart -k gui/$UID/com.clawdbot.gateway
|
||||
launchctl bootout gui/$UID/com.clawdbot.gateway
|
||||
```
|
||||
|
||||
## Process manager design (Swift Subprocess)
|
||||
- Add a small `GatewayProcessManager` (Swift) that owns:
|
||||
- `execution: Execution?` from `Swift Subprocess` to track the child.
|
||||
- `start(config)` called when “Clawdbot Active” flips ON:
|
||||
- binary: host Node running the bundled gateway under `Clawdbot.app/Contents/Resources/Gateway/`
|
||||
- args: current clawdbot entrypoint and flags
|
||||
- cwd/env: point to `~/.clawdbot` as today; inject the expanded PATH so Homebrew Node resolves under launchd
|
||||
- output: stream stdout/stderr to `/tmp/clawdbot-gateway.log` (cap buffer via Subprocess OutputLimits)
|
||||
- restart: optional linear/backoff restart if exit was non-zero and Active is still true
|
||||
- `stop()` called when Active flips OFF or app terminates: cancel the execution and `waitUntilExit`.
|
||||
- Wire SwiftUI toggle:
|
||||
- ON: `GatewayProcessManager.start(...)`
|
||||
- OFF: `GatewayProcessManager.stop()` (no launchctl calls in this mode)
|
||||
- Keep the existing `LaunchdManager` around so we can switch back if needed; the toggle can choose between launchd or child mode with a flag if we want both.
|
||||
## Attach‑only (developer mode)
|
||||
|
||||
## Packaging and signing
|
||||
- Bundle the gateway payload (dist + production node_modules) under `Contents/Resources/Gateway/`; rely on host Node ≥22 instead of embedding a runtime.
|
||||
- Codesign native addons and dylibs inside the bundle; no nested runtime binary to sign now.
|
||||
- Host runtime should not call TCC APIs directly; keep privileged work inside the app/broker.
|
||||
Attach‑only tells the app to **connect to an existing Gateway** without spawning
|
||||
one. This is ideal for local dev (hot‑reload, custom flags).
|
||||
|
||||
## Logging and observability
|
||||
- Stream child stdout/stderr to `/tmp/clawdbot-gateway.log`; surface the last N lines in the Debug tab.
|
||||
- Emit a user notification (via existing NotificationManager) on crash/exit while Active is true.
|
||||
- Add a lightweight heartbeat from Node → app (e.g., ping over stdout) so the app can show status in the menu.
|
||||
Steps:
|
||||
|
||||
## Failure/edge cases
|
||||
- App crash/quit kills the gateway. Decide if that is acceptable for the deployment tier; otherwise, stick with launchd for production and keep child-process for dev/experiments.
|
||||
- If the gateway exits repeatedly, back off (e.g., 1s/2s/5s/10s) and give up after N attempts with a menu warning.
|
||||
- Respect the existing pause semantics: when paused, the broker should return `ok=false, "clawdbot paused"`; the gateway should avoid calling privileged routes while paused.
|
||||
1) Start the Gateway yourself:
|
||||
```bash
|
||||
pnpm gateway:watch
|
||||
```
|
||||
2) In the macOS app: Debug Settings → Gateway → **Attach only**.
|
||||
|
||||
## Open questions / follow-ups
|
||||
- Do we need dual-mode (launchd for prod, child for dev)? If yes, gate via a setting or build flag.
|
||||
- Embedding a runtime is off the table for now; we rely on host Node for size/simplicity. Revisit only if host PATH drift becomes painful.
|
||||
- Do we want a tiny signed helper for rare TCC actions that cannot be brokered via the Swift app/broker?
|
||||
The UI should show “Using existing gateway …” once connected.
|
||||
|
||||
## Decision snapshot (current recommendation)
|
||||
- Keep all TCC surfaces in the Swift app/broker (node commands + PeekabooBridgeHost).
|
||||
- Implement `GatewayProcessManager` with Swift Subprocess to start/stop the gateway on the “Clawdbot Active” toggle.
|
||||
- Maintain the launchd path as a fallback for uptime/login persistence until child-mode proves stable.
|
||||
## Remote mode
|
||||
|
||||
Remote mode never starts a local Gateway. The app uses an SSH tunnel to the
|
||||
remote host and connects over that tunnel.
|
||||
|
||||
## Why we prefer launchd
|
||||
|
||||
- Auto‑start at login.
|
||||
- Built‑in restart/KeepAlive semantics.
|
||||
- Predictable logs and supervision.
|
||||
|
||||
If a true child‑process mode is ever needed again, it should be documented as a
|
||||
separate, explicit dev‑only mode.
|
||||
|
||||
@@ -1,170 +1,62 @@
|
||||
---
|
||||
summary: "Plan for integrating Peekaboo automation into Clawdbot via PeekabooBridge (socket-based TCC broker)"
|
||||
summary: "PeekabooBridge integration for macOS UI automation"
|
||||
read_when:
|
||||
- Hosting PeekabooBridge in Clawdbot.app
|
||||
- Integrating Peekaboo as a submodule
|
||||
- Changing PeekabooBridge protocol/paths
|
||||
---
|
||||
# Peekaboo Bridge in Clawdbot (macOS UI automation broker)
|
||||
# Peekaboo Bridge (macOS UI automation)
|
||||
|
||||
## TL;DR
|
||||
- **Peekaboo removed its XPC helper** and now exposes privileged automation via a **UNIX domain socket bridge** (`PeekabooBridge` / `PeekabooBridgeHost`, socket name `bridge.sock`).
|
||||
- Clawdbot integrates by **optionally hosting the same bridge** inside **Clawdbot.app** (user-toggleable). The primary client is the **`peekaboo` CLI** (installed via npm); Clawdbot does not need its own `ui …` CLI surface.
|
||||
- For **visualizations**, we keep them in **Peekaboo.app** (best UX); Clawdbot stays a thin broker host. No visualizer toggle in Clawdbot.
|
||||
Clawdbot can host **PeekabooBridge** as a local, permission‑aware UI automation
|
||||
broker. This lets the `peekaboo` CLI drive UI automation while reusing the
|
||||
macOS app’s TCC permissions.
|
||||
|
||||
Non-goals:
|
||||
- No auto-launching Peekaboo.app.
|
||||
- No onboarding deep links from the automation endpoint (Clawdbot onboarding already handles permissions).
|
||||
- No AI provider/agent runtime dependencies in Clawdbot (avoid pulling Tachikoma/MCP into the Clawdbot app/CLI).
|
||||
## What this is (and isn’t)
|
||||
|
||||
## Big refactor (Dec 2025): XPC → Bridge
|
||||
Peekaboo’s privileged execution moved from “CLI → XPC helper” to “CLI → socket bridge host”. For Clawdbot this is a win:
|
||||
- It matches the existing “local socket + codesign checks” approach.
|
||||
- It lets us piggyback on **either** Peekaboo.app’s permissions **or** Clawdbot.app’s permissions (whichever is running).
|
||||
- It avoids “two apps with two TCC bubbles” unless needed.
|
||||
- **Host**: Clawdbot.app can act as a PeekabooBridge host.
|
||||
- **Client**: use the `peekaboo` CLI (no separate `clawdbot ui ...` surface).
|
||||
- **UI**: visual overlays stay in Peekaboo.app; Clawdbot is a thin broker host.
|
||||
|
||||
Reference (Peekaboo submodule): `Peekaboo/docs/bridge-host.md`.
|
||||
## Enable the bridge
|
||||
|
||||
## Architecture
|
||||
### Processes
|
||||
- **Bridge hosts** (provide TCC-backed automation):
|
||||
- **Peekaboo.app** (preferred; also provides visualizations + controls)
|
||||
- **Claude.app** (secondary; lets `peekaboo` reuse Claude Desktop’s granted permissions)
|
||||
- **Clawdbot.app** (secondary; “thin host” only)
|
||||
- **Bridge clients** (trigger single actions):
|
||||
- `peekaboo …` (preferred; humans + agents)
|
||||
- Optional: Clawdbot/Node shells out to `peekaboo` when it needs UI automation/capture
|
||||
In the macOS app:
|
||||
- Settings → **Enable Peekaboo Bridge**
|
||||
|
||||
### Host discovery (client-side)
|
||||
Order is deliberate:
|
||||
1. Peekaboo.app host (full UX)
|
||||
2. Claude.app host (piggyback on Claude Desktop permissions)
|
||||
3. Clawdbot.app host (piggyback on Clawdbot permissions)
|
||||
When enabled, Clawdbot starts a local UNIX socket server. If disabled, the host
|
||||
is stopped and `peekaboo` will fall back to other available hosts.
|
||||
|
||||
Socket paths (convention; exact paths must match Peekaboo):
|
||||
- Peekaboo: `~/Library/Application Support/Peekaboo/bridge.sock`
|
||||
- Claude: `~/Library/Application Support/Claude/bridge.sock`
|
||||
- Clawdbot: `~/Library/Application Support/clawdbot/bridge.sock`
|
||||
## Client discovery order
|
||||
|
||||
No auto-launch: if a host isn’t reachable, the command fails with a clear error (start Peekaboo.app, Claude.app, or Clawdbot.app).
|
||||
Peekaboo clients typically try hosts in this order:
|
||||
|
||||
Override (debugging): set `PEEKABOO_BRIDGE_SOCKET=/path/to/bridge.sock`.
|
||||
1. Peekaboo.app (full UX)
|
||||
2. Claude.app (if installed)
|
||||
3. Clawdbot.app (thin broker)
|
||||
|
||||
### Protocol shape
|
||||
- **Single request per connection**: connect → write one JSON request → half-close → read one JSON response → close.
|
||||
- **Timeout**: 10 seconds end-to-end per action (client enforced; host should also enforce per-operation).
|
||||
- **Errors**: human-readable string by default; structured envelope in `--json`.
|
||||
Use `peekaboo bridge status --verbose` to see which host is active and which
|
||||
socket path is in use. You can override with:
|
||||
|
||||
## Dependency strategy (submodule)
|
||||
Integrate Peekaboo via git submodule (nested submodules are OK).
|
||||
```bash
|
||||
export PEEKABOO_BRIDGE_SOCKET=/path/to/bridge.sock
|
||||
```
|
||||
|
||||
Path in Clawdbot repo:
|
||||
- `./Peekaboo` (Swabble-style; keep stable so SwiftPM path deps don’t churn).
|
||||
## Security & permissions
|
||||
|
||||
What Clawdbot should use:
|
||||
- **Client side**: `PeekabooBridge` (socket client + protocol models).
|
||||
- **Host side (Clawdbot.app)**: `PeekabooBridgeHost` + the minimal Peekaboo services needed to implement operations.
|
||||
- The bridge validates **caller code signatures**; TeamID `Y5PE65HELJ` is
|
||||
allowed by default (Peekaboo’s signing team), plus the Clawdbot app’s TeamID.
|
||||
- Requests time out after ~10 seconds.
|
||||
- If required permissions are missing, the bridge returns a clear error message
|
||||
rather than launching System Settings.
|
||||
|
||||
What Clawdbot should *not* embed:
|
||||
- **Visualizer UI**: keep it in Peekaboo.app for now (toggle + controls live there).
|
||||
- **XPC**: don’t reintroduce helper targets; use the bridge.
|
||||
## Snapshot behavior (automation)
|
||||
|
||||
## IPC / CLI surface
|
||||
### No `clawdbot ui …`
|
||||
We avoid a parallel “Clawdbot UI automation CLI”. Instead:
|
||||
- `peekaboo` is the user/agent-facing CLI surface for automation and capture.
|
||||
- Clawdbot.app can host PeekabooBridge as a **thin TCC broker** so Peekaboo can piggyback on Clawdbot permissions when Peekaboo.app isn’t running.
|
||||
Snapshots are stored in memory and expire automatically after a short window.
|
||||
If you need longer retention, re‑capture from the client.
|
||||
|
||||
### Diagnostics
|
||||
Use Peekaboo’s built-in diagnostics to see which host would be used:
|
||||
- `peekaboo bridge status`
|
||||
- `peekaboo bridge status --verbose`
|
||||
- `peekaboo bridge status --json`
|
||||
## Troubleshooting
|
||||
|
||||
### Output format
|
||||
Peekaboo commands default to human text output. Add `--json` for a structured envelope.
|
||||
|
||||
### Timeouts
|
||||
Default timeout for UI actions: **10 seconds** end-to-end (client enforced; host should also enforce per-operation).
|
||||
|
||||
## Coordinate model (multi-display)
|
||||
Requirement: coordinates are **per screen**, not global.
|
||||
|
||||
Standardize for the CLI (agent-friendly): **top-left origin per screen**.
|
||||
|
||||
Proposed request shape:
|
||||
- Requests accept `screenIndex` + `{x, y}` in that screen’s local coordinate space.
|
||||
- Clawdbot.app converts to global CG coordinates using `NSScreen.screens[screenIndex].frame.origin`.
|
||||
- Responses should echo both:
|
||||
- The resolved `screenIndex`
|
||||
- The local `{x, y}` and bounds
|
||||
- Optionally the global `{x, y}` for debugging
|
||||
|
||||
Ordering: use `NSScreen.screens` ordering consistently (documented in the CLI help + JSON schema).
|
||||
|
||||
## Targeting (per app/window)
|
||||
Expose window/app targeting in the UI surface (align with Peekaboo targeting):
|
||||
- frontmost
|
||||
- by app name / bundle id
|
||||
- by window title substring
|
||||
- by (app, index)
|
||||
|
||||
Peekaboo CLI targeting (agent-friendly):
|
||||
- `--bundle-id <id>` for app targeting
|
||||
- `--window-index <n>` (0-based) for disambiguating within an app when capturing
|
||||
|
||||
All “see/click/type/scroll/wait” requests should accept a target (default: frontmost).
|
||||
|
||||
## “See” + click packs (Playwright-style)
|
||||
Behavior stays aligned with Peekaboo:
|
||||
- `peekaboo see` returns element IDs (e.g. `B1`, `T3`) with bounds/labels.
|
||||
- Follow-up actions reference those IDs without re-scanning.
|
||||
|
||||
`peekaboo see` should:
|
||||
- capture (optionally targeted) window/screen
|
||||
- return a screenshot **file path** (default: temp directory)
|
||||
- return a list of elements (text or JSON)
|
||||
|
||||
Snapshot lifecycle requirement:
|
||||
- Host apps are long-lived, so snapshot state should be **in-memory by default**.
|
||||
- Snapshot scoping: “implicit snapshot” is **per target bundle id** (reuse last snapshot for that app when snapshot id is omitted).
|
||||
|
||||
Practical flow (agent-friendly):
|
||||
- `peekaboo list apps` / `peekaboo list windows` provide bundle-id context for targeting.
|
||||
- `peekaboo see --bundle-id X` updates the implicit snapshot for `X`.
|
||||
- `peekaboo click --bundle-id X --on B1` reuses the most recent snapshot for `X` when `--snapshot-id` is omitted.
|
||||
|
||||
## Visualizer integration
|
||||
Keep visualizations in **Peekaboo.app** for now.
|
||||
- Clawdbot hosts the bridge, but does not render overlays.
|
||||
- Any “visualizer enabled/disabled” setting is controlled in Peekaboo.app.
|
||||
|
||||
## Screenshots (legacy → Peekaboo takeover)
|
||||
Clawdbot should not grow a separate screenshot CLI surface.
|
||||
|
||||
Migration plan:
|
||||
- Use `peekaboo capture …` / `peekaboo see …` (returns a file path, default temp directory).
|
||||
- Once Clawdbot’ legacy screenshot plumbing is replaced, remove it cleanly (no aliases).
|
||||
|
||||
## Permissions behavior
|
||||
If required permissions are missing:
|
||||
- return `ok=false` with a short human error message (e.g., “Accessibility permission missing”)
|
||||
- do not try to open System Settings from the automation endpoint
|
||||
|
||||
## Security (socket auth)
|
||||
Both hosts must enforce:
|
||||
- filesystem perms on the socket path (owner read/write only)
|
||||
- server-side caller validation:
|
||||
- require the caller’s code signature TeamID to be `Y5PE65HELJ`
|
||||
- optional bundle-id allowlist for tighter scoping
|
||||
|
||||
Debug-only escape hatch (development convenience):
|
||||
- “allow same-UID callers” means: *skip codesign checks for clients running under the same Unix user*.
|
||||
- This must be **opt-in**, **DEBUG-only**, and guarded by an env var (Peekaboo uses `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1`).
|
||||
|
||||
## Next integration steps (after this doc)
|
||||
1. Add Peekaboo as a git submodule (nested submodules OK).
|
||||
2. Host `PeekabooBridgeHost` inside Clawdbot.app behind a single setting (“Enable Peekaboo Bridge”, default on).
|
||||
3. Ensure Clawdbot hosts the bridge at `~/Library/Application Support/clawdbot/bridge.sock` and speaks the PeekabooBridge JSON protocol.
|
||||
4. Validate with `peekaboo bridge status --verbose` that Peekaboo can select Clawdbot as the fallback host (no auto-launch).
|
||||
5. Keep all protocol decisions aligned with Peekaboo (coordinate system, element IDs, snapshot scoping, error envelopes).
|
||||
- If `peekaboo` reports “bridge client is not authorized”, ensure the client is
|
||||
properly signed or run the host with `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1`
|
||||
in **debug** mode only.
|
||||
- If no hosts are found, open one of the host apps (Peekaboo.app or Clawdbot.app)
|
||||
and confirm permissions are granted.
|
||||
|
||||
@@ -3,25 +3,37 @@ summary: "How the mac app embeds the gateway WebChat and how to debug it"
|
||||
read_when:
|
||||
- Debugging mac WebChat view or loopback port
|
||||
---
|
||||
# Web Chat (macOS app)
|
||||
# WebChat (macOS app)
|
||||
|
||||
The macOS menu bar app shows the WebChat UI as a native SwiftUI view and reuses the **primary Clawd session** (`main`, or `global` when scope is global).
|
||||
The macOS menu bar app embeds the WebChat UI as a native SwiftUI view. It
|
||||
connects to the Gateway and defaults to the **main session** for the selected
|
||||
agent (with a session switcher for other sessions).
|
||||
|
||||
- **Local mode**: connects directly to the local Gateway WebSocket.
|
||||
- **Remote mode**: forwards the Gateway WebSocket control port over SSH and uses that as the data plane.
|
||||
- **Remote mode**: forwards the Gateway control port over SSH and uses that
|
||||
tunnel as the data plane.
|
||||
|
||||
## Launch & debugging
|
||||
|
||||
- Manual: Lobster menu → “Open Chat”.
|
||||
- Auto-open for testing: run `dist/Clawdbot.app/Contents/MacOS/Clawdbot --webchat` (or pass `--webchat` to the binary launched by launchd). The window opens on startup.
|
||||
- Logs: see [`./scripts/clawlog.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/clawlog.sh) (subsystem `com.clawdbot`, category `WebChatSwiftUI`).
|
||||
- Auto‑open for testing:
|
||||
```bash
|
||||
dist/Clawdbot.app/Contents/MacOS/Clawdbot --webchat
|
||||
```
|
||||
- Logs: `./scripts/clawlog.sh` (subsystem `com.clawdbot`, category `WebChatSwiftUI`).
|
||||
|
||||
## How it’s wired
|
||||
- Implementation: [`apps/macos/Sources/Clawdbot/WebChatSwiftUI.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/WebChatSwiftUI.swift) hosts `ClawdbotChatUI` and speaks to the Gateway over `GatewayConnection`.
|
||||
- Data plane: Gateway WebSocket methods `chat.history`, `chat.send`, `chat.abort`; events `chat`, `agent`, `presence`, `tick`, `health`.
|
||||
- Session: usually primary (`main`); multiple transports (WhatsApp/Telegram/Discord/Desktop) share the same key. The onboarding flow uses a dedicated `onboarding` session to keep first-run setup separate.
|
||||
|
||||
## Security / surface area
|
||||
- Data plane: Gateway WS methods `chat.history`, `chat.send`, `chat.abort` and
|
||||
events `chat`, `agent`, `presence`, `tick`, `health`.
|
||||
- Session: defaults to the primary session (`main`, or `global` when scope is
|
||||
global). The UI can switch between sessions.
|
||||
- Onboarding uses a dedicated session to keep first‑run setup separate.
|
||||
|
||||
## Security surface
|
||||
|
||||
- Remote mode forwards only the Gateway WebSocket control port over SSH.
|
||||
|
||||
## Known limitations
|
||||
- The UI is optimized for the primary session and typical “chat” usage (not a full browser-based sandbox surface).
|
||||
|
||||
- The UI is optimized for chat sessions (not a full browser sandbox).
|
||||
|
||||
@@ -3,7 +3,7 @@ summary: "macOS IPC architecture for Clawdbot app, gateway node bridge, and Peek
|
||||
read_when:
|
||||
- Editing IPC contracts or menu bar app IPC
|
||||
---
|
||||
# Clawdbot macOS IPC architecture (Dec 2025)
|
||||
# Clawdbot macOS IPC architecture
|
||||
|
||||
**Current model:** there is **no local control socket** and no `clawdbot-mac` CLI. All agent actions go through the Gateway WebSocket and `node.invoke`. UI automation still uses PeekabooBridge.
|
||||
|
||||
@@ -21,10 +21,10 @@ read_when:
|
||||
- UI automation uses a separate UNIX socket named `bridge.sock` and the PeekabooBridge JSON protocol.
|
||||
- Host preference order (client-side): Peekaboo.app → Claude.app → Clawdbot.app → local execution.
|
||||
- Security: bridge hosts require TeamID `Y5PE65HELJ`; DEBUG-only same-UID escape hatch is guarded by `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` (Peekaboo convention).
|
||||
- See: [`docs/mac/peekaboo.md`](/platforms/mac/peekaboo) for the Clawdbot plan and naming.
|
||||
- See: [`docs/mac/peekaboo.md`](/platforms/mac/peekaboo) for PeekabooBridge usage.
|
||||
|
||||
### Mach/XPC (future direction)
|
||||
- Still optional for internal app services, but **not required** for automation now that node.invoke is the surface.
|
||||
### Mach/XPC
|
||||
- Not required for automation; `node.invoke` + PeekabooBridge cover current needs.
|
||||
|
||||
## Operational flows
|
||||
- Restart/rebuild: `SIGN_IDENTITY="Apple Development: Peter Steinberger (2ZAC4GM7GD)" scripts/restart-mac.sh`
|
||||
@@ -37,4 +37,4 @@ read_when:
|
||||
- Prefer requiring a TeamID match for all privileged surfaces.
|
||||
- PeekabooBridge: `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` (DEBUG-only) may allow same-UID callers for local development.
|
||||
- All communication remains local-only; no network sockets are exposed.
|
||||
- TCC prompts originate only from the GUI app bundle; run [`scripts/package-mac-app.sh`](https://github.com/clawdbot/clawdbot/blob/main/scripts/package-mac-app.sh) so the signed bundle ID stays stable.
|
||||
- TCC prompts originate only from the GUI app bundle; keep the signed bundle ID stable across rebuilds.
|
||||
|
||||
Reference in New Issue
Block a user