Files
clawdbot/docs/mac/peekaboo.md
2026-01-05 21:30:19 +01:00

171 lines
8.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
summary: "Plan for integrating Peekaboo automation into Clawdbot via PeekabooBridge (socket-based TCC broker)"
read_when:
- Hosting PeekabooBridge in Clawdbot.app
- Integrating Peekaboo as a submodule
- Changing PeekabooBridge protocol/paths
---
# Peekaboo Bridge in Clawdbot (macOS UI automation broker)
## TL;DR
- **Peekaboo removed its XPC helper** and now exposes privileged automation via a **UNIX domain socket bridge** (`PeekabooBridge` / `PeekabooBridgeHost`, socket name `bridge.sock`).
- Clawdbot integrates by **optionally hosting the same bridge** inside **Clawdbot.app** (user-toggleable). The primary client is the **`peekaboo` CLI** (installed via npm); Clawdbot does not need its own `ui …` CLI surface.
- For **visualizations**, we keep them in **Peekaboo.app** (best UX); Clawdbot stays a thin broker host. No visualizer toggle in Clawdbot.
Non-goals:
- No auto-launching Peekaboo.app.
- No onboarding deep links from the automation endpoint (Clawdbot onboarding already handles permissions).
- No AI provider/agent runtime dependencies in Clawdbot (avoid pulling Tachikoma/MCP into the Clawdbot app/CLI).
## Big refactor (Dec 2025): XPC → Bridge
Peekaboos privileged execution moved from “CLI → XPC helper” to “CLI → socket bridge host”. For Clawdbot this is a win:
- It matches the existing “local socket + codesign checks” approach.
- It lets us piggyback on **either** Peekaboo.apps permissions **or** Clawdbot.apps permissions (whichever is running).
- It avoids “two apps with two TCC bubbles” unless needed.
Reference (Peekaboo submodule): `Peekaboo/docs/bridge-host.md`.
## Architecture
### Processes
- **Bridge hosts** (provide TCC-backed automation):
- **Peekaboo.app** (preferred; also provides visualizations + controls)
- **Claude.app** (secondary; lets `peekaboo` reuse Claude Desktops granted permissions)
- **Clawdbot.app** (secondary; “thin host” only)
- **Bridge clients** (trigger single actions):
- `peekaboo …` (preferred; humans + agents)
- Optional: Clawdbot/Node shells out to `peekaboo` when it needs UI automation/capture
### Host discovery (client-side)
Order is deliberate:
1. Peekaboo.app host (full UX)
2. Claude.app host (piggyback on Claude Desktop permissions)
3. Clawdbot.app host (piggyback on Clawdbot permissions)
Socket paths (convention; exact paths must match Peekaboo):
- Peekaboo: `~/Library/Application Support/Peekaboo/bridge.sock`
- Claude: `~/Library/Application Support/Claude/bridge.sock`
- Clawdbot: `~/Library/Application Support/clawdbot/bridge.sock`
No auto-launch: if a host isnt reachable, the command fails with a clear error (start Peekaboo.app, Claude.app, or Clawdbot.app).
Override (debugging): set `PEEKABOO_BRIDGE_SOCKET=/path/to/bridge.sock`.
### Protocol shape
- **Single request per connection**: connect → write one JSON request → half-close → read one JSON response → close.
- **Timeout**: 10 seconds end-to-end per action (client enforced; host should also enforce per-operation).
- **Errors**: human-readable string by default; structured envelope in `--json`.
## Dependency strategy (submodule)
Integrate Peekaboo via git submodule (nested submodules are OK).
Path in Clawdbot repo:
- `./Peekaboo` (Swabble-style; keep stable so SwiftPM path deps dont churn).
What Clawdbot should use:
- **Client side**: `PeekabooBridge` (socket client + protocol models).
- **Host side (Clawdbot.app)**: `PeekabooBridgeHost` + the minimal Peekaboo services needed to implement operations.
What Clawdbot should *not* embed:
- **Visualizer UI**: keep it in Peekaboo.app for now (toggle + controls live there).
- **XPC**: dont reintroduce helper targets; use the bridge.
## IPC / CLI surface
### No `clawdbot ui …`
We avoid a parallel “Clawdbot UI automation CLI”. Instead:
- `peekaboo` is the user/agent-facing CLI surface for automation and capture.
- Clawdbot.app can host PeekabooBridge as a **thin TCC broker** so Peekaboo can piggyback on Clawdbot permissions when Peekaboo.app isnt running.
### Diagnostics
Use Peekaboos built-in diagnostics to see which host would be used:
- `peekaboo bridge status`
- `peekaboo bridge status --verbose`
- `peekaboo bridge status --json`
### Output format
Peekaboo commands default to human text output. Add `--json` for a structured envelope.
### Timeouts
Default timeout for UI actions: **10 seconds** end-to-end (client enforced; host should also enforce per-operation).
## Coordinate model (multi-display)
Requirement: coordinates are **per screen**, not global.
Standardize for the CLI (agent-friendly): **top-left origin per screen**.
Proposed request shape:
- Requests accept `screenIndex` + `{x, y}` in that screens local coordinate space.
- Clawdbot.app converts to global CG coordinates using `NSScreen.screens[screenIndex].frame.origin`.
- Responses should echo both:
- The resolved `screenIndex`
- The local `{x, y}` and bounds
- Optionally the global `{x, y}` for debugging
Ordering: use `NSScreen.screens` ordering consistently (documented in the CLI help + JSON schema).
## Targeting (per app/window)
Expose window/app targeting in the UI surface (align with Peekaboo targeting):
- frontmost
- by app name / bundle id
- by window title substring
- by (app, index)
Peekaboo CLI targeting (agent-friendly):
- `--bundle-id <id>` for app targeting
- `--window-index <n>` (0-based) for disambiguating within an app when capturing
All “see/click/type/scroll/wait” requests should accept a target (default: frontmost).
## “See” + click packs (Playwright-style)
Behavior stays aligned with Peekaboo:
- `peekaboo see` returns element IDs (e.g. `B1`, `T3`) with bounds/labels.
- Follow-up actions reference those IDs without re-scanning.
`peekaboo see` should:
- capture (optionally targeted) window/screen
- return a screenshot **file path** (default: temp directory)
- return a list of elements (text or JSON)
Snapshot lifecycle requirement:
- Host apps are long-lived, so snapshot state should be **in-memory by default**.
- Snapshot scoping: “implicit snapshot” is **per target bundle id** (reuse last snapshot for that app when snapshot id is omitted).
Practical flow (agent-friendly):
- `peekaboo list apps` / `peekaboo list windows` provide bundle-id context for targeting.
- `peekaboo see --bundle-id X` updates the implicit snapshot for `X`.
- `peekaboo click --bundle-id X --on B1` reuses the most recent snapshot for `X` when `--snapshot-id` is omitted.
## Visualizer integration
Keep visualizations in **Peekaboo.app** for now.
- Clawdbot hosts the bridge, but does not render overlays.
- Any “visualizer enabled/disabled” setting is controlled in Peekaboo.app.
## Screenshots (legacy → Peekaboo takeover)
Clawdbot should not grow a separate screenshot CLI surface.
Migration plan:
- Use `peekaboo capture …` / `peekaboo see …` (returns a file path, default temp directory).
- Once Clawdbot legacy screenshot plumbing is replaced, remove it cleanly (no aliases).
## Permissions behavior
If required permissions are missing:
- return `ok=false` with a short human error message (e.g., “Accessibility permission missing”)
- do not try to open System Settings from the automation endpoint
## Security (socket auth)
Both hosts must enforce:
- filesystem perms on the socket path (owner read/write only)
- server-side caller validation:
- require the callers code signature TeamID to be `Y5PE65HELJ`
- optional bundle-id allowlist for tighter scoping
Debug-only escape hatch (development convenience):
- “allow same-UID callers” means: *skip codesign checks for clients running under the same Unix user*.
- This must be **opt-in**, **DEBUG-only**, and guarded by an env var (Peekaboo uses `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1`).
## Next integration steps (after this doc)
1. Add Peekaboo as a git submodule (nested submodules OK).
2. Host `PeekabooBridgeHost` inside Clawdbot.app behind a single setting (“Enable Peekaboo Bridge”, default on).
3. Ensure Clawdbot hosts the bridge at `~/Library/Application Support/clawdbot/bridge.sock` and speaks the PeekabooBridge JSON protocol.
4. Validate with `peekaboo bridge status --verbose` that Peekaboo can select Clawdbot as the fallback host (no auto-launch).
5. Keep all protocol decisions aligned with Peekaboo (coordinate system, element IDs, snapshot scoping, error envelopes).