8.0 KiB
summary, read_when
| summary | read_when | |||
|---|---|---|---|---|
| Plan for integrating Peekaboo automation into Clawdis via PeekabooBridge (socket-based TCC broker) |
|
Peekaboo Bridge in Clawdis (macOS UI automation broker)
TL;DR
- Peekaboo removed its XPC helper and now exposes privileged automation via a UNIX domain socket bridge (
PeekabooBridge/PeekabooBridgeHost, socket namebridge.sock). - Clawdis integrates by optionally hosting the same bridge inside Clawdis.app (user-toggleable). The primary client is the
peekabooCLI (installed via npm); Clawdis does not need its ownui …CLI surface. - For visualizations, we keep them in Peekaboo.app (best UX); Clawdis stays a thin broker host. No visualizer toggle in Clawdis.
Non-goals:
- No auto-launching Peekaboo.app.
- No onboarding deep links from the automation endpoint (Clawdis onboarding already handles permissions).
- No AI provider/agent runtime dependencies in Clawdis (avoid pulling Tachikoma/MCP into the Clawdis app/CLI).
Big refactor (Dec 2025): XPC → Bridge
Peekaboo’s privileged execution moved from “CLI → XPC helper” to “CLI → socket bridge host”. For Clawdis this is a win:
- It matches the existing “local socket + codesign checks” approach.
- It lets us piggyback on either Peekaboo.app’s permissions or Clawdis.app’s permissions (whichever is running).
- It avoids “two apps with two TCC bubbles” unless needed.
Reference (Peekaboo submodule): docs/bridge-host.md.
Architecture
Processes
- Bridge hosts (provide TCC-backed automation):
- Peekaboo.app (preferred; also provides visualizations + controls)
- Clawdis.app (secondary; “thin host” only)
- Bridge clients (trigger single actions):
peekaboo …(preferred; humans + agents)- Optional: Clawdis/Node shells out to
peekaboowhen it needs UI automation/capture
Host discovery (client-side)
Order is deliberate:
- Peekaboo.app host (full UX)
- Clawdis.app host (piggyback on Clawdis permissions)
Socket paths (convention; exact paths must match Peekaboo):
- Peekaboo:
~/Library/Application Support/Peekaboo/bridge.sock - Clawdis:
~/Library/Application Support/clawdis/bridge.sock
No auto-launch: if a host isn’t reachable, the command fails with a clear error (start Peekaboo.app or Clawdis.app).
Override (debugging): set PEEKABOO_BRIDGE_SOCKET=/path/to/bridge.sock.
Protocol shape
- Single request per connection: connect → write one JSON request → half-close → read one JSON response → close.
- Timeout: 10 seconds end-to-end per action (client enforced; host should also enforce per-operation).
- Errors: human-readable string by default; structured envelope in
--json.
Dependency strategy (submodule)
Integrate Peekaboo via git submodule (nested submodules are OK).
Path in Clawdis repo:
./Peekaboo(Swabble-style; keep stable so SwiftPM path deps don’t churn).
What Clawdis should use:
- Client side:
PeekabooBridge(socket client + protocol models). - Host side (Clawdis.app):
PeekabooBridgeHost+ the minimal Peekaboo services needed to implement operations.
What Clawdis should not embed:
- Visualizer UI: keep it in Peekaboo.app for now (toggle + controls live there).
- XPC: don’t reintroduce helper targets; use the bridge.
IPC / CLI surface
No clawdis-mac ui …
We avoid a parallel “Clawdis UI automation CLI”. Instead:
peekaboois the user/agent-facing CLI surface for automation and capture.- Clawdis.app can host PeekabooBridge as a thin TCC broker so Peekaboo can piggyback on Clawdis permissions when Peekaboo.app isn’t running.
Diagnostics
Use Peekaboo’s built-in diagnostics to see which host would be used:
peekaboo bridge statuspeekaboo bridge status --verbosepeekaboo bridge status --json
Output format
Peekaboo commands default to human text output. Add --json for a structured envelope.
Timeouts
Default timeout for UI actions: 10 seconds end-to-end (client enforced; host should also enforce per-operation).
Coordinate model (multi-display)
Requirement: coordinates are per screen, not global.
Standardize for the CLI (agent-friendly): top-left origin per screen.
Proposed request shape:
- Requests accept
screenIndex+{x, y}in that screen’s local coordinate space. - Clawdis.app converts to global CG coordinates using
NSScreen.screens[screenIndex].frame.origin. - Responses should echo both:
- The resolved
screenIndex - The local
{x, y}and bounds - Optionally the global
{x, y}for debugging
- The resolved
Ordering: use NSScreen.screens ordering consistently (documented in the CLI help + JSON schema).
Targeting (per app/window)
Expose window/app targeting in the UI surface (align with Peekaboo targeting):
- frontmost
- by app name / bundle id
- by window title substring
- by (app, index)
Peekaboo CLI targeting (agent-friendly):
--bundle-id <id>for app targeting--window-index <n>(0-based) for disambiguating within an app when capturing
All “see/click/type/scroll/wait” requests should accept a target (default: frontmost).
“See” + click packs (Playwright-style)
Behavior stays aligned with Peekaboo:
peekaboo seereturns element IDs (e.g.B1,T3) with bounds/labels.- Follow-up actions reference those IDs without re-scanning.
peekaboo see should:
- capture (optionally targeted) window/screen
- return a screenshot file path (default: temp directory)
- return a list of elements (text or JSON)
Snapshot lifecycle requirement:
- Host apps are long-lived, so snapshot state should be in-memory by default.
- Snapshot scoping: “implicit snapshot” is per target bundle id (reuse last snapshot for that app when snapshot id is omitted).
Practical flow (agent-friendly):
peekaboo list apps/peekaboo list windowsprovide bundle-id context for targeting.peekaboo see --bundle-id Xupdates the implicit snapshot forX.peekaboo click --bundle-id X --on B1reuses the most recent snapshot forXwhen--snapshot-idis omitted.
Visualizer integration
Keep visualizations in Peekaboo.app for now.
- Clawdis hosts the bridge, but does not render overlays.
- Any “visualizer enabled/disabled” setting is controlled in Peekaboo.app.
Screenshots (legacy → Peekaboo takeover)
Clawdis should not grow a separate screenshot CLI surface.
Migration plan:
- Use
peekaboo capture …/peekaboo see …(returns a file path, default temp directory). - Once Clawdis’ legacy screenshot plumbing is replaced, remove it cleanly (no aliases).
Permissions behavior
If required permissions are missing:
- return
ok=falsewith a short human error message (e.g., “Accessibility permission missing”) - do not try to open System Settings from the automation endpoint
Security (socket auth)
Both hosts must enforce:
- filesystem perms on the socket path (owner read/write only)
- server-side caller validation:
- require the caller’s code signature TeamID to be
Y5PE65HELJ - optional bundle-id allowlist for tighter scoping
- require the caller’s code signature TeamID to be
Debug-only escape hatch (development convenience):
- “allow same-UID callers” means: skip codesign checks for clients running under the same Unix user.
- This must be opt-in, DEBUG-only, and guarded by an env var (Peekaboo uses
PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1).
Next integration steps (after this doc)
- Add Peekaboo as a git submodule (nested submodules OK).
- Host
PeekabooBridgeHostinside Clawdis.app behind a single setting (“Enable Peekaboo Bridge”, default on). - Ensure Clawdis hosts the bridge at
~/Library/Application Support/clawdis/bridge.sockand speaks the PeekabooBridge JSON protocol. - Validate with
peekaboo bridge status --verbosethat Peekaboo can select Clawdis as the fallback host (no auto-launch). - Keep all protocol decisions aligned with Peekaboo (coordinate system, element IDs, snapshot scoping, error envelopes).