8.9 KiB
summary, read_when
| summary | read_when | |||
|---|---|---|---|---|
| Plan for integrating Peekaboo automation into Clawdis via PeekabooBridge (socket-based TCC broker) |
|
Peekaboo Bridge in Clawdis (macOS UI automation broker)
TL;DR
- Peekaboo removed its XPC helper and now exposes privileged automation via a UNIX domain socket bridge (
PeekabooBridge/PeekabooBridgeHost, socket namebridge.sock). - Clawdis integrates by hosting the same bridge inside Clawdis.app (optional, user-toggleable), and by making
clawdis-mac ui …act as a bridge client. - For visualizations, we keep them in Peekaboo.app (best UX); Clawdis stays a thin broker host. No visualizer toggle in Clawdis.
Non-goals:
- No auto-launching Peekaboo.app.
- No onboarding deep links from the automation endpoint (Clawdis onboarding already handles permissions).
- No AI provider/agent runtime dependencies in Clawdis (avoid pulling Tachikoma/MCP into the Clawdis app/CLI).
Big refactor (Dec 2025): XPC → Bridge
Peekaboo’s privileged execution moved from “CLI → XPC helper” to “CLI → socket bridge host”. For Clawdis this is a win:
- It matches the existing “local socket + codesign checks” approach.
- It lets us piggyback on either Peekaboo.app’s permissions or Clawdis.app’s permissions (whichever is running).
- It avoids “two apps with two TCC bubbles” unless needed.
Reference (Peekaboo submodule): docs/bridge-host.md.
Architecture
Processes
- Bridge hosts (provide TCC-backed automation):
- Peekaboo.app (preferred; also provides visualizations + controls)
- Clawdis.app (secondary; “thin host” only)
- Bridge clients (trigger single actions):
clawdis-mac ui …clawdis ui …(Node/TS convenience wrapper; shells out toclawdis-mac ui …)- Node/Gateway shells out to
clawdis-mac
Host discovery (client-side)
Order is deliberate:
- Peekaboo.app host (full UX)
- Clawdis.app host (piggyback on Clawdis permissions)
Socket paths (convention; exact paths must match Peekaboo):
- Peekaboo:
~/Library/Application Support/Peekaboo/bridge.sock - Clawdis:
~/Library/Application Support/clawdis/bridge.sock
No auto-launch: if a host isn’t reachable, the command fails with a clear error (start Peekaboo.app or Clawdis.app).
Override (debugging): set PEEKABOO_BRIDGE_SOCKET=/path/to/bridge.sock.
Protocol shape
- Single request per connection: connect → write one JSON request → half-close → read one JSON response → close.
- Timeout: 10 seconds end-to-end per action (client enforced; host should also enforce per-operation).
- Errors: human-readable string by default; structured envelope in
--json.
Dependency strategy (submodule)
Integrate Peekaboo via git submodule (nested submodules are OK).
Path in Clawdis repo:
./Peekaboo(Swabble-style; keep stable so SwiftPM path deps don’t churn).
What Clawdis should use:
- Client side:
PeekabooBridge(socket client + protocol models). - Host side (Clawdis.app):
PeekabooBridgeHost+ the minimal Peekaboo services needed to implement operations.
What Clawdis should not embed:
- Visualizer UI: keep it in Peekaboo.app for now (toggle + controls live there).
- XPC: don’t reintroduce helper targets; use the bridge.
IPC / CLI surface
Namespacing
Add new automation commands behind a ui prefix:
clawdis-mac ui …for UI automation + visualization-related actions.- Keep existing top-level commands (
notify,run,canvas …, etc.) for compatibility.
Screenshot cutover:
- Remove legacy screenshot endpoints/commands.
- Ship only
clawdis-mac ui screenshot(no aliases).
Output format
Change clawdis-mac to default to human text output:
- Default: plain text; errors are string messages to stderr; exit codes indicate success/failure.
--json: structured output (for agents/scripts) with stable schemas.
This applies globally, not only ui commands.
Note (current state as of 2025-12-13): clawdis-mac prints text by default; use --json for structured output.
Timeouts
Default timeout for UI actions: 10 seconds end-to-end.
Coordinate model (multi-display)
Requirement: coordinates are per screen, not global.
Standardize for the CLI (agent-friendly): top-left origin per screen.
Proposed request shape:
- Requests accept
screenIndex+{x, y}in that screen’s local coordinate space. - Clawdis.app converts to global CG coordinates using
NSScreen.screens[screenIndex].frame.origin. - Responses should echo both:
- The resolved
screenIndex - The local
{x, y}and bounds - Optionally the global
{x, y}for debugging
- The resolved
Ordering: use NSScreen.screens ordering consistently (documented in the CLI help + JSON schema).
Targeting (per app/window)
Expose window/app targeting in the UI surface (align with Peekaboo targeting):
- frontmost
- by app name / bundle id
- by window title substring
- by (app, index)
Current clawdis-mac ui … support:
--bundle-id <id>for app targeting--window-index <n>(0-based) for disambiguating within an app when capturing (see/screenshot)
All “see/click/type/scroll/wait” requests should accept a target (default: frontmost).
“See” + click packs (Playwright-style)
Behavior stays aligned with Peekaboo:
ui seereturns element IDs (e.g.B1,T3) with bounds/labels.- Follow-up actions reference those IDs without re-scanning.
clawdis-mac ui see should:
- capture (optionally targeted) window/screen
- return a screenshot file path (default: temp directory)
- return a list of elements (text or JSON)
Snapshot lifecycle requirement:
- Host apps are long-lived, so snapshot state should be in-memory by default.
- Snapshot scoping: “implicit snapshot” is per target bundle id (reuse last snapshot for that app when snapshot id is omitted).
Practical flow (agent-friendly):
clawdis-mac ui frontmostreturns the focused app (bundle id) + focused window (title/id) so follow-up calls can pass--bundle-id ….clawdis-mac ui see --bundle-id Xupdates the implicit snapshot forX.clawdis-mac ui click --bundle-id X --on B1reuses the most recent snapshot forXwhen--snapshot-idis omitted.
Visualizer integration
Keep visualizations in Peekaboo.app for now.
- Clawdis hosts the bridge, but does not render overlays.
- Any “visualizer enabled/disabled” setting is controlled in Peekaboo.app.
Screenshots (legacy → Peekaboo takeover)
Clawdis uses clawdis-mac ui screenshot and returns a file path (default location: temp directory) instead of raw image bytes.
Migration plan:
- Bridge host performs capture and returns a temp file path.
- No legacy aliases; make the old screenshot surface disappear cleanly.
Permissions behavior
If required permissions are missing:
- return
ok=falsewith a short human error message (e.g., “Accessibility permission missing”) - do not try to open System Settings from the automation endpoint
Security (socket auth)
Both hosts must enforce:
- filesystem perms on the socket path (owner read/write only)
- server-side caller validation:
- require the caller’s code signature TeamID to be
Y5PE65HELJ - optional bundle-id allowlist for tighter scoping
- require the caller’s code signature TeamID to be
Debug-only escape hatch (development convenience):
- “allow same-UID callers” means: skip codesign checks for clients running under the same Unix user.
- This must be opt-in, DEBUG-only, and guarded by an env var (Peekaboo uses
PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1).
Current clawdis-mac ui commands (Dec 2025)
All commands default to text output. Add --json right after clawdis-mac for a structured envelope.
clawdis-mac ui permissions statusclawdis-mac ui frontmostclawdis-mac ui appsclawdis-mac ui windows [--bundle-id <id>]clawdis-mac ui screenshot [--screen-index <n>] [--bundle-id <id>] [--window-index <n>] [--watch] [--scale native|1x]clawdis-mac ui see [--bundle-id <id>] [--window-index <n>] [--snapshot-id <id>]clawdis-mac ui click --on <elementId> [--bundle-id <id>] [--snapshot-id <id>] [--double|--right]clawdis-mac ui type --text <value> [--into <elementId>] [--bundle-id <id>] [--snapshot-id <id>] [--clear] [--delay-ms <n>]clawdis-mac ui wait --on <elementId> [--bundle-id <id>] [--snapshot-id <id>] [--timeout <sec>]
Next integration steps (after this doc)
- Add Peekaboo as a git submodule (nested submodules OK).
- Add a small
clawdis-mac ui …surface that speaks PeekabooBridge (text by default,--jsonfor structured). - Host
PeekabooBridgeHostinside Clawdis.app behind a single setting (“Enable Peekaboo Bridge”, default on). - Implement the minimum operation set needed for agents (see/click/type/scroll/wait/screenshot, plus list apps/windows/screens).
- Keep all protocol decisions aligned with Peekaboo (coordinate system, element IDs, snapshot scoping, error envelopes).