feat(mac): host PeekabooBridge for ui

This commit is contained in:
Peter Steinberger
2025-12-13 16:55:41 +00:00
parent fd566bda14
commit c17440f5b4
21 changed files with 1197 additions and 422 deletions

View File

@@ -1,44 +1,80 @@
---
summary: "Plan for integrating Peekaboo automation + visualizer into Clawdis macOS app (via clawdis-mac)"
summary: "Plan for integrating Peekaboo automation into Clawdis via PeekabooBridge (socket-based TCC broker)"
read_when:
- Adding UI automation commands
- Integrating Peekaboo as a submodule
- Changing clawdis-mac IPC/output formats
---
# Peekaboo in Clawdis (macOS UI automation + visualizer)
# Peekaboo Bridge in Clawdis (macOS UI automation broker)
## Goal
Reuse Peekaboos mac automation “core” inside **Clawdis.app** so we piggyback on Clawdis existing TCC grants (Screen Recording, Accessibility, etc.). The CLI (`clawdis-mac`) stays a thin synchronous trigger surface for **single actions** (no batches), returning errors cleanly.
## TL;DR
- **Peekaboo removed its XPC helper** and now exposes privileged automation via a **UNIX domain socket bridge** (`PeekabooBridge` / `PeekabooBridgeHost`, socket name `bridge.sock`).
- Clawdis integrates by **hosting the same bridge** inside **Clawdis.app** (optional, user-toggleable), and by making `clawdis-mac ui …` act as a **bridge client**.
- For **visualizations**, we keep them in **Peekaboo.app** (best UX); Clawdis stays a thin broker host. No visualizer toggle in Clawdis.
Non-goals:
- No AI/agent runtime parts from Peekaboo (no Tachikoma/MCP/Commander entrypoints).
- No auto-onboarding or System Settings deep-linking from the automation layer (Clawdis onboarding already handles that).
- No auto-launching Peekaboo.app.
- No onboarding deep links from the automation endpoint (Clawdis onboarding already handles permissions).
- No AI provider/agent runtime dependencies in Clawdis (avoid pulling Tachikoma/MCP into the Clawdis app/CLI).
## Where code lives
- **Clawdis.app (macOS)**: owns all automation + visualization + TCC prompts.
- **`clawdis-mac` CLI**: sends one request, waits, prints result, exits non-zero on failure.
- **Gateway/Node/TS**: shells out to `clawdis-mac` when it needs TCC-backed actions.
## Big refactor (Dec 2025): XPC → Bridge
Peekaboos privileged execution moved from “CLI → XPC helper” to “CLI → socket bridge host”. For Clawdis this is a win:
- It matches the existing “local socket + codesign checks” approach.
- It lets us piggyback on **either** Peekaboo.apps permissions **or** Clawdis.apps permissions (whichever is running).
- It avoids “two apps with two TCC bubbles” unless needed.
Transport: existing UNIX domain socket (`controlSocketPath`) already used by `clawdis-mac`.
Reference (Peekaboo submodule): `docs/bridge-host.md`.
## Dependencies (submodule strategy)
Integrate Peekaboo via git submodule (nested submodules OK).
## Architecture
### Processes
- **Bridge hosts** (provide TCC-backed automation):
- **Peekaboo.app** (preferred; also provides visualizations + controls)
- **Clawdis.app** (secondary; “thin host” only)
- **Bridge clients** (trigger single actions):
- `clawdis-mac ui …`
- Node/Gateway shells out to `clawdis-mac`
Consume only:
- `PeekabooAutomationKit` (AX automation, element detection, capture helpers; no Tachikoma/MCP).
- `AXorcist` (input driving / AX helpers).
- `PeekabooVisualizer` (overlay visualizations).
### Host discovery (client-side)
Order is deliberate:
1. Peekaboo.app host (full UX)
2. Clawdis.app host (piggyback on Clawdis permissions)
Important nuance:
- `PeekabooAutomationKit` is a standalone SwiftPM package and does **not** require Tachikoma/MCP/Commander.
- `PeekabooVisualizer` ships as a product inside `PeekabooCore/Package.swift`. That package declares other dependencies (including a path dependency to Tachikoma). SwiftPM will still need those paths to exist during dependency resolution even if we dont build those targets.
- If this becomes annoying for Clawdis, the follow-up is to extract `PeekabooVisualizer` into its own standalone Swift package that depends only on `PeekabooFoundation`/`PeekabooProtocols`/`PeekabooExternalDependencies`.
Socket paths (convention; exact paths must match Peekaboo):
- Peekaboo: `~/Library/Application Support/Peekaboo/bridge.sock`
- Clawdis: `~/Library/Application Support/clawdis/bridge.sock`
No auto-launch: if a host isnt reachable, the command fails with a clear error (start Peekaboo.app or Clawdis.app).
Override (debugging): set `PEEKABOO_BRIDGE_SOCKET=/path/to/bridge.sock`.
### Protocol shape
- **Single request per connection**: connect → write one JSON request → half-close → read one JSON response → close.
- **Timeout**: 10 seconds end-to-end per action (client enforced; host should also enforce per-operation).
- **Errors**: human-readable string by default; structured envelope in `--json`.
## Dependency strategy (submodule)
Integrate Peekaboo via git submodule (nested submodules are OK).
Path in Clawdis repo:
- `./Peekaboo` (Swabble-style; keep stable so SwiftPM path deps dont churn).
What Clawdis should use:
- **Client side**: `PeekabooBridge` (socket client + protocol models).
- **Host side (Clawdis.app)**: `PeekabooBridgeHost` + the minimal Peekaboo services needed to implement operations.
What Clawdis should *not* embed:
- **Visualizer UI**: keep it in Peekaboo.app for now (toggle + controls live there).
- **XPC**: dont reintroduce helper targets; use the bridge.
## IPC / CLI surface
### Namespacing
Add new automation commands behind a `ui` prefix:
- `clawdis-mac ui …` for UI automation + visualization-related actions.
- Keep existing top-level commands (`notify`, `run`, `canvas …`, etc.) for compatibility, but do a clean cutover for screenshots: remove the legacy top-level `screenshot` command and ship only `clawdis-mac ui screenshot`.
- Keep existing top-level commands (`notify`, `run`, `canvas …`, etc.) for compatibility.
Screenshot cutover:
- Remove legacy screenshot endpoints/commands.
- Ship only `clawdis-mac ui screenshot` (no aliases).
### Output format
Change `clawdis-mac` to default to human text output:
@@ -50,14 +86,14 @@ This applies globally, not only `ui` commands.
Note (current state as of 2025-12-13): `clawdis-mac` prints text by default; use `--json` for structured output.
### Timeouts
Default timeout for UI actions: **10 seconds** end-to-end (CLI already defaults to 10s).
- CLI: keep the fail-fast default at 10s (unless a command explicitly requests longer).
- Server: only has a ~5s read/decode timeout today; UI operations must also enforce their own per-action timeout so “wait for element” can fail deterministically.
Default timeout for UI actions: **10 seconds** end-to-end.
## Coordinate model (multi-display)
Requirement: coordinates are **per screen**, not global.
Proposed API shape:
Standardize for the CLI (agent-friendly): **top-left origin per screen**.
Proposed request shape:
- Requests accept `screenIndex` + `{x, y}` in that screens local coordinate space.
- Clawdis.app converts to global CG coordinates using `NSScreen.screens[screenIndex].frame.origin`.
- Responses should echo both:
@@ -68,53 +104,48 @@ Proposed API shape:
Ordering: use `NSScreen.screens` ordering consistently (documented in the CLI help + JSON schema).
## Targeting (per app/window)
Expose window/app targeting in the IPC surface (based on Peekaboos existing `WindowTarget` model):
Expose window/app targeting in the UI surface (align with Peekaboo targeting):
- frontmost
- by app name / bundle id
- by window title substring
- by (app, index)
- by window id
Current `clawdis-mac ui …` support:
- `--bundle-id <id>` for app targeting
- `--window-index <n>` (0-based) for disambiguating within an app when capturing (see/screenshot)
All “see/click/type/scroll/wait” requests should accept a target (default: frontmost).
## “See” + click packs (Playwright-style)
Peekaboo already has the core ingredients:
- element detection yielding stable IDs (e.g., `B1`, `T3`)
- bounds + labels/values
- snapshot IDs to allow follow-up actions without re-scanning
Behavior stays aligned with Peekaboo:
- `ui see` returns element IDs (e.g. `B1`, `T3`) with bounds/labels.
- Follow-up actions reference those IDs without re-scanning.
Clawdiss `ui see` should:
`clawdis-mac ui see` should:
- capture (optionally targeted) window/screen
- return a **snapshot id**
- return a list of elements with `{id, type, label/value?, bounds}`
- optionally return screenshot path/bytes (pref: path)
- return a screenshot **file path** (default: temp directory)
- return a list of elements (text or JSON)
Snapshot lifecycle requirement:
- Clawdis runs long-lived in memory, so snapshot state should be **in-memory by default** (no disk-backed JSON concept).
- Peekaboo already supports this via an `InMemorySnapshotManager` (keep disk-backed snapshots as an optional debug mode later).
- Host apps are long-lived, so snapshot state should be **in-memory by default**.
- Snapshot scoping: “implicit snapshot” is **per target bundle id** (reuse last snapshot for that app when snapshot id is omitted).
Practical flow (agent-friendly):
- `clawdis-mac ui frontmost` returns the focused app (bundle id) + focused window (title/id) so follow-up calls can pass `--bundle-id …`.
- `clawdis-mac ui see --bundle-id X` updates the implicit snapshot for `X`.
- `clawdis-mac ui click --bundle-id X --on B1` reuses the most recent snapshot for `X` when `--snapshot-id` is omitted.
## Visualizer integration
Visualizer must be user-toggleable via a Clawdis setting.
Implementation sketch:
- Add a Clawdis UserDefaults-backed setting (e.g. `clawdis.ui.visualizerEnabled`).
- Implement Peekaboos `VisualizerSettingsProviding` in Clawdis (`visualizerEnabled`, animation speed, and per-effect toggles).
- Create a Clawdis-specific `AutomationFeedbackClient` that forwards PeekabooAutomationKit feedback events into a shared `VisualizerCoordinator`.
Current state:
- `PeekabooVisualizer` already includes the visualization implementation (SwiftUI overlay views + coordinator).
The visualizer is intentionally display-only (no clickable overlays needed).
Keep visualizations in **Peekaboo.app** for now.
- Clawdis hosts the bridge, but does not render overlays.
- Any “visualizer enabled/disabled” setting is controlled in Peekaboo.app.
## Screenshots (legacy → Peekaboo takeover)
Clawdis uses `clawdis-mac ui screenshot` and returns a file path (default location: temp directory) instead of raw image bytes.
Migration plan:
- Replace capture implementation with PeekabooAutomationKits capture service so we share:
- per-screen mapping
- window/app targeting
- visual feedback (flash / watch HUD) when enabled
- Keep writing images to a file path on the app side and returning the path (text-friendly), with `--json` providing the structured metadata.
- No aliases: remove the old `Request.screenshot` and introduce a new `Request.uiScreenshot` (or similar) so the new behavior is explicit and theres no “legacy mode” to maintain.
- Bridge host performs capture and returns a temp file path.
- No legacy aliases; make the old screenshot surface disappear cleanly.
## Permissions behavior
If required permissions are missing:
@@ -122,17 +153,32 @@ If required permissions are missing:
- do not try to open System Settings from the automation endpoint
## Security (socket auth)
Clawdis socket is protected by:
Both hosts must enforce:
- filesystem perms on the socket path (owner read/write only)
- server-side caller check:
- requires the callers code signature TeamID to be `Y5PE65HELJ`
- in `DEBUG` builds only, an explicit escape hatch allows same-UID clients when `CLAWDIS_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` is set (development convenience)
- server-side caller validation:
- require the callers code signature TeamID to be `Y5PE65HELJ`
- optional bundle-id allowlist for tighter scoping
This ensures “any local process” cant drive the privileged surface just because it runs under the same macOS user.
Debug-only escape hatch (development convenience):
- “allow same-UID callers” means: *skip codesign checks for clients running under the same Unix user*.
- This must be **opt-in**, **DEBUG-only**, and guarded by an env var (Peekaboo uses `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1`).
## Current `clawdis-mac ui` commands (Dec 2025)
All commands default to text output. Add `--json` right after `clawdis-mac` for a structured envelope.
- `clawdis-mac ui permissions status`
- `clawdis-mac ui frontmost`
- `clawdis-mac ui apps`
- `clawdis-mac ui windows [--bundle-id <id>]`
- `clawdis-mac ui screenshot [--screen-index <n>] [--bundle-id <id>] [--window-index <n>] [--watch] [--scale native|1x]`
- `clawdis-mac ui see [--bundle-id <id>] [--window-index <n>] [--snapshot-id <id>]`
- `clawdis-mac ui click --on <elementId> [--bundle-id <id>] [--snapshot-id <id>] [--double|--right]`
- `clawdis-mac ui type --text <value> [--into <elementId>] [--bundle-id <id>] [--snapshot-id <id>] [--clear] [--delay-ms <n>]`
- `clawdis-mac ui wait --on <elementId> [--bundle-id <id>] [--snapshot-id <id>] [--timeout <sec>]`
## Next integration steps (after this doc)
1. Add Peekaboo as a git submodule (and required nested submodules).
2. Wire SwiftPM deps in `apps/macos/Package.swift` to import `PeekabooAutomationKit` + `PeekabooVisualizer`.
3. Extend `ClawdisIPC.Request` with `ui.*` commands (`see/click/type/scroll/wait/screenshot/windows/screens`).
4. Implement handlers in Clawdis.app and route through PeekabooAutomationKit services.
5. Update `clawdis-mac` output defaults (text + `--json`), and adjust any internal call sites that relied on JSON-by-default.
1. Add Peekaboo as a git submodule (nested submodules OK).
2. Add a small `clawdis-mac ui …` surface that speaks PeekabooBridge (text by default, `--json` for structured).
3. Host `PeekabooBridgeHost` inside Clawdis.app behind a single setting (“Enable Peekaboo Bridge”, default on).
4. Implement the minimum operation set needed for agents (see/click/type/scroll/wait/screenshot, plus list apps/windows/screens).
5. Keep all protocol decisions aligned with Peekaboo (coordinate system, element IDs, snapshot scoping, error envelopes).