139 lines
7.3 KiB
Markdown
139 lines
7.3 KiB
Markdown
---
|
||
summary: "Plan for integrating Peekaboo automation + visualizer into Clawdis macOS app (via clawdis-mac)"
|
||
read_when:
|
||
- Adding UI automation commands
|
||
- Integrating Peekaboo as a submodule
|
||
- Changing clawdis-mac IPC/output formats
|
||
---
|
||
# Peekaboo in Clawdis (macOS UI automation + visualizer)
|
||
|
||
## Goal
|
||
Reuse Peekaboo’s mac automation “core” inside **Clawdis.app** so we piggyback on Clawdis’ existing TCC grants (Screen Recording, Accessibility, etc.). The CLI (`clawdis-mac`) stays a thin synchronous trigger surface for **single actions** (no batches), returning errors cleanly.
|
||
|
||
Non-goals:
|
||
- No AI/agent runtime parts from Peekaboo (no Tachikoma/MCP/Commander entrypoints).
|
||
- No auto-onboarding or System Settings deep-linking from the automation layer (Clawdis onboarding already handles that).
|
||
|
||
## Where code lives
|
||
- **Clawdis.app (macOS)**: owns all automation + visualization + TCC prompts.
|
||
- **`clawdis-mac` CLI**: sends one request, waits, prints result, exits non-zero on failure.
|
||
- **Gateway/Node/TS**: shells out to `clawdis-mac` when it needs TCC-backed actions.
|
||
|
||
Transport: existing UNIX domain socket (`controlSocketPath`) already used by `clawdis-mac`.
|
||
|
||
## Dependencies (submodule strategy)
|
||
Integrate Peekaboo via git submodule (nested submodules OK).
|
||
|
||
Consume only:
|
||
- `PeekabooAutomationKit` (AX automation, element detection, capture helpers; no Tachikoma/MCP).
|
||
- `AXorcist` (input driving / AX helpers).
|
||
- `PeekabooVisualizer` (overlay visualizations).
|
||
|
||
Important nuance:
|
||
- `PeekabooAutomationKit` is a standalone SwiftPM package and does **not** require Tachikoma/MCP/Commander.
|
||
- `PeekabooVisualizer` ships as a product inside `PeekabooCore/Package.swift`. That package declares other dependencies (including a path dependency to Tachikoma). SwiftPM will still need those paths to exist during dependency resolution even if we don’t build those targets.
|
||
- If this becomes annoying for Clawdis, the follow-up is to extract `PeekabooVisualizer` into its own standalone Swift package that depends only on `PeekabooFoundation`/`PeekabooProtocols`/`PeekabooExternalDependencies`.
|
||
|
||
## IPC / CLI surface
|
||
### Namespacing
|
||
Add new automation commands behind a `ui` prefix:
|
||
- `clawdis-mac ui …` for UI automation + visualization-related actions.
|
||
- Keep existing top-level commands (`notify`, `run`, `canvas …`, etc.) for compatibility, but do a clean cutover for screenshots: remove the legacy top-level `screenshot` command and ship only `clawdis-mac ui screenshot`.
|
||
|
||
### Output format
|
||
Change `clawdis-mac` to default to human text output:
|
||
- **Default**: plain text; errors are string messages to stderr; exit codes indicate success/failure.
|
||
- **`--json`**: structured output (for agents/scripts) with stable schemas.
|
||
|
||
This applies globally, not only `ui` commands.
|
||
|
||
Note (current state as of 2025-12-13): `clawdis-mac` prints JSON by default. This is a planned behavior change.
|
||
|
||
### Timeouts
|
||
Default timeout for UI actions: **10 seconds** end-to-end (CLI already defaults to 10s).
|
||
- CLI: keep the fail-fast default at 10s (unless a command explicitly requests longer).
|
||
- Server: only has a ~5s read/decode timeout today; UI operations must also enforce their own per-action timeout so “wait for element” can fail deterministically.
|
||
|
||
## Coordinate model (multi-display)
|
||
Requirement: coordinates are **per screen**, not global.
|
||
|
||
Proposed API shape:
|
||
- Requests accept `screenIndex` + `{x, y}` in that screen’s local coordinate space.
|
||
- Clawdis.app converts to global CG coordinates using `NSScreen.screens[screenIndex].frame.origin`.
|
||
- Responses should echo both:
|
||
- The resolved `screenIndex`
|
||
- The local `{x, y}` and bounds
|
||
- Optionally the global `{x, y}` for debugging
|
||
|
||
Ordering: use `NSScreen.screens` ordering consistently (documented in the CLI help + JSON schema).
|
||
|
||
## Targeting (per app/window)
|
||
Expose window/app targeting in the IPC surface (based on Peekaboo’s existing `WindowTarget` model):
|
||
- frontmost
|
||
- by app name / bundle id
|
||
- by window title substring
|
||
- by (app, index)
|
||
- by window id
|
||
|
||
All “see/click/type/scroll/wait” requests should accept a target (default: frontmost).
|
||
|
||
## “See” + click packs (Playwright-style)
|
||
Peekaboo already has the core ingredients:
|
||
- element detection yielding stable IDs (e.g., `B1`, `T3`)
|
||
- bounds + labels/values
|
||
- snapshot IDs to allow follow-up actions without re-scanning
|
||
|
||
Clawdis’s `ui see` should:
|
||
- capture (optionally targeted) window/screen
|
||
- return a **snapshot id**
|
||
- return a list of elements with `{id, type, label/value?, bounds}`
|
||
- optionally return screenshot path/bytes (pref: path)
|
||
|
||
Snapshot lifecycle requirement:
|
||
- Clawdis runs long-lived in memory, so “snapshot state” should be **in-memory by default** (no disk-backed JSON concept).
|
||
- Peekaboo already supports this via an `InMemorySnapshotManager` (keep disk-backed snapshots as an optional debug mode later).
|
||
|
||
## Visualizer integration
|
||
Visualizer must be user-toggleable via a Clawdis setting.
|
||
|
||
Implementation sketch:
|
||
- Add a Clawdis UserDefaults-backed setting (e.g. `clawdis.ui.visualizerEnabled`).
|
||
- Implement Peekaboo’s `VisualizerSettingsProviding` in Clawdis (`visualizerEnabled`, animation speed, and per-effect toggles).
|
||
- Create a Clawdis-specific `AutomationFeedbackClient` that forwards PeekabooAutomationKit feedback events into a shared `VisualizerCoordinator`.
|
||
|
||
Current state:
|
||
- `PeekabooVisualizer` already includes the visualization implementation (SwiftUI overlay views + coordinator).
|
||
The visualizer is intentionally display-only (no clickable overlays needed).
|
||
|
||
## Screenshots (legacy → Peekaboo takeover)
|
||
Clawdis currently has a legacy `screenshot` request returning raw PNG bytes in `Response.payload`.
|
||
|
||
Migration plan:
|
||
- Replace capture implementation with PeekabooAutomationKit’s capture service so we share:
|
||
- per-screen mapping
|
||
- window/app targeting
|
||
- visual feedback (flash / watch HUD) when enabled
|
||
- Prefer writing images to a file path on the app side and returning the path (text-friendly), with `--json` providing the structured metadata.
|
||
- No aliases: remove the old `Request.screenshot` and introduce a new `Request.uiScreenshot` (or similar) so the new behavior is explicit and there’s no “legacy mode” to maintain.
|
||
|
||
## Permissions behavior
|
||
If required permissions are missing:
|
||
- return `ok=false` with a short human error message (e.g., “Accessibility permission missing”)
|
||
- do not try to open System Settings from the automation endpoint
|
||
|
||
## Security (socket auth)
|
||
Clawdis’ socket is protected by:
|
||
- filesystem perms on the socket path (owner read/write only)
|
||
- server-side caller check:
|
||
- requires the caller’s code signature TeamID to be `Y5PE65HELJ`
|
||
- in `DEBUG` builds only, an explicit escape hatch allows same-UID clients when `CLAWDIS_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` is set (development convenience)
|
||
|
||
This ensures “any local process” can’t drive the privileged surface just because it runs under the same macOS user.
|
||
|
||
## Next integration steps (after this doc)
|
||
1. Add Peekaboo as a git submodule (and required nested submodules).
|
||
2. Wire SwiftPM deps in `apps/macos/Package.swift` to import `PeekabooAutomationKit` + `PeekabooVisualizer`.
|
||
3. Extend `ClawdisIPC.Request` with `ui.*` commands (`see/click/type/scroll/wait/screenshot/windows/screens`).
|
||
4. Implement handlers in Clawdis.app and route through PeekabooAutomationKit services.
|
||
5. Update `clawdis-mac` output defaults (text + `--json`), and adjust any internal call sites that relied on JSON-by-default.
|