let5see/clawdbot

Fork 0

Files

Peter Steinberger 29907a4c3f docs(mac): drop screenshot alias plan

2025-12-13 02:51:48 +00:00

7.2 KiB

Raw Blame History

summary, read_when

summary

read_when

Plan for integrating Peekaboo automation + visualizer into Clawdis macOS app (via clawdis-mac)

Adding UI automation commands

Integrating Peekaboo as a submodule

Changing clawdis-mac IPC/output formats

Peekaboo in Clawdis (macOS UI automation + visualizer)

Goal

Reuse Peekaboo’s mac automation “core” inside Clawdis.app so we piggyback on Clawdis’ existing TCC grants (Screen Recording, Accessibility, etc.). The CLI (clawdis-mac) stays a thin synchronous trigger surface for single actions (no batches), returning errors cleanly.

Non-goals:

No AI/agent runtime parts from Peekaboo (no Tachikoma/MCP/Commander entrypoints).
No auto-onboarding or System Settings deep-linking from the automation layer (Clawdis onboarding already handles that).

Where code lives

Clawdis.app (macOS): owns all automation + visualization + TCC prompts.
clawdis-mac CLI: sends one request, waits, prints result, exits non-zero on failure.
Gateway/Node/TS: shells out to clawdis-mac when it needs TCC-backed actions.

Transport: existing UNIX domain socket (controlSocketPath) already used by clawdis-mac.

Dependencies (submodule strategy)

Integrate Peekaboo via git submodule (nested submodules OK).

Consume only:

PeekabooAutomationKit (AX automation, element detection, capture helpers; no Tachikoma/MCP).
AXorcist (input driving / AX helpers).
PeekabooVisualizer (overlay visualizations).

Important nuance:

PeekabooVisualizer currently ships as the PeekabooVisualizer product inside PeekabooCore/Package.swift. That package declares other dependencies (including a path dependency to Tachikoma). SwiftPM will still need those paths to exist during dependency resolution even if we don’t build those targets.
- If this is too annoying for Clawdis, the follow-up is to extract PeekabooVisualizer into its own standalone Swift package that depends only on PeekabooFoundation/PeekabooProtocols/PeekabooExternalDependencies.

IPC / CLI surface

Namespacing

Add new automation commands behind a ui prefix:

clawdis-mac ui … for UI automation + visualization-related actions.
Keep existing top-level commands (notify, run, canvas …, etc.) for compatibility, but do a clean cutover for screenshots: remove the legacy top-level screenshot command and ship only clawdis-mac ui screenshot.

Output format

Change clawdis-mac to default to human text output:

Default: plain text; errors are string messages to stderr; exit codes indicate success/failure.
--json: structured output (for agents/scripts) with stable schemas.

This applies globally, not only ui commands.

Timeouts

Default timeout for UI actions: 10 seconds end-to-end (CLI already defaults to 10s).

CLI: keep the fail-fast default at 10s (unless a command explicitly requests longer).
Server: only has a ~5s read/decode timeout today; UI operations must also enforce their own per-action timeout so “wait for element” can fail deterministically.

Coordinate model (multi-display)

Requirement: coordinates are per screen, not global.

Proposed API shape:

Requests accept screenIndex + {x, y} in that screen’s local coordinate space.
Clawdis.app converts to global CG coordinates using NSScreen.screens[screenIndex].frame.origin.
Responses should echo both:
- The resolved screenIndex
- The local {x, y} and bounds
- Optionally the global {x, y} for debugging

Ordering: use NSScreen.screens ordering consistently (documented in the CLI help + JSON schema).

Targeting (per app/window)

Expose window/app targeting in the IPC surface (based on Peekaboo’s existing WindowTarget model):

frontmost
by app name / bundle id
by window title substring
by (app, index)
by window id

All “see/click/type/scroll/wait” requests should accept a target (default: frontmost).

“See” + click packs (Playwright-style)

Peekaboo already has the core ingredients:

element detection yielding stable IDs (e.g., B1, T3)
bounds + labels/values
session IDs to allow follow-up actions without re-scanning

Clawdis’s ui see should:

capture (optionally targeted) window/screen
return a session id
return a list of elements with {id, type, label/value?, bounds}
optionally return screenshot path/bytes (pref: path)

Visualizer integration

Visualizer must be user-toggleable via a Clawdis setting.

Implementation sketch:

Add a Clawdis UserDefaults-backed setting (e.g. clawdis.ui.visualizerEnabled).
Implement Peekaboo’s VisualizerSettingsProviding in Clawdis (visualizerEnabled, animation speed, and per-effect toggles).
Create a Clawdis-specific AutomationFeedbackClient that forwards PeekabooAutomationKit feedback events into a shared VisualizerCoordinator.

Current state:

PeekabooVisualizer already includes the visualization implementation (SwiftUI overlay views + coordinator).

Open requirement:

“Any AX event should be clickable.” Today the visualizer is display-only; the likely follow-up is:
- make the annotated element overlays tappable (debug tool)
- surface tap → element id → send a ui click --element <id> --session <sid> request back through Clawdis’ control channel (or a local callback if the visualizer runs inside the app)

Screenshots (legacy → Peekaboo takeover)

Clawdis currently has a legacy screenshot request returning raw PNG bytes in Response.payload.

Migration plan:

Replace capture implementation with PeekabooAutomationKit’s capture service so we share:
- per-screen mapping
- window/app targeting
- visual feedback (flash / watch HUD) when enabled
Prefer writing images to a file path on the app side and returning the path (text-friendly), with --json providing the structured metadata.
No aliases: remove the old Request.screenshot and introduce a new Request.uiScreenshot (or similar) so the new behavior is explicit and there’s no “legacy mode” to maintain.

Permissions behavior

If required permissions are missing:

return ok=false with a short human error message (e.g., “Accessibility permission missing”)
do not try to open System Settings from the automation endpoint

Security (socket auth)

Clawdis’ socket is protected by:

filesystem perms on the socket path (owner read/write only)
server-side caller check:
- requires the caller’s code signature TeamID to be Y5PE65HELJ
- in DEBUG builds only, an explicit escape hatch allows same-UID clients when CLAWDIS_ALLOW_UNSIGNED_SOCKET_CLIENTS=1 is set (development convenience)

This ensures “any local process” can’t drive the privileged surface just because it runs under the same macOS user.

Next integration steps (after this doc)

Add Peekaboo as a git submodule (and required nested submodules).
Wire SwiftPM deps in apps/macos/Package.swift to import PeekabooAutomationKit + PeekabooVisualizer.
Extend ClawdisIPC.Request with ui.* commands (see/click/type/scroll/wait/screenshot/windows/screens).
Implement handlers in Clawdis.app and route through PeekabooAutomationKit services.
Update clawdis-mac output defaults (text + --json), and adjust any internal call sites that relied on JSON-by-default.

7.2 KiB Raw Blame History Unescape Escape