Run the Node-based Clawdis/clawdis gateway as a direct child of the LSUIElement app (instead of a launchd agent) while keeping all TCC-sensitive work inside the Swift app/XPC and wiring the existing “Clawdis Active” toggle to start/stop the child.
Run the Node-based Clawdis/clawdis gateway as a direct child of the LSUIElement app (instead of a launchd agent) while keeping all TCC-sensitive work inside the Swift app/broker layer and wiring the existing “Clawdis Active” toggle to start/stop the child.
## When to prefer the child-process mode
- You want gateway lifetime strictly coupled to the menu-bar app (dies when the app quits) and controlled by the “Clawdis Active” toggle without touching launchd.
@@ -18,12 +18,13 @@ Run the Node-based Clawdis/clawdis gateway as a direct child of the LSUIElement
## Tradeoffs vs. launchd
- **Pros:** tighter coupling to UI state; simpler surface (no plist install/bootout); easier to stream stdout/stderr; fewer moving parts for beta users.
- **Cons:** no built-in KeepAlive/login auto-start; app crash kills gateway; you must build your own restart/backoff; Activity Monitor will show both processes under the app; still need correct TCC handling (see below).
- **TCC:** behaviorally, child processes often inherit the parent app’s “responsible process” for TCC, but this is *not a contract*. Continue to route all protected actions through the Swift app/XPC so prompts stay tied to the signed app bundle.
- **TCC:** behaviorally, child processes often inherit the parent app’s “responsible process” for TCC, but this is *not a contract*. Continue to route all protected actions through the Swift app/broker so prompts stay tied to the signed app bundle.
## TCC guardrails (must keep)
- Screen Recording, Accessibility, mic, and speech prompts must originate from the Swift app/XPC. The Node child should never call these APIs directly; use the existing XPC/CLI broker (`clawdis-mac`) for:
- Screen Recording, Accessibility, mic, and speech prompts must originate from the signed Swift app/broker. The Node child should never call these APIs directly; use the CLI broker (`clawdis-mac`) for:
-`ensure-permissions`
-`ui screenshot`/ ScreenCaptureKit work
-`ui screenshot`(via PeekabooBridge host)
- other `ui …` automation (see/click/type/scroll/wait) when implemented
- mic/speech permission checks
- notifications
- shell runs that need `needs-screen-recording`
@@ -48,7 +49,7 @@ Run the Node-based Clawdis/clawdis gateway as a direct child of the LSUIElement
## Packaging and signing
- Bundle the gateway payload (dist + production node_modules) under `Contents/Resources/Gateway/`; rely on host Node ≥22 instead of embedding a runtime.
- Codesign native addons and dylibs inside the bundle; no nested runtime binary to sign now.
- Host runtime should not call TCC APIs directly; keep privileged work inside the app/XPC.
- Host runtime should not call TCC APIs directly; keep privileged work inside the app/broker.
## Logging and observability
- Stream child stdout/stderr to `/tmp/clawdis-gateway.log`; surface the last N lines in the Debug tab.
@@ -58,14 +59,14 @@ Run the Node-based Clawdis/clawdis gateway as a direct child of the LSUIElement
## Failure/edge cases
- App crash/quit kills the gateway. Decide if that is acceptable for the deployment tier; otherwise, stick with launchd for production and keep child-process for dev/experiments.
- If the gateway exits repeatedly, back off (e.g., 1s/2s/5s/10s) and give up after N attempts with a menu warning.
- Respect the existing pause semantics: when paused, the XPC should return `ok=false, "clawdis paused"`; the gateway should avoid calling privileged routes while paused.
- Respect the existing pause semantics: when paused, the broker should return `ok=false, "clawdis paused"`; the gateway should avoid calling privileged routes while paused.
## Open questions / follow-ups
- Do we need dual-mode (launchd for prod, child for dev)? If yes, gate via a setting or build flag.
- Embedding a runtime is off the table for now; we rely on host Node for size/simplicity. Revisit only if host PATH drift becomes painful.
- Do we want a tiny signed helper for rare TCC actions that cannot be brokered via XPC?
- Do we want a tiny signed helper for rare TCC actions that cannot be brokered via the Swift app/broker?
## Decision snapshot (current recommendation)
- Keep all TCC surfaces in the Swift app/XPC.
- Keep all TCC surfaces in the Swift app/broker (control socket + PeekabooBridgeHost).
- Implement `GatewayProcessManager` with Swift Subprocess to start/stop the gateway on the “Clawdis Active” toggle.
- Maintain the launchd path as a fallback for uptime/login persistence until child-mode proves stable.
summary: "Plan for integrating Peekaboo automation + visualizer into Clawdis macOS app (via clawdis-mac)"
summary: "Plan for integrating Peekaboo automation into Clawdis via PeekabooBridge (socket-based TCC broker)"
read_when:
- Adding UI automation commands
- Integrating Peekaboo as a submodule
- Changing clawdis-mac IPC/output formats
---
# Peekaboo in Clawdis (macOS UI automation + visualizer)
# Peekaboo Bridge in Clawdis (macOS UI automation broker)
## Goal
Reuse Peekaboo’s mac automation “core” inside **Clawdis.app** so we piggyback on Clawdis’ existing TCC grants (Screen Recording, Accessibility, etc.). The CLI (`clawdis-mac`) stays a thin synchronous trigger surface for **single actions** (no batches), returning errors cleanly.
## TL;DR
- **Peekaboo removed its XPC helper** and now exposes privileged automation via a **UNIX domain socket bridge** (`PeekabooBridge` / `PeekabooBridgeHost`, socket name `bridge.sock`).
- Clawdis integrates by **hosting the same bridge** inside **Clawdis.app** (optional, user-toggleable), and by making `clawdis-mac ui …` act as a **bridge client**.
- For **visualizations**, we keep them in **Peekaboo.app** (best UX); Clawdis stays a thin broker host. No visualizer toggle in Clawdis.
Non-goals:
- No AI/agent runtime parts from Peekaboo (no Tachikoma/MCP/Commander entrypoints).
- No auto-onboarding or System Settings deep-linking from the automation layer (Clawdis onboarding already handles that).
- No auto-launching Peekaboo.app.
- No onboarding deeplinks from the automation endpoint (Clawdis onboarding already handles permissions).
- No AI provider/agent runtime dependencies in Clawdis (avoid pulling Tachikoma/MCP into the Clawdis app/CLI).
- **Peekaboo.app** (preferred; also provides visualizations + controls)
- **Clawdis.app** (secondary; “thin host” only)
- **Bridge clients** (trigger single actions):
-`clawdis-mac ui …`
- Node/Gateway shells out to `clawdis-mac`
Consume only:
-`PeekabooAutomationKit` (AX automation, element detection, capture helpers; no Tachikoma/MCP).
-`AXorcist` (input driving / AX helpers).
-`PeekabooVisualizer` (overlay visualizations).
### Host discovery (client-side)
Order is deliberate:
1. Peekaboo.app host (full UX)
2. Clawdis.app host (piggyback on Clawdis permissions)
Important nuance:
-`PeekabooAutomationKit` is a standalone SwiftPM package and does **not** require Tachikoma/MCP/Commander.
-`PeekabooVisualizer` ships as a product inside `PeekabooCore/Package.swift`. That package declares other dependencies (including a path dependency to Tachikoma). SwiftPM will still need those paths to exist during dependency resolution even if we don’t build those targets.
- If this becomes annoying for Clawdis, the follow-up is to extract `PeekabooVisualizer` into its own standalone Swift package that depends only on `PeekabooFoundation`/`PeekabooProtocols`/`PeekabooExternalDependencies`.
Socket paths (convention; exact paths must match Peekaboo):
- **Host side (Clawdis.app)**: `PeekabooBridgeHost` + the minimal Peekaboo services needed to implement operations.
What Clawdis should *not* embed:
- **Visualizer UI**: keep it in Peekaboo.app for now (toggle + controls live there).
- **XPC**: don’t reintroduce helper targets; use the bridge.
## IPC / CLI surface
### Namespacing
Add new automation commands behind a `ui` prefix:
-`clawdis-mac ui …` for UI automation + visualization-related actions.
- Keep existing top-level commands (`notify`, `run`, `canvas …`, etc.) for compatibility, but do a clean cutover for screenshots: remove the legacy top-level `screenshot` command and ship only `clawdis-mac ui screenshot`.
- Ship only `clawdis-mac ui screenshot` (no aliases).
### Output format
Change `clawdis-mac` to default to human text output:
@@ -50,14 +86,14 @@ This applies globally, not only `ui` commands.
Note (current state as of 2025-12-13): `clawdis-mac` prints text by default; use `--json` for structured output.
### Timeouts
Default timeout for UI actions: **10 seconds** end-to-end (CLI already defaults to 10s).
- CLI: keep the fail-fast default at 10s (unless a command explicitly requests longer).
- Server: only has a ~5s read/decode timeout today; UI operations must also enforce their own per-action timeout so “wait for element” can fail deterministically.
Default timeout for UI actions: **10 seconds** end-to-end.
## Coordinate model (multi-display)
Requirement: coordinates are **per screen**, not global.
Proposed API shape:
Standardize for the CLI (agent-friendly): **top-left origin per screen**.
Proposed request shape:
- Requests accept `screenIndex` + `{x, y}` in that screen’s local coordinate space.
- Clawdis.app converts to global CG coordinates using `NSScreen.screens[screenIndex].frame.origin`.
- Responses should echo both:
@@ -68,53 +104,48 @@ Proposed API shape:
Ordering: use `NSScreen.screens` ordering consistently (documented in the CLI help + JSON schema).
## Targeting (per app/window)
Expose window/app targeting in the IPC surface (based on Peekaboo’s existing `WindowTarget` model):
Expose window/app targeting in the UI surface (align with Peekaboo targeting):
- frontmost
- by app name / bundle id
- by window title substring
- by (app, index)
- by window id
Current `clawdis-mac ui …` support:
-`--bundle-id <id>` for app targeting
-`--window-index <n>` (0-based) for disambiguating within an app when capturing (see/screenshot)
All “see/click/type/scroll/wait” requests should accept a target (default: frontmost).
- return a screenshot **file path** (default: temp directory)
- return a list of elements (text or JSON)
Snapshot lifecycle requirement:
-Clawdis runs long-lived in memory, so “snapshot state” should be **in-memory by default** (no disk-backed JSON concept).
-Peekaboo already supports this via an `InMemorySnapshotManager` (keep disk-backed snapshots as an optional debug mode later).
-Host apps are long-lived, so snapshot state should be **in-memory by default**.
-Snapshot scoping: “implicit snapshot” is **per target bundle id** (reuse last snapshot for that app when snapshot id is omitted).
Practical flow (agent-friendly):
-`clawdis-mac ui frontmost` returns the focused app (bundle id) + focused window (title/id) so follow-up calls can pass `--bundle-id …`.
-`clawdis-mac ui see --bundle-id X` updates the implicit snapshot for `X`.
-`clawdis-mac ui click --bundle-id X --on B1` reuses the most recent snapshot for `X` when `--snapshot-id` is omitted.
## Visualizer integration
Visualizer must be user-toggleable via a Clawdis setting.
Implementation sketch:
- Add a Clawdis UserDefaults-backed setting (e.g. `clawdis.ui.visualizerEnabled`).
- Implement Peekaboo’s `VisualizerSettingsProviding` in Clawdis (`visualizerEnabled`, animation speed, and per-effect toggles).
- Create a Clawdis-specific `AutomationFeedbackClient` that forwards PeekabooAutomationKit feedback events into a shared `VisualizerCoordinator`.
Current state:
-`PeekabooVisualizer` already includes the visualization implementation (SwiftUI overlay views + coordinator).
The visualizer is intentionally display-only (no clickable overlays needed).
Keep visualizations in **Peekaboo.app** for now.
- Clawdis hosts the bridge, but does not render overlays.
- Any “visualizer enabled/disabled” setting is controlled in Peekaboo.app.
## Screenshots (legacy → Peekaboo takeover)
Clawdis uses `clawdis-mac ui screenshot` and returns a file path (default location: temp directory) instead of raw image bytes.
Migration plan:
-Replace capture implementation with PeekabooAutomationKit’s capture service so we share:
-per-screen mapping
- window/app targeting
- visual feedback (flash / watch HUD) when enabled
- Keep writing images to a file path on the app side and returning the path (text-friendly), with `--json` providing the structured metadata.
- No aliases: remove the old `Request.screenshot` and introduce a new `Request.uiScreenshot` (or similar) so the new behavior is explicit and there’s no “legacy mode” to maintain.
-Bridge host performs capture and returns a temp file path.
-No legacy aliases; make the old screenshot surface disappear cleanly.
## Permissions behavior
If required permissions are missing:
@@ -122,17 +153,32 @@ If required permissions are missing:
- do not try to open System Settings from the automation endpoint
## Security (socket auth)
Clawdis’ socket is protected by:
Both hosts must enforce:
- filesystem perms on the socket path (owner read/write only)
- server-side caller check:
- requires the caller’s code signature TeamID to be `Y5PE65HELJ`
-in `DEBUG` builds only, an explicit escape hatch allows same-UID clients when `CLAWDIS_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` is set (development convenience)
- server-side caller validation:
- require the caller’s code signature TeamID to be `Y5PE65HELJ`
-optional bundle-id allowlist for tighter scoping
This ensures “any local process” can’t drive the privileged surface just because it runs under the same macOS user.
summary: "macOS XPC architecture for Clawdis app, CLI helper, and gateway bridge"
summary: "macOS IPC architecture for Clawdis app, CLI helper, and gateway bridge (control socket + XPC + PeekabooBridge)"
read_when:
- Editing XPC contracts or menu bar app IPC
- Editing IPC contracts or menu bar app IPC
---
# Clawdis macOS XPC architecture (Dec 2025)
# Clawdis macOS IPC architecture (Dec 2025)
Note: the current implementation primarily uses a local UNIX-domain control socket (`controlSocketPath`) between `clawdis-mac` and the app. This doc describes the intended long-term XPC/Mach-service architecture and the security constraints; update it as the implementation converges.
Note: the current implementation primarily uses a local UNIX-domain control socket (`controlSocketPath`) between `clawdis-mac` and the app. This doc captures the intended long-term Mach/XPC direction and the security constraints, and also documents the separate PeekabooBridge socket used for UI automation.
## Goals
- Single GUI app instance that owns all TCC-facing work (notifications, screen recording, mic, speech, AppleScript).
- A small surface for automation: the `clawdis-mac` CLI and the Node gateway talk to the app via a local XPC channel.
- A small surface for automation: the `clawdis-mac` CLI and the Node gateway talk to the app via local IPC.
- Predictable permissions: always the same signed bundle ID, launched by launchd, so TCC grants stick.
- Limit who can connect: only signed clients from our team (with an explicit DEBUG-only escape hatch for development).
## How it works
### Control socket (current)
-`clawdis-mac` talks to the app via a local UNIX socket (`controlSocketPath`) for app-specific requests (notify, status, ensure-permissions, run, etc.).
### PeekabooBridge (UI automation)
- UI automation uses a separate UNIX socket named `bridge.sock` and the PeekabooBridge JSON protocol.
- Host preference order (client-side): Peekaboo.app → Clawdis.app → local execution.
- Security: bridge hosts require TeamID `Y5PE65HELJ`; DEBUG-only same-UID escape hatch is guarded by `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` (Peekaboo convention).
- See: `docs/mac/peekaboo.md` for the Clawdis plan and naming.
### Mach/XPC (future direction)
- The app registers a Mach service named `com.steipete.clawdis.xpc` via a user LaunchAgent at `~/Library/LaunchAgents/com.steipete.clawdis.plist`.
- The launch agent runs `dist/Clawdis.app/Contents/MacOS/Clawdis` with `RunAtLoad=true`, `KeepAlive=false`, and a `MachServices` entry for the XPC name.
- The app hosts the XPC listener (`NSXPCListener(machServiceName:)`) and exports `ClawdisXPCService`.
@@ -35,6 +45,8 @@ Note: the current implementation primarily uses a local UNIX-domain control sock
- RunAtLoad without KeepAlive means the app starts once; if it crashes it stays down (no unwanted respawn), but CLI calls will re-spawn via launchd.
## Hardening notes
- Prefer requiring a TeamID match for all privileged surfaces. The codebase currently has a `DEBUG`-only same-UID escape hatch gated behind `CLAWDIS_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` for local development.
- Prefer requiring a TeamID match for all privileged surfaces.
- Clawdis control socket: `CLAWDIS_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` (DEBUG-only) may allow same-UID callers for local development.
- PeekabooBridge: `PEEKABOO_ALLOW_UNSIGNED_SOCKET_CLIENTS=1` (DEBUG-only) may allow same-UID callers for local development.
- All communication remains local-only; no network sockets are exposed.
- TCC prompts originate only from the GUI app bundle; run scripts/package-mac-app.sh so the signed bundle ID stays stable.
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.