From 46b9006de2d7ce8fbc2f8ba04fe842ccc8a81fe7 Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Fri, 19 Dec 2025 23:57:35 +0000 Subject: [PATCH] docs(browser): add MCP tool spec --- docs/AGENTS.default.md | 1 + docs/browser.md | 228 +++++++++++++++++++++++++++++++++++++++++ docs/mac/browser.md | 156 +--------------------------- 3 files changed, 232 insertions(+), 153 deletions(-) create mode 100644 docs/browser.md diff --git a/docs/AGENTS.default.md b/docs/AGENTS.default.md index 45159291d..2c49c6edc 100644 --- a/docs/AGENTS.default.md +++ b/docs/AGENTS.default.md @@ -97,3 +97,4 @@ git commit -m "Add Clawd workspace" - Keep heartbeats enabled so the assistant can schedule reminders, monitor inboxes, and trigger camera captures. - For browser-driven verification, use `clawdis browser` (tabs/status/screenshot) with the clawd-managed Chrome profile. - For DOM inspection, use `clawdis browser eval|query|dom|snapshot` (and `--json`/`--out` when you need machine output). +- For advanced actions, use `clawdis browser tool browser_* --args '{...}'` (Playwright MCP parity). diff --git a/docs/browser.md b/docs/browser.md new file mode 100644 index 000000000..959327c1f --- /dev/null +++ b/docs/browser.md @@ -0,0 +1,228 @@ +--- +summary: "Spec: integrated browser control server + MCP tool dispatch" +read_when: + - Adding agent-controlled browser automation + - Debugging why clawd is interfering with your own Chrome + - Implementing browser settings + lifecycle in the macOS app +--- + +# Browser (integrated) — clawd-managed Chrome + +Status: draft spec · Date: 2025-12-19 + +Goal: give the **clawd** persona its own browser that is: +- Visually distinct (lobster-orange, profile labeled "clawd"). +- Fully agent-manageable (start/stop, list tabs, focus/close tabs, open URLs, screenshot). +- Non-interfering with the user's own browser (separate profile + dedicated ports). + +This doc covers the macOS app/gateway side. It intentionally does not mandate +Playwright vs Puppeteer; the key is the **contract** and the **separation guarantees**. + +## User-facing settings + +Add a dedicated settings section (preferably under **Tools** or its own "Browser" tab): + +- **Enable clawd browser** (`default: on`) + - When off: no browser is launched, and browser tools return "disabled". +- **Browser control URL** (`default: http://127.0.0.1:18791`) + - Interpreted as the base URL of the local/remote browser-control server. + - If the URL host is not loopback, Clawdis must **not** attempt to launch a local + browser; it only connects. +- **Accent color** (`default: #FF4500`, "lobster-orange") + - Used to theme the clawd browser profile (best-effort) and to tint UI indicators + in Clawdis. + +Optional (advanced, can be hidden behind Debug initially): +- **Use headless browser** (`default: off`) +- **Attach to existing only** (`default: off`) — if on, never launch; only connect if + already running. + +### Port convention + +Clawdis already uses: +- Gateway WebSocket: `18789` +- Bridge (voice/node): `18790` + +For the clawd browser-control server, use "family" ports: +- Browser control HTTP API: `18791` (bridge + 1) +- Browser CDP/debugging port: `18792` (control + 1) +- Canvas host HTTP (optional): `18793` (next free port; see `docs/configuration.md`) + +The user usually only configures the **control URL** (port `18791`). CDP is an +internal detail. + +## Browser isolation guarantees (non-negotiable) + +1) **Dedicated user data dir** + - Never attach to or reuse the user's default Chrome profile. + - Store clawd browser state under an app-owned directory, e.g.: + - `~/Library/Application Support/Clawdis/browser/clawd/` (mac app) + - or `~/.clawdis/browser/clawd/` (gateway/CLI) + +2) **Dedicated ports** + - Never use `9222` (reserved for ad-hoc dev workflows; avoids colliding with + `agent-tools/browser-tools`). + - Default ports are `18791/18792` unless overridden. + +3) **Named tab/page management** + - The agent must be able to enumerate and target tabs deterministically (by + stable `targetId` or equivalent), not "last tab". + +## Browser selection (macOS) + +On startup (when enabled + local URL), Clawdis chooses the browser executable +in this order: +1) **Google Chrome Canary** (if installed) +2) **Chromium** (if installed) +3) **Google Chrome** (fallback) + +Implementation detail: detection is by existence of the `.app` bundle under +`/Applications` (and optionally `~/Applications`), then using the resolved +executable path. + +Rationale: +- Canary/Chromium are easy to visually distinguish from the user's daily driver. +- Chrome fallback ensures the feature works on a stock machine. + +## Visual differentiation ("lobster-orange") + +The clawd browser should be obviously different at a glance: +- Profile name: **clawd** +- Profile color: **#FF4500** + +Preferred behavior: +- Seed/patch the profile's preferences on first launch so the color + name persist. + +Fallback behavior: +- If preferences patching is not reliable, open with the dedicated profile and let + the user set the profile color/name once via Chrome UI; it must persist because + the `userDataDir` is persistent. + +## Control server contract (proposed) + +Expose a small local HTTP API (and/or gateway RPC surface) so the agent can manage +state without touching the user's Chrome. + +Minimum endpoints/methods (names illustrative): + +- `browser.status` + - returns: `{ enabled, url, running, pid?, version?, chosenBrowser?, userDataDir?, ports: { control, cdp } }` +- `browser.start` + - starts the browser-control server + browser (no-op if already running) +- `browser.stop` + - stops the server and closes the clawd browser (best-effort; graceful first, then force if needed) +- `browser.tabs.list` + - returns: array of `{ targetId, title, url, isActive, lastFocusedAt? }` +- `browser.tabs.open` + - params: `{ url, newTab?: true }` → returns `{ targetId }` +- `browser.tabs.focus` + - params: `{ targetId }` +- `browser.tabs.close` + - params: `{ targetId }` +- `browser.screenshot` + - params: `{ targetId?, fullPage?: false }` → returns a `MEDIA:` attachment URL (via the existing Clawdis media host) + +DOM + inspection (v1): +- `browser.eval` + - params: `{ js, targetId?, await?: false }` → returns the CDP `Runtime.evaluate` result (best-effort `returnByValue`) +- `browser.query` + - params: `{ selector, targetId?, limit? }` → returns basic element summaries (tag/id/class/text/value/href/outerHTML) +- `browser.dom` + - params: `{ format: "html"|"text", targetId?, selector?, maxChars? }` → returns a truncated dump (`text` field) +- `browser.snapshot` + - params: `{ format: "aria"|"domSnapshot", targetId?, limit? }` + - `aria`: simplified Accessibility tree with `backendDOMNodeId` when available (future click/type hooks) + - `domSnapshot`: lightweight DOM walk snapshot (tree-ish, bounded by `limit`) + +Nice-to-have (later): +- `browser.click` / `browser.type` / `browser.waitFor` helpers built atop snapshot refs / backend node ids +- `browser.tool` dispatch that mirrors Playwright MCP tool names for quick feature parity + +### "Is it open or closed?" + +"Open" means: +- the control server is reachable at the configured URL **and** +- it reports a live browser connection. + +"Closed" means: +- control server not reachable, or server reports no browser. + +Clawdis should treat "open/closed" as a health check (fast path), not by scanning +global Chrome processes (avoid false positives). + +## Interaction with the agent (clawd) + +The agent should use browser tools only when: +- enabled in settings +- control URL is configured + +If disabled, tools must fail fast with a friendly error ("Browser disabled in settings"). + +The agent should not assume tabs are ephemeral. It should: +- call `browser.tabs.list` to discover existing tabs first +- reuse an existing tab when appropriate (e.g. a persistent "main" tab) +- avoid opening duplicate tabs unless asked + +## Tool dispatch (Playwright MCP parity) + +Clawdis exposes a generic tool dispatcher for Playwright MCP-style tools: + +`POST /tool` with JSON `{ name: "browser_*", args: { ... }, targetId?: "..." }` + +CLI helper: +`clawdis browser tool browser_* --args '{...}'` + +Supported tool names: +- `browser_close` +- `browser_resize` +- `browser_console_messages` +- `browser_network_requests` +- `browser_handle_dialog` +- `browser_evaluate` +- `browser_file_upload` +- `browser_fill_form` +- `browser_install` (no-op; uses system Chrome/Chromium) +- `browser_press_key` +- `browser_type` +- `browser_navigate` +- `browser_navigate_back` +- `browser_run_code` +- `browser_take_screenshot` +- `browser_snapshot` +- `browser_click` +- `browser_drag` +- `browser_hover` +- `browser_select_option` +- `browser_tabs` +- `browser_wait_for` +- `browser_pdf_save` +- `browser_start_tracing` +- `browser_stop_tracing` +- `browser_verify_element_visible` +- `browser_verify_text_visible` +- `browser_verify_list_visible` +- `browser_verify_value` +- `browser_mouse_move_xy` +- `browser_mouse_click_xy` +- `browser_mouse_drag_xy` +- `browser_generate_locator` + +Notes: +- `browser_file_upload` and `browser_handle_dialog` are modal-only; they only + work when a file chooser/dialog modal state is present. +- `browser_snapshot` returns a Playwright-for-AI snapshot (use for follow-up actions). + +## Security & privacy notes + +- The clawd browser profile is app-owned; it may contain logged-in sessions. + Treat it as sensitive data. +- The control server must bind to loopback only by default (`127.0.0.1`) unless the + user explicitly configures a non-loopback URL. +- Never reuse or copy the user's default Chrome profile. + +## Non-goals (for the first cut) + +- Cross-device "sync" of tabs between Mac and Pi. +- Sharing the user's logged-in Chrome sessions automatically. +- General-purpose web scraping; this is primarily for "close-the-loop" verification + and interaction. diff --git a/docs/mac/browser.md b/docs/mac/browser.md index c7b12b95c..31608bb95 100644 --- a/docs/mac/browser.md +++ b/docs/mac/browser.md @@ -1,161 +1,11 @@ --- -summary: "Spec: clawd-managed Chrome/Chromium instance (separate profile, lobster-orange, tab management)" +summary: "Redirect: mac/browser.md → browser.md" read_when: - Adding agent-controlled browser automation - Debugging why clawd is interfering with your own Chrome - Implementing browser settings + lifecycle in the macOS app --- -# Browser (macOS app) — clawd-managed Chrome +# Browser (macOS app) — moved -Status: draft spec · Date: 2025-12-13 - -Goal: give the **clawd** persona its own browser that is: -- Visually distinct (lobster-orange, profile labeled “clawd”). -- Fully agent-manageable (start/stop, list tabs, focus/close tabs, open URLs, screenshot). -- Non-interfering with the user’s own browser (separate profile + dedicated ports). - -This doc covers the macOS app/gateway side. It intentionally does not mandate Playwright vs Puppeteer yet; the key is the **contract** and the **separation guarantees**. - -## User-facing settings - -Add a dedicated settings section (preferably under **Tools** or its own “Browser” tab): - -- **Enable clawd browser** (`default: on`) - - When off: no browser is launched, and browser tools return “disabled”. -- **Browser control URL** (`default: http://127.0.0.1:18791`) - - Interpreted as the base URL of the local/remote browser-control server. - - If the URL host is not loopback, Clawdis must **not** attempt to launch a local browser; it only connects. -- **Accent color** (`default: #FF4500`, “lobster-orange”) - - Used to theme the clawd browser profile (best-effort) and to tint UI indicators in Clawdis. - -Optional (advanced, can be hidden behind Debug initially): -- **Use headless browser** (`default: off`) -- **Attach to existing only** (`default: off`) — if on, never launch; only connect if already running. - -### Port convention - -Clawdis already uses: -- Gateway WebSocket: `18789` -- Bridge (voice/node): `18790` - -For the clawd browser-control server, use “family” ports: -- Browser control HTTP API: `18791` (bridge + 1) -- Browser CDP/debugging port: `18792` (control + 1) -- Canvas host HTTP (optional): `18793` (next free port; see `docs/configuration.md`) - -The user usually only configures the **control URL** (port `18791`). CDP is an internal detail. - -## Browser isolation guarantees (non-negotiable) - -1) **Dedicated user data dir** - - Never attach to or reuse the user’s default Chrome profile. - - Store clawd browser state under an app-owned directory, e.g.: - - `~/Library/Application Support/Clawdis/browser/clawd/` (mac app) - - or `~/.clawdis/browser/clawd/` (gateway/CLI) - -2) **Dedicated ports** - - Never use `9222` (reserved for ad-hoc dev workflows; avoids colliding with `agent-tools/browser-tools`). - - Default ports are `18791/18792` unless overridden. - -3) **Named tab/page management** - - The agent must be able to enumerate and target tabs deterministically (by stable `targetId` or equivalent), not “last tab”. - -## Browser selection (macOS) - -On startup (when enabled + local URL), Clawdis chooses the browser executable in this order: -1) **Google Chrome Canary** (if installed) -2) **Chromium** (if installed) -3) **Google Chrome** (fallback) - -Implementation detail: detection is by existence of the `.app` bundle under `/Applications` (and optionally `~/Applications`), then using the resolved executable path. - -Rationale: -- Canary/Chromium are easy to visually distinguish from the user’s daily driver. -- Chrome fallback ensures the feature works on a stock machine. - -## Visual differentiation (“lobster-orange”) - -The clawd browser should be obviously different at a glance: -- Profile name: **clawd** -- Profile color: **#FF4500** - -Preferred behavior: -- Seed/patch the profile’s preferences on first launch so the color + name persist. - -Fallback behavior: -- If preferences patching is not reliable, open with the dedicated profile and let the user set the profile color/name once via Chrome UI; it must persist because the `userDataDir` is persistent. - -## Control server contract (proposed) - -Expose a small local HTTP API (and/or gateway RPC surface) so the agent can manage state without touching the user’s Chrome. - -Minimum endpoints/methods (names illustrative): - -- `browser.status` - - returns: `{ enabled, url, running, pid?, version?, chosenBrowser?, userDataDir?, ports: { control, cdp } }` -- `browser.start` - - starts the browser-control server + browser (no-op if already running) -- `browser.stop` - - stops the server and closes the clawd browser (best-effort; graceful first, then force if needed) -- `browser.tabs.list` - - returns: array of `{ targetId, title, url, isActive, lastFocusedAt? }` -- `browser.tabs.open` - - params: `{ url, newTab?: true }` → returns `{ targetId }` -- `browser.tabs.focus` - - params: `{ targetId }` -- `browser.tabs.close` - - params: `{ targetId }` -- `browser.screenshot` - - params: `{ targetId?, fullPage?: false }` → returns a `MEDIA:` attachment URL (via the existing Clawdis media host) - -DOM + inspection (v1): -- `browser.eval` - - params: `{ js, targetId?, await?: false }` → returns the CDP `Runtime.evaluate` result (best-effort `returnByValue`) -- `browser.query` - - params: `{ selector, targetId?, limit? }` → returns basic element summaries (tag/id/class/text/value/href/outerHTML) -- `browser.dom` - - params: `{ format: "html"|"text", targetId?, selector?, maxChars? }` → returns a truncated dump (`text` field) -- `browser.snapshot` - - params: `{ format: "aria"|"domSnapshot", targetId?, limit? }` - - `aria`: simplified Accessibility tree with `backendDOMNodeId` when available (future click/type hooks) - - `domSnapshot`: lightweight DOM walk snapshot (tree-ish, bounded by `limit`) - -Nice-to-have (later): -- `browser.click` / `browser.type` / `browser.waitFor` helpers built atop snapshot refs / backend node ids - -### “Is it open or closed?” - -“Open” means: -- the control server is reachable at the configured URL **and** -- it reports a live browser connection. - -“Closed” means: -- control server not reachable, or server reports no browser. - -Clawdis should treat “open/closed” as a health check (fast path), not by scanning global Chrome processes (avoid false positives). - -## Interaction with the agent (clawd) - -The agent should use browser tools only when: -- enabled in settings -- control URL is configured - -If disabled, tools must fail fast with a friendly error (“Browser disabled in settings”). - -The agent should not assume tabs are ephemeral. It should: -- call `browser.tabs.list` to discover existing tabs first -- reuse an existing tab when appropriate (e.g. a persistent “main” tab) -- avoid opening duplicate tabs unless asked - -## Security & privacy notes - -- The clawd browser profile is app-owned; it may contain logged-in sessions. Treat it as sensitive data. -- The control server must bind to loopback only by default (`127.0.0.1`) unless the user explicitly configures a non-loopback URL. -- Never reuse or copy the user’s default Chrome profile. - -## Non-goals (for the first cut) - -- Cross-device “sync” of tabs between Mac and Pi. -- Sharing the user’s logged-in Chrome sessions automatically. -- General-purpose web scraping; this is primarily for “close-the-loop” verification and interaction. +This doc moved to `docs/browser.md`.