docs(browser): add MCP tool spec
This commit is contained in:
@@ -97,3 +97,4 @@ git commit -m "Add Clawd workspace"
|
||||
- Keep heartbeats enabled so the assistant can schedule reminders, monitor inboxes, and trigger camera captures.
|
||||
- For browser-driven verification, use `clawdis browser` (tabs/status/screenshot) with the clawd-managed Chrome profile.
|
||||
- For DOM inspection, use `clawdis browser eval|query|dom|snapshot` (and `--json`/`--out` when you need machine output).
|
||||
- For advanced actions, use `clawdis browser tool browser_* --args '{...}'` (Playwright MCP parity).
|
||||
|
||||
228
docs/browser.md
Normal file
228
docs/browser.md
Normal file
@@ -0,0 +1,228 @@
|
||||
---
|
||||
summary: "Spec: integrated browser control server + MCP tool dispatch"
|
||||
read_when:
|
||||
- Adding agent-controlled browser automation
|
||||
- Debugging why clawd is interfering with your own Chrome
|
||||
- Implementing browser settings + lifecycle in the macOS app
|
||||
---
|
||||
|
||||
# Browser (integrated) — clawd-managed Chrome
|
||||
|
||||
Status: draft spec · Date: 2025-12-19
|
||||
|
||||
Goal: give the **clawd** persona its own browser that is:
|
||||
- Visually distinct (lobster-orange, profile labeled "clawd").
|
||||
- Fully agent-manageable (start/stop, list tabs, focus/close tabs, open URLs, screenshot).
|
||||
- Non-interfering with the user's own browser (separate profile + dedicated ports).
|
||||
|
||||
This doc covers the macOS app/gateway side. It intentionally does not mandate
|
||||
Playwright vs Puppeteer; the key is the **contract** and the **separation guarantees**.
|
||||
|
||||
## User-facing settings
|
||||
|
||||
Add a dedicated settings section (preferably under **Tools** or its own "Browser" tab):
|
||||
|
||||
- **Enable clawd browser** (`default: on`)
|
||||
- When off: no browser is launched, and browser tools return "disabled".
|
||||
- **Browser control URL** (`default: http://127.0.0.1:18791`)
|
||||
- Interpreted as the base URL of the local/remote browser-control server.
|
||||
- If the URL host is not loopback, Clawdis must **not** attempt to launch a local
|
||||
browser; it only connects.
|
||||
- **Accent color** (`default: #FF4500`, "lobster-orange")
|
||||
- Used to theme the clawd browser profile (best-effort) and to tint UI indicators
|
||||
in Clawdis.
|
||||
|
||||
Optional (advanced, can be hidden behind Debug initially):
|
||||
- **Use headless browser** (`default: off`)
|
||||
- **Attach to existing only** (`default: off`) — if on, never launch; only connect if
|
||||
already running.
|
||||
|
||||
### Port convention
|
||||
|
||||
Clawdis already uses:
|
||||
- Gateway WebSocket: `18789`
|
||||
- Bridge (voice/node): `18790`
|
||||
|
||||
For the clawd browser-control server, use "family" ports:
|
||||
- Browser control HTTP API: `18791` (bridge + 1)
|
||||
- Browser CDP/debugging port: `18792` (control + 1)
|
||||
- Canvas host HTTP (optional): `18793` (next free port; see `docs/configuration.md`)
|
||||
|
||||
The user usually only configures the **control URL** (port `18791`). CDP is an
|
||||
internal detail.
|
||||
|
||||
## Browser isolation guarantees (non-negotiable)
|
||||
|
||||
1) **Dedicated user data dir**
|
||||
- Never attach to or reuse the user's default Chrome profile.
|
||||
- Store clawd browser state under an app-owned directory, e.g.:
|
||||
- `~/Library/Application Support/Clawdis/browser/clawd/` (mac app)
|
||||
- or `~/.clawdis/browser/clawd/` (gateway/CLI)
|
||||
|
||||
2) **Dedicated ports**
|
||||
- Never use `9222` (reserved for ad-hoc dev workflows; avoids colliding with
|
||||
`agent-tools/browser-tools`).
|
||||
- Default ports are `18791/18792` unless overridden.
|
||||
|
||||
3) **Named tab/page management**
|
||||
- The agent must be able to enumerate and target tabs deterministically (by
|
||||
stable `targetId` or equivalent), not "last tab".
|
||||
|
||||
## Browser selection (macOS)
|
||||
|
||||
On startup (when enabled + local URL), Clawdis chooses the browser executable
|
||||
in this order:
|
||||
1) **Google Chrome Canary** (if installed)
|
||||
2) **Chromium** (if installed)
|
||||
3) **Google Chrome** (fallback)
|
||||
|
||||
Implementation detail: detection is by existence of the `.app` bundle under
|
||||
`/Applications` (and optionally `~/Applications`), then using the resolved
|
||||
executable path.
|
||||
|
||||
Rationale:
|
||||
- Canary/Chromium are easy to visually distinguish from the user's daily driver.
|
||||
- Chrome fallback ensures the feature works on a stock machine.
|
||||
|
||||
## Visual differentiation ("lobster-orange")
|
||||
|
||||
The clawd browser should be obviously different at a glance:
|
||||
- Profile name: **clawd**
|
||||
- Profile color: **#FF4500**
|
||||
|
||||
Preferred behavior:
|
||||
- Seed/patch the profile's preferences on first launch so the color + name persist.
|
||||
|
||||
Fallback behavior:
|
||||
- If preferences patching is not reliable, open with the dedicated profile and let
|
||||
the user set the profile color/name once via Chrome UI; it must persist because
|
||||
the `userDataDir` is persistent.
|
||||
|
||||
## Control server contract (proposed)
|
||||
|
||||
Expose a small local HTTP API (and/or gateway RPC surface) so the agent can manage
|
||||
state without touching the user's Chrome.
|
||||
|
||||
Minimum endpoints/methods (names illustrative):
|
||||
|
||||
- `browser.status`
|
||||
- returns: `{ enabled, url, running, pid?, version?, chosenBrowser?, userDataDir?, ports: { control, cdp } }`
|
||||
- `browser.start`
|
||||
- starts the browser-control server + browser (no-op if already running)
|
||||
- `browser.stop`
|
||||
- stops the server and closes the clawd browser (best-effort; graceful first, then force if needed)
|
||||
- `browser.tabs.list`
|
||||
- returns: array of `{ targetId, title, url, isActive, lastFocusedAt? }`
|
||||
- `browser.tabs.open`
|
||||
- params: `{ url, newTab?: true }` → returns `{ targetId }`
|
||||
- `browser.tabs.focus`
|
||||
- params: `{ targetId }`
|
||||
- `browser.tabs.close`
|
||||
- params: `{ targetId }`
|
||||
- `browser.screenshot`
|
||||
- params: `{ targetId?, fullPage?: false }` → returns a `MEDIA:` attachment URL (via the existing Clawdis media host)
|
||||
|
||||
DOM + inspection (v1):
|
||||
- `browser.eval`
|
||||
- params: `{ js, targetId?, await?: false }` → returns the CDP `Runtime.evaluate` result (best-effort `returnByValue`)
|
||||
- `browser.query`
|
||||
- params: `{ selector, targetId?, limit? }` → returns basic element summaries (tag/id/class/text/value/href/outerHTML)
|
||||
- `browser.dom`
|
||||
- params: `{ format: "html"|"text", targetId?, selector?, maxChars? }` → returns a truncated dump (`text` field)
|
||||
- `browser.snapshot`
|
||||
- params: `{ format: "aria"|"domSnapshot", targetId?, limit? }`
|
||||
- `aria`: simplified Accessibility tree with `backendDOMNodeId` when available (future click/type hooks)
|
||||
- `domSnapshot`: lightweight DOM walk snapshot (tree-ish, bounded by `limit`)
|
||||
|
||||
Nice-to-have (later):
|
||||
- `browser.click` / `browser.type` / `browser.waitFor` helpers built atop snapshot refs / backend node ids
|
||||
- `browser.tool` dispatch that mirrors Playwright MCP tool names for quick feature parity
|
||||
|
||||
### "Is it open or closed?"
|
||||
|
||||
"Open" means:
|
||||
- the control server is reachable at the configured URL **and**
|
||||
- it reports a live browser connection.
|
||||
|
||||
"Closed" means:
|
||||
- control server not reachable, or server reports no browser.
|
||||
|
||||
Clawdis should treat "open/closed" as a health check (fast path), not by scanning
|
||||
global Chrome processes (avoid false positives).
|
||||
|
||||
## Interaction with the agent (clawd)
|
||||
|
||||
The agent should use browser tools only when:
|
||||
- enabled in settings
|
||||
- control URL is configured
|
||||
|
||||
If disabled, tools must fail fast with a friendly error ("Browser disabled in settings").
|
||||
|
||||
The agent should not assume tabs are ephemeral. It should:
|
||||
- call `browser.tabs.list` to discover existing tabs first
|
||||
- reuse an existing tab when appropriate (e.g. a persistent "main" tab)
|
||||
- avoid opening duplicate tabs unless asked
|
||||
|
||||
## Tool dispatch (Playwright MCP parity)
|
||||
|
||||
Clawdis exposes a generic tool dispatcher for Playwright MCP-style tools:
|
||||
|
||||
`POST /tool` with JSON `{ name: "browser_*", args: { ... }, targetId?: "..." }`
|
||||
|
||||
CLI helper:
|
||||
`clawdis browser tool browser_* --args '{...}'`
|
||||
|
||||
Supported tool names:
|
||||
- `browser_close`
|
||||
- `browser_resize`
|
||||
- `browser_console_messages`
|
||||
- `browser_network_requests`
|
||||
- `browser_handle_dialog`
|
||||
- `browser_evaluate`
|
||||
- `browser_file_upload`
|
||||
- `browser_fill_form`
|
||||
- `browser_install` (no-op; uses system Chrome/Chromium)
|
||||
- `browser_press_key`
|
||||
- `browser_type`
|
||||
- `browser_navigate`
|
||||
- `browser_navigate_back`
|
||||
- `browser_run_code`
|
||||
- `browser_take_screenshot`
|
||||
- `browser_snapshot`
|
||||
- `browser_click`
|
||||
- `browser_drag`
|
||||
- `browser_hover`
|
||||
- `browser_select_option`
|
||||
- `browser_tabs`
|
||||
- `browser_wait_for`
|
||||
- `browser_pdf_save`
|
||||
- `browser_start_tracing`
|
||||
- `browser_stop_tracing`
|
||||
- `browser_verify_element_visible`
|
||||
- `browser_verify_text_visible`
|
||||
- `browser_verify_list_visible`
|
||||
- `browser_verify_value`
|
||||
- `browser_mouse_move_xy`
|
||||
- `browser_mouse_click_xy`
|
||||
- `browser_mouse_drag_xy`
|
||||
- `browser_generate_locator`
|
||||
|
||||
Notes:
|
||||
- `browser_file_upload` and `browser_handle_dialog` are modal-only; they only
|
||||
work when a file chooser/dialog modal state is present.
|
||||
- `browser_snapshot` returns a Playwright-for-AI snapshot (use for follow-up actions).
|
||||
|
||||
## Security & privacy notes
|
||||
|
||||
- The clawd browser profile is app-owned; it may contain logged-in sessions.
|
||||
Treat it as sensitive data.
|
||||
- The control server must bind to loopback only by default (`127.0.0.1`) unless the
|
||||
user explicitly configures a non-loopback URL.
|
||||
- Never reuse or copy the user's default Chrome profile.
|
||||
|
||||
## Non-goals (for the first cut)
|
||||
|
||||
- Cross-device "sync" of tabs between Mac and Pi.
|
||||
- Sharing the user's logged-in Chrome sessions automatically.
|
||||
- General-purpose web scraping; this is primarily for "close-the-loop" verification
|
||||
and interaction.
|
||||
@@ -1,161 +1,11 @@
|
||||
---
|
||||
summary: "Spec: clawd-managed Chrome/Chromium instance (separate profile, lobster-orange, tab management)"
|
||||
summary: "Redirect: mac/browser.md → browser.md"
|
||||
read_when:
|
||||
- Adding agent-controlled browser automation
|
||||
- Debugging why clawd is interfering with your own Chrome
|
||||
- Implementing browser settings + lifecycle in the macOS app
|
||||
---
|
||||
|
||||
# Browser (macOS app) — clawd-managed Chrome
|
||||
# Browser (macOS app) — moved
|
||||
|
||||
Status: draft spec · Date: 2025-12-13
|
||||
|
||||
Goal: give the **clawd** persona its own browser that is:
|
||||
- Visually distinct (lobster-orange, profile labeled “clawd”).
|
||||
- Fully agent-manageable (start/stop, list tabs, focus/close tabs, open URLs, screenshot).
|
||||
- Non-interfering with the user’s own browser (separate profile + dedicated ports).
|
||||
|
||||
This doc covers the macOS app/gateway side. It intentionally does not mandate Playwright vs Puppeteer yet; the key is the **contract** and the **separation guarantees**.
|
||||
|
||||
## User-facing settings
|
||||
|
||||
Add a dedicated settings section (preferably under **Tools** or its own “Browser” tab):
|
||||
|
||||
- **Enable clawd browser** (`default: on`)
|
||||
- When off: no browser is launched, and browser tools return “disabled”.
|
||||
- **Browser control URL** (`default: http://127.0.0.1:18791`)
|
||||
- Interpreted as the base URL of the local/remote browser-control server.
|
||||
- If the URL host is not loopback, Clawdis must **not** attempt to launch a local browser; it only connects.
|
||||
- **Accent color** (`default: #FF4500`, “lobster-orange”)
|
||||
- Used to theme the clawd browser profile (best-effort) and to tint UI indicators in Clawdis.
|
||||
|
||||
Optional (advanced, can be hidden behind Debug initially):
|
||||
- **Use headless browser** (`default: off`)
|
||||
- **Attach to existing only** (`default: off`) — if on, never launch; only connect if already running.
|
||||
|
||||
### Port convention
|
||||
|
||||
Clawdis already uses:
|
||||
- Gateway WebSocket: `18789`
|
||||
- Bridge (voice/node): `18790`
|
||||
|
||||
For the clawd browser-control server, use “family” ports:
|
||||
- Browser control HTTP API: `18791` (bridge + 1)
|
||||
- Browser CDP/debugging port: `18792` (control + 1)
|
||||
- Canvas host HTTP (optional): `18793` (next free port; see `docs/configuration.md`)
|
||||
|
||||
The user usually only configures the **control URL** (port `18791`). CDP is an internal detail.
|
||||
|
||||
## Browser isolation guarantees (non-negotiable)
|
||||
|
||||
1) **Dedicated user data dir**
|
||||
- Never attach to or reuse the user’s default Chrome profile.
|
||||
- Store clawd browser state under an app-owned directory, e.g.:
|
||||
- `~/Library/Application Support/Clawdis/browser/clawd/` (mac app)
|
||||
- or `~/.clawdis/browser/clawd/` (gateway/CLI)
|
||||
|
||||
2) **Dedicated ports**
|
||||
- Never use `9222` (reserved for ad-hoc dev workflows; avoids colliding with `agent-tools/browser-tools`).
|
||||
- Default ports are `18791/18792` unless overridden.
|
||||
|
||||
3) **Named tab/page management**
|
||||
- The agent must be able to enumerate and target tabs deterministically (by stable `targetId` or equivalent), not “last tab”.
|
||||
|
||||
## Browser selection (macOS)
|
||||
|
||||
On startup (when enabled + local URL), Clawdis chooses the browser executable in this order:
|
||||
1) **Google Chrome Canary** (if installed)
|
||||
2) **Chromium** (if installed)
|
||||
3) **Google Chrome** (fallback)
|
||||
|
||||
Implementation detail: detection is by existence of the `.app` bundle under `/Applications` (and optionally `~/Applications`), then using the resolved executable path.
|
||||
|
||||
Rationale:
|
||||
- Canary/Chromium are easy to visually distinguish from the user’s daily driver.
|
||||
- Chrome fallback ensures the feature works on a stock machine.
|
||||
|
||||
## Visual differentiation (“lobster-orange”)
|
||||
|
||||
The clawd browser should be obviously different at a glance:
|
||||
- Profile name: **clawd**
|
||||
- Profile color: **#FF4500**
|
||||
|
||||
Preferred behavior:
|
||||
- Seed/patch the profile’s preferences on first launch so the color + name persist.
|
||||
|
||||
Fallback behavior:
|
||||
- If preferences patching is not reliable, open with the dedicated profile and let the user set the profile color/name once via Chrome UI; it must persist because the `userDataDir` is persistent.
|
||||
|
||||
## Control server contract (proposed)
|
||||
|
||||
Expose a small local HTTP API (and/or gateway RPC surface) so the agent can manage state without touching the user’s Chrome.
|
||||
|
||||
Minimum endpoints/methods (names illustrative):
|
||||
|
||||
- `browser.status`
|
||||
- returns: `{ enabled, url, running, pid?, version?, chosenBrowser?, userDataDir?, ports: { control, cdp } }`
|
||||
- `browser.start`
|
||||
- starts the browser-control server + browser (no-op if already running)
|
||||
- `browser.stop`
|
||||
- stops the server and closes the clawd browser (best-effort; graceful first, then force if needed)
|
||||
- `browser.tabs.list`
|
||||
- returns: array of `{ targetId, title, url, isActive, lastFocusedAt? }`
|
||||
- `browser.tabs.open`
|
||||
- params: `{ url, newTab?: true }` → returns `{ targetId }`
|
||||
- `browser.tabs.focus`
|
||||
- params: `{ targetId }`
|
||||
- `browser.tabs.close`
|
||||
- params: `{ targetId }`
|
||||
- `browser.screenshot`
|
||||
- params: `{ targetId?, fullPage?: false }` → returns a `MEDIA:` attachment URL (via the existing Clawdis media host)
|
||||
|
||||
DOM + inspection (v1):
|
||||
- `browser.eval`
|
||||
- params: `{ js, targetId?, await?: false }` → returns the CDP `Runtime.evaluate` result (best-effort `returnByValue`)
|
||||
- `browser.query`
|
||||
- params: `{ selector, targetId?, limit? }` → returns basic element summaries (tag/id/class/text/value/href/outerHTML)
|
||||
- `browser.dom`
|
||||
- params: `{ format: "html"|"text", targetId?, selector?, maxChars? }` → returns a truncated dump (`text` field)
|
||||
- `browser.snapshot`
|
||||
- params: `{ format: "aria"|"domSnapshot", targetId?, limit? }`
|
||||
- `aria`: simplified Accessibility tree with `backendDOMNodeId` when available (future click/type hooks)
|
||||
- `domSnapshot`: lightweight DOM walk snapshot (tree-ish, bounded by `limit`)
|
||||
|
||||
Nice-to-have (later):
|
||||
- `browser.click` / `browser.type` / `browser.waitFor` helpers built atop snapshot refs / backend node ids
|
||||
|
||||
### “Is it open or closed?”
|
||||
|
||||
“Open” means:
|
||||
- the control server is reachable at the configured URL **and**
|
||||
- it reports a live browser connection.
|
||||
|
||||
“Closed” means:
|
||||
- control server not reachable, or server reports no browser.
|
||||
|
||||
Clawdis should treat “open/closed” as a health check (fast path), not by scanning global Chrome processes (avoid false positives).
|
||||
|
||||
## Interaction with the agent (clawd)
|
||||
|
||||
The agent should use browser tools only when:
|
||||
- enabled in settings
|
||||
- control URL is configured
|
||||
|
||||
If disabled, tools must fail fast with a friendly error (“Browser disabled in settings”).
|
||||
|
||||
The agent should not assume tabs are ephemeral. It should:
|
||||
- call `browser.tabs.list` to discover existing tabs first
|
||||
- reuse an existing tab when appropriate (e.g. a persistent “main” tab)
|
||||
- avoid opening duplicate tabs unless asked
|
||||
|
||||
## Security & privacy notes
|
||||
|
||||
- The clawd browser profile is app-owned; it may contain logged-in sessions. Treat it as sensitive data.
|
||||
- The control server must bind to loopback only by default (`127.0.0.1`) unless the user explicitly configures a non-loopback URL.
|
||||
- Never reuse or copy the user’s default Chrome profile.
|
||||
|
||||
## Non-goals (for the first cut)
|
||||
|
||||
- Cross-device “sync” of tabs between Mac and Pi.
|
||||
- Sharing the user’s logged-in Chrome sessions automatically.
|
||||
- General-purpose web scraping; this is primarily for “close-the-loop” verification and interaction.
|
||||
This doc moved to `docs/browser.md`.
|
||||
|
||||
Reference in New Issue
Block a user