Files
clawdbot/docs/mac/browser.md
2025-12-13 18:33:04 +00:00

162 lines
6.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
summary: "Spec: clawd-managed Chrome/Chromium instance (separate profile, lobster-orange, tab management)"
read_when:
- Adding agent-controlled browser automation
- Debugging why clawd is interfering with your own Chrome
- Implementing browser settings + lifecycle in the macOS app
---
# Browser (macOS app) — clawd-managed Chrome
Status: draft spec · Date: 2025-12-13
Goal: give the **clawd** persona its own browser that is:
- Visually distinct (lobster-orange, profile labeled “clawd”).
- Fully agent-manageable (start/stop, list tabs, focus/close tabs, open URLs, screenshot).
- Non-interfering with the users own browser (separate profile + dedicated ports).
This doc covers the macOS app/gateway side. It intentionally does not mandate Playwright vs Puppeteer yet; the key is the **contract** and the **separation guarantees**.
## User-facing settings
Add a dedicated settings section (preferably under **Tools** or its own “Browser” tab):
- **Enable clawd browser** (`default: on`)
- When off: no browser is launched, and browser tools return “disabled”.
- **Browser control URL** (`default: http://127.0.0.1:18791`)
- Interpreted as the base URL of the local/remote browser-control server.
- If the URL host is not loopback, Clawdis must **not** attempt to launch a local browser; it only connects.
- **Accent color** (`default: #FF4500`, “lobster-orange”)
- Used to theme the clawd browser profile (best-effort) and to tint UI indicators in Clawdis.
Optional (advanced, can be hidden behind Debug initially):
- **Use headless browser** (`default: off`)
- **Attach to existing only** (`default: off`) — if on, never launch; only connect if already running.
### Port convention
Clawdis already uses:
- Gateway WebSocket: `18789`
- WebChat HTTP: `18788`
- Bridge (voice/iris): `18790`
For the clawd browser-control server, use “family” ports:
- Browser control HTTP API: `18791` (bridge + 1)
- Browser CDP/debugging port: `18792` (control + 1)
The user usually only configures the **control URL** (port `18791`). CDP is an internal detail.
## Browser isolation guarantees (non-negotiable)
1) **Dedicated user data dir**
- Never attach to or reuse the users default Chrome profile.
- Store clawd browser state under an app-owned directory, e.g.:
- `~/Library/Application Support/Clawdis/browser/clawd/` (mac app)
- or `~/.clawdis/browser/clawd/` (gateway/CLI)
2) **Dedicated ports**
- Never use `9222` (reserved for ad-hoc dev workflows; avoids colliding with `agent-tools/browser-tools`).
- Default ports are `18791/18792` unless overridden.
3) **Named tab/page management**
- The agent must be able to enumerate and target tabs deterministically (by stable `targetId` or equivalent), not “last tab”.
## Browser selection (macOS)
On startup (when enabled + local URL), Clawdis chooses the browser executable in this order:
1) **Google Chrome Canary** (if installed)
2) **Chromium** (if installed)
3) **Google Chrome** (fallback)
Implementation detail: detection is by existence of the `.app` bundle under `/Applications` (and optionally `~/Applications`), then using the resolved executable path.
Rationale:
- Canary/Chromium are easy to visually distinguish from the users daily driver.
- Chrome fallback ensures the feature works on a stock machine.
## Visual differentiation (“lobster-orange”)
The clawd browser should be obviously different at a glance:
- Profile name: **clawd**
- Profile color: **#FF4500**
Preferred behavior:
- Seed/patch the profiles preferences on first launch so the color + name persist.
Fallback behavior:
- If preferences patching is not reliable, open with the dedicated profile and let the user set the profile color/name once via Chrome UI; it must persist because the `userDataDir` is persistent.
## Control server contract (proposed)
Expose a small local HTTP API (and/or gateway RPC surface) so the agent can manage state without touching the users Chrome.
Minimum endpoints/methods (names illustrative):
- `browser.status`
- returns: `{ enabled, url, running, pid?, version?, chosenBrowser?, userDataDir?, ports: { control, cdp } }`
- `browser.start`
- starts the browser-control server + browser (no-op if already running)
- `browser.stop`
- stops the server and closes the clawd browser (best-effort; graceful first, then force if needed)
- `browser.tabs.list`
- returns: array of `{ targetId, title, url, isActive, lastFocusedAt? }`
- `browser.tabs.open`
- params: `{ url, newTab?: true }` → returns `{ targetId }`
- `browser.tabs.focus`
- params: `{ targetId }`
- `browser.tabs.close`
- params: `{ targetId }`
- `browser.screenshot`
- params: `{ targetId?, fullPage?: false }` → returns a `MEDIA:` attachment URL (via the existing Clawdis media host)
DOM + inspection (v1):
- `browser.eval`
- params: `{ js, targetId?, await?: false }` → returns the CDP `Runtime.evaluate` result (best-effort `returnByValue`)
- `browser.query`
- params: `{ selector, targetId?, limit? }` → returns basic element summaries (tag/id/class/text/value/href/outerHTML)
- `browser.dom`
- params: `{ format: "html"|"text", targetId?, selector?, maxChars? }` → returns a truncated dump (`text` field)
- `browser.snapshot`
- params: `{ format: "aria"|"domSnapshot", targetId?, limit? }`
- `aria`: simplified Accessibility tree with `backendDOMNodeId` when available (future click/type hooks)
- `domSnapshot`: lightweight DOM walk snapshot (tree-ish, bounded by `limit`)
Nice-to-have (later):
- `browser.click` / `browser.type` / `browser.waitFor` helpers built atop snapshot refs / backend node ids
### “Is it open or closed?”
“Open” means:
- the control server is reachable at the configured URL **and**
- it reports a live browser connection.
“Closed” means:
- control server not reachable, or server reports no browser.
Clawdis should treat “open/closed” as a health check (fast path), not by scanning global Chrome processes (avoid false positives).
## Interaction with the agent (clawd)
The agent should use browser tools only when:
- enabled in settings
- control URL is configured
If disabled, tools must fail fast with a friendly error (“Browser disabled in settings”).
The agent should not assume tabs are ephemeral. It should:
- call `browser.tabs.list` to discover existing tabs first
- reuse an existing tab when appropriate (e.g. a persistent “main” tab)
- avoid opening duplicate tabs unless asked
## Security & privacy notes
- The clawd browser profile is app-owned; it may contain logged-in sessions. Treat it as sensitive data.
- The control server must bind to loopback only by default (`127.0.0.1`) unless the user explicitly configures a non-loopback URL.
- Never reuse or copy the users default Chrome profile.
## Non-goals (for the first cut)
- Cross-device “sync” of tabs between Mac and Pi.
- Sharing the users logged-in Chrome sessions automatically.
- General-purpose web scraping; this is primarily for “close-the-loop” verification and interaction.