8.5 KiB
summary, read_when
| summary | read_when | |||
|---|---|---|---|---|
| Spec: integrated browser control server + action commands |
|
Browser (integrated) — clawd-managed Chrome
Status: draft spec · Date: 2025-12-19
Goal: give the clawd persona its own browser that is:
- Visually distinct (lobster-orange, profile labeled "clawd").
- Fully agent-manageable (start/stop, list tabs, focus/close tabs, open URLs, screenshot).
- Non-interfering with the user's own browser (separate profile + dedicated ports).
This doc covers the macOS app/gateway side. It intentionally does not mandate Playwright vs Puppeteer; the key is the contract and the separation guarantees.
User-facing settings
Add a dedicated settings section (preferably under Tools or its own "Browser" tab):
- Enable clawd browser (
default: on)- When off: no browser is launched, and browser tools return "disabled".
- Browser control URL (
default: http://127.0.0.1:18791)- Interpreted as the base URL of the local/remote browser-control server.
- If the URL host is not loopback, Clawdis must not attempt to launch a local browser; it only connects.
- Accent color (
default: #FF4500, "lobster-orange")- Used to theme the clawd browser profile (best-effort) and to tint UI indicators in Clawdis.
Optional (advanced, can be hidden behind Debug initially):
- Use headless browser (
default: off) - Attach to existing only (
default: off) — if on, never launch; only connect if already running.
Port convention
Clawdis already uses:
- Gateway WebSocket:
18789 - Bridge (voice/node):
18790
For the clawd browser-control server, use "family" ports:
- Browser control HTTP API:
18791(bridge + 1) - Browser CDP/debugging port:
18792(control + 1) - Canvas host HTTP (optional):
18793(next free port; seedocs/configuration.md)
The user usually only configures the control URL (port 18791). CDP is an
internal detail.
Browser isolation guarantees (non-negotiable)
-
Dedicated user data dir
- Never attach to or reuse the user's default Chrome profile.
- Store clawd browser state under an app-owned directory, e.g.:
~/Library/Application Support/Clawdis/browser/clawd/(mac app)- or
~/.clawdis/browser/clawd/(gateway/CLI)
-
Dedicated ports
- Never use
9222(reserved for ad-hoc dev workflows; avoids colliding withagent-tools/browser-tools). - Default ports are
18791/18792unless overridden.
- Never use
-
Named tab/page management
- The agent must be able to enumerate and target tabs deterministically (by
stable
targetIdor equivalent), not "last tab".
- The agent must be able to enumerate and target tabs deterministically (by
stable
Browser selection (macOS)
On startup (when enabled + local URL), Clawdis chooses the browser executable in this order:
- Google Chrome Canary (if installed)
- Chromium (if installed)
- Google Chrome (fallback)
Implementation detail: detection is by existence of the .app bundle under
/Applications (and optionally ~/Applications), then using the resolved
executable path.
Rationale:
- Canary/Chromium are easy to visually distinguish from the user's daily driver.
- Chrome fallback ensures the feature works on a stock machine.
Visual differentiation ("lobster-orange")
The clawd browser should be obviously different at a glance:
- Profile name: clawd
- Profile color: #FF4500
Preferred behavior:
- Seed/patch the profile's preferences on first launch so the color + name persist.
Fallback behavior:
- If preferences patching is not reliable, open with the dedicated profile and let
the user set the profile color/name once via Chrome UI; it must persist because
the
userDataDiris persistent.
Control server contract (current)
Expose a small local HTTP API (and/or gateway RPC surface) so the agent can manage state without touching the user's Chrome.
Basics:
GET /status payload (enabled/running/pid/cdpPort/etc)POST /startstart browserPOST /stopstop browserGET /tabslist tabsPOST /tabs/openopen a new tabPOST /tabs/focusfocus a tab by id/prefixDELETE /tabs/:targetIdclose a tab by id/prefixPOST /closeclose the current tab (optional targetId in body)
Inspection:
GET /screenshot(CDP screenshot)POST /screenshot(Playwright screenshot with ref/element)POST /eval(CDP evaluate)GET /queryGET /domGET /snapshot(aria|domSnapshot|ai)
Debug-only endpoints (intentionally omitted for now):
- network request log (privacy)
- tracing export (large + sensitive)
- locator generation (dev convenience)
Actions:
POST /navigate,POST /backPOST /resizePOST /click,POST /type,POST /press,POST /hover,POST /drag,POST /selectPOST /upload(file chooser modal must be open)POST /fill(JSON field descriptors)POST /dialog(alert/confirm/prompt)POST /wait(time/text/textGone)POST /evaluate(function + optional ref)POST /run(function(page) → result)GET /consolePOST /pdfPOST /verify/element,POST /verify/text,POST /verify/list,POST /verify/valuePOST /mouse/move,POST /mouse/click,POST /mouse/drag
"Is it open or closed?"
"Open" means:
- the control server is reachable at the configured URL and
- it reports a live browser connection.
"Closed" means:
- control server not reachable, or server reports no browser.
Clawdis should treat "open/closed" as a health check (fast path), not by scanning global Chrome processes (avoid false positives).
Interaction with the agent (clawd)
The agent should use browser tools only when:
- enabled in settings
- control URL is configured
If disabled, tools must fail fast with a friendly error ("Browser disabled in settings").
The agent should not assume tabs are ephemeral. It should:
- call
browser.tabs.listto discover existing tabs first - reuse an existing tab when appropriate (e.g. a persistent "main" tab)
- avoid opening duplicate tabs unless asked
CLI quick reference (one example each)
Basics:
clawdis browser statusclawdis browser startclawdis browser stopclawdis browser tabsclawdis browser open https://example.comclawdis browser focus abcd1234clawdis browser close abcd1234
Inspection:
clawdis browser screenshotclawdis browser screenshot --full-pageclawdis browser screenshot --ref 12clawdis browser eval "document.title"clawdis browser query "a" --limit 5clawdis browser dom --format text --max-chars 5000clawdis browser snapshot --format aria --limit 200clawdis browser snapshot --format ai
Actions:
clawdis browser navigate https://example.comclawdis browser backclawdis browser resize 1280 720clawdis browser click 12 --doubleclawdis browser type 23 "hello" --submitclawdis browser press Enterclawdis browser hover 44clawdis browser drag 10 11clawdis browser select 9 OptionA OptionBclawdis browser upload /tmp/file.pdfclawdis browser fill --fields '[{\"ref\":\"1\",\"value\":\"Ada\"}]'clawdis browser dialog --acceptclawdis browser wait --text "Done"clawdis browser evaluate --fn '(el) => el.textContent' --ref 7clawdis browser run --code '(page) => page.title()'clawdis browser console --level errorclawdis browser pdfclawdis browser verify-element --role button --name "Submit"clawdis browser verify-text "Welcome"clawdis browser verify-list 3 ItemA ItemBclawdis browser verify-value --ref 4 --type textbox --value helloclawdis browser mouse-move --x 120 --y 240clawdis browser mouse-click --x 120 --y 240clawdis browser mouse-drag --start-x 10 --start-y 20 --end-x 200 --end-y 300
Notes:
uploadanddialogonly work when a file chooser or dialog is present.snapshot --format aireturns Playwright-for-AI markup used for ref-based actions.
Security & privacy notes
- The clawd browser profile is app-owned; it may contain logged-in sessions. Treat it as sensitive data.
- The control server must bind to loopback only by default (
127.0.0.1) unless the user explicitly configures a non-loopback URL. - Never reuse or copy the user's default Chrome profile.
Non-goals (for the first cut)
- Cross-device "sync" of tabs between Mac and Pi.
- Sharing the user's logged-in Chrome sessions automatically.
- General-purpose web scraping; this is primarily for "close-the-loop" verification and interaction.