- tabs.ts now uses getProfileContext like other routes - browser-tool threads profile param through all actions - add tests for profile query param on /tabs endpoints - update docs with browser tool profile parameter
12 KiB
summary, read_when
| summary | read_when | |||
|---|---|---|---|---|
| Spec: integrated browser control server + action commands |
|
Browser (integrated) — clawd-managed Chrome
Status: draft spec · Date: 2025-12-20
Goal: give the clawd persona its own browser that is:
- Visually distinct (lobster-orange, profile labeled "clawd").
- Fully agent-manageable (start/stop, list tabs, focus/close tabs, open URLs, screenshot).
- Non-interfering with the user's own browser (separate profile + dedicated ports).
This doc covers the macOS app/gateway side. It intentionally does not mandate Playwright vs Puppeteer; the key is the contract and the separation guarantees.
User-facing settings
Add a dedicated settings section (preferably under Skills or its own "Browser" tab):
- Enable clawd browser (
default: on)- When off: no browser is launched, and browser tools return "disabled".
- Browser control URL (
default: http://127.0.0.1:18791)- Interpreted as the base URL of the local/remote browser-control server.
- If the URL host is not loopback, Clawdbot must not attempt to launch a local browser; it only connects.
- CDP URL (
default: controlUrl + 1)- Base URL for Chrome DevTools Protocol (e.g.
http://127.0.0.1:18792). - Set this to a non-loopback host to attach the local control server to a remote Chrome/Chromium CDP endpoint (SSH/Tailscale tunnel recommended).
- If the CDP URL host is non-loopback, clawd does not auto-launch a local browser.
- If you tunnel a remote CDP to
localhost, set Attach to existing only to avoid accidentally launching a local browser.
- Base URL for Chrome DevTools Protocol (e.g.
- Accent color (
default: #FF4500, "lobster-orange")- Used to theme the clawd browser profile (best-effort) and to tint UI indicators in Clawdbot.
Optional (advanced, can be hidden behind Debug initially):
- Use headless browser (
default: off) - Attach to existing only (
default: off) — if on, never launch; only connect if already running. - Browser executable path (override, optional)
- No sandbox (
default: off) — adds--no-sandbox+--disable-setuid-sandbox
Port convention
Clawdbot already uses:
- Gateway WebSocket:
18789 - Bridge (voice/node):
18790
For the clawd browser-control server, use "family" ports:
- Browser control HTTP API:
18791(bridge + 1) - Browser CDP/debugging port:
18792(control + 1) - Canvas host HTTP:
18793by default, mounted at/__clawdbot__/canvas/
The user usually only configures the control URL (port 18791). CDP is an
internal detail.
Browser isolation guarantees (non-negotiable)
-
Dedicated user data dir
- Never attach to or reuse the user's default Chrome profile.
- Store clawd browser state under an app-owned directory, e.g.:
~/Library/Application Support/Clawdbot/browser/clawd/(mac app)- or
~/.clawdbot/browser/clawd/(gateway/CLI)
-
Dedicated ports
- Never use
9222(reserved for ad-hoc dev workflows; avoids colliding withagent-tools/browser-tools). - Default ports are
18791/18792unless overridden.
- Never use
-
Named tab/page management
- The agent must be able to enumerate and target tabs deterministically (by
stable
targetIdor equivalent), not "last tab".
- The agent must be able to enumerate and target tabs deterministically (by
stable
Browser selection (macOS + Linux)
On startup (when enabled + local URL), Clawdbot chooses the browser executable in this order:
- Google Chrome Canary (if installed)
- Chromium (if installed)
- Google Chrome (fallback)
Linux:
- Looks for
google-chrome/chromiumin common system paths. - Use Browser executable path to force a specific binary.
Implementation detail:
- macOS: detection is by existence of the
.appbundle under/Applications(and optionally~/Applications), then using the resolved executable path. - Linux: common
/usr/bin//snap/binpaths.
Rationale:
- Canary/Chromium are easy to visually distinguish from the user's daily driver.
- Chrome fallback ensures the feature works on a stock machine.
Visual differentiation ("lobster-orange")
The clawd browser should be obviously different at a glance:
- Profile name: clawd
- Profile color: #FF4500
Preferred behavior:
- Seed/patch the profile's preferences on first launch so the color + name persist.
Fallback behavior:
- If preferences patching is not reliable, open with the dedicated profile and let
the user set the profile color/name once via Chrome UI; it must persist because
the
userDataDiris persistent.
Control server contract (vNext)
Expose a small local HTTP API (and/or gateway RPC surface) so the agent can manage state without touching the user's Chrome.
Basics:
GET /status payload (enabled/running/pid/cdpPort/etc)POST /startstart browserPOST /stopstop browserGET /tabslist tabsPOST /tabs/openopen a new tabPOST /tabs/focusfocus a tab by id/prefixDELETE /tabs/:targetIdclose a tab by id/prefix
Inspection:
POST /screenshot{ targetId?, fullPage?, ref?, element?, type? }GET /snapshot?format=aria|ai&targetId?&limit?GET /console?level?&targetId?POST /pdf{ targetId? }
Actions:
POST /navigatePOST /act{ kind, targetId?, ... }wherekindis one of:click,type,press,hover,drag,select,fill,wait,resize,close,evaluate
Hooks (arming):
POST /hooks/file-chooser{ targetId?, paths, timeoutMs? }POST /hooks/dialog{ targetId?, accept, promptText?, timeoutMs? }
"Is it open or closed?"
"Open" means:
- the control server is reachable at the configured URL and
- it reports a live browser connection.
"Closed" means:
- control server not reachable, or server reports no browser.
Clawdbot should treat "open/closed" as a health check (fast path), not by scanning global Chrome processes (avoid false positives).
Multi-profile support
Clawdbot supports multiple named browser profiles, each with:
- Dedicated CDP port (auto-allocated from 18800-18899) or a per-profile CDP URL
- Persistent user data directory (
~/.clawdbot/browser/<name>/user-data/) - Unique color for visual distinction
Configuration
{
"browser": {
"enabled": true,
"defaultProfile": "clawd",
"profiles": {
"clawd": { "cdpPort": 18800, "color": "#FF4500" },
"work": { "cdpPort": 18801, "color": "#0066CC" },
"remote": { "cdpUrl": "http://10.0.0.42:9222", "color": "#00AA00" }
}
}
}
Profile actions
GET /profiles— list all profiles with statusPOST /profiles/create{ name, color?, cdpUrl? }— create new profile (auto-allocates port if nocdpUrl)DELETE /profiles/:name— delete profile (stops browser + removes user data for local profiles)POST /reset-profile?profile=<name>— kill orphan process on profile's port (local profiles only)
Profile parameter
All existing endpoints accept optional ?profile=<name> query parameter:
GET /?profile=work— status for work profilePOST /start?profile=work— start work profile browserGET /tabs?profile=work— list tabs for work profilePOST /tabs/open?profile=work— open tab in work profile- etc.
When profile is omitted, uses browser.defaultProfile (defaults to "clawd").
Agent browser tool
The browser tool accepts an optional profile parameter for all actions:
{
"action": "open",
"targetUrl": "https://example.com",
"profile": "work"
}
This routes the operation to the specified profile's browser instance. Omitting
profile uses the default profile.
Profile naming rules
- Lowercase alphanumeric characters and hyphens only
- Must start with a letter or number (not a hyphen)
- Maximum 64 characters
- Examples:
clawd,work,my-project-1
Port allocation
Ports are allocated from range 18800-18899 (~100 profiles max). This is far more than practical use — memory and CPU exhaustion occur well before port exhaustion. Ports are allocated once at profile creation and persisted permanently. Remote profiles are attach-only and do not use the local port range.
Interaction with the agent (clawd)
The agent should use browser tools only when:
- enabled in settings
- control URL is configured
If disabled, tools must fail fast with a friendly error ("Browser disabled in settings").
The agent should not assume tabs are ephemeral. It should:
- call
browser.tabs.listto discover existing tabs first - reuse an existing tab when appropriate (e.g. a persistent "main" tab)
- avoid opening duplicate tabs unless asked
CLI quick reference (one example each)
All commands accept --browser-profile <name> to target a specific profile (default: clawd).
Profile management:
clawdbot browser profilesclawdbot browser create-profile --name workclawdbot browser create-profile --name remote --cdp-url http://10.0.0.42:9222clawdbot browser delete-profile --name workBasics:clawdbot browser statusclawdbot browser startclawdbot browser stopclawdbot browser reset-profileclawdbot browser tabsclawdbot browser open https://example.comclawdbot browser focus abcd1234clawdbot browser close abcd1234
Inspection:
clawdbot browser screenshotclawdbot browser screenshot --full-pageclawdbot browser screenshot --ref 12clawdbot browser snapshotclawdbot browser snapshot --format aria --limit 200
Actions:
clawdbot browser navigate https://example.comclawdbot browser resize 1280 720clawdbot browser click 12 --doubleclawdbot browser type 23 "hello" --submitclawdbot browser press Enterclawdbot browser hover 44clawdbot browser drag 10 11clawdbot browser select 9 OptionA OptionBclawdbot browser upload /tmp/file.pdfclawdbot browser fill --fields '[{\"ref\":\"1\",\"value\":\"Ada\"}]'clawdbot browser dialog --acceptclawdbot browser wait --text "Done"clawdbot browser evaluate --fn '(el) => el.textContent' --ref 7clawdbot browser evaluate --fn "document.querySelector('.my-class').click()"clawdbot browser console --level errorclawdbot browser pdf
Notes:
uploadanddialogare arming calls; run them before the click/press that triggers the chooser/dialog.uploadcan take arefto auto-click after arming (useful for single-step file uploads).uploadcan also takeinputRef(aria ref) orelement(CSS selector) to set<input type="file">directly without waiting for a file chooser.- The arm default timeout is 2 minutes (clamped to max 2 minutes); pass
timeoutMsif you need shorter. snapshotdefaults toai;ariareturns an accessibility tree for debugging.click/typerequirereffromsnapshot --format ai; useevaluatefor rare CSS selector one-offs.- Avoid
waitby default; use it only in exceptional cases when there is no reliable UI state to wait on.
Security & privacy notes
- The clawd browser profile is app-owned; it may contain logged-in sessions. Treat it as sensitive data.
- The control server must bind to loopback only by default (
127.0.0.1) unless the user explicitly configures a non-loopback URL. - Never reuse or copy the user's default Chrome profile.
- Remote CDP endpoints should be tunneled or protected; CDP is highly privileged.
Non-goals (for the first cut)
- Cross-device "sync" of tabs between Mac and Pi.
- Sharing the user's logged-in Chrome sessions automatically.
- General-purpose web scraping; this is primarily for "close-the-loop" verification and interaction.
Troubleshooting
For Linux-specific issues (especially Ubuntu with snap Chromium), see browser-linux-troubleshooting.