Files
clawdbot/docs/browser.md

11 KiB

summary, read_when
summary read_when
Spec: integrated browser control server + action commands
Adding agent-controlled browser automation
Debugging why clawd is interfering with your own Chrome
Implementing browser settings + lifecycle in the macOS app

Browser (integrated) — clawd-managed Chrome

Status: draft spec · Date: 2025-12-20

Goal: give the clawd persona its own browser that is:

  • Visually distinct (lobster-orange, profile labeled "clawd").
  • Fully agent-manageable (start/stop, list tabs, focus/close tabs, open URLs, screenshot).
  • Non-interfering with the user's own browser (separate profile + dedicated ports).

This doc covers the macOS app/gateway side. It intentionally does not mandate Playwright vs Puppeteer; the key is the contract and the separation guarantees.

User-facing settings

Add a dedicated settings section (preferably under Skills or its own "Browser" tab):

  • Enable clawd browser (default: on)
    • When off: no browser is launched, and browser tools return "disabled".
  • Browser control URL (default: http://127.0.0.1:18791)
    • Interpreted as the base URL of the local/remote browser-control server.
    • If the URL host is not loopback, Clawdbot must not attempt to launch a local browser; it only connects.
  • CDP URL (default: controlUrl + 1)
    • Base URL for Chrome DevTools Protocol (e.g. http://127.0.0.1:18792).
    • Set this to a non-loopback host to attach the local control server to a remote Chrome/Chromium CDP endpoint (SSH/Tailscale tunnel recommended).
    • If the CDP URL host is non-loopback, clawd does not auto-launch a local browser.
    • If you tunnel a remote CDP to localhost, set Attach to existing only to avoid accidentally launching a local browser.
  • Accent color (default: #FF4500, "lobster-orange")
    • Used to theme the clawd browser profile (best-effort) and to tint UI indicators in Clawdbot.

Optional (advanced, can be hidden behind Debug initially):

  • Use headless browser (default: off)
  • Attach to existing only (default: off) — if on, never launch; only connect if already running.
  • Browser executable path (override, optional)
  • No sandbox (default: off) — adds --no-sandbox + --disable-setuid-sandbox

Port convention

Clawdbot already uses:

  • Gateway WebSocket: 18789
  • Bridge (voice/node): 18790

For the clawd browser-control server, use "family" ports:

  • Browser control HTTP API: 18791 (bridge + 1)
  • Browser CDP/debugging port: 18792 (control + 1)
  • Canvas host HTTP: 18793 by default, mounted at /__clawdbot__/canvas/

The user usually only configures the control URL (port 18791). CDP is an internal detail.

Browser isolation guarantees (non-negotiable)

  1. Dedicated user data dir

    • Never attach to or reuse the user's default Chrome profile.
    • Store clawd browser state under an app-owned directory, e.g.:
      • ~/Library/Application Support/Clawdbot/browser/clawd/ (mac app)
      • or ~/.clawdbot/browser/clawd/ (gateway/CLI)
  2. Dedicated ports

    • Never use 9222 (reserved for ad-hoc dev workflows; avoids colliding with agent-tools/browser-tools).
    • Default ports are 18791/18792 unless overridden.
  3. Named tab/page management

    • The agent must be able to enumerate and target tabs deterministically (by stable targetId or equivalent), not "last tab".

Browser selection (macOS + Linux)

On startup (when enabled + local URL), Clawdbot chooses the browser executable in this order:

  1. Google Chrome Canary (if installed)
  2. Chromium (if installed)
  3. Google Chrome (fallback)

Linux:

  • Looks for google-chrome / chromium in common system paths.
  • Use Browser executable path to force a specific binary.

Implementation detail:

  • macOS: detection is by existence of the .app bundle under /Applications (and optionally ~/Applications), then using the resolved executable path.
  • Linux: common /usr/bin//snap/bin paths.

Rationale:

  • Canary/Chromium are easy to visually distinguish from the user's daily driver.
  • Chrome fallback ensures the feature works on a stock machine.

Visual differentiation ("lobster-orange")

The clawd browser should be obviously different at a glance:

  • Profile name: clawd
  • Profile color: #FF4500

Preferred behavior:

  • Seed/patch the profile's preferences on first launch so the color + name persist.

Fallback behavior:

  • If preferences patching is not reliable, open with the dedicated profile and let the user set the profile color/name once via Chrome UI; it must persist because the userDataDir is persistent.

Control server contract (vNext)

Expose a small local HTTP API (and/or gateway RPC surface) so the agent can manage state without touching the user's Chrome.

Basics:

  • GET / status payload (enabled/running/pid/cdpPort/etc)
  • POST /start start browser
  • POST /stop stop browser
  • GET /tabs list tabs
  • POST /tabs/open open a new tab
  • POST /tabs/focus focus a tab by id/prefix
  • DELETE /tabs/:targetId close a tab by id/prefix

Inspection:

  • POST /screenshot { targetId?, fullPage?, ref?, element?, type? }
  • GET /snapshot ?format=aria|ai&targetId?&limit?
  • GET /console ?level?&targetId?
  • POST /pdf { targetId? }

Actions:

  • POST /navigate
  • POST /act { kind, targetId?, ... } where kind is one of:
    • click, type, press, hover, drag, select, fill, wait, resize, close, evaluate

Hooks (arming):

  • POST /hooks/file-chooser { targetId?, paths, timeoutMs? }
  • POST /hooks/dialog { targetId?, accept, promptText?, timeoutMs? }

"Is it open or closed?"

"Open" means:

  • the control server is reachable at the configured URL and
  • it reports a live browser connection.

"Closed" means:

  • control server not reachable, or server reports no browser.

Clawdbot should treat "open/closed" as a health check (fast path), not by scanning global Chrome processes (avoid false positives).

Multi-profile support

Clawdbot supports multiple named browser profiles, each with:

  • Dedicated CDP port (auto-allocated from 18800-18899) or a per-profile CDP URL
  • Persistent user data directory (~/.clawdbot/browser/<name>/user-data/)
  • Unique color for visual distinction

Configuration

{
  "browser": {
    "enabled": true,
    "defaultProfile": "clawd",
    "profiles": {
      "clawd": { "cdpPort": 18800, "color": "#FF4500" },
      "work": { "cdpPort": 18801, "color": "#0066CC" },
      "remote": { "cdpUrl": "http://10.0.0.42:9222", "color": "#00AA00" }
    }
  }
}

Profile actions

  • GET /profiles — list all profiles with status
  • POST /profiles/create { name, color?, cdpUrl? } — create new profile (auto-allocates port if no cdpUrl)
  • DELETE /profiles/:name — delete profile (stops browser + removes user data for local profiles)
  • POST /reset-profile?profile=<name> — kill orphan process on profile's port (local profiles only)

Profile parameter

All existing endpoints accept optional ?profile=<name> query parameter:

  • GET /?profile=work — status for work profile
  • POST /start?profile=work — start work profile browser
  • GET /tabs?profile=work — list tabs for work profile
  • etc.

When profile is omitted, uses browser.defaultProfile (defaults to "clawd").

Profile naming rules

  • Lowercase alphanumeric characters and hyphens only
  • Must start with a letter or number (not a hyphen)
  • Maximum 64 characters
  • Examples: clawd, work, my-project-1

Port allocation

Ports are allocated from range 18800-18899 (~100 profiles max). This is far more than practical use — memory and CPU exhaustion occur well before port exhaustion. Ports are allocated once at profile creation and persisted permanently. Remote profiles are attach-only and do not use the local port range.

Interaction with the agent (clawd)

The agent should use browser tools only when:

  • enabled in settings
  • control URL is configured

If disabled, tools must fail fast with a friendly error ("Browser disabled in settings").

The agent should not assume tabs are ephemeral. It should:

  • call browser.tabs.list to discover existing tabs first
  • reuse an existing tab when appropriate (e.g. a persistent "main" tab)
  • avoid opening duplicate tabs unless asked

CLI quick reference (one example each)

All commands accept --browser-profile <name> to target a specific profile (default: clawd).

Profile management:

  • clawdbot browser profiles
  • clawdbot browser create-profile --name work
  • clawdbot browser create-profile --name remote --cdp-url http://10.0.0.42:9222
  • clawdbot browser delete-profile --name work Basics:
  • clawdbot browser status
  • clawdbot browser start
  • clawdbot browser stop
  • clawdbot browser reset-profile
  • clawdbot browser tabs
  • clawdbot browser open https://example.com
  • clawdbot browser focus abcd1234
  • clawdbot browser close abcd1234

Inspection:

  • clawdbot browser screenshot
  • clawdbot browser screenshot --full-page
  • clawdbot browser screenshot --ref 12
  • clawdbot browser snapshot
  • clawdbot browser snapshot --format aria --limit 200

Actions:

  • clawdbot browser navigate https://example.com
  • clawdbot browser resize 1280 720
  • clawdbot browser click 12 --double
  • clawdbot browser type 23 "hello" --submit
  • clawdbot browser press Enter
  • clawdbot browser hover 44
  • clawdbot browser drag 10 11
  • clawdbot browser select 9 OptionA OptionB
  • clawdbot browser upload /tmp/file.pdf
  • clawdbot browser fill --fields '[{\"ref\":\"1\",\"value\":\"Ada\"}]'
  • clawdbot browser dialog --accept
  • clawdbot browser wait --text "Done"
  • clawdbot browser evaluate --fn '(el) => el.textContent' --ref 7
  • clawdbot browser evaluate --fn "document.querySelector('.my-class').click()"
  • clawdbot browser console --level error
  • clawdbot browser pdf

Notes:

  • upload and dialog are arming calls; run them before the click/press that triggers the chooser/dialog.
  • upload can take a ref to auto-click after arming (useful for single-step file uploads).
  • upload can also take inputRef (aria ref) or element (CSS selector) to set <input type="file"> directly without waiting for a file chooser.
  • The arm default timeout is 2 minutes (clamped to max 2 minutes); pass timeoutMs if you need shorter.
  • snapshot defaults to ai; aria returns an accessibility tree for debugging.
  • click/type require ref from snapshot --format ai; use evaluate for rare CSS selector one-offs.
  • Avoid wait by default; use it only in exceptional cases when there is no reliable UI state to wait on.

Security & privacy notes

  • The clawd browser profile is app-owned; it may contain logged-in sessions. Treat it as sensitive data.
  • The control server must bind to loopback only by default (127.0.0.1) unless the user explicitly configures a non-loopback URL.
  • Never reuse or copy the user's default Chrome profile.
  • Remote CDP endpoints should be tunneled or protected; CDP is highly privileged.

Non-goals (for the first cut)

  • Cross-device "sync" of tabs between Mac and Pi.
  • Sharing the user's logged-in Chrome sessions automatically.
  • General-purpose web scraping; this is primarily for "close-the-loop" verification and interaction.

Troubleshooting

For Linux-specific issues (especially Ubuntu with snap Chromium), see browser-linux-troubleshooting.