Files
clawdbot/docs/mac/browser.md
Peter Steinberger 875cf9a054 refactor(webchat): SwiftUI-only WebChat UI
# Conflicts:
#	apps/macos/Package.swift
2025-12-17 23:05:28 +01:00

6.9 KiB
Raw Blame History

summary, read_when
summary read_when
Spec: clawd-managed Chrome/Chromium instance (separate profile, lobster-orange, tab management)
Adding agent-controlled browser automation
Debugging why clawd is interfering with your own Chrome
Implementing browser settings + lifecycle in the macOS app

Browser (macOS app) — clawd-managed Chrome

Status: draft spec · Date: 2025-12-13

Goal: give the clawd persona its own browser that is:

  • Visually distinct (lobster-orange, profile labeled “clawd”).
  • Fully agent-manageable (start/stop, list tabs, focus/close tabs, open URLs, screenshot).
  • Non-interfering with the users own browser (separate profile + dedicated ports).

This doc covers the macOS app/gateway side. It intentionally does not mandate Playwright vs Puppeteer yet; the key is the contract and the separation guarantees.

User-facing settings

Add a dedicated settings section (preferably under Tools or its own “Browser” tab):

  • Enable clawd browser (default: on)
    • When off: no browser is launched, and browser tools return “disabled”.
  • Browser control URL (default: http://127.0.0.1:18791)
    • Interpreted as the base URL of the local/remote browser-control server.
    • If the URL host is not loopback, Clawdis must not attempt to launch a local browser; it only connects.
  • Accent color (default: #FF4500, “lobster-orange”)
    • Used to theme the clawd browser profile (best-effort) and to tint UI indicators in Clawdis.

Optional (advanced, can be hidden behind Debug initially):

  • Use headless browser (default: off)
  • Attach to existing only (default: off) — if on, never launch; only connect if already running.

Port convention

Clawdis already uses:

  • Gateway WebSocket: 18789
  • Bridge (voice/iris): 18790

For the clawd browser-control server, use “family” ports:

  • Browser control HTTP API: 18791 (bridge + 1)
  • Browser CDP/debugging port: 18792 (control + 1)

The user usually only configures the control URL (port 18791). CDP is an internal detail.

Browser isolation guarantees (non-negotiable)

  1. Dedicated user data dir

    • Never attach to or reuse the users default Chrome profile.
    • Store clawd browser state under an app-owned directory, e.g.:
      • ~/Library/Application Support/Clawdis/browser/clawd/ (mac app)
      • or ~/.clawdis/browser/clawd/ (gateway/CLI)
  2. Dedicated ports

    • Never use 9222 (reserved for ad-hoc dev workflows; avoids colliding with agent-tools/browser-tools).
    • Default ports are 18791/18792 unless overridden.
  3. Named tab/page management

    • The agent must be able to enumerate and target tabs deterministically (by stable targetId or equivalent), not “last tab”.

Browser selection (macOS)

On startup (when enabled + local URL), Clawdis chooses the browser executable in this order:

  1. Google Chrome Canary (if installed)
  2. Chromium (if installed)
  3. Google Chrome (fallback)

Implementation detail: detection is by existence of the .app bundle under /Applications (and optionally ~/Applications), then using the resolved executable path.

Rationale:

  • Canary/Chromium are easy to visually distinguish from the users daily driver.
  • Chrome fallback ensures the feature works on a stock machine.

Visual differentiation (“lobster-orange”)

The clawd browser should be obviously different at a glance:

  • Profile name: clawd
  • Profile color: #FF4500

Preferred behavior:

  • Seed/patch the profiles preferences on first launch so the color + name persist.

Fallback behavior:

  • If preferences patching is not reliable, open with the dedicated profile and let the user set the profile color/name once via Chrome UI; it must persist because the userDataDir is persistent.

Control server contract (proposed)

Expose a small local HTTP API (and/or gateway RPC surface) so the agent can manage state without touching the users Chrome.

Minimum endpoints/methods (names illustrative):

  • browser.status
    • returns: { enabled, url, running, pid?, version?, chosenBrowser?, userDataDir?, ports: { control, cdp } }
  • browser.start
    • starts the browser-control server + browser (no-op if already running)
  • browser.stop
    • stops the server and closes the clawd browser (best-effort; graceful first, then force if needed)
  • browser.tabs.list
    • returns: array of { targetId, title, url, isActive, lastFocusedAt? }
  • browser.tabs.open
    • params: { url, newTab?: true } → returns { targetId }
  • browser.tabs.focus
    • params: { targetId }
  • browser.tabs.close
    • params: { targetId }
  • browser.screenshot
    • params: { targetId?, fullPage?: false } → returns a MEDIA: attachment URL (via the existing Clawdis media host)

DOM + inspection (v1):

  • browser.eval
    • params: { js, targetId?, await?: false } → returns the CDP Runtime.evaluate result (best-effort returnByValue)
  • browser.query
    • params: { selector, targetId?, limit? } → returns basic element summaries (tag/id/class/text/value/href/outerHTML)
  • browser.dom
    • params: { format: "html"|"text", targetId?, selector?, maxChars? } → returns a truncated dump (text field)
  • browser.snapshot
    • params: { format: "aria"|"domSnapshot", targetId?, limit? }
    • aria: simplified Accessibility tree with backendDOMNodeId when available (future click/type hooks)
    • domSnapshot: lightweight DOM walk snapshot (tree-ish, bounded by limit)

Nice-to-have (later):

  • browser.click / browser.type / browser.waitFor helpers built atop snapshot refs / backend node ids

“Is it open or closed?”

“Open” means:

  • the control server is reachable at the configured URL and
  • it reports a live browser connection.

“Closed” means:

  • control server not reachable, or server reports no browser.

Clawdis should treat “open/closed” as a health check (fast path), not by scanning global Chrome processes (avoid false positives).

Interaction with the agent (clawd)

The agent should use browser tools only when:

  • enabled in settings
  • control URL is configured

If disabled, tools must fail fast with a friendly error (“Browser disabled in settings”).

The agent should not assume tabs are ephemeral. It should:

  • call browser.tabs.list to discover existing tabs first
  • reuse an existing tab when appropriate (e.g. a persistent “main” tab)
  • avoid opening duplicate tabs unless asked

Security & privacy notes

  • The clawd browser profile is app-owned; it may contain logged-in sessions. Treat it as sensitive data.
  • The control server must bind to loopback only by default (127.0.0.1) unless the user explicitly configures a non-loopback URL.
  • Never reuse or copy the users default Chrome profile.

Non-goals (for the first cut)

  • Cross-device “sync” of tabs between Mac and Pi.
  • Sharing the users logged-in Chrome sessions automatically.
  • General-purpose web scraping; this is primarily for “close-the-loop” verification and interaction.