feat: add browser target selection for sandboxed agents

2026-01-11 01:24:02 +01:00
parent d2098e4492
commit 326fb04d12
16 changed files with 173 additions and 8 deletions
--- a/docs/gateway/configuration.md
+++ b/docs/gateway/configuration.md
@@ -1376,6 +1376,7 @@ Legacy: `perSession` is still supported (`true` → `scope: "session"`,
          noVncPort: 6080,
          headless: false,
          enableNoVnc: true,
+          allowHostControl: false,
          autoStart: true,
          autoStartTimeoutMs: 12000
        },
@@ -1418,6 +1419,11 @@ the noVNC URL is injected into the system prompt so the agent can reference it.
 This does not require `browser.enabled` in the main config; the sandbox control
 URL is injected per session.

+`agents.defaults.sandbox.browser.allowHostControl` (default: false) allows
+sandboxed sessions to explicitly target the **host** browser control server
+via the browser tool (`target: "host"`). Leave this off if you want strict
+sandbox isolation.
+
 ### `models` (custom providers + base URLs)

 Clawdbot uses the **pi-coding-agent** model catalog. You can add custom providers
--- a/docs/gateway/sandboxing.md
+++ b/docs/gateway/sandboxing.md
@@ -21,6 +21,7 @@ and process access when the model does something dumb.
 - Optional sandboxed browser (`agents.defaults.sandbox.browser`).
  - By default, the sandbox browser auto-starts (ensures CDP is reachable) when the browser tool needs it.
    Configure via `agents.defaults.sandbox.browser.autoStart` and `agents.defaults.sandbox.browser.autoStartTimeoutMs`.
+  - `agents.defaults.sandbox.browser.allowHostControl` lets sandboxed sessions target the host browser explicitly.

 Not sandboxed:
 - The Gateway process itself.
--- a/docs/tools/browser.md
+++ b/docs/tools/browser.md
@@ -12,6 +12,11 @@ Clawdbot can run a **dedicated Chrome/Chromium profile** that the agent controls
 It is isolated from your personal browser and is managed through a small local
 control server.

+Beginner view:
+- Think of it as a **separate, agent-only browser**.
+- It does **not** touch your personal Chrome profile.
+- The agent can **open tabs, read pages, click, and type** in a safe lane.
+
 ## What you get

 - A separate browser profile named **clawd** (orange accent by default).
@@ -65,6 +70,7 @@ Notes:
  the default browser ports shift to stay in the same “family” (control = gateway + 2).
 - `cdpUrl` defaults to `controlUrl + 1` when unset.
 - `attachOnly: true` means “never launch Chrome; only attach if it is already running.”
+- `color` + per-profile `color` tint the browser UI so you can see which profile is active.

 ## Local vs remote control

@@ -75,6 +81,31 @@ Notes:
 - **Remote CDP:** set `browser.profiles.<name>.cdpUrl` (or `browser.cdpUrl`) to
  attach to a remote Chrome. In this case, Clawdbot will not launch a local browser.

+## Remote browser (control server)
+
+You can run the **browser control server** on another machine and point your
+Gateway at it with a remote `controlUrl`. This lets the agent drive a browser
+outside the host (lab box, VM, remote desktop, etc.).
+
+Key points:
+- The **control server** speaks to Chrome/Chromium via **CDP**.
+- The **Gateway** only needs the HTTP control URL.
+- Profiles are resolved on the **control server** side.
+
+Example:
+```json5
+{
+  browser: {
+    enabled: true,
+    controlUrl: "http://10.0.0.42:18791",
+    defaultProfile: "work"
+  }
+}
+```
+
+Use `profiles.<name>.cdpUrl` for **remote CDP** if you want the Gateway to talk
+directly to a Chrome instance without a remote control server.
+
 ## Profiles (multi-browser)

 Clawdbot supports multiple named profiles. Each profile has its own:
@@ -129,6 +160,18 @@ Some features (navigate/act/ai snapshot, element screenshots, PDF) require
 Playwright. In embedded gateway builds, Playwright may be unavailable; those
 endpoints return a clear 501 error. ARIA snapshots and basic screenshots still work.

+## How it works (internal)
+
+High-level flow:
+- A small **control server** accepts HTTP requests.
+- It connects to Chrome/Chromium via **CDP**.
+- For advanced actions (click/type/snapshot/PDF), it uses **Playwright** on top
+  of CDP.
+- When Playwright is missing, only non-Playwright operations are available.
+
+This design keeps the agent on a stable, deterministic interface while letting
+you swap local/remote browsers and profiles.
+
 ## CLI quick reference

 All commands accept `--browser-profile <name>` to target a specific profile.
@@ -185,3 +228,21 @@ Notes:

 For Linux-specific issues (especially snap Chromium), see
 [Browser troubleshooting](/tools/browser-linux-troubleshooting).
+
+## Agent tools + how control works
+
+The agent gets **one tool** for browser automation:
+- `browser` — status/start/stop/tabs/open/focus/close/snapshot/screenshot/navigate/act
+
+How it maps:
+- `browser snapshot` returns a stable UI tree (AI or ARIA).
+- `browser act` uses the snapshot `ref` IDs to click/type/drag/select.
+- `browser screenshot` captures pixels (full page or element).
+- `browser` accepts:
+  - `profile` to choose a named browser profile (host or remote control server).
+  - `target` (`sandbox` | `host` | `custom`) to select where the browser lives.
+  - `controlUrl` sets `target: "custom"` implicitly (remote control server).
+  - In sandboxed sessions, `target: "host"` requires `agents.defaults.sandbox.browser.allowHostControl=true`.
+  - If `target` is omitted: sandboxed sessions default to `sandbox`, non-sandbox sessions default to `host`.
+
+This keeps the agent deterministic and avoids brittle selectors.