From 9b8a4d0c76b6798da89de9837580b81d40bad3f1 Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Sat, 20 Dec 2025 03:27:17 +0000 Subject: [PATCH] docs(browser): simplify control contract --- docs/browser.md | 43 +++++--------- .../browser-control-simplification.md | 58 +++++++++++++++++++ 2 files changed, 71 insertions(+), 30 deletions(-) create mode 100644 docs/refactor/browser-control-simplification.md diff --git a/docs/browser.md b/docs/browser.md index e41c8299b..48b89e264 100644 --- a/docs/browser.md +++ b/docs/browser.md @@ -8,7 +8,7 @@ read_when: # Browser (integrated) — clawd-managed Chrome -Status: draft spec · Date: 2025-12-19 +Status: draft spec · Date: 2025-12-20 Goal: give the **clawd** persona its own browser that is: - Visually distinct (lobster-orange, profile labeled "clawd"). @@ -98,7 +98,7 @@ Fallback behavior: the user set the profile color/name once via Chrome UI; it must persist because the `userDataDir` is persistent. -## Control server contract (current) +## Control server contract (vNext) Expose a small local HTTP API (and/or gateway RPC surface) so the agent can manage state without touching the user's Chrome. @@ -111,32 +111,21 @@ Basics: - `POST /tabs/open` open a new tab - `POST /tabs/focus` focus a tab by id/prefix - `DELETE /tabs/:targetId` close a tab by id/prefix -- `POST /close` close the current tab (optional targetId in body) Inspection: -- `GET /screenshot` (CDP screenshot) -- `POST /screenshot` (Playwright screenshot with ref/element) -- `GET /query` -- `GET /dom` -- `GET /snapshot` (`aria` | `domSnapshot` | `ai`) - -Debug-only endpoints (intentionally omitted for now): -- network request log (privacy) -- tracing export (large + sensitive) -- locator generation (dev convenience) +- `POST /screenshot` `{ targetId?, fullPage?, ref?, element?, type? }` +- `GET /snapshot` `?format=aria|ai&targetId?&limit?` +- `GET /console` `?level?&targetId?` +- `POST /pdf` `{ targetId? }` Actions: - `POST /navigate` -- `POST /resize` -- `POST /click`, `POST /type`, `POST /press`, `POST /hover`, `POST /drag`, `POST /select` -- `POST /upload` (arms the next file chooser) -- `POST /fill` (JSON field descriptors) -- `POST /dialog` (arms the next alert/confirm/prompt) -- `POST /wait` (time/text/textGone) -- `POST /evaluate` (function + optional ref) -- `GET /console` -- `POST /pdf` -- `POST /verify/element`, `POST /verify/text`, `POST /verify/list`, `POST /verify/value` +- `POST /act` `{ kind, targetId?, ... }` where `kind` is one of: + - `click`, `type`, `press`, `hover`, `drag`, `select`, `fill`, `wait`, `resize`, `close`, `evaluate` + +Hooks (arming): +- `POST /hooks/file-chooser` `{ targetId?, paths, timeoutMs? }` +- `POST /hooks/dialog` `{ targetId?, accept, promptText?, timeoutMs? }` ### "Is it open or closed?" @@ -178,8 +167,6 @@ Inspection: - `clawdis browser screenshot` - `clawdis browser screenshot --full-page` - `clawdis browser screenshot --ref 12` -- `clawdis browser query "a" --limit 5` -- `clawdis browser dom --format text --max-chars 5000` - `clawdis browser snapshot --format aria --limit 200` - `clawdis browser snapshot --format ai` @@ -199,14 +186,10 @@ Actions: - `clawdis browser evaluate --fn '(el) => el.textContent' --ref 7` - `clawdis browser console --level error` - `clawdis browser pdf` -- `clawdis browser verify-element --role button --name "Submit"` -- `clawdis browser verify-text "Welcome"` -- `clawdis browser verify-list 3 ItemA ItemB` -- `clawdis browser verify-value --ref 4 --type textbox --value hello` Notes: - `upload` and `dialog` are **arming** calls; run them before the click/press that triggers the chooser/dialog. -- `snapshot --format ai` returns Playwright-for-AI markup used for ref-based actions. +- `snapshot --format ai` returns AI snapshot markup used for ref-based actions. ## Security & privacy notes diff --git a/docs/refactor/browser-control-simplification.md b/docs/refactor/browser-control-simplification.md new file mode 100644 index 000000000..dc76cdba8 --- /dev/null +++ b/docs/refactor/browser-control-simplification.md @@ -0,0 +1,58 @@ +--- +summary: "Refactor: simplify browser control API + implementation" +read_when: + - Refactoring browser control routes, client, or CLI + - Auditing agent-facing browser tool surface +date: 2025-12-20 +--- + +# Refactor: Browser control simplification + +Goal: make the browser-control surface **small, stable, and agent-oriented**, and remove “implementation-shaped” APIs (Playwright/CDP specifics, one-off endpoints, and debugging helpers). + +## Why + +- The previous API accreted many narrow endpoints (`/click`, `/type`, `/press`, …) plus debug utilities. +- Some actions are inherently racy when modeled as “do X *when* the event is already visible” (file chooser, dialogs). +- We want a single, coherent contract that keeps “how it’s implemented” private. + +## Target contract (vNext) + +**Basics** +- `GET /` status +- `POST /start`, `POST /stop` +- `GET /tabs`, `POST /tabs/open`, `POST /tabs/focus`, `DELETE /tabs/:targetId` + +**Agent actions** +- `POST /navigate` `{ url, targetId? }` +- `POST /act` `{ kind, targetId?, ... }` where `kind` is one of: + - `click`, `type`, `press`, `hover`, `drag`, `select`, `fill`, `wait`, `resize`, `close`, `evaluate` +- `POST /screenshot` `{ targetId?, fullPage?, ref?, element?, type?, filename? }` +- `GET /snapshot` `?format=ai|aria&targetId?&limit?` +- `GET /console` `?level?&targetId?` +- `POST /pdf` `{ targetId? }` + +**Hooks (pre-setup / arming)** +- `POST /hooks/file-chooser` `{ targetId?, paths, timeoutMs? }` +- `POST /hooks/dialog` `{ targetId?, accept, promptText?, timeoutMs? }` + +Semantics: +- Hook endpoints **arm** the next matching event within `timeoutMs` (default 10s). +- Last arm wins per page (new arm replaces previous). + +## Work checklist + +- [x] Replace action endpoints with `POST /act` +- [x] Remove legacy endpoints (`/click`, `/type`, `/wait`, …) and any CLI wrappers that no longer make sense +- [x] Remove `/back` and any history-specific routes +- [x] Convert `upload` + `dialog` to hook/arming endpoints +- [x] Unify screenshots behind `POST /screenshot` (no GET variant) +- [x] Trim inspect/debug endpoints (`/query`, `/dom`) unless explicitly needed +- [x] Update docs/browser.md to describe contract without implementation details +- [x] Update tests (server + client) to cover vNext contract + +## Notes / decisions + +- Keep Playwright as an internal implementation detail for now. +- Prefer ref-based interactions (`aria-ref`) over coordinate-based ones. +- Keep the code split “routes vs. engine” small and obvious; avoid scattering logic across too many files.