docs(browser): simplify control contract
This commit is contained in:
@@ -8,7 +8,7 @@ read_when:
|
|||||||
|
|
||||||
# Browser (integrated) — clawd-managed Chrome
|
# Browser (integrated) — clawd-managed Chrome
|
||||||
|
|
||||||
Status: draft spec · Date: 2025-12-19
|
Status: draft spec · Date: 2025-12-20
|
||||||
|
|
||||||
Goal: give the **clawd** persona its own browser that is:
|
Goal: give the **clawd** persona its own browser that is:
|
||||||
- Visually distinct (lobster-orange, profile labeled "clawd").
|
- Visually distinct (lobster-orange, profile labeled "clawd").
|
||||||
@@ -98,7 +98,7 @@ Fallback behavior:
|
|||||||
the user set the profile color/name once via Chrome UI; it must persist because
|
the user set the profile color/name once via Chrome UI; it must persist because
|
||||||
the `userDataDir` is persistent.
|
the `userDataDir` is persistent.
|
||||||
|
|
||||||
## Control server contract (current)
|
## Control server contract (vNext)
|
||||||
|
|
||||||
Expose a small local HTTP API (and/or gateway RPC surface) so the agent can manage
|
Expose a small local HTTP API (and/or gateway RPC surface) so the agent can manage
|
||||||
state without touching the user's Chrome.
|
state without touching the user's Chrome.
|
||||||
@@ -111,32 +111,21 @@ Basics:
|
|||||||
- `POST /tabs/open` open a new tab
|
- `POST /tabs/open` open a new tab
|
||||||
- `POST /tabs/focus` focus a tab by id/prefix
|
- `POST /tabs/focus` focus a tab by id/prefix
|
||||||
- `DELETE /tabs/:targetId` close a tab by id/prefix
|
- `DELETE /tabs/:targetId` close a tab by id/prefix
|
||||||
- `POST /close` close the current tab (optional targetId in body)
|
|
||||||
|
|
||||||
Inspection:
|
Inspection:
|
||||||
- `GET /screenshot` (CDP screenshot)
|
- `POST /screenshot` `{ targetId?, fullPage?, ref?, element?, type? }`
|
||||||
- `POST /screenshot` (Playwright screenshot with ref/element)
|
- `GET /snapshot` `?format=aria|ai&targetId?&limit?`
|
||||||
- `GET /query`
|
- `GET /console` `?level?&targetId?`
|
||||||
- `GET /dom`
|
- `POST /pdf` `{ targetId? }`
|
||||||
- `GET /snapshot` (`aria` | `domSnapshot` | `ai`)
|
|
||||||
|
|
||||||
Debug-only endpoints (intentionally omitted for now):
|
|
||||||
- network request log (privacy)
|
|
||||||
- tracing export (large + sensitive)
|
|
||||||
- locator generation (dev convenience)
|
|
||||||
|
|
||||||
Actions:
|
Actions:
|
||||||
- `POST /navigate`
|
- `POST /navigate`
|
||||||
- `POST /resize`
|
- `POST /act` `{ kind, targetId?, ... }` where `kind` is one of:
|
||||||
- `POST /click`, `POST /type`, `POST /press`, `POST /hover`, `POST /drag`, `POST /select`
|
- `click`, `type`, `press`, `hover`, `drag`, `select`, `fill`, `wait`, `resize`, `close`, `evaluate`
|
||||||
- `POST /upload` (arms the next file chooser)
|
|
||||||
- `POST /fill` (JSON field descriptors)
|
Hooks (arming):
|
||||||
- `POST /dialog` (arms the next alert/confirm/prompt)
|
- `POST /hooks/file-chooser` `{ targetId?, paths, timeoutMs? }`
|
||||||
- `POST /wait` (time/text/textGone)
|
- `POST /hooks/dialog` `{ targetId?, accept, promptText?, timeoutMs? }`
|
||||||
- `POST /evaluate` (function + optional ref)
|
|
||||||
- `GET /console`
|
|
||||||
- `POST /pdf`
|
|
||||||
- `POST /verify/element`, `POST /verify/text`, `POST /verify/list`, `POST /verify/value`
|
|
||||||
|
|
||||||
### "Is it open or closed?"
|
### "Is it open or closed?"
|
||||||
|
|
||||||
@@ -178,8 +167,6 @@ Inspection:
|
|||||||
- `clawdis browser screenshot`
|
- `clawdis browser screenshot`
|
||||||
- `clawdis browser screenshot --full-page`
|
- `clawdis browser screenshot --full-page`
|
||||||
- `clawdis browser screenshot --ref 12`
|
- `clawdis browser screenshot --ref 12`
|
||||||
- `clawdis browser query "a" --limit 5`
|
|
||||||
- `clawdis browser dom --format text --max-chars 5000`
|
|
||||||
- `clawdis browser snapshot --format aria --limit 200`
|
- `clawdis browser snapshot --format aria --limit 200`
|
||||||
- `clawdis browser snapshot --format ai`
|
- `clawdis browser snapshot --format ai`
|
||||||
|
|
||||||
@@ -199,14 +186,10 @@ Actions:
|
|||||||
- `clawdis browser evaluate --fn '(el) => el.textContent' --ref 7`
|
- `clawdis browser evaluate --fn '(el) => el.textContent' --ref 7`
|
||||||
- `clawdis browser console --level error`
|
- `clawdis browser console --level error`
|
||||||
- `clawdis browser pdf`
|
- `clawdis browser pdf`
|
||||||
- `clawdis browser verify-element --role button --name "Submit"`
|
|
||||||
- `clawdis browser verify-text "Welcome"`
|
|
||||||
- `clawdis browser verify-list 3 ItemA ItemB`
|
|
||||||
- `clawdis browser verify-value --ref 4 --type textbox --value hello`
|
|
||||||
|
|
||||||
Notes:
|
Notes:
|
||||||
- `upload` and `dialog` are **arming** calls; run them before the click/press that triggers the chooser/dialog.
|
- `upload` and `dialog` are **arming** calls; run them before the click/press that triggers the chooser/dialog.
|
||||||
- `snapshot --format ai` returns Playwright-for-AI markup used for ref-based actions.
|
- `snapshot --format ai` returns AI snapshot markup used for ref-based actions.
|
||||||
|
|
||||||
## Security & privacy notes
|
## Security & privacy notes
|
||||||
|
|
||||||
|
|||||||
58
docs/refactor/browser-control-simplification.md
Normal file
58
docs/refactor/browser-control-simplification.md
Normal file
@@ -0,0 +1,58 @@
|
|||||||
|
---
|
||||||
|
summary: "Refactor: simplify browser control API + implementation"
|
||||||
|
read_when:
|
||||||
|
- Refactoring browser control routes, client, or CLI
|
||||||
|
- Auditing agent-facing browser tool surface
|
||||||
|
date: 2025-12-20
|
||||||
|
---
|
||||||
|
|
||||||
|
# Refactor: Browser control simplification
|
||||||
|
|
||||||
|
Goal: make the browser-control surface **small, stable, and agent-oriented**, and remove “implementation-shaped” APIs (Playwright/CDP specifics, one-off endpoints, and debugging helpers).
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
- The previous API accreted many narrow endpoints (`/click`, `/type`, `/press`, …) plus debug utilities.
|
||||||
|
- Some actions are inherently racy when modeled as “do X *when* the event is already visible” (file chooser, dialogs).
|
||||||
|
- We want a single, coherent contract that keeps “how it’s implemented” private.
|
||||||
|
|
||||||
|
## Target contract (vNext)
|
||||||
|
|
||||||
|
**Basics**
|
||||||
|
- `GET /` status
|
||||||
|
- `POST /start`, `POST /stop`
|
||||||
|
- `GET /tabs`, `POST /tabs/open`, `POST /tabs/focus`, `DELETE /tabs/:targetId`
|
||||||
|
|
||||||
|
**Agent actions**
|
||||||
|
- `POST /navigate` `{ url, targetId? }`
|
||||||
|
- `POST /act` `{ kind, targetId?, ... }` where `kind` is one of:
|
||||||
|
- `click`, `type`, `press`, `hover`, `drag`, `select`, `fill`, `wait`, `resize`, `close`, `evaluate`
|
||||||
|
- `POST /screenshot` `{ targetId?, fullPage?, ref?, element?, type?, filename? }`
|
||||||
|
- `GET /snapshot` `?format=ai|aria&targetId?&limit?`
|
||||||
|
- `GET /console` `?level?&targetId?`
|
||||||
|
- `POST /pdf` `{ targetId? }`
|
||||||
|
|
||||||
|
**Hooks (pre-setup / arming)**
|
||||||
|
- `POST /hooks/file-chooser` `{ targetId?, paths, timeoutMs? }`
|
||||||
|
- `POST /hooks/dialog` `{ targetId?, accept, promptText?, timeoutMs? }`
|
||||||
|
|
||||||
|
Semantics:
|
||||||
|
- Hook endpoints **arm** the next matching event within `timeoutMs` (default 10s).
|
||||||
|
- Last arm wins per page (new arm replaces previous).
|
||||||
|
|
||||||
|
## Work checklist
|
||||||
|
|
||||||
|
- [x] Replace action endpoints with `POST /act`
|
||||||
|
- [x] Remove legacy endpoints (`/click`, `/type`, `/wait`, …) and any CLI wrappers that no longer make sense
|
||||||
|
- [x] Remove `/back` and any history-specific routes
|
||||||
|
- [x] Convert `upload` + `dialog` to hook/arming endpoints
|
||||||
|
- [x] Unify screenshots behind `POST /screenshot` (no GET variant)
|
||||||
|
- [x] Trim inspect/debug endpoints (`/query`, `/dom`) unless explicitly needed
|
||||||
|
- [x] Update docs/browser.md to describe contract without implementation details
|
||||||
|
- [x] Update tests (server + client) to cover vNext contract
|
||||||
|
|
||||||
|
## Notes / decisions
|
||||||
|
|
||||||
|
- Keep Playwright as an internal implementation detail for now.
|
||||||
|
- Prefer ref-based interactions (`aria-ref`) over coordinate-based ones.
|
||||||
|
- Keep the code split “routes vs. engine” small and obvious; avoid scattering logic across too many files.
|
||||||
Reference in New Issue
Block a user