2.5 KiB
2.5 KiB
summary, read_when, date
| summary | read_when | date | ||
|---|---|---|---|---|
| Refactor: simplify browser control API + implementation |
|
2025-12-20 |
Refactor: Browser control simplification
Goal: make the browser-control surface small, stable, and agent-oriented, and remove “implementation-shaped” APIs (Playwright/CDP specifics, one-off endpoints, and debugging helpers).
Why
- The previous API accreted many narrow endpoints (
/click,/type,/press, …) plus debug utilities. - Some actions are inherently racy when modeled as “do X when the event is already visible” (file chooser, dialogs).
- We want a single, coherent contract that keeps “how it’s implemented” private.
Target contract (vNext)
Basics
GET /statusPOST /start,POST /stopGET /tabs,POST /tabs/open,POST /tabs/focus,DELETE /tabs/:targetId
Agent actions
POST /navigate{ url, targetId? }POST /act{ kind, targetId?, ... }wherekindis one of:click,type,press,hover,drag,select,fill,wait,resize,close,evaluate
POST /screenshot{ targetId?, fullPage?, ref?, element?, type?, filename? }GET /snapshot?format=ai|aria&targetId?&limit?GET /console?level?&targetId?POST /pdf{ targetId? }
Hooks (pre-setup / arming)
POST /hooks/file-chooser{ targetId?, paths, timeoutMs? }POST /hooks/dialog{ targetId?, accept, promptText?, timeoutMs? }
Semantics:
- Hook endpoints arm the next matching event within
timeoutMs(default 10s). - Last arm wins per page (new arm replaces previous).
Work checklist
- Replace action endpoints with
POST /act - Remove legacy endpoints (
/click,/type,/wait, …) and any CLI wrappers that no longer make sense - Remove
/backand any history-specific routes - Convert
upload+dialogto hook/arming endpoints - Unify screenshots behind
POST /screenshot(no GET variant) - Trim inspect/debug endpoints (
/query,/dom) unless explicitly needed - Update docs/browser.md to describe contract without implementation details
- Update tests (server + client) to cover vNext contract
Notes / decisions
- Keep Playwright as an internal implementation detail for now.
- Prefer ref-based interactions (
aria-ref) over coordinate-based ones. - Keep the code split “routes vs. engine” small and obvious; avoid scattering logic across too many files.