feat(browser): add native action commands

This commit is contained in:
Peter Steinberger
2025-12-20 00:53:45 +00:00
parent d67bec0740
commit a526d3c1f2
26 changed files with 2589 additions and 1234 deletions

View File

@@ -97,4 +97,4 @@ git commit -m "Add Clawd workspace"
- Keep heartbeats enabled so the assistant can schedule reminders, monitor inboxes, and trigger camera captures.
- For browser-driven verification, use `clawdis browser` (tabs/status/screenshot) with the clawd-managed Chrome profile.
- For DOM inspection, use `clawdis browser eval|query|dom|snapshot` (and `--json`/`--out` when you need machine output).
- For advanced actions, use `clawdis browser tool browser_* --args '{...}'` (Playwright MCP parity).
- For interactions, use `clawdis browser click|type|hover|drag|select|upload|press|wait|navigate|back|evaluate|run`.

View File

@@ -1,5 +1,5 @@
---
summary: "Spec: integrated browser control server + MCP tool dispatch"
summary: "Spec: integrated browser control server + action commands"
read_when:
- Adding agent-controlled browser automation
- Debugging why clawd is interfering with your own Chrome
@@ -98,45 +98,45 @@ Fallback behavior:
the user set the profile color/name once via Chrome UI; it must persist because
the `userDataDir` is persistent.
## Control server contract (proposed)
## Control server contract (current)
Expose a small local HTTP API (and/or gateway RPC surface) so the agent can manage
state without touching the user's Chrome.
Minimum endpoints/methods (names illustrative):
Basics:
- `GET /` status payload (enabled/running/pid/cdpPort/etc)
- `POST /start` start browser
- `POST /stop` stop browser
- `GET /tabs` list tabs
- `POST /tabs/open` open a new tab
- `POST /tabs/focus` focus a tab by id/prefix
- `DELETE /tabs/:targetId` close a tab by id/prefix
- `POST /close` close the current tab (optional targetId in body)
- `browser.status`
- returns: `{ enabled, url, running, pid?, version?, chosenBrowser?, userDataDir?, ports: { control, cdp } }`
- `browser.start`
- starts the browser-control server + browser (no-op if already running)
- `browser.stop`
- stops the server and closes the clawd browser (best-effort; graceful first, then force if needed)
- `browser.tabs.list`
- returns: array of `{ targetId, title, url, isActive, lastFocusedAt? }`
- `browser.tabs.open`
- params: `{ url, newTab?: true }` → returns `{ targetId }`
- `browser.tabs.focus`
- params: `{ targetId }`
- `browser.tabs.close`
- params: `{ targetId }`
- `browser.screenshot`
- params: `{ targetId?, fullPage?: false }` → returns a `MEDIA:` attachment URL (via the existing Clawdis media host)
Inspection:
- `GET /screenshot` (CDP screenshot)
- `POST /screenshot` (Playwright screenshot with ref/element)
- `POST /eval` (CDP evaluate)
- `GET /query`
- `GET /dom`
- `GET /snapshot` (`aria` | `domSnapshot` | `ai`)
DOM + inspection (v1):
- `browser.eval`
- params: `{ js, targetId?, await?: false }` → returns the CDP `Runtime.evaluate` result (best-effort `returnByValue`)
- `browser.query`
- params: `{ selector, targetId?, limit? }` → returns basic element summaries (tag/id/class/text/value/href/outerHTML)
- `browser.dom`
- params: `{ format: "html"|"text", targetId?, selector?, maxChars? }` → returns a truncated dump (`text` field)
- `browser.snapshot`
- params: `{ format: "aria"|"domSnapshot", targetId?, limit? }`
- `aria`: simplified Accessibility tree with `backendDOMNodeId` when available (future click/type hooks)
- `domSnapshot`: lightweight DOM walk snapshot (tree-ish, bounded by `limit`)
Nice-to-have (later):
- `browser.click` / `browser.type` / `browser.waitFor` helpers built atop snapshot refs / backend node ids
- `browser.tool` dispatch that mirrors Playwright MCP tool names for quick feature parity
Actions:
- `POST /navigate`, `POST /back`
- `POST /resize`
- `POST /click`, `POST /type`, `POST /press`, `POST /hover`, `POST /drag`, `POST /select`
- `POST /upload` (file chooser modal must be open)
- `POST /fill` (JSON field descriptors)
- `POST /dialog` (alert/confirm/prompt)
- `POST /wait` (time/text/textGone)
- `POST /evaluate` (function + optional ref)
- `POST /run` (function(page) → result)
- `GET /console`, `GET /network`
- `POST /trace/start`, `POST /trace/stop`
- `POST /pdf`
- `POST /verify/element`, `POST /verify/text`, `POST /verify/list`, `POST /verify/value`
- `POST /mouse/move`, `POST /mouse/click`, `POST /mouse/drag`
- `POST /locator` (generate Playwright locator)
### "Is it open or closed?"
@@ -163,54 +163,60 @@ The agent should not assume tabs are ephemeral. It should:
- reuse an existing tab when appropriate (e.g. a persistent "main" tab)
- avoid opening duplicate tabs unless asked
## Tool dispatch (Playwright MCP parity)
## CLI quick reference (one example each)
Clawdis exposes a generic tool dispatcher for Playwright MCP-style tools:
Basics:
- `clawdis browser status`
- `clawdis browser start`
- `clawdis browser stop`
- `clawdis browser tabs`
- `clawdis browser open https://example.com`
- `clawdis browser focus abcd1234`
- `clawdis browser close abcd1234`
`POST /tool` with JSON `{ name: "browser_*", args: { ... }, targetId?: "..." }`
Inspection:
- `clawdis browser screenshot`
- `clawdis browser screenshot --full-page`
- `clawdis browser screenshot --ref 12`
- `clawdis browser eval "document.title"`
- `clawdis browser query "a" --limit 5`
- `clawdis browser dom --format text --max-chars 5000`
- `clawdis browser snapshot --format aria --limit 200`
- `clawdis browser snapshot --format ai`
CLI helper:
`clawdis browser tool browser_* --args '{...}'`
Supported tool names:
- `browser_close`
- `browser_resize`
- `browser_console_messages`
- `browser_network_requests`
- `browser_handle_dialog`
- `browser_evaluate`
- `browser_file_upload`
- `browser_fill_form`
- `browser_install` (no-op; uses system Chrome/Chromium)
- `browser_press_key`
- `browser_type`
- `browser_navigate`
- `browser_navigate_back`
- `browser_run_code`
- `browser_take_screenshot`
- `browser_snapshot`
- `browser_click`
- `browser_drag`
- `browser_hover`
- `browser_select_option`
- `browser_tabs`
- `browser_wait_for`
- `browser_pdf_save`
- `browser_start_tracing`
- `browser_stop_tracing`
- `browser_verify_element_visible`
- `browser_verify_text_visible`
- `browser_verify_list_visible`
- `browser_verify_value`
- `browser_mouse_move_xy`
- `browser_mouse_click_xy`
- `browser_mouse_drag_xy`
- `browser_generate_locator`
Actions:
- `clawdis browser navigate https://example.com`
- `clawdis browser back`
- `clawdis browser resize 1280 720`
- `clawdis browser click 12 --double`
- `clawdis browser type 23 "hello" --submit`
- `clawdis browser press Enter`
- `clawdis browser hover 44`
- `clawdis browser drag 10 11`
- `clawdis browser select 9 OptionA OptionB`
- `clawdis browser upload /tmp/file.pdf`
- `clawdis browser fill --fields '[{\"ref\":\"1\",\"value\":\"Ada\"}]'`
- `clawdis browser dialog --accept`
- `clawdis browser wait --text "Done"`
- `clawdis browser evaluate --fn '(el) => el.textContent' --ref 7`
- `clawdis browser run --code '(page) => page.title()'`
- `clawdis browser console --level error`
- `clawdis browser network --include-static`
- `clawdis browser trace-start`
- `clawdis browser trace-stop`
- `clawdis browser pdf`
- `clawdis browser verify-element --role button --name "Submit"`
- `clawdis browser verify-text "Welcome"`
- `clawdis browser verify-list 3 ItemA ItemB`
- `clawdis browser verify-value --ref 4 --type textbox --value hello`
- `clawdis browser mouse-move --x 120 --y 240`
- `clawdis browser mouse-click --x 120 --y 240`
- `clawdis browser mouse-drag --start-x 10 --start-y 20 --end-x 200 --end-y 300`
- `clawdis browser locator 77`
Notes:
- `browser_file_upload` and `browser_handle_dialog` are modal-only; they only
work when a file chooser/dialog modal state is present.
- `browser_snapshot` returns a Playwright-for-AI snapshot (use for follow-up actions).
- `upload` and `dialog` only work when a file chooser or dialog is present.
- `snapshot --format ai` returns Playwright-for-AI markup used for ref-based actions.
## Security & privacy notes

View File

@@ -1,13 +0,0 @@
---
summary: "Redirect: /clawd.md → /clawd"
permalink: /clawd.md
---
<!-- {% raw %} -->
<script>
window.location.replace("{{ \"/clawd\" | relative_url }}");
</script>
If youre not redirected automatically, go to
<a href="{{ \"/clawd\" | relative_url }}">/clawd</a>.
<!-- {% endraw %} -->

View File

@@ -1,11 +0,0 @@
---
summary: "Redirect: mac/browser.md → browser.md"
read_when:
- Adding agent-controlled browser automation
- Debugging why clawd is interfering with your own Chrome
- Implementing browser settings + lifecycle in the macOS app
---
# Browser (macOS app) — moved
This doc moved to `docs/browser.md`.