Files
clawdbot/docs/tools/index.md
2026-01-08 23:25:51 +01:00

307 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
summary: "Agent tool surface for Clawdbot (browser, canvas, nodes, cron) replacing legacy `clawdbot-*` skills"
read_when:
- Adding or modifying agent tools
- Retiring or changing `clawdbot-*` skills
---
# Tools (Clawdbot)
Clawdbot exposes **first-class agent tools** for browser, canvas, nodes, and cron.
These replace the old `clawdbot-*` skills: the tools are typed, no shelling,
and the agent should rely on them directly.
## Disabling tools
You can globally allow/deny tools via `agent.tools` in `clawdbot.json`
(deny wins). This prevents disallowed tools from being sent to providers.
```json5
{
agent: {
tools: {
deny: ["browser"]
}
}
}
```
## Tool inventory
### `bash`
Run shell commands in the workspace.
Core parameters:
- `command` (required)
- `yieldMs` (auto-background after timeout, default 10000)
- `background` (immediate background)
- `timeout` (seconds; kills the process if exceeded, default 1800)
- `elevated` (bool; run on host if elevated mode is enabled/allowed; only changes behavior when the agent is sandboxed)
- Need a real TTY? Use the tmux skill.
Notes:
- Returns `status: "running"` with a `sessionId` when backgrounded.
- Use `process` to poll/log/write/kill/clear background sessions.
- If `process` is disallowed, `bash` runs synchronously and ignores `yieldMs`/`background`.
- `elevated` is gated by `agent.elevated` (global sender allowlist) and runs on the host.
- `elevated` only changes behavior when the agent is sandboxed (otherwise its a no-op).
### `process`
Manage background bash sessions.
Core actions:
- `list`, `poll`, `log`, `write`, `kill`, `clear`, `remove`
Notes:
- `poll` returns new output and exit status when complete.
- `log` supports line-based `offset`/`limit` (omit `offset` to grab the last N lines).
- `process` is scoped per agent; sessions from other agents are not visible.
### `browser`
Control the dedicated clawd browser.
Core actions:
- `status`, `start`, `stop`, `tabs`, `open`, `focus`, `close`
- `snapshot` (aria/ai)
- `screenshot` (returns image block + `MEDIA:<path>`)
- `act` (UI actions: click/type/press/hover/drag/select/fill/resize/wait/evaluate)
- `navigate`, `console`, `pdf`, `upload`, `dialog`
Profile management:
- `profiles` — list all browser profiles with status
- `create-profile` — create new profile with auto-allocated port (or `cdpUrl`)
- `delete-profile` — stop browser, delete user data, remove from config (local only)
- `reset-profile` — kill orphan process on profile's port (local only)
Common parameters:
- `controlUrl` (defaults from config)
- `profile` (optional; defaults to `browser.defaultProfile`)
Notes:
- Requires `browser.enabled=true` (default is `true`; set `false` to disable).
- Uses `browser.controlUrl` unless `controlUrl` is passed explicitly.
- All actions accept optional `profile` parameter for multi-instance support.
- When `profile` is omitted, uses `browser.defaultProfile` (defaults to "clawd").
- Profile names: lowercase alphanumeric + hyphens only (max 64 chars).
- Port range: 18800-18899 (~100 profiles max).
- Remote profiles are attach-only (no start/stop/reset).
- `snapshot` defaults to `ai`; use `aria` for the accessibility tree.
- `act` requires `ref` from `snapshot --format ai`; use `evaluate` for rare CSS selector needs.
- Avoid `act``wait` by default; use it only in exceptional cases (no reliable UI state to wait on).
- `upload` can optionally pass a `ref` to auto-click after arming.
- `upload` also supports `inputRef` (aria ref) or `element` (CSS selector) to set `<input type="file">` directly.
### `canvas`
Drive the node Canvas (present, eval, snapshot, A2UI).
Core actions:
- `present`, `hide`, `navigate`, `eval`
- `snapshot` (returns image block + `MEDIA:<path>`)
- `a2ui_push`, `a2ui_reset`
Notes:
- Uses gateway `node.invoke` under the hood.
- If no `node` is provided, the tool picks a default (single connected node or local mac node).
- A2UI is v0.8 only (no `createSurface`); the CLI rejects v0.9 JSONL with line errors.
- Quick smoke: `clawdbot nodes canvas a2ui push --node <id> --text "Hello from A2UI"`.
### `nodes`
Discover and target paired nodes; send notifications; capture camera/screen.
Core actions:
- `status`, `describe`
- `pending`, `approve`, `reject` (pairing)
- `notify` (macOS `system.notify`)
- `run` (macOS `system.run`)
- `camera_snap`, `camera_clip`, `screen_record`
- `location_get`
Notes:
- Camera/screen commands require the node app to be foregrounded.
- Images return image blocks + `MEDIA:<path>`.
- Videos return `FILE:<path>` (mp4).
- Location returns a JSON payload (lat/lon/accuracy/timestamp).
- `run` params: `command` argv array; optional `cwd`, `env` (`KEY=VAL`), `commandTimeoutMs`, `invokeTimeoutMs`, `needsScreenRecording`.
Example (`run`):
```json
{
"action": "run",
"node": "office-mac",
"command": ["echo", "Hello"],
"env": ["FOO=bar"],
"commandTimeoutMs": 12000,
"invokeTimeoutMs": 45000,
"needsScreenRecording": false
}
```
### `image`
Analyze an image with the configured image model.
Core parameters:
- `image` (required path or URL)
- `prompt` (optional; defaults to "Describe the image.")
- `model` (optional override)
- `maxBytesMb` (optional size cap)
Notes:
- Only available when `agent.imageModel` is configured (primary or fallbacks).
- Uses the image model directly (independent of the main chat model).
### `cron`
Manage Gateway cron jobs and wakeups.
Core actions:
- `status`, `list`
- `add`, `update`, `remove`, `run`, `runs`
- `wake` (enqueue system event + optional immediate heartbeat)
Notes:
- `add` expects a full cron job object (same schema as `cron.add` RPC).
- `update` uses `{ id, patch }`.
### `gateway`
Restart or apply updates to the running Gateway process (in-place).
Core actions:
- `restart` (sends `SIGUSR1` to the current process; `clawdbot gateway` restart in-place)
- `config.get` / `config.schema`
- `config.apply` (validate + write config + restart + wake)
- `update.run` (run update + restart + wake)
Notes:
- Use `delayMs` (defaults to 2000) to avoid interrupting an in-flight reply.
### `sessions_list` / `sessions_history` / `sessions_send` / `sessions_spawn`
List sessions, inspect transcript history, or send to another session.
Core parameters:
- `sessions_list`: `kinds?`, `limit?`, `activeMinutes?`, `messageLimit?` (0 = none)
- `sessions_history`: `sessionKey`, `limit?`, `includeTools?`
- `sessions_send`: `sessionKey`, `message`, `timeoutSeconds?` (0 = fire-and-forget)
- `sessions_spawn`: `task`, `label?`, `agentId?`, `model?`, `runTimeoutSeconds?`, `cleanup?`
Notes:
- `main` is the canonical direct-chat key; global/unknown are hidden.
- `messageLimit > 0` fetches last N messages per session (tool messages filtered).
- `sessions_send` waits for final completion when `timeoutSeconds > 0`.
- `sessions_spawn` starts a sub-agent run and posts an announce reply back to the requester chat.
- `sessions_spawn` is non-blocking and returns `status: "accepted"` immediately.
- `sessions_send` runs a replyback pingpong (reply `REPLY_SKIP` to stop; max turns via `session.agentToAgent.maxPingPongTurns`, 05).
- After the pingpong, the target agent runs an **announce step**; reply `ANNOUNCE_SKIP` to suppress the announcement.
### `agents_list`
List agent ids that the current session may target with `sessions_spawn`.
Notes:
- Result is restricted to per-agent allowlists (`routing.agents.<agentId>.subagents.allowAgents`).
- When `["*"]` is configured, the tool includes all configured agents and marks `allowAny: true`.
### `discord`
Send Discord reactions, stickers, or polls.
Core actions:
- `react` (`channelId`, `messageId`, `emoji`)
- `reactions` (`channelId`, `messageId`, optional `limit`)
- `sticker` (`to`, `stickerIds`, optional `content`)
- `poll` (`to`, `question`, `answers`, optional `allowMultiselect`, `durationHours`, `content`)
- `permissions` (`channelId`)
- `readMessages` (`channelId`, optional `limit`/`before`/`after`/`around`)
- `sendMessage` (`to`, `content`, optional `mediaUrl`, `replyTo`)
- `editMessage` (`channelId`, `messageId`, `content`)
- `deleteMessage` (`channelId`, `messageId`)
- `threadCreate` (`channelId`, `name`, optional `messageId`, `autoArchiveMinutes`)
- `threadList` (`guildId`, optional `channelId`, `includeArchived`, `before`, `limit`)
- `threadReply` (`channelId`, `content`, optional `mediaUrl`, `replyTo`)
- `pinMessage`/`unpinMessage` (`channelId`, `messageId`)
- `listPins` (`channelId`)
- `searchMessages` (`guildId`, `content`, optional `channelId`/`channelIds`, `authorId`/`authorIds`, `limit`)
- `memberInfo` (`guildId`, `userId`)
- `roleInfo` (`guildId`)
- `emojiList` (`guildId`)
- `roleAdd`/`roleRemove` (`guildId`, `userId`, `roleId`)
- `channelInfo` (`channelId`)
- `channelList` (`guildId`)
- `voiceStatus` (`guildId`, `userId`)
- `eventList` (`guildId`)
- `eventCreate` (`guildId`, `name`, `startTime`, optional `endTime`, `description`, `channelId`, `entityType`, `location`)
- `timeout` (`guildId`, `userId`, optional `durationMinutes`, `until`, `reason`)
- `kick` (`guildId`, `userId`, optional `reason`)
- `ban` (`guildId`, `userId`, optional `reason`, `deleteMessageDays`)
Notes:
- `to` accepts `channel:<id>` or `user:<id>`.
- Polls require 210 answers and default to 24 hours.
- `reactions` returns per-emoji user lists (limited to 100 per reaction).
- Reaction removal semantics: see [/tools/reactions](/tools/reactions).
- `discord.actions.*` gates Discord tool actions; `roles` + `moderation` default to `false`.
- `searchMessages` follows the Discord preview feature constraints (limit max 25, channel/author filters accept arrays).
- The tool is only exposed when the current provider is Discord.
### `whatsapp`
Send WhatsApp reactions.
Core actions:
- `react` (`chatJid`, `messageId`, `emoji`, optional `remove`, `participant`, `fromMe`, `accountId`)
Notes:
- Reaction removal semantics: see [/tools/reactions](/tools/reactions).
- `whatsapp.actions.*` gates WhatsApp tool actions.
- The tool is only exposed when the current provider is WhatsApp.
### `telegram`
Send Telegram messages or reactions.
Core actions:
- `sendMessage` (`to`, `content`, optional `mediaUrl`, `replyToMessageId`, `messageThreadId`)
- `react` (`chatId`, `messageId`, `emoji`, optional `remove`)
Notes:
- Reaction removal semantics: see [/tools/reactions](/tools/reactions).
- `telegram.actions.*` gates Telegram tool actions.
- The tool is only exposed when the current provider is Telegram.
## Parameters (common)
Gateway-backed tools (`canvas`, `nodes`, `cron`):
- `gatewayUrl` (default `ws://127.0.0.1:18789`)
- `gatewayToken` (if auth enabled)
- `timeoutMs`
Browser tool:
- `controlUrl` (defaults from config)
## Recommended agent flows
Browser automation:
1) `browser``status` / `start`
2) `snapshot` (ai or aria)
3) `act` (click/type/press)
4) `screenshot` if you need visual confirmation
Canvas render:
1) `canvas``present`
2) `a2ui_push` (optional)
3) `snapshot`
Node targeting:
1) `nodes``status`
2) `describe` on the chosen node
3) `notify` / `run` / `camera_snap` / `screen_record`
## Safety
- Avoid direct `system.run`; use `nodes``run` only with explicit user consent.
- Respect user consent for camera/screen capture.
- Use `status/describe` to ensure permissions before invoking media commands.
## How tools are presented to the agent
Tools are exposed in two parallel channels:
1) **System prompt text**: a human-readable list + guidance.
2) **Tool schema**: the structured function definitions sent to the model API.
That means the agent sees both “what tools exist” and “how to call them.” If a tool
doesnt appear in the system prompt or the schema, the model cannot call it.