feat: improve web_fetch readability extraction

This commit is contained in:
Peter Steinberger
2026-01-16 23:17:55 +00:00
parent 9aad6dfe1b
commit 37fa4f7eef
9 changed files with 242 additions and 8 deletions

View File

@@ -1709,11 +1709,12 @@ Legacy: `tools.bash` is still accepted as an alias.
- `tools.web.search.maxResults` (110, default 5)
- `tools.web.search.timeoutSeconds` (default 30)
- `tools.web.search.cacheTtlMinutes` (default 15)
- `tools.web.fetch.enabled` (default false; sandboxed sessions auto-enable unless set to false)
- `tools.web.fetch.enabled` (default true)
- `tools.web.fetch.maxChars` (default 50000)
- `tools.web.fetch.timeoutSeconds` (default 30)
- `tools.web.fetch.cacheTtlMinutes` (default 15)
- `tools.web.fetch.userAgent` (optional override)
- `tools.web.fetch.readability` (default true; disable to use basic HTML cleanup only)
`agents.defaults.subagents` configures sub-agent defaults:
- `model`: default model for spawned sub-agents (string or `{ primary, fallbacks }`). If omitted, sub-agents inherit the callers model unless overridden per agent or per call.

View File

@@ -116,7 +116,8 @@ Fetch a URL and extract readable content.
maxChars: 50000,
timeoutSeconds: 30,
cacheTtlMinutes: 15,
userAgent: "clawdbot/2026.1.15"
userAgent: "clawdbot/2026.1.15",
readability: true
}
}
}
@@ -130,7 +131,8 @@ Fetch a URL and extract readable content.
- `maxChars` (truncate long pages)
Notes:
- `web_fetch` uses Readability (main-content extraction) by default and falls back to basic HTML cleanup if it fails.
- `web_fetch` is best-effort extraction; some sites will need the browser tool.
- Responses are cached (default 15 minutes) to reduce repeated fetches.
- If you use tool profiles/allowlists, add `web_search`/`web_fetch` or `group:web`.
- If the Brave key is missing, `web_search` returns a short setup hint with a docs link.
- If the Brave key is missing, `web_search` returns a short setup hint with a docs link.