feat: improve web_fetch readability extraction

This commit is contained in:
Peter Steinberger
2026-01-16 23:17:55 +00:00
parent 9aad6dfe1b
commit 37fa4f7eef
9 changed files with 242 additions and 8 deletions

View File

@@ -116,7 +116,8 @@ Fetch a URL and extract readable content.
maxChars: 50000,
timeoutSeconds: 30,
cacheTtlMinutes: 15,
userAgent: "clawdbot/2026.1.15"
userAgent: "clawdbot/2026.1.15",
readability: true
}
}
}
@@ -130,7 +131,8 @@ Fetch a URL and extract readable content.
- `maxChars` (truncate long pages)
Notes:
- `web_fetch` uses Readability (main-content extraction) by default and falls back to basic HTML cleanup if it fails.
- `web_fetch` is best-effort extraction; some sites will need the browser tool.
- Responses are cached (default 15 minutes) to reduce repeated fetches.
- If you use tool profiles/allowlists, add `web_search`/`web_fetch` or `group:web`.
- If the Brave key is missing, `web_search` returns a short setup hint with a docs link.
- If the Brave key is missing, `web_search` returns a short setup hint with a docs link.