feat: improve web_fetch readability extraction

2026-01-16 23:17:55 +00:00
parent 9aad6dfe1b
commit 37fa4f7eef
9 changed files with 242 additions and 8 deletions
--- a/docs/gateway/configuration.md
+++ b/docs/gateway/configuration.md
@@ -1709,11 +1709,12 @@ Legacy: `tools.bash` is still accepted as an alias.
 - `tools.web.search.maxResults` (1–10, default 5)
 - `tools.web.search.timeoutSeconds` (default 30)
 - `tools.web.search.cacheTtlMinutes` (default 15)
- `tools.web.fetch.enabled` (default false; sandboxed sessions auto-enable unless set to false)
+- `tools.web.fetch.enabled` (default true)
 - `tools.web.fetch.maxChars` (default 50000)
 - `tools.web.fetch.timeoutSeconds` (default 30)
 - `tools.web.fetch.cacheTtlMinutes` (default 15)
 - `tools.web.fetch.userAgent` (optional override)
+- `tools.web.fetch.readability` (default true; disable to use basic HTML cleanup only)

 `agents.defaults.subagents` configures sub-agent defaults:
 - `model`: default model for spawned sub-agents (string or `{ primary, fallbacks }`). If omitted, sub-agents inherit the caller’s model unless overridden per agent or per call.
--- a/docs/tools/web.md
+++ b/docs/tools/web.md
@@ -116,7 +116,8 @@ Fetch a URL and extract readable content.
        maxChars: 50000,
        timeoutSeconds: 30,
        cacheTtlMinutes: 15,
-        userAgent: "clawdbot/2026.1.15"
+        userAgent: "clawdbot/2026.1.15",
+        readability: true
      }
    }
  }
@@ -130,7 +131,8 @@ Fetch a URL and extract readable content.
 - `maxChars` (truncate long pages)

 Notes:
+- `web_fetch` uses Readability (main-content extraction) by default and falls back to basic HTML cleanup if it fails.
 - `web_fetch` is best-effort extraction; some sites will need the browser tool.
 - Responses are cached (default 15 minutes) to reduce repeated fetches.
 - If you use tool profiles/allowlists, add `web_search`/`web_fetch` or `group:web`.
- - If the Brave key is missing, `web_search` returns a short setup hint with a docs link.
+- If the Brave key is missing, `web_search` returns a short setup hint with a docs link.