feat: improve web_fetch readability extraction

2026-01-16 23:17:55 +00:00
parent 9aad6dfe1b
commit 37fa4f7eef
9 changed files with 242 additions and 8 deletions
--- a/docs/tools/web.md
+++ b/docs/tools/web.md
@@ -116,7 +116,8 @@ Fetch a URL and extract readable content.
        maxChars: 50000,
        timeoutSeconds: 30,
        cacheTtlMinutes: 15,
-        userAgent: "clawdbot/2026.1.15"
+        userAgent: "clawdbot/2026.1.15",
+        readability: true
      }
    }
  }
@@ -130,7 +131,8 @@ Fetch a URL and extract readable content.
 - `maxChars` (truncate long pages)

 Notes:
+- `web_fetch` uses Readability (main-content extraction) by default and falls back to basic HTML cleanup if it fails.
 - `web_fetch` is best-effort extraction; some sites will need the browser tool.
 - Responses are cached (default 15 minutes) to reduce repeated fetches.
 - If you use tool profiles/allowlists, add `web_search`/`web_fetch` or `group:web`.
- - If the Brave key is missing, `web_search` returns a short setup hint with a docs link.
+- If the Brave key is missing, `web_search` returns a short setup hint with a docs link.