59 lines
1.6 KiB
Markdown
59 lines
1.6 KiB
Markdown
---
|
|
summary: "Firecrawl fallback for web_fetch (anti-bot + cached extraction)"
|
|
read_when:
|
|
- You want Firecrawl-backed web extraction
|
|
- You need a Firecrawl API key
|
|
- You want anti-bot extraction for web_fetch
|
|
---
|
|
|
|
# Firecrawl
|
|
|
|
Clawdbot can use **Firecrawl** as a fallback extractor for `web_fetch`. It is a hosted
|
|
content extraction service that supports bot circumvention and caching, which helps
|
|
with JS-heavy sites or pages that block plain HTTP fetches.
|
|
|
|
## Get an API key
|
|
|
|
1) Create a Firecrawl account and generate an API key.
|
|
2) Store it in config or set `FIRECRAWL_API_KEY` in the gateway environment.
|
|
|
|
## Configure Firecrawl
|
|
|
|
```json5
|
|
{
|
|
tools: {
|
|
web: {
|
|
fetch: {
|
|
firecrawl: {
|
|
apiKey: "FIRECRAWL_API_KEY_HERE",
|
|
baseUrl: "https://api.firecrawl.dev",
|
|
onlyMainContent: true,
|
|
maxAgeMs: 172800000,
|
|
timeoutSeconds: 60
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Notes:
|
|
- `firecrawl.enabled` defaults to true when an API key is present.
|
|
- `maxAgeMs` controls how old cached results can be (ms). Default is 2 days.
|
|
|
|
## Stealth / bot circumvention
|
|
|
|
Firecrawl exposes a **proxy mode** parameter for bot circumvention (`basic`, `stealth`, or `auto`).
|
|
Clawdbot always uses `proxy: "auto"` plus `storeInCache: true` for Firecrawl requests.
|
|
If proxy is omitted, Firecrawl defaults to `auto`. `auto` retries with stealth proxies if a basic attempt fails, which may use more credits
|
|
than basic-only scraping.
|
|
|
|
## How `web_fetch` uses Firecrawl
|
|
|
|
`web_fetch` extraction order:
|
|
1) Readability (local)
|
|
2) Firecrawl (if configured)
|
|
3) Basic HTML cleanup (last fallback)
|
|
|
|
See [Web tools](/tools/web) for the full web tool setup.
|