feat: add cache-ttl pruning mode

This commit is contained in:
Peter Steinberger
2026-01-21 19:44:20 +00:00
parent c415ccaed5
commit 9f59ff325b
18 changed files with 164 additions and 123 deletions

View File

@@ -9,8 +9,10 @@ read_when:
Session pruning trims **old tool results** from the in-memory context right before each LLM call. It does **not** rewrite the on-disk session history (`*.jsonl`).
## When it runs
- Before each LLM request (context hook).
- When `mode: "cache-ttl"` is enabled and the last Anthropic call for the session is older than `ttl`.
- Only affects the messages sent to the model for that request.
- Only active for Anthropic API calls (and OpenRouter Anthropic models).
- For best results, match `ttl` to your model `cacheControlTtl`.
## What can be pruned
- Only `toolResult` messages.
@@ -26,14 +28,10 @@ Pruning uses an estimated context window (chars ≈ tokens × 4). The window siz
3) `agents.defaults.contextTokens`.
4) Default `200000` tokens.
## Modes
### adaptive
- If estimated context ratio ≥ `softTrimRatio`: soft-trim oversized tool results.
- If still ≥ `hardClearRatio` **and** prunable tool text ≥ `minPrunableToolChars`: hard-clear oldest eligible tool results.
### aggressive
- Always hard-clears eligible tool results before the cutoff.
- Ignores `hardClear.enabled` (always clears when eligible).
## Mode
### cache-ttl
- Pruning only runs if the last Anthropic call is older than `ttl` (default `5m`).
- When it runs: same soft-trim + hard-clear behavior as before.
## Soft vs hard pruning
- **Soft-trim**: only for oversized tool results.
@@ -52,6 +50,7 @@ Pruning uses an estimated context window (chars ≈ tokens × 4). The window siz
- Compaction is separate: compaction summarizes and persists, pruning is transient per request. See [/concepts/compaction](/concepts/compaction).
## Defaults (when enabled)
- `ttl`: `"5m"`
- `keepLastAssistants`: `3`
- `softTrimRatio`: `0.3`
- `hardClearRatio`: `0.5`
@@ -60,16 +59,7 @@ Pruning uses an estimated context window (chars ≈ tokens × 4). The window siz
- `hardClear`: `{ enabled: true, placeholder: "[Old tool result content cleared]" }`
## Examples
Default (adaptive):
```json5
{
agent: {
contextPruning: { mode: "adaptive" }
}
}
```
To disable:
Default (off):
```json5
{
agent: {
@@ -78,11 +68,11 @@ To disable:
}
```
Aggressive:
Enable TTL-aware pruning:
```json5
{
agent: {
contextPruning: { mode: "aggressive" }
contextPruning: { mode: "cache-ttl", ttl: "5m" }
}
}
```
@@ -92,7 +82,7 @@ Restrict pruning to specific tools:
{
agent: {
contextPruning: {
mode: "adaptive",
mode: "cache-ttl",
tools: { allow: ["exec", "read"], deny: ["*image*"] }
}
}

View File

@@ -1414,7 +1414,7 @@ Each `agents.defaults.models` entry can include:
- `alias` (optional model shortcut, e.g. `/opus`).
- `params` (optional provider-specific API params passed through to the model request).
`params` is also applied to streaming runs (embedded agent + compaction). Supported keys today: `temperature`, `maxTokens`, `cacheControlTtl` (`"5m"` or `"1h"`, Anthropic API + OpenRouter Anthropic models only; ignored for Anthropic OAuth/Claude Code tokens). These merge with call-time options; caller-supplied values win. `temperature` is an advanced knob—leave unset unless you know the models defaults and need a change. Anthropic API defaults to `"1h"` unless you override (`cacheControlTtl: "5m"`). Clawdbot includes the `extended-cache-ttl-2025-04-11` beta flag for Anthropic API; keep it if you override provider headers.
`params` is also applied to streaming runs (embedded agent + compaction). Supported keys today: `temperature`, `maxTokens`, `cacheControlTtl` (`"5m"`, Anthropic API + OpenRouter Anthropic models only; ignored for Anthropic OAuth/Claude Code tokens). These merge with call-time options; caller-supplied values win. `temperature` is an advanced knob—leave unset unless you know the models defaults and need a change. Clawdbot includes the `extended-cache-ttl-2025-04-11` beta flag for Anthropic API; keep it if you override provider headers.
Example:
@@ -1569,7 +1569,7 @@ Example:
}
```
#### `agents.defaults.contextPruning` (tool-result pruning)
#### `agents.defaults.contextPruning` (TTL-aware tool-result pruning)
`agents.defaults.contextPruning` prunes **old tool results** from the in-memory context right before a request is sent to the LLM.
It does **not** modify the session history on disk (`*.jsonl` remains complete).
@@ -1580,11 +1580,9 @@ High level:
- Never touches user/assistant messages.
- Protects the last `keepLastAssistants` assistant messages (no tool results after that point are pruned).
- Protects the bootstrap prefix (nothing before the first user message is pruned).
- Modes:
- `adaptive`: soft-trims oversized tool results (keep head/tail) when the estimated context ratio crosses `softTrimRatio`.
Then hard-clears the oldest eligible tool results when the estimated context ratio crosses `hardClearRatio` **and**
theres enough prunable tool-result bulk (`minPrunableToolChars`).
- `aggressive`: always replaces eligible tool results before the cutoff with the `hardClear.placeholder` (no ratio checks).
- Mode:
- `cache-ttl`: pruning only runs when the last Anthropic call for the session is **older** than `ttl`.
When it runs, it uses the same soft-trim + hard-clear behavior as before.
Soft vs hard pruning (what changes in the context sent to the LLM):
- **Soft-trim**: only for *oversized* tool results. Keeps the beginning + end and inserts `...` in the middle.
@@ -1598,44 +1596,40 @@ Notes / current limitations:
- Tool results containing **image blocks are skipped** (never trimmed/cleared) right now.
- The estimated “context ratio” is based on **characters** (approximate), not exact tokens.
- If the session doesnt contain at least `keepLastAssistants` assistant messages yet, pruning is skipped.
- In `aggressive` mode, `hardClear.enabled` is ignored (eligible tool results are always replaced with `hardClear.placeholder`).
- `cache-ttl` only activates for Anthropic API calls (and OpenRouter Anthropic models).
- For best results, match `contextPruning.ttl` to the model `cacheControlTtl` you set in `agents.defaults.models.*.params`.
Default (adaptive):
```json5
{
agents: { defaults: { contextPruning: { mode: "adaptive" } } }
}
```
To disable:
Default (off):
```json5
{
agents: { defaults: { contextPruning: { mode: "off" } } }
}
```
Defaults (when `mode` is `"adaptive"` or `"aggressive"`):
- `keepLastAssistants`: `3`
- `softTrimRatio`: `0.3` (adaptive only)
- `hardClearRatio`: `0.5` (adaptive only)
- `minPrunableToolChars`: `50000` (adaptive only)
- `softTrim`: `{ maxChars: 4000, headChars: 1500, tailChars: 1500 }` (adaptive only)
- `hardClear`: `{ enabled: true, placeholder: "[Old tool result content cleared]" }`
Example (aggressive, minimal):
Enable TTL-aware pruning:
```json5
{
agents: { defaults: { contextPruning: { mode: "aggressive" } } }
agents: { defaults: { contextPruning: { mode: "cache-ttl" } } }
}
```
Example (adaptive tuned):
Defaults (when `mode` is `"cache-ttl"`):
- `ttl`: `"5m"`
- `keepLastAssistants`: `3`
- `softTrimRatio`: `0.3`
- `hardClearRatio`: `0.5`
- `minPrunableToolChars`: `50000`
- `softTrim`: `{ maxChars: 4000, headChars: 1500, tailChars: 1500 }`
- `hardClear`: `{ enabled: true, placeholder: "[Old tool result content cleared]" }`
Example (cache-ttl tuned):
```json5
{
agents: {
defaults: {
contextPruning: {
mode: "adaptive",
mode: "cache-ttl",
ttl: "5m",
keepLastAssistants: 3,
softTrimRatio: 0.3,
hardClearRatio: 0.5,

View File

@@ -36,10 +36,10 @@ clawdbot onboard --anthropic-api-key "$ANTHROPIC_API_KEY"
## Prompt caching (Anthropic API)
Clawdbot enables **1-hour prompt caching by default** for Anthropic API keys.
Clawdbot does **not** override Anthropics default cache TTL unless you set it.
This is **API-only**; Claude Code CLI OAuth ignores TTL settings.
To override the TTL per model, set `cacheControlTtl` in the model `params`:
To set the TTL per model, use `cacheControlTtl` in the model `params`:
```json5
{
@@ -47,7 +47,7 @@ To override the TTL per model, set `cacheControlTtl` in the model `params`:
defaults: {
models: {
"anthropic/claude-opus-4-5": {
params: { cacheControlTtl: "5m" } // or "1h"
params: { cacheControlTtl: "5m" }
}
}
}