feat: add cache-ttl pruning mode

This commit is contained in:
Peter Steinberger
2026-01-21 19:44:20 +00:00
parent c415ccaed5
commit 9f59ff325b
18 changed files with 164 additions and 123 deletions

View File

@@ -9,8 +9,10 @@ read_when:
Session pruning trims **old tool results** from the in-memory context right before each LLM call. It does **not** rewrite the on-disk session history (`*.jsonl`).
## When it runs
- Before each LLM request (context hook).
- When `mode: "cache-ttl"` is enabled and the last Anthropic call for the session is older than `ttl`.
- Only affects the messages sent to the model for that request.
- Only active for Anthropic API calls (and OpenRouter Anthropic models).
- For best results, match `ttl` to your model `cacheControlTtl`.
## What can be pruned
- Only `toolResult` messages.
@@ -26,14 +28,10 @@ Pruning uses an estimated context window (chars ≈ tokens × 4). The window siz
3) `agents.defaults.contextTokens`.
4) Default `200000` tokens.
## Modes
### adaptive
- If estimated context ratio ≥ `softTrimRatio`: soft-trim oversized tool results.
- If still ≥ `hardClearRatio` **and** prunable tool text ≥ `minPrunableToolChars`: hard-clear oldest eligible tool results.
### aggressive
- Always hard-clears eligible tool results before the cutoff.
- Ignores `hardClear.enabled` (always clears when eligible).
## Mode
### cache-ttl
- Pruning only runs if the last Anthropic call is older than `ttl` (default `5m`).
- When it runs: same soft-trim + hard-clear behavior as before.
## Soft vs hard pruning
- **Soft-trim**: only for oversized tool results.
@@ -52,6 +50,7 @@ Pruning uses an estimated context window (chars ≈ tokens × 4). The window siz
- Compaction is separate: compaction summarizes and persists, pruning is transient per request. See [/concepts/compaction](/concepts/compaction).
## Defaults (when enabled)
- `ttl`: `"5m"`
- `keepLastAssistants`: `3`
- `softTrimRatio`: `0.3`
- `hardClearRatio`: `0.5`
@@ -60,16 +59,7 @@ Pruning uses an estimated context window (chars ≈ tokens × 4). The window siz
- `hardClear`: `{ enabled: true, placeholder: "[Old tool result content cleared]" }`
## Examples
Default (adaptive):
```json5
{
agent: {
contextPruning: { mode: "adaptive" }
}
}
```
To disable:
Default (off):
```json5
{
agent: {
@@ -78,11 +68,11 @@ To disable:
}
```
Aggressive:
Enable TTL-aware pruning:
```json5
{
agent: {
contextPruning: { mode: "aggressive" }
contextPruning: { mode: "cache-ttl", ttl: "5m" }
}
}
```
@@ -92,7 +82,7 @@ Restrict pruning to specific tools:
{
agent: {
contextPruning: {
mode: "adaptive",
mode: "cache-ttl",
tools: { allow: ["exec", "read"], deny: ["*image*"] }
}
}