feat: add cache-ttl pruning mode
This commit is contained in:
@@ -9,8 +9,10 @@ read_when:
|
||||
Session pruning trims **old tool results** from the in-memory context right before each LLM call. It does **not** rewrite the on-disk session history (`*.jsonl`).
|
||||
|
||||
## When it runs
|
||||
- Before each LLM request (context hook).
|
||||
- When `mode: "cache-ttl"` is enabled and the last Anthropic call for the session is older than `ttl`.
|
||||
- Only affects the messages sent to the model for that request.
|
||||
- Only active for Anthropic API calls (and OpenRouter Anthropic models).
|
||||
- For best results, match `ttl` to your model `cacheControlTtl`.
|
||||
|
||||
## What can be pruned
|
||||
- Only `toolResult` messages.
|
||||
@@ -26,14 +28,10 @@ Pruning uses an estimated context window (chars ≈ tokens × 4). The window siz
|
||||
3) `agents.defaults.contextTokens`.
|
||||
4) Default `200000` tokens.
|
||||
|
||||
## Modes
|
||||
### adaptive
|
||||
- If estimated context ratio ≥ `softTrimRatio`: soft-trim oversized tool results.
|
||||
- If still ≥ `hardClearRatio` **and** prunable tool text ≥ `minPrunableToolChars`: hard-clear oldest eligible tool results.
|
||||
|
||||
### aggressive
|
||||
- Always hard-clears eligible tool results before the cutoff.
|
||||
- Ignores `hardClear.enabled` (always clears when eligible).
|
||||
## Mode
|
||||
### cache-ttl
|
||||
- Pruning only runs if the last Anthropic call is older than `ttl` (default `5m`).
|
||||
- When it runs: same soft-trim + hard-clear behavior as before.
|
||||
|
||||
## Soft vs hard pruning
|
||||
- **Soft-trim**: only for oversized tool results.
|
||||
@@ -52,6 +50,7 @@ Pruning uses an estimated context window (chars ≈ tokens × 4). The window siz
|
||||
- Compaction is separate: compaction summarizes and persists, pruning is transient per request. See [/concepts/compaction](/concepts/compaction).
|
||||
|
||||
## Defaults (when enabled)
|
||||
- `ttl`: `"5m"`
|
||||
- `keepLastAssistants`: `3`
|
||||
- `softTrimRatio`: `0.3`
|
||||
- `hardClearRatio`: `0.5`
|
||||
@@ -60,16 +59,7 @@ Pruning uses an estimated context window (chars ≈ tokens × 4). The window siz
|
||||
- `hardClear`: `{ enabled: true, placeholder: "[Old tool result content cleared]" }`
|
||||
|
||||
## Examples
|
||||
Default (adaptive):
|
||||
```json5
|
||||
{
|
||||
agent: {
|
||||
contextPruning: { mode: "adaptive" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
To disable:
|
||||
Default (off):
|
||||
```json5
|
||||
{
|
||||
agent: {
|
||||
@@ -78,11 +68,11 @@ To disable:
|
||||
}
|
||||
```
|
||||
|
||||
Aggressive:
|
||||
Enable TTL-aware pruning:
|
||||
```json5
|
||||
{
|
||||
agent: {
|
||||
contextPruning: { mode: "aggressive" }
|
||||
contextPruning: { mode: "cache-ttl", ttl: "5m" }
|
||||
}
|
||||
}
|
||||
```
|
||||
@@ -92,7 +82,7 @@ Restrict pruning to specific tools:
|
||||
{
|
||||
agent: {
|
||||
contextPruning: {
|
||||
mode: "adaptive",
|
||||
mode: "cache-ttl",
|
||||
tools: { allow: ["exec", "read"], deny: ["*image*"] }
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user