feat: add cache-ttl pruning mode

2026-01-21 19:44:20 +00:00
parent c415ccaed5
commit 9f59ff325b
18 changed files with 164 additions and 123 deletions
--- a/docs/concepts/session-pruning.md
+++ b/docs/concepts/session-pruning.md
@@ -9,8 +9,10 @@ read_when:
 Session pruning trims **old tool results** from the in-memory context right before each LLM call. It does **not** rewrite the on-disk session history (`*.jsonl`).

 ## When it runs
- Before each LLM request (context hook).
+- When `mode: "cache-ttl"` is enabled and the last Anthropic call for the session is older than `ttl`.
 - Only affects the messages sent to the model for that request.
+ - Only active for Anthropic API calls (and OpenRouter Anthropic models).
+ - For best results, match `ttl` to your model `cacheControlTtl`.

 ## What can be pruned
 - Only `toolResult` messages.
@@ -26,14 +28,10 @@ Pruning uses an estimated context window (chars ≈ tokens × 4). The window siz
 3) `agents.defaults.contextTokens`.
 4) Default `200000` tokens.

-## Modes
-### adaptive
- If estimated context ratio ≥ `softTrimRatio`: soft-trim oversized tool results.
- If still ≥ `hardClearRatio` **and** prunable tool text ≥ `minPrunableToolChars`: hard-clear oldest eligible tool results.
-
-### aggressive
- Always hard-clears eligible tool results before the cutoff.
- Ignores `hardClear.enabled` (always clears when eligible).
+## Mode
+### cache-ttl
+- Pruning only runs if the last Anthropic call is older than `ttl` (default `5m`).
+- When it runs: same soft-trim + hard-clear behavior as before.

 ## Soft vs hard pruning
 - **Soft-trim**: only for oversized tool results.
@@ -52,6 +50,7 @@ Pruning uses an estimated context window (chars ≈ tokens × 4). The window siz
 - Compaction is separate: compaction summarizes and persists, pruning is transient per request. See [/concepts/compaction](/concepts/compaction).

 ## Defaults (when enabled)
+- `ttl`: `"5m"`
 - `keepLastAssistants`: `3`
 - `softTrimRatio`: `0.3`
 - `hardClearRatio`: `0.5`
@@ -60,16 +59,7 @@ Pruning uses an estimated context window (chars ≈ tokens × 4). The window siz
 - `hardClear`: `{ enabled: true, placeholder: "[Old tool result content cleared]" }`

 ## Examples
-Default (adaptive):
-```json5
-{
-  agent: {
-    contextPruning: { mode: "adaptive" }
-  }
-}
-```
-
-To disable:
+Default (off):
 ```json5
 {
  agent: {
@@ -78,11 +68,11 @@ To disable:
 }
 ```

-Aggressive:
+Enable TTL-aware pruning:
 ```json5
 {
  agent: {
-    contextPruning: { mode: "aggressive" }
+    contextPruning: { mode: "cache-ttl", ttl: "5m" }
  }
 }
 ```
@@ -92,7 +82,7 @@ Restrict pruning to specific tools:
 {
  agent: {
    contextPruning: {
-      mode: "adaptive",
+      mode: "cache-ttl",
      tools: { allow: ["exec", "read"], deny: ["*image*"] }
    }
  }