diff --git a/CHANGELOG.md b/CHANGELOG.md index da414ef0e..e1d8ada0e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -29,6 +29,7 @@ - CLI: add `clawdbot docs` live docs search with pretty output. - CLI: add `clawdbot agents` (list/add/delete) with wizarded workspace/setup, provider login, and full prune on delete. - Agent: treat compaction retry AbortError as a fallback trigger without swallowing non-abort errors. Thanks @erikpr1994 for PR #341. +- Agent: add opt-in session pruning for tool results to reduce context bloat. Thanks @maxsumrall for PR #381. - Agent: deliver final replies for non-streaming models when block chunking is enabled. Thank you @mneves75 for PR #369! - Agent: trim bootstrap context injections and keep group guidance concise (emoji reactions allowed). Thanks @tobiasbischoff for PR #370. - Sub-agents: allow `sessions_spawn` model overrides and error on invalid models. Thanks @azade-c for PR #298. diff --git a/README.md b/README.md index bc77a3fe4..d15bd5680 100644 --- a/README.md +++ b/README.md @@ -454,5 +454,5 @@ Thanks to all clawtributors: adamgall jalehman jarvis-medmatic mneves75 regenrek tobiasbischoff MSch obviyus dbhurley Asleep123 Iamadig imfing kitze nachoiacovino VACInc cash-echo-bot claude kiranjd pcty-nextgen-service-account minghinmatthewlam ngutman onutc oswalpalash snopoke ManuelHettich loukotal hugobarauna AbhisekBasu1 emanuelst dantelex erikpr1994 antons RandyVentures - reeltimeapps fcatuhe + reeltimeapps fcatuhe maxsumrall

diff --git a/docs/concepts/session-pruning.md b/docs/concepts/session-pruning.md new file mode 100644 index 000000000..784f97619 --- /dev/null +++ b/docs/concepts/session-pruning.md @@ -0,0 +1,92 @@ +--- +summary: "Session pruning: opt-in tool-result trimming to reduce context bloat" +read_when: + - You want to reduce LLM context growth from tool outputs + - You are tuning agent.contextPruning +--- +# Session Pruning + +Session pruning trims **old tool results** from the in-memory context right before each LLM call. It is **opt-in** and does **not** rewrite the on-disk session history (`*.jsonl`). + +## When it runs +- Before each LLM request (context hook). +- Only affects the messages sent to the model for that request. + +## What can be pruned +- Only `toolResult` messages. +- User + assistant messages are **never** modified. +- The last `keepLastAssistants` assistant messages are protected; tool results after that cutoff are not pruned. +- If there aren’t enough assistant messages to establish the cutoff, pruning is skipped. +- Tool results containing **image blocks** are skipped (never trimmed/cleared). + +## Context window estimation +Pruning uses an estimated context window (chars ≈ tokens × 4). The window size is resolved in this order: +1) Model definition `contextWindow` (from the model registry). +2) `models.providers.*.models[].contextWindow` override. +3) `agent.contextTokens`. +4) Default `200000` tokens. + +## Modes +### adaptive +- If estimated context ratio ≥ `softTrimRatio`: soft-trim oversized tool results. +- If still ≥ `hardClearRatio` **and** prunable tool text ≥ `minPrunableToolChars`: hard-clear oldest eligible tool results. + +### aggressive +- Always hard-clears eligible tool results before the cutoff. +- Ignores `hardClear.enabled` (always clears when eligible). + +## Soft vs hard pruning +- **Soft-trim**: only for oversized tool results. + - Keeps head + tail, inserts `...`, and appends a note with the original size. + - Skips results with image blocks. +- **Hard-clear**: replaces the entire tool result with `hardClear.placeholder`. + +## Tool selection +- `tools.allow` / `tools.deny` support `*` wildcards. +- Deny wins. +- Empty allow list => all tools allowed. + +## Interaction with other limits +- Built-in tools already truncate their own output; session pruning is an extra layer that prevents long-running chats from accumulating too much tool output in the model context. +- Compaction is separate: compaction summarizes and persists, pruning is transient per request. + +## Defaults (when enabled) +- `keepLastAssistants`: `3` +- `softTrimRatio`: `0.3` +- `hardClearRatio`: `0.5` +- `minPrunableToolChars`: `50000` +- `softTrim`: `{ maxChars: 4000, headChars: 1500, tailChars: 1500 }` +- `hardClear`: `{ enabled: true, placeholder: "[Old tool result content cleared]" }` + +## Examples +Minimal (adaptive): +```json5 +{ + agent: { + contextPruning: { mode: "adaptive" } + } +} +``` + +Aggressive: +```json5 +{ + agent: { + contextPruning: { mode: "aggressive" } + } +} +``` + +Restrict pruning to specific tools: +```json5 +{ + agent: { + contextPruning: { + mode: "adaptive", + tools: { allow: ["bash", "read"], deny: ["*image*"] } + } + } +} +``` + +See config reference: [Gateway Configuration](/gateway/configuration) diff --git a/docs/concepts/session.md b/docs/concepts/session.md index 8cd144201..0a075f46c 100644 --- a/docs/concepts/session.md +++ b/docs/concepts/session.md @@ -21,6 +21,10 @@ All session state is **owned by the gateway** (the “master” Clawdbot). UI cl - Group entries may include `displayName`, `provider`, `subject`, `room`, and `space` to label sessions in UIs. - Clawdbot does **not** read legacy Pi/Tau session folders. +## Session pruning (optional) +Clawdbot can trim **old tool results** from the in-memory context right before LLM calls (opt-in). +This does **not** rewrite JSONL history. See [/concepts/session-pruning](/concepts/session-pruning). + ## Mapping transports → session keys - Direct chats collapse to the per-agent primary key: `agent::`. - Multiple phone numbers and providers can map to the same agent main key; they act as transports into one conversation. diff --git a/docs/docs.json b/docs/docs.json index e29cfd9c3..2bb5314b9 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -546,6 +546,7 @@ "concepts/agent-workspace", "concepts/multi-agent", "concepts/session", + "concepts/session-pruning", "concepts/sessions", "concepts/session-tool", "concepts/presence", diff --git a/docs/gateway/configuration.md b/docs/gateway/configuration.md index 0b39a9580..c2d5ed22f 100644 --- a/docs/gateway/configuration.md +++ b/docs/gateway/configuration.md @@ -813,6 +813,87 @@ If you configure the same alias name (case-insensitive) yourself, your value win } ``` +#### `agent.contextPruning` (opt-in tool-result pruning) + +`agent.contextPruning` prunes **old tool results** from the in-memory context right before a request is sent to the LLM. +It does **not** modify the session history on disk (`*.jsonl` remains complete). + +This is intended to reduce token usage for chatty agents that accumulate large tool outputs over time. + +High level: +- Never touches user/assistant messages. +- Protects the last `keepLastAssistants` assistant messages (no tool results after that point are pruned). +- Modes: + - `adaptive`: soft-trims oversized tool results (keep head/tail) when the estimated context ratio crosses `softTrimRatio`. + Then hard-clears the oldest eligible tool results when the estimated context ratio crosses `hardClearRatio` **and** + there’s enough prunable tool-result bulk (`minPrunableToolChars`). + - `aggressive`: always replaces eligible tool results before the cutoff with the `hardClear.placeholder` (no ratio checks). + +Soft vs hard pruning (what changes in the context sent to the LLM): +- **Soft-trim**: only for *oversized* tool results. Keeps the beginning + end and inserts `...` in the middle. + - Before: `toolResult("…very long output…")` + - After: `toolResult("HEAD…\n...\n…TAIL\n\n[Tool result trimmed: …]")` +- **Hard-clear**: replaces the entire tool result with the placeholder. + - Before: `toolResult("…very long output…")` + - After: `toolResult("[Old tool result content cleared]")` + +Notes / current limitations: +- Tool results containing **image blocks are skipped** (never trimmed/cleared) right now. +- The estimated “context ratio” is based on **characters** (approximate), not exact tokens. +- If the session doesn’t contain at least `keepLastAssistants` assistant messages yet, pruning is skipped. +- In `aggressive` mode, `hardClear.enabled` is ignored (eligible tool results are always replaced with `hardClear.placeholder`). + +Example (minimal): +```json5 +{ + agent: { + contextPruning: { + mode: "adaptive" + } + } +} +``` + +Defaults (when `mode` is `"adaptive"` or `"aggressive"`): +- `keepLastAssistants`: `3` +- `softTrimRatio`: `0.3` (adaptive only) +- `hardClearRatio`: `0.5` (adaptive only) +- `minPrunableToolChars`: `50000` (adaptive only) +- `softTrim`: `{ maxChars: 4000, headChars: 1500, tailChars: 1500 }` (adaptive only) +- `hardClear`: `{ enabled: true, placeholder: "[Old tool result content cleared]" }` + +Example (aggressive, minimal): +```json5 +{ + agent: { + contextPruning: { + mode: "aggressive" + } + } +} +``` + +Example (adaptive tuned): +```json5 +{ + agent: { + contextPruning: { + mode: "adaptive", + keepLastAssistants: 3, + softTrimRatio: 0.3, + hardClearRatio: 0.5, + minPrunableToolChars: 50000, + softTrim: { maxChars: 4000, headChars: 1500, tailChars: 1500 }, + hardClear: { enabled: true, placeholder: "[Old tool result content cleared]" }, + // Optional: restrict pruning to specific tools (deny wins; supports "*" wildcards) + tools: { deny: ["browser", "canvas"] }, + } + } +} +``` + +See [/concepts/session-pruning](/concepts/session-pruning) for behavior details. + Block streaming: - `agent.blockStreamingDefault`: `"on"`/`"off"` (default on). - `agent.blockStreamingBreak`: `"text_end"` or `"message_end"` (default: text_end). diff --git a/docs/start/hubs.md b/docs/start/hubs.md index 77b943b47..31350b3b9 100644 --- a/docs/start/hubs.md +++ b/docs/start/hubs.md @@ -38,6 +38,7 @@ Use these hubs to discover every page, including deep dives and reference docs t - [Multi-agent routing](https://docs.clawd.bot/concepts/multi-agent) - [Sessions](https://docs.clawd.bot/concepts/session) - [Sessions (alias)](https://docs.clawd.bot/concepts/sessions) +- [Session pruning](https://docs.clawd.bot/concepts/session-pruning) - [Session tools](https://docs.clawd.bot/concepts/session-tool) - [Queue](https://docs.clawd.bot/concepts/queue) - [Slash commands](https://docs.clawd.bot/tools/slash-commands)