feat: stream reply blocks immediately

2026-01-03 00:28:33 +01:00
parent 9dd613edf7
commit 9616f4b2b1
14 changed files with 323 additions and 8 deletions
--- a/docs/agent.md
+++ b/docs/agent.md
@@ -71,6 +71,8 @@ Legacy Pi/Tau session folders are **not** read.
 ## Steering while streaming

 Incoming user messages are queued while the agent is streaming. The queue is checked **after each tool call**. If a queued message is present, remaining tool calls from the current assistant message are skipped (error tool results with "Skipped due to queued user message."), then the queued user message is injected before the next assistant response.
+Block streaming sends completed assistant blocks as soon as they finish; disable
+via `agent.blockStreamingDefault: "off"` if you only want the final response.

 ## Configuration (minimal)

--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -330,6 +330,7 @@ Controls the embedded agent runtime (model/thinking/verbose/timeouts).
    },
    thinkingDefault: "low",
    verboseDefault: "off",
+    blockStreamingDefault: "on",
    timeoutSeconds: 600,
    mediaMaxMb: 5,
    heartbeat: {
@@ -354,6 +355,10 @@ deprecation fallback.
 Z.AI models are available as `zai/<model>` (e.g. `zai/glm-4.7`) and require
 `ZAI_API_KEY` (or legacy `Z_AI_API_KEY`) in the environment.

+`agent.blockStreamingDefault` controls whether completed assistant blocks
+(`message_end` chunks) are sent immediately (default: `on`). Set to `off` to
+only deliver the final consolidated reply.
+
 `agent.heartbeat` configures periodic heartbeat runs:
 - `every`: duration string (`ms`, `s`, `m`, `h`); default unit minutes. Omit or set
  `0m` to disable.