feat: stream reply blocks immediately

This commit is contained in:
Peter Steinberger
2026-01-03 00:28:33 +01:00
parent 9dd613edf7
commit 9616f4b2b1
14 changed files with 323 additions and 8 deletions

View File

@@ -71,6 +71,8 @@ Legacy Pi/Tau session folders are **not** read.
## Steering while streaming
Incoming user messages are queued while the agent is streaming. The queue is checked **after each tool call**. If a queued message is present, remaining tool calls from the current assistant message are skipped (error tool results with "Skipped due to queued user message."), then the queued user message is injected before the next assistant response.
Block streaming sends completed assistant blocks as soon as they finish; disable
via `agent.blockStreamingDefault: "off"` if you only want the final response.
## Configuration (minimal)

View File

@@ -330,6 +330,7 @@ Controls the embedded agent runtime (model/thinking/verbose/timeouts).
},
thinkingDefault: "low",
verboseDefault: "off",
blockStreamingDefault: "on",
timeoutSeconds: 600,
mediaMaxMb: 5,
heartbeat: {
@@ -354,6 +355,10 @@ deprecation fallback.
Z.AI models are available as `zai/<model>` (e.g. `zai/glm-4.7`) and require
`ZAI_API_KEY` (or legacy `Z_AI_API_KEY`) in the environment.
`agent.blockStreamingDefault` controls whether completed assistant blocks
(`message_end` chunks) are sent immediately (default: `on`). Set to `off` to
only deliver the final consolidated reply.
`agent.heartbeat` configures periodic heartbeat runs:
- `every`: duration string (`ms`, `s`, `m`, `h`); default unit minutes. Omit or set
`0m` to disable.