fix(auto-reply): tighten block streaming defaults

This commit is contained in:
Peter Steinberger
2026-01-09 22:40:58 +01:00
parent f8bf041396
commit 79af03ba5e
13 changed files with 71 additions and 25 deletions

View File

@@ -71,6 +71,8 @@
- Auto-reply: preserve spacing when stripping inline directives. (#539) — thanks @joshp123 - Auto-reply: preserve spacing when stripping inline directives. (#539) — thanks @joshp123
- Auto-reply: relax reply tag parsing to allow whitespace. (#560) — thanks @mcinteerj - Auto-reply: relax reply tag parsing to allow whitespace. (#560) — thanks @mcinteerj
- Auto-reply: add per-provider block streaming toggles and coalesce streamed blocks to reduce line spam. (#536) — thanks @mcinteerj - Auto-reply: add per-provider block streaming toggles and coalesce streamed blocks to reduce line spam. (#536) — thanks @mcinteerj
- Auto-reply: default block streaming off for non-Telegram providers unless explicitly enabled, and avoid splitting on forced flushes below max.
- Auto-reply: raise default coalesce minChars for Signal/Slack/Discord and clarify streaming vs draft streaming in docs.
- Auto-reply: default block streaming coalesce idle to 1s to reduce tiny chunks. — thanks @steipete - Auto-reply: default block streaming coalesce idle to 1s to reduce tiny chunks. — thanks @steipete
- Auto-reply: fix /status usage summary filtering for the active provider. - Auto-reply: fix /status usage summary filtering for the active provider.
- Auto-reply: deduplicate followup queue entries using message id/routing to avoid duplicate replies. (#600) — thanks @samratjha96 - Auto-reply: deduplicate followup queue entries using message id/routing to avoid duplicate replies. (#600) — thanks @samratjha96

View File

@@ -83,13 +83,14 @@ When queue mode is `followup` or `collect`, inbound messages are held until the
current turn ends, then a new agent turn starts with the queued payloads. See current turn ends, then a new agent turn starts with the queued payloads. See
[`docs/queue.md`](/concepts/queue) for mode + debounce/cap behavior. [`docs/queue.md`](/concepts/queue) for mode + debounce/cap behavior.
Block streaming sends completed assistant blocks as soon as they finish; disable Block streaming sends completed assistant blocks as soon as they finish; it is
via `agents.defaults.blockStreamingDefault: "off"` if you only want the final response. **off by default** (`agents.defaults.blockStreamingDefault: "off"`).
Tune the boundary via `agents.defaults.blockStreamingBreak` (`text_end` vs `message_end`; defaults to text_end). Tune the boundary via `agents.defaults.blockStreamingBreak` (`text_end` vs `message_end`; defaults to text_end).
Control soft block chunking with `agents.defaults.blockStreamingChunk` (defaults to Control soft block chunking with `agents.defaults.blockStreamingChunk` (defaults to
8001200 chars; prefers paragraph breaks, then newlines; sentences last). 8001200 chars; prefers paragraph breaks, then newlines; sentences last).
Coalesce streamed chunks with `agents.defaults.blockStreamingCoalesce` to reduce Coalesce streamed chunks with `agents.defaults.blockStreamingCoalesce` to reduce
single-line spam (idle-based merging before send). single-line spam (idle-based merging before send). Non-Telegram providers require
explicit `*.blockStreaming: true` to enable block replies.
Verbose tool summaries are emitted at tool start (no debounce); Control UI Verbose tool summaries are emitted at tool start (no debounce); Control UI
streams tool output via agent events when available. streams tool output via agent events when available.
More details: [Streaming + chunking](/concepts/streaming). More details: [Streaming + chunking](/concepts/streaming).

View File

@@ -57,11 +57,11 @@ Block streaming sends partial replies as the model produces text blocks.
Chunking respects provider text limits and avoids splitting fenced code. Chunking respects provider text limits and avoids splitting fenced code.
Key settings: Key settings:
- `agents.defaults.blockStreamingDefault` (`on|off`) - `agents.defaults.blockStreamingDefault` (`on|off`, default off)
- `agents.defaults.blockStreamingBreak` (`text_end|message_end`) - `agents.defaults.blockStreamingBreak` (`text_end|message_end`)
- `agents.defaults.blockStreamingChunk` (`minChars|maxChars|breakPreference`) - `agents.defaults.blockStreamingChunk` (`minChars|maxChars|breakPreference`)
- `agents.defaults.blockStreamingCoalesce` (idle-based batching) - `agents.defaults.blockStreamingCoalesce` (idle-based batching)
- Provider overrides: `*.blockStreaming` and `*.blockStreamingCoalesce` - Provider overrides: `*.blockStreaming` and `*.blockStreamingCoalesce` (non-Telegram providers require explicit `*.blockStreaming: true`)
Details: [Streaming + chunking](/concepts/streaming). Details: [Streaming + chunking](/concepts/streaming).

View File

@@ -32,7 +32,7 @@ Legend:
- `provider send`: actual outbound messages (block replies). - `provider send`: actual outbound messages (block replies).
**Controls:** **Controls:**
- `agents.defaults.blockStreamingDefault`: `"on"`/`"off"` (default on). - `agents.defaults.blockStreamingDefault`: `"on"`/`"off"` (default off).
- Provider overrides: `*.blockStreaming` (and per-account variants) to force `"on"`/`"off"` per provider. - Provider overrides: `*.blockStreaming` (and per-account variants) to force `"on"`/`"off"` per provider.
- `agents.defaults.blockStreamingBreak`: `"text_end"` or `"message_end"`. - `agents.defaults.blockStreamingBreak`: `"text_end"` or `"message_end"`.
- `agents.defaults.blockStreamingChunk`: `{ minChars, maxChars, breakPreference? }`. - `agents.defaults.blockStreamingChunk`: `{ minChars, maxChars, breakPreference? }`.
@@ -69,14 +69,19 @@ progressive output.
- Joiner is derived from `blockStreamingChunk.breakPreference` - Joiner is derived from `blockStreamingChunk.breakPreference`
(`paragraph``\n\n`, `newline``\n`, `sentence` → space). (`paragraph``\n\n`, `newline``\n`, `sentence` → space).
- Provider overrides are available via `*.blockStreamingCoalesce` (including per-account configs). - Provider overrides are available via `*.blockStreamingCoalesce` (including per-account configs).
- Default coalesce `minChars` is bumped to 1500 for Signal/Slack/Discord unless overridden.
## “Stream chunks or everything” ## “Stream chunks or everything”
This maps to: This maps to:
- **Stream chunks:** `blockStreamingDefault: "on"` + `blockStreamingBreak: "text_end"` (emit as you go). - **Stream chunks:** `blockStreamingDefault: "on"` + `blockStreamingBreak: "text_end"` (emit as you go). Non-Telegram providers also need `*.blockStreaming: true`.
- **Stream everything at end:** `blockStreamingBreak: "message_end"` (flush once, possibly multiple chunks if very long). - **Stream everything at end:** `blockStreamingBreak: "message_end"` (flush once, possibly multiple chunks if very long).
- **No block streaming:** `blockStreamingDefault: "off"` (only final reply). - **No block streaming:** `blockStreamingDefault: "off"` (only final reply).
**Provider note:** For non-Telegram providers, block streaming is **off unless**
`*.blockStreaming` is explicitly set to `true`. Telegram can stream drafts
(`telegram.streamMode`) without block replies.
## Telegram draft streaming (token-ish) ## Telegram draft streaming (token-ish)
Telegram is the only provider with draft streaming: Telegram is the only provider with draft streaming:
@@ -85,6 +90,7 @@ Telegram is the only provider with draft streaming:
- `partial`: draft updates with the latest stream text. - `partial`: draft updates with the latest stream text.
- `block`: draft updates in chunked blocks (same chunker rules). - `block`: draft updates in chunked blocks (same chunker rules).
- `off`: no draft streaming. - `off`: no draft streaming.
- Draft streaming is separate from block streaming; block replies are off by default and only enabled by `*.blockStreaming: true` on non-Telegram providers.
- Final reply is still a normal message. - Final reply is still a normal message.
- `/reasoning stream` writes reasoning into the draft bubble (Telegram only). - `/reasoning stream` writes reasoning into the draft bubble (Telegram only).

View File

@@ -208,7 +208,7 @@ Save to `~/.clawdbot/clawdbot.json` and you can DM the bot from that number.
thinkingDefault: "low", thinkingDefault: "low",
verboseDefault: "off", verboseDefault: "off",
elevatedDefault: "on", elevatedDefault: "on",
blockStreamingDefault: "on", blockStreamingDefault: "off",
blockStreamingBreak: "text_end", blockStreamingBreak: "text_end",
blockStreamingChunk: { blockStreamingChunk: {
minChars: 800, minChars: 800,

View File

@@ -675,7 +675,7 @@ Multi-account support lives under `telegram.accounts` (see the multi-account sec
} }
}, },
replyToMode: "first", // off | first | all replyToMode: "first", // off | first | all
streamMode: "partial", // off | partial | block (draft streaming) streamMode: "partial", // off | partial | block (draft streaming; separate from block streaming)
actions: { reactions: true, sendMessage: true }, // tool action gates (false disables) actions: { reactions: true, sendMessage: true }, // tool action gates (false disables)
mediaMaxMb: 5, mediaMaxMb: 5,
retry: { // outbound retry policy retry: { // outbound retry policy
@@ -1143,8 +1143,9 @@ Example (adaptive tuned):
See [/concepts/session-pruning](/concepts/session-pruning) for behavior details. See [/concepts/session-pruning](/concepts/session-pruning) for behavior details.
Block streaming: Block streaming:
- `agents.defaults.blockStreamingDefault`: `"on"`/`"off"` (default on). - `agents.defaults.blockStreamingDefault`: `"on"`/`"off"` (default off).
- Provider overrides: `*.blockStreaming` (and per-account variants) to force block streaming on/off. - Provider overrides: `*.blockStreaming` (and per-account variants) to force block streaming on/off.
Non-Telegram providers require an explicit `*.blockStreaming: true` to enable block replies.
- `agents.defaults.blockStreamingBreak`: `"text_end"` or `"message_end"` (default: text_end). - `agents.defaults.blockStreamingBreak`: `"text_end"` or `"message_end"` (default: text_end).
- `agents.defaults.blockStreamingChunk`: soft chunking for streamed blocks. Defaults to - `agents.defaults.blockStreamingChunk`: soft chunking for streamed blocks. Defaults to
8001200 chars, prefers paragraph breaks (`\n\n`), then newlines, then sentences. 8001200 chars, prefers paragraph breaks (`\n\n`), then newlines, then sentences.
@@ -1156,7 +1157,8 @@ Block streaming:
``` ```
- `agents.defaults.blockStreamingCoalesce`: merge streamed blocks before sending. - `agents.defaults.blockStreamingCoalesce`: merge streamed blocks before sending.
Defaults to `{ idleMs: 1000 }` and inherits `minChars` from `blockStreamingChunk` Defaults to `{ idleMs: 1000 }` and inherits `minChars` from `blockStreamingChunk`
with `maxChars` capped to the provider text limit. with `maxChars` capped to the provider text limit. Signal/Slack/Discord default
to `minChars: 1500` unless overridden.
Provider overrides: `whatsapp.blockStreamingCoalesce`, `telegram.blockStreamingCoalesce`, Provider overrides: `whatsapp.blockStreamingCoalesce`, `telegram.blockStreamingCoalesce`,
`discord.blockStreamingCoalesce`, `slack.blockStreamingCoalesce`, `signal.blockStreamingCoalesce`, `discord.blockStreamingCoalesce`, `slack.blockStreamingCoalesce`, `signal.blockStreamingCoalesce`,
`imessage.blockStreamingCoalesce`, `msteams.blockStreamingCoalesce` (and per-account variants). `imessage.blockStreamingCoalesce`, `msteams.blockStreamingCoalesce` (and per-account variants).

View File

@@ -18,7 +18,7 @@ read_when:
- **Webhook support:** `webhook-set.ts` wraps `setWebhook/deleteWebhook`; `webhook.ts` hosts the callback with health + graceful shutdown. Gateway enables webhook mode when `telegram.webhookUrl` is set (otherwise it long-polls). - **Webhook support:** `webhook-set.ts` wraps `setWebhook/deleteWebhook`; `webhook.ts` hosts the callback with health + graceful shutdown. Gateway enables webhook mode when `telegram.webhookUrl` is set (otherwise it long-polls).
- **Sessions:** direct chats collapse into the agent main session (`agent:<agentId>:<mainKey>`); groups use `agent:<agentId>:telegram:group:<chatId>`; replies route back to the same provider. - **Sessions:** direct chats collapse into the agent main session (`agent:<agentId>:<mainKey>`); groups use `agent:<agentId>:telegram:group:<chatId>`; replies route back to the same provider.
- **Config knobs:** `telegram.botToken`, `telegram.dmPolicy`, `telegram.groups` (allowlist + mention defaults), `telegram.allowFrom`, `telegram.groupAllowFrom`, `telegram.groupPolicy`, `telegram.mediaMaxMb`, `telegram.proxy`, `telegram.webhookSecret`, `telegram.webhookUrl`. - **Config knobs:** `telegram.botToken`, `telegram.dmPolicy`, `telegram.groups` (allowlist + mention defaults), `telegram.allowFrom`, `telegram.groupAllowFrom`, `telegram.groupPolicy`, `telegram.mediaMaxMb`, `telegram.proxy`, `telegram.webhookSecret`, `telegram.webhookUrl`.
- **Draft streaming:** optional `telegram.streamMode` uses `sendMessageDraft` in private topic chats (Bot API 9.3+). - **Draft streaming:** optional `telegram.streamMode` uses `sendMessageDraft` in private topic chats (Bot API 9.3+). This is separate from provider block streaming.
- **Tests:** grammy mocks cover DM + group mention gating and outbound send; more media/webhook fixtures still welcome. - **Tests:** grammy mocks cover DM + group mention gating and outbound send; more media/webhook fixtures still welcome.
Open questions Open questions

View File

@@ -206,6 +206,10 @@ Config:
- `block`: update the draft bubble in larger blocks (chunked). - `block`: update the draft bubble in larger blocks (chunked).
- `off`: disable draft streaming. - `off`: disable draft streaming.
Note: draft streaming is separate from **block streaming** (provider messages).
Block streaming is off by default and requires `telegram.blockStreaming: true`
if you want early Telegram messages instead of draft updates.
Reasoning stream (Telegram only): Reasoning stream (Telegram only):
- `/reasoning stream` streams reasoning into the draft bubble while the reply is - `/reasoning stream` streams reasoning into the draft bubble while the reply is
generating, then sends the final answer without reasoning. generating, then sends the final answer without reasoning.

View File

@@ -53,6 +53,14 @@ export class EmbeddedBlockChunker {
const maxChars = Math.max(minChars, Math.floor(this.#chunking.maxChars)); const maxChars = Math.max(minChars, Math.floor(this.#chunking.maxChars));
if (this.#buffer.length < minChars && !force) return; if (this.#buffer.length < minChars && !force) return;
if (force && this.#buffer.length <= maxChars) {
if (this.#buffer.trim().length > 0) {
emit(this.#buffer);
}
this.#buffer = "";
return;
}
while ( while (
this.#buffer.length >= minChars || this.#buffer.length >= minChars ||
(force && this.#buffer.length > 0) (force && this.#buffer.length > 0)

View File

@@ -784,7 +784,7 @@ describe("subscribeEmbeddedPiSession", () => {
blockReplyBreak: "message_end", blockReplyBreak: "message_end",
blockReplyChunking: { blockReplyChunking: {
minChars: 5, minChars: 5,
maxChars: 40, maxChars: 25,
breakPreference: "paragraph", breakPreference: "paragraph",
}, },
}); });
@@ -836,7 +836,7 @@ describe("subscribeEmbeddedPiSession", () => {
blockReplyBreak: "message_end", blockReplyBreak: "message_end",
blockReplyChunking: { blockReplyChunking: {
minChars: 5, minChars: 5,
maxChars: 50, maxChars: 25,
breakPreference: "paragraph", breakPreference: "paragraph",
}, },
}); });
@@ -939,7 +939,7 @@ describe("subscribeEmbeddedPiSession", () => {
blockReplyBreak: "message_end", blockReplyBreak: "message_end",
blockReplyChunking: { blockReplyChunking: {
minChars: 5, minChars: 5,
maxChars: 40, maxChars: 25,
breakPreference: "paragraph", breakPreference: "paragraph",
}, },
}); });
@@ -986,7 +986,7 @@ describe("subscribeEmbeddedPiSession", () => {
blockReplyBreak: "message_end", blockReplyBreak: "message_end",
blockReplyChunking: { blockReplyChunking: {
minChars: 5, minChars: 5,
maxChars: 45, maxChars: 30,
breakPreference: "paragraph", breakPreference: "paragraph",
}, },
}); });
@@ -1035,7 +1035,7 @@ describe("subscribeEmbeddedPiSession", () => {
blockReplyBreak: "message_end", blockReplyBreak: "message_end",
blockReplyChunking: { blockReplyChunking: {
minChars: 10, minChars: 10,
maxChars: 50, maxChars: 30,
breakPreference: "paragraph", breakPreference: "paragraph",
}, },
}); });

View File

@@ -470,20 +470,26 @@ export async function getReplyFromConfig(
(agentCfg?.elevatedDefault as ElevatedLevel | undefined) ?? (agentCfg?.elevatedDefault as ElevatedLevel | undefined) ??
"on") "on")
: "off"; : "off";
const providerKey = sessionCtx.Provider?.trim().toLowerCase();
const explicitBlockStreamingEnable = opts?.disableBlockStreaming === false;
const resolvedBlockStreaming = const resolvedBlockStreaming =
opts?.disableBlockStreaming === true opts?.disableBlockStreaming === true
? "off" ? "off"
: opts?.disableBlockStreaming === false : opts?.disableBlockStreaming === false
? "on" ? "on"
: agentCfg?.blockStreamingDefault === "off" : agentCfg?.blockStreamingDefault === "on"
? "off" ? "on"
: "on"; : "off";
const resolvedBlockStreamingBreak: "text_end" | "message_end" = const resolvedBlockStreamingBreak: "text_end" | "message_end" =
agentCfg?.blockStreamingBreak === "message_end" agentCfg?.blockStreamingBreak === "message_end"
? "message_end" ? "message_end"
: "text_end"; : "text_end";
const allowBlockStreaming =
providerKey === "telegram" || explicitBlockStreamingEnable;
const blockStreamingEnabled = const blockStreamingEnabled =
resolvedBlockStreaming === "on" && opts?.disableBlockStreaming !== true; resolvedBlockStreaming === "on" &&
opts?.disableBlockStreaming !== true &&
allowBlockStreaming;
const blockReplyChunking = blockStreamingEnabled const blockReplyChunking = blockStreamingEnabled
? resolveBlockStreamingChunking( ? resolveBlockStreamingChunking(
cfg, cfg,

View File

@@ -5,6 +5,13 @@ import { resolveTextChunkLimit, type TextChunkProvider } from "../chunk.js";
const DEFAULT_BLOCK_STREAM_MIN = 800; const DEFAULT_BLOCK_STREAM_MIN = 800;
const DEFAULT_BLOCK_STREAM_MAX = 1200; const DEFAULT_BLOCK_STREAM_MAX = 1200;
const DEFAULT_BLOCK_STREAM_COALESCE_IDLE_MS = 1000; const DEFAULT_BLOCK_STREAM_COALESCE_IDLE_MS = 1000;
const PROVIDER_COALESCE_DEFAULTS: Partial<
Record<TextChunkProvider, { minChars: number; idleMs: number }>
> = {
signal: { minChars: 1500, idleMs: 1000 },
slack: { minChars: 1500, idleMs: 1000 },
discord: { minChars: 1500, idleMs: 1000 },
};
const BLOCK_CHUNK_PROVIDERS = new Set<TextChunkProvider>([ const BLOCK_CHUNK_PROVIDERS = new Set<TextChunkProvider>([
"whatsapp", "whatsapp",
@@ -77,6 +84,9 @@ export function resolveBlockStreamingCoalescing(
const providerKey = normalizeChunkProvider(provider); const providerKey = normalizeChunkProvider(provider);
const textLimit = resolveTextChunkLimit(cfg, providerKey, accountId); const textLimit = resolveTextChunkLimit(cfg, providerKey, accountId);
const normalizedAccountId = normalizeAccountId(accountId); const normalizedAccountId = normalizeAccountId(accountId);
const providerDefaults = providerKey
? PROVIDER_COALESCE_DEFAULTS[providerKey]
: undefined;
const providerCfg = (() => { const providerCfg = (() => {
if (!cfg || !providerKey) return undefined; if (!cfg || !providerKey) return undefined;
if (providerKey === "whatsapp") { if (providerKey === "whatsapp") {
@@ -125,7 +135,10 @@ export function resolveBlockStreamingCoalescing(
const minRequested = Math.max( const minRequested = Math.max(
1, 1,
Math.floor( Math.floor(
coalesceCfg?.minChars ?? chunking?.minChars ?? DEFAULT_BLOCK_STREAM_MIN, coalesceCfg?.minChars ??
providerDefaults?.minChars ??
chunking?.minChars ??
DEFAULT_BLOCK_STREAM_MIN,
), ),
); );
const maxRequested = Math.max( const maxRequested = Math.max(
@@ -136,7 +149,11 @@ export function resolveBlockStreamingCoalescing(
const minChars = Math.min(minRequested, maxChars); const minChars = Math.min(minRequested, maxChars);
const idleMs = Math.max( const idleMs = Math.max(
0, 0,
Math.floor(coalesceCfg?.idleMs ?? DEFAULT_BLOCK_STREAM_COALESCE_IDLE_MS), Math.floor(
coalesceCfg?.idleMs ??
providerDefaults?.idleMs ??
DEFAULT_BLOCK_STREAM_COALESCE_IDLE_MS,
),
); );
const preference = chunking?.breakPreference ?? "paragraph"; const preference = chunking?.breakPreference ?? "paragraph";
const joiner = const joiner =

View File

@@ -116,7 +116,7 @@ const FIELD_LABELS: Record<string, string> = {
"talk.apiKey": "Talk API Key", "talk.apiKey": "Talk API Key",
"telegram.botToken": "Telegram Bot Token", "telegram.botToken": "Telegram Bot Token",
"telegram.dmPolicy": "Telegram DM Policy", "telegram.dmPolicy": "Telegram DM Policy",
"telegram.streamMode": "Telegram Stream Mode", "telegram.streamMode": "Telegram Draft Stream Mode",
"telegram.retry.attempts": "Telegram Retry Attempts", "telegram.retry.attempts": "Telegram Retry Attempts",
"telegram.retry.minDelayMs": "Telegram Retry Min Delay (ms)", "telegram.retry.minDelayMs": "Telegram Retry Min Delay (ms)",
"telegram.retry.maxDelayMs": "Telegram Retry Max Delay (ms)", "telegram.retry.maxDelayMs": "Telegram Retry Max Delay (ms)",
@@ -193,7 +193,7 @@ const FIELD_HELP: Record<string, string> = {
"telegram.dmPolicy": "telegram.dmPolicy":
'Direct message access control ("pairing" recommended). "open" requires telegram.allowFrom=["*"].', 'Direct message access control ("pairing" recommended). "open" requires telegram.allowFrom=["*"].',
"telegram.streamMode": "telegram.streamMode":
"Draft streaming mode for Telegram replies (off | partial | block). Requires private topics + sendMessageDraft.", "Draft streaming mode for Telegram replies (off | partial | block). Separate from block streaming; requires private topics + sendMessageDraft.",
"telegram.retry.attempts": "telegram.retry.attempts":
"Max retry attempts for outbound Telegram API calls (default: 3).", "Max retry attempts for outbound Telegram API calls (default: 3).",
"telegram.retry.minDelayMs": "telegram.retry.minDelayMs":