let5see/clawdbot

Fork 0

Files

Peter Steinberger 9eb5d01367 docs: document streaming + chunking

2026-01-07 17:31:40 +01:00

3.8 KiB

Raw Blame History

summary, read_when

summary

read_when

Streaming + chunking behavior (block replies, draft streaming, limits)

Explaining how streaming or chunking works on providers

Changing block streaming or provider chunking behavior

Debugging duplicate/early block replies or draft streaming

Streaming + chunking

Clawdbot has two separate “streaming” layers:

Block streaming (providers): emit completed blocks as the assistant writes. These are normal provider messages (not token deltas).
Token-ish streaming (Telegram only): update a draft bubble with partial text while generating; final message is sent at the end.

There is no real token streaming to external provider messages today. Telegram draft streaming is the only partial-stream surface.

Block streaming (provider messages)

Block streaming sends assistant output in coarse chunks as it becomes available.

Model output
  └─ text_delta/events
       ├─ (blockStreamingBreak=text_end)
       │    └─ chunker emits blocks as buffer grows
       └─ (blockStreamingBreak=message_end)
            └─ chunker flushes at message_end
                   └─ provider send (block replies)

Legend:

text_delta/events: model stream events (may be sparse for non-streaming models).
chunker: EmbeddedBlockChunker applying min/max bounds + break preference.
provider send: actual outbound messages (block replies).

Controls:

agent.blockStreamingDefault: "on"/"off" (default on).
agent.blockStreamingBreak: "text_end" or "message_end".
agent.blockStreamingChunk: { minChars, maxChars, breakPreference? }.
Provider hard cap: *.textChunkLimit (e.g., whatsapp.textChunkLimit).

Boundary semantics:

text_end: stream blocks as soon as chunker emits; flush on each text_end.
message_end: wait until assistant message finishes, then flush buffered output.

message_end still uses the chunker if the buffered text exceeds maxChars, so it can emit multiple chunks at the end.

Chunking algorithm (low/high bounds)

Block chunking is implemented by EmbeddedBlockChunker:

Low bound: don’t emit until buffer >= minChars (unless forced).
High bound: prefer splits before maxChars; if forced, split at maxChars.
Break preference: paragraph → newline → sentence → whitespace → hard break.
Code fences: never split inside fences; when forced at maxChars, close + reopen the fence to keep Markdown valid.

maxChars is clamped to the provider textChunkLimit, so you can’t exceed per-provider caps.

“Stream chunks or everything”

This maps to:

Stream chunks: blockStreamingDefault: "on" + blockStreamingBreak: "text_end" (emit as you go).
Stream everything at end: blockStreamingBreak: "message_end" (flush once, possibly multiple chunks if very long).
No block streaming: blockStreamingDefault: "off" (only final reply).

Telegram draft streaming (token-ish)

Telegram is the only provider with draft streaming:

Uses Bot API sendMessageDraft in private chats with topics.
telegram.streamMode: "partial" | "block" | "off".
- partial: draft updates with the latest stream text.
- block: draft updates in chunked blocks (same chunker rules).
- off: no draft streaming.
Final reply is still a normal message.
/reasoning stream writes reasoning into the draft bubble (Telegram only).

When draft streaming is active, Clawdbot disables block streaming for that reply to avoid double-streaming.

Telegram (private + topics)
  └─ sendMessageDraft (draft bubble)
       ├─ streamMode=partial → update latest text
       └─ streamMode=block   → chunker updates draft
  └─ final reply → normal message