From d81627da72d51f2ec2b1af6f7b5ed64c2180fd60 Mon Sep 17 00:00:00 2001
From: Peter Steinberger <steipete@gmail.com>
Date: Wed, 7 Jan 2026 17:15:53 +0100
Subject: [PATCH] docs: document streaming + chunking

---
 docs/concepts/agent.md        |  1 +
 docs/concepts/streaming.md    | 85 +++++++++++++++++++++++++++++++++++
 docs/gateway/configuration.md |  1 +
 docs/index.md                 |  1 +
 docs/providers/telegram.md    |  1 +
 docs/start/hubs.md            |  1 +
 6 files changed, 90 insertions(+)
 create mode 100644 docs/concepts/streaming.md

diff --git a/docs/concepts/agent.md b/docs/concepts/agent.md
index 307aaa534..a4d4bb780 100644
--- a/docs/concepts/agent.md
+++ b/docs/concepts/agent.md
@@ -102,6 +102,7 @@ Control soft block chunking with `agent.blockStreamingChunk` (defaults to
 800–1200 chars; prefers paragraph breaks, then newlines; sentences last).
 Verbose tool summaries are emitted at tool start (no debounce); Control UI
 streams tool output via agent events when available.
+More details: [Streaming + chunking](/concepts/streaming).
 
 ## Configuration (minimal)
 
diff --git a/docs/concepts/streaming.md b/docs/concepts/streaming.md
new file mode 100644
index 000000000..26c216b82
--- /dev/null
+++ b/docs/concepts/streaming.md
@@ -0,0 +1,85 @@
+---
+summary: "Streaming + chunking behavior (block replies, draft streaming, limits)"
+read_when:
+  - Explaining how streaming or chunking works on providers
+  - Changing block streaming or provider chunking behavior
+  - Debugging duplicate/early block replies or draft streaming
+---
+# Streaming + chunking
+
+Clawdbot has two separate “streaming” layers:
+- **Block streaming (providers):** emit completed **blocks** as the assistant writes. These are normal provider messages (not token deltas).
+- **Token-ish streaming (Telegram only):** update a **draft bubble** with partial text while generating; final message is sent at the end.
+
+There is **no real token streaming** to external provider messages today. Telegram draft streaming is the only partial-stream surface.
+
+## Block streaming (provider messages)
+
+Block streaming sends assistant output in coarse chunks as it becomes available.
+
+```
+Model output
+  └─ text_delta/events
+       ├─ (blockStreamingBreak=text_end)
+       │    └─ chunker emits blocks as buffer grows
+       └─ (blockStreamingBreak=message_end)
+            └─ chunker flushes at message_end
+                   └─ provider send (block replies)
+```
+Legend:
+- `text_delta/events`: model stream events (may be sparse for non-streaming models).
+- `chunker`: `EmbeddedBlockChunker` applying min/max bounds + break preference.
+- `provider send`: actual outbound messages (block replies).
+
+**Controls:**
+- `agent.blockStreamingDefault`: `"on"`/`"off"` (default on).
+- `agent.blockStreamingBreak`: `"text_end"` or `"message_end"`.
+- `agent.blockStreamingChunk`: `{ minChars, maxChars, breakPreference? }`.
+- Provider hard cap: `*.textChunkLimit` (e.g., `whatsapp.textChunkLimit`).
+
+**Boundary semantics:**
+- `text_end`: stream blocks as soon as chunker emits; flush on each `text_end`.
+- `message_end`: wait until assistant message finishes, then flush buffered output.
+
+`message_end` still uses the chunker if the buffered text exceeds `maxChars`, so it can emit multiple chunks at the end.
+
+## Chunking algorithm (low/high bounds)
+
+Block chunking is implemented by `EmbeddedBlockChunker`:
+- **Low bound:** don’t emit until buffer >= `minChars` (unless forced).
+- **High bound:** prefer splits before `maxChars`; if forced, split at `maxChars`.
+- **Break preference:** `paragraph` → `newline` → `sentence` → `whitespace` → hard break.
+- **Code fences:** never split inside fences; when forced at `maxChars`, close + reopen the fence to keep Markdown valid.
+
+`maxChars` is clamped to the provider `textChunkLimit`, so you can’t exceed per-provider caps.
+
+## “Stream chunks or everything”
+
+This maps to:
+- **Stream chunks:** `blockStreamingDefault: "on"` + `blockStreamingBreak: "text_end"` (emit as you go).
+- **Stream everything at end:** `blockStreamingBreak: "message_end"` (flush once, possibly multiple chunks if very long).
+- **No block streaming:** `blockStreamingDefault: "off"` (only final reply).
+
+## Telegram draft streaming (token-ish)
+
+Telegram is the only provider with draft streaming:
+- Uses Bot API `sendMessageDraft` in **private chats with topics**.
+- `telegram.streamMode: "partial" | "block" | "off"`.
+  - `partial`: draft updates with the latest stream text.
+  - `block`: draft updates in chunked blocks (same chunker rules).
+  - `off`: no draft streaming.
+- Final reply is still a normal message.
+- `/reasoning stream` writes reasoning into the draft bubble (Telegram only).
+
+When draft streaming is active, Clawdbot disables block streaming for that reply to avoid double-streaming.
+
+```
+Telegram (private + topics)
+  └─ sendMessageDraft (draft bubble)
+       ├─ streamMode=partial → update latest text
+       └─ streamMode=block   → chunker updates draft
+  └─ final reply → normal message
+```
+Legend:
+- `sendMessageDraft`: Telegram draft bubble (not a real message).
+- `final reply`: normal Telegram message send.
diff --git a/docs/gateway/configuration.md b/docs/gateway/configuration.md
index 8a5be2d8c..0b39a9580 100644
--- a/docs/gateway/configuration.md
+++ b/docs/gateway/configuration.md
@@ -826,6 +826,7 @@ Block streaming:
     }
   }
   ```
+See [/concepts/streaming](/concepts/streaming) for behavior + chunking details.
 
 `agent.model.primary` should be set as `provider/model` (e.g. `anthropic/claude-opus-4-5`).
 Aliases come from `agent.models.*.alias` (e.g. `Opus`).
diff --git a/docs/index.md b/docs/index.md
index cd3b4be50..410ff452d 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -80,6 +80,7 @@ Most operations flow through the **Gateway** (`clawdbot gateway`), a single long
 - 🎮 **Discord Bot** — DMs + guild channels via discord.js
 - 💬 **iMessage** — Local imsg CLI integration (macOS)
 - 🤖 **Agent bridge** — Pi (RPC mode) with tool streaming
+- ⏱️ **Streaming + chunking** — Block streaming + Telegram draft streaming details ([/concepts/streaming](/concepts/streaming))
 - 🧠 **Multi-agent routing** — Route provider accounts/peers to isolated agents (workspace + per-agent sessions)
 - 🔐 **Subscription auth** — Anthropic (Claude Pro/Max) + OpenAI (ChatGPT/Codex) via OAuth
 - 💬 **Sessions** — Direct chats collapse into shared `main` (default); groups are isolated
diff --git a/docs/providers/telegram.md b/docs/providers/telegram.md
index ef8b7bac8..3772ebc81 100644
--- a/docs/providers/telegram.md
+++ b/docs/providers/telegram.md
@@ -94,6 +94,7 @@ Reasoning stream (Telegram only):
 - `/reasoning stream` streams reasoning into the draft bubble while the reply is
   generating, then sends the final answer without reasoning.
 - If `telegram.streamMode` is `off`, reasoning stream is disabled.
+More context: [Streaming + chunking](/concepts/streaming).
 
 ## Agent tool (reactions)
 - Tool: `telegram` with `react` action (`chatId`, `messageId`, `emoji`).
diff --git a/docs/start/hubs.md b/docs/start/hubs.md
index bddfaea85..77b943b47 100644
--- a/docs/start/hubs.md
+++ b/docs/start/hubs.md
@@ -34,6 +34,7 @@ Use these hubs to discover every page, including deep dives and reference docs t
 - [Agent runtime](https://docs.clawd.bot/concepts/agent)
 - [Agent workspace](https://docs.clawd.bot/concepts/agent-workspace)
 - [Agent loop](https://docs.clawd.bot/concepts/agent-loop)
+- [Streaming + chunking](/concepts/streaming)
 - [Multi-agent routing](https://docs.clawd.bot/concepts/multi-agent)
 - [Sessions](https://docs.clawd.bot/concepts/session)
 - [Sessions (alias)](https://docs.clawd.bot/concepts/sessions)