From d81627da72d51f2ec2b1af6f7b5ed64c2180fd60 Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Wed, 7 Jan 2026 17:15:53 +0100 Subject: [PATCH] docs: document streaming + chunking --- docs/concepts/agent.md | 1 + docs/concepts/streaming.md | 85 +++++++++++++++++++++++++++++++++++ docs/gateway/configuration.md | 1 + docs/index.md | 1 + docs/providers/telegram.md | 1 + docs/start/hubs.md | 1 + 6 files changed, 90 insertions(+) create mode 100644 docs/concepts/streaming.md diff --git a/docs/concepts/agent.md b/docs/concepts/agent.md index 307aaa534..a4d4bb780 100644 --- a/docs/concepts/agent.md +++ b/docs/concepts/agent.md @@ -102,6 +102,7 @@ Control soft block chunking with `agent.blockStreamingChunk` (defaults to 800–1200 chars; prefers paragraph breaks, then newlines; sentences last). Verbose tool summaries are emitted at tool start (no debounce); Control UI streams tool output via agent events when available. +More details: [Streaming + chunking](/concepts/streaming). ## Configuration (minimal) diff --git a/docs/concepts/streaming.md b/docs/concepts/streaming.md new file mode 100644 index 000000000..26c216b82 --- /dev/null +++ b/docs/concepts/streaming.md @@ -0,0 +1,85 @@ +--- +summary: "Streaming + chunking behavior (block replies, draft streaming, limits)" +read_when: + - Explaining how streaming or chunking works on providers + - Changing block streaming or provider chunking behavior + - Debugging duplicate/early block replies or draft streaming +--- +# Streaming + chunking + +Clawdbot has two separate “streaming” layers: +- **Block streaming (providers):** emit completed **blocks** as the assistant writes. These are normal provider messages (not token deltas). +- **Token-ish streaming (Telegram only):** update a **draft bubble** with partial text while generating; final message is sent at the end. + +There is **no real token streaming** to external provider messages today. Telegram draft streaming is the only partial-stream surface. + +## Block streaming (provider messages) + +Block streaming sends assistant output in coarse chunks as it becomes available. + +``` +Model output + └─ text_delta/events + ├─ (blockStreamingBreak=text_end) + │ └─ chunker emits blocks as buffer grows + └─ (blockStreamingBreak=message_end) + └─ chunker flushes at message_end + └─ provider send (block replies) +``` +Legend: +- `text_delta/events`: model stream events (may be sparse for non-streaming models). +- `chunker`: `EmbeddedBlockChunker` applying min/max bounds + break preference. +- `provider send`: actual outbound messages (block replies). + +**Controls:** +- `agent.blockStreamingDefault`: `"on"`/`"off"` (default on). +- `agent.blockStreamingBreak`: `"text_end"` or `"message_end"`. +- `agent.blockStreamingChunk`: `{ minChars, maxChars, breakPreference? }`. +- Provider hard cap: `*.textChunkLimit` (e.g., `whatsapp.textChunkLimit`). + +**Boundary semantics:** +- `text_end`: stream blocks as soon as chunker emits; flush on each `text_end`. +- `message_end`: wait until assistant message finishes, then flush buffered output. + +`message_end` still uses the chunker if the buffered text exceeds `maxChars`, so it can emit multiple chunks at the end. + +## Chunking algorithm (low/high bounds) + +Block chunking is implemented by `EmbeddedBlockChunker`: +- **Low bound:** don’t emit until buffer >= `minChars` (unless forced). +- **High bound:** prefer splits before `maxChars`; if forced, split at `maxChars`. +- **Break preference:** `paragraph` → `newline` → `sentence` → `whitespace` → hard break. +- **Code fences:** never split inside fences; when forced at `maxChars`, close + reopen the fence to keep Markdown valid. + +`maxChars` is clamped to the provider `textChunkLimit`, so you can’t exceed per-provider caps. + +## “Stream chunks or everything” + +This maps to: +- **Stream chunks:** `blockStreamingDefault: "on"` + `blockStreamingBreak: "text_end"` (emit as you go). +- **Stream everything at end:** `blockStreamingBreak: "message_end"` (flush once, possibly multiple chunks if very long). +- **No block streaming:** `blockStreamingDefault: "off"` (only final reply). + +## Telegram draft streaming (token-ish) + +Telegram is the only provider with draft streaming: +- Uses Bot API `sendMessageDraft` in **private chats with topics**. +- `telegram.streamMode: "partial" | "block" | "off"`. + - `partial`: draft updates with the latest stream text. + - `block`: draft updates in chunked blocks (same chunker rules). + - `off`: no draft streaming. +- Final reply is still a normal message. +- `/reasoning stream` writes reasoning into the draft bubble (Telegram only). + +When draft streaming is active, Clawdbot disables block streaming for that reply to avoid double-streaming. + +``` +Telegram (private + topics) + └─ sendMessageDraft (draft bubble) + ├─ streamMode=partial → update latest text + └─ streamMode=block → chunker updates draft + └─ final reply → normal message +``` +Legend: +- `sendMessageDraft`: Telegram draft bubble (not a real message). +- `final reply`: normal Telegram message send. diff --git a/docs/gateway/configuration.md b/docs/gateway/configuration.md index 8a5be2d8c..0b39a9580 100644 --- a/docs/gateway/configuration.md +++ b/docs/gateway/configuration.md @@ -826,6 +826,7 @@ Block streaming: } } ``` +See [/concepts/streaming](/concepts/streaming) for behavior + chunking details. `agent.model.primary` should be set as `provider/model` (e.g. `anthropic/claude-opus-4-5`). Aliases come from `agent.models.*.alias` (e.g. `Opus`). diff --git a/docs/index.md b/docs/index.md index cd3b4be50..410ff452d 100644 --- a/docs/index.md +++ b/docs/index.md @@ -80,6 +80,7 @@ Most operations flow through the **Gateway** (`clawdbot gateway`), a single long - 🎮 **Discord Bot** — DMs + guild channels via discord.js - 💬 **iMessage** — Local imsg CLI integration (macOS) - 🤖 **Agent bridge** — Pi (RPC mode) with tool streaming +- ⏱️ **Streaming + chunking** — Block streaming + Telegram draft streaming details ([/concepts/streaming](/concepts/streaming)) - 🧠 **Multi-agent routing** — Route provider accounts/peers to isolated agents (workspace + per-agent sessions) - 🔐 **Subscription auth** — Anthropic (Claude Pro/Max) + OpenAI (ChatGPT/Codex) via OAuth - 💬 **Sessions** — Direct chats collapse into shared `main` (default); groups are isolated diff --git a/docs/providers/telegram.md b/docs/providers/telegram.md index ef8b7bac8..3772ebc81 100644 --- a/docs/providers/telegram.md +++ b/docs/providers/telegram.md @@ -94,6 +94,7 @@ Reasoning stream (Telegram only): - `/reasoning stream` streams reasoning into the draft bubble while the reply is generating, then sends the final answer without reasoning. - If `telegram.streamMode` is `off`, reasoning stream is disabled. +More context: [Streaming + chunking](/concepts/streaming). ## Agent tool (reactions) - Tool: `telegram` with `react` action (`chatId`, `messageId`, `emoji`). diff --git a/docs/start/hubs.md b/docs/start/hubs.md index bddfaea85..77b943b47 100644 --- a/docs/start/hubs.md +++ b/docs/start/hubs.md @@ -34,6 +34,7 @@ Use these hubs to discover every page, including deep dives and reference docs t - [Agent runtime](https://docs.clawd.bot/concepts/agent) - [Agent workspace](https://docs.clawd.bot/concepts/agent-workspace) - [Agent loop](https://docs.clawd.bot/concepts/agent-loop) +- [Streaming + chunking](/concepts/streaming) - [Multi-agent routing](https://docs.clawd.bot/concepts/multi-agent) - [Sessions](https://docs.clawd.bot/concepts/session) - [Sessions (alias)](https://docs.clawd.bot/concepts/sessions)