From d2b76acb728990b11708a14488d727b0fff203e2 Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Thu, 15 Jan 2026 00:34:34 +0000 Subject: [PATCH] docs: expand markdown formatting pipeline --- docs/concepts/markdown-formatting.md | 58 ++++++++++++++++++++++++++-- 1 file changed, 55 insertions(+), 3 deletions(-) diff --git a/docs/concepts/markdown-formatting.md b/docs/concepts/markdown-formatting.md index 1d23a0f0b..ae8889b73 100644 --- a/docs/concepts/markdown-formatting.md +++ b/docs/concepts/markdown-formatting.md @@ -3,11 +3,22 @@ summary: "Markdown formatting pipeline for outbound channels" read_when: - You are changing markdown formatting or chunking for outbound channels - You are adding a new channel formatter or style mapping + - You are debugging formatting regressions across channels --- # Markdown formatting Clawdbot formats outbound Markdown by converting it into a shared intermediate -representation (IR) before rendering channel-specific output. +representation (IR) before rendering channel-specific output. The IR keeps the +source text intact while carrying style/link spans so chunking and rendering can +stay consistent across channels. + +## Goals + +- **Consistency:** one parse step, multiple renderers. +- **Safe chunking:** split text before rendering so inline formatting never + breaks across chunks. +- **Channel fit:** map the same IR to Slack mrkdwn, Telegram HTML, and Signal + style ranges without re-parsing Markdown. ## Pipeline @@ -22,13 +33,54 @@ representation (IR) before rendering channel-specific output. - **Telegram:** HTML tags (``, ``, ``, ``, `
`, ``).
    - **Signal:** plain text + `text-style` ranges; links become `label (url)` when label differs.
 
+## Where it is used
+
+- Slack, Telegram, and Signal outbound adapters render from the IR.
+- Other channels (WhatsApp, iMessage, MS Teams, Discord) still use plain text or
+  their own formatting rules.
+
+## Chunking rules
+
+- Chunk limits come from channel adapters/config and are applied to the IR text.
+- Code fences are preserved as a single block with a trailing newline so channels
+  render them correctly.
+- List prefixes and blockquote prefixes are part of the IR text, so chunking
+  does not split mid-prefix.
+- Inline styles (bold/italic/strike/inline-code/spoiler) are never split across
+  chunks; the renderer reopens styles inside each chunk.
+
+If you need more on chunking behavior across channels, see
+[Streaming + chunking](/concepts/streaming).
+
 ## Link policy
 
-- **Slack:** `[label](url)` -> ``; bare URLs are left as-is.
+- **Slack:** `[label](url)` -> ``; bare URLs remain bare. Autolink
+  is disabled during parse to avoid double-linking.
 - **Telegram:** `[label](url)` -> `label` (HTML parse mode).
-- **Signal:** `[label](url)` -> `label (url)` unless label matches url.
+- **Signal:** `[label](url)` -> `label (url)` unless label matches the URL.
 
 ## Spoilers
 
 Spoiler markers (`||spoiler||`) are parsed only for Signal, where they map to
 SPOILER style ranges. Other channels treat them as plain text.
+
+## How to add or update a channel formatter
+
+1. **Parse once:** use the shared `markdownToIR(...)` helper with channel-appropriate
+   options (autolink, heading style, blockquote prefix).
+2. **Render:** implement a renderer with `renderMarkdownWithMarkers(...)` and a
+   style marker map (or Signal style ranges).
+3. **Chunk:** call `chunkMarkdownIR(...)` before rendering; render each chunk.
+4. **Wire adapter:** update the channel outbound adapter to use the new chunker
+   and renderer.
+5. **Test:** add or update format tests and an outbound delivery test if the
+   channel uses chunking.
+
+## Common gotchas
+
+- Slack angle-bracket tokens (`<@U123>`, `<#C123>`, ``) must be
+  preserved; escape raw HTML safely.
+- Telegram HTML requires escaping text outside tags to avoid broken markup.
+- Signal style ranges depend on UTF-16 offsets; do not use code point offsets.
+- Preserve trailing newlines for fenced code blocks so closing markers land on
+  their own line.