docs: reorganize documentation structure

2026-01-07 00:41:31 +01:00
parent b8db8502aa
commit db4d0b8e75
126 changed files with 881 additions and 270 deletions
--- a/docs/nodes/audio.md
+++ b/docs/nodes/audio.md
@@ -0,0 +1,48 @@
+---
+summary: "How inbound audio/voice notes are downloaded, transcribed, and injected into replies"
+read_when:
+  - Changing audio transcription or media handling
+---
+# Audio / Voice Notes — 2025-12-05
+
+## What works
+- **Optional transcription**: If `routing.transcribeAudio.command` is set in `~/.clawdbot/clawdbot.json`, CLAWDBOT will:
+  1) Download inbound audio to a temp path when WhatsApp only provides a URL.
+  2) Run the configured CLI (templated with `{{MediaPath}}`), expecting transcript on stdout.
+  3) Replace `Body` with the transcript, set `{{Transcript}}`, and prepend the original media path plus a `Transcript:` section in the command prompt so models see both.
+  4) Continue through the normal auto-reply pipeline (templating, sessions, Pi command).
+- **Verbose logging**: In `--verbose`, we log when transcription runs and when the transcript replaces the body.
+
+## Config example (OpenAI Whisper CLI)
+Requires `OPENAI_API_KEY` in env and `openai` CLI installed:
+```json5
+{
+  routing: {
+    transcribeAudio: {
+      command: [
+        "openai",
+        "api",
+        "audio.transcriptions.create",
+        "-m",
+        "whisper-1",
+        "-f",
+        "{{MediaPath}}",
+        "--response-format",
+        "text"
+      ],
+      timeoutSeconds: 45
+    }
+  }
+}
+```
+
+## Notes & limits
+- We don’t ship a transcriber; you opt in with any CLI that prints text to stdout (Whisper cloud, whisper.cpp, vosk, Deepgram, etc.).
+- Size guard: inbound audio must be ≤5 MB (matches the temp media store and transcript pipeline).
+- Outbound caps: web send supports audio/voice up to 16 MB (sent as a voice note with `ptt: true`).
+- If transcription fails, we fall back to the original body/media note; replies still go through.
+- Transcript is available to templates as `{{Transcript}}`; models get both the media path and a `Transcript:` block in the prompt when using command mode.
+
+## Gotchas
+- Ensure your CLI exits 0 and prints plain text; JSON needs to be massaged via `jq -r .text`.
+- Keep timeouts reasonable (`timeoutSeconds`, default 45s) to avoid blocking the reply queue.