docs: reorganize documentation structure

2026-01-07 00:41:31 +01:00
parent b8db8502aa
commit db4d0b8e75
126 changed files with 881 additions and 270 deletions
--- a/docs/nodes/talk.md
+++ b/docs/nodes/talk.md
@@ -0,0 +1,79 @@
+---
+summary: "Talk mode: continuous speech conversations with ElevenLabs TTS"
+read_when:
+  - Implementing Talk mode on macOS/iOS/Android
+  - Changing voice/TTS/interrupt behavior
+---
+# Talk Mode
+
+Talk mode is a continuous voice conversation loop:
+1) Listen for speech
+2) Send transcript to the model (main session, chat.send)
+3) Wait for the response
+4) Speak it via ElevenLabs (streaming playback)
+
+## Behavior (macOS)
+- **Always-on overlay** while Talk mode is enabled.
+- **Listening → Thinking → Speaking** phase transitions.
+- On a **short pause** (silence window), the current transcript is sent.
+- Replies are **written to WebChat** (same as typing).
+- **Interrupt on speech** (default on): if the user starts talking while the assistant is speaking, we stop playback and note the interruption timestamp for the next prompt.
+
+## Voice directives in replies
+The assistant may prefix its reply with a **single JSON line** to control voice:
+
+```json
+{"voice":"<voice-id>","once":true}
+```
+
+Rules:
+- First non-empty line only.
+- Unknown keys are ignored.
+- `once: true` applies to the current reply only.
+- Without `once`, the voice becomes the new default for Talk mode.
+- The JSON line is stripped before TTS playback.
+
+Supported keys:
+- `voice` / `voice_id` / `voiceId`
+- `model` / `model_id` / `modelId`
+- `speed`, `rate` (WPM), `stability`, `similarity`, `style`, `speakerBoost`
+- `seed`, `normalize`, `lang`, `output_format`, `latency_tier`
+- `once`
+
+## Config (clawdbot.json)
+```json5
+{
+  "talk": {
+    "voiceId": "elevenlabs_voice_id",
+    "modelId": "eleven_v3",
+    "outputFormat": "mp3_44100_128",
+    "apiKey": "elevenlabs_api_key",
+    "interruptOnSpeech": true
+  }
+}
+```
+
+Defaults:
+- `interruptOnSpeech`: true
+- `voiceId`: falls back to `ELEVENLABS_VOICE_ID` / `SAG_VOICE_ID` (or first ElevenLabs voice when API key is available)
+- `modelId`: defaults to `eleven_v3` when unset
+- `apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available)
+- `outputFormat`: defaults to `pcm_44100` on macOS/iOS and `pcm_24000` on Android (set `mp3_*` to force MP3 streaming)
+
+## macOS UI
+- Menu bar toggle: **Talk**
+- Config tab: **Talk Mode** group (voice id + interrupt toggle)
+- Overlay:
+  - **Listening**: cloud pulses with mic level
+  - **Thinking**: sinking animation
+  - **Speaking**: radiating rings
+  - Click cloud: stop speaking
+  - Click X: exit Talk mode
+
+## Notes
+- Requires Speech + Microphone permissions.
+- Uses `chat.send` against session key `main`.
+- TTS uses ElevenLabs streaming API with `ELEVENLABS_API_KEY` and incremental playback on macOS/iOS/Android for lower latency.
+- `stability` for `eleven_v3` is validated to `0.0`, `0.5`, or `1.0`; other models accept `0..1`.
+- `latency_tier` is validated to `0..4` when set.
+- Android supports `pcm_16000`, `pcm_22050`, `pcm_24000`, and `pcm_44100` output formats for low-latency AudioTrack streaming.