--- summary: "Talk mode: continuous speech conversations with ElevenLabs TTS" read_when: - Implementing Talk mode on macOS/iOS/Android - Changing voice/TTS/interrupt behavior --- # Talk Mode Talk mode is a continuous voice conversation loop: 1) Listen for speech 2) Send transcript to the model (main session, chat.send) 3) Wait for the response 4) Speak it via ElevenLabs (streaming playback) ## Behavior (macOS) - **Always-on overlay** while Talk mode is enabled. - **Listening → Thinking → Speaking** phase transitions. - On a **short pause** (silence window), the current transcript is sent. - Replies are **written to WebChat** (same as typing). - **Interrupt on speech** (default on): if the user starts talking while the assistant is speaking, we stop playback and note the interruption timestamp for the next prompt. ## Voice directives in replies The assistant may prefix its reply with a **single JSON line** to control voice: ```json {"voice":"","once":true} ``` Rules: - First non-empty line only. - Unknown keys are ignored. - `once: true` applies to the current reply only. - Without `once`, the voice becomes the new default for Talk mode. - The JSON line is stripped before TTS playback. Supported keys: - `voice` / `voice_id` / `voiceId` - `model` / `model_id` / `modelId` - `speed`, `rate` (WPM), `stability`, `similarity`, `style`, `speakerBoost` - `seed`, `normalize`, `lang`, `output_format`, `latency_tier` - `once` ## Config (`~/.clawdbot/clawdbot.json`) ```json5 { "talk": { "voiceId": "elevenlabs_voice_id", "modelId": "eleven_v3", "outputFormat": "mp3_44100_128", "apiKey": "elevenlabs_api_key", "interruptOnSpeech": true } } ``` Defaults: - `interruptOnSpeech`: true - `voiceId`: falls back to `ELEVENLABS_VOICE_ID` / `SAG_VOICE_ID` (or first ElevenLabs voice when API key is available) - `modelId`: defaults to `eleven_v3` when unset - `apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available) - `outputFormat`: defaults to `pcm_44100` on macOS/iOS and `pcm_24000` on Android (set `mp3_*` to force MP3 streaming) ## macOS UI - Menu bar toggle: **Talk** - Config tab: **Talk Mode** group (voice id + interrupt toggle) - Overlay: - **Listening**: cloud pulses with mic level - **Thinking**: sinking animation - **Speaking**: radiating rings - Click cloud: stop speaking - Click X: exit Talk mode ## Notes - Requires Speech + Microphone permissions. - Uses `chat.send` against session key `main`. - TTS uses ElevenLabs streaming API with `ELEVENLABS_API_KEY` and incremental playback on macOS/iOS/Android for lower latency. - `stability` for `eleven_v3` is validated to `0.0`, `0.5`, or `1.0`; other models accept `0..1`. - `latency_tier` is validated to `0..4` when set. - Android supports `pcm_16000`, `pcm_22050`, `pcm_24000`, and `pcm_44100` output formats for low-latency AudioTrack streaming.