Files
clawdbot/docs/talk.md
2025-12-30 01:57:58 +01:00

75 lines
2.3 KiB
Markdown

---
summary: "Talk mode: continuous speech conversations with ElevenLabs TTS"
read_when:
- Implementing Talk mode on macOS/iOS/Android
- Changing voice/TTS/interrupt behavior
---
# Talk Mode
Talk mode is a continuous voice conversation loop:
1) Listen for speech
2) Send transcript to the model (main session, chat.send)
3) Wait for the response
4) Speak it via ElevenLabs
## Behavior (macOS)
- **Always-on overlay** while Talk mode is enabled.
- **Listening → Thinking → Speaking** phase transitions.
- On a **short pause** (silence window), the current transcript is sent.
- Replies are **written to WebChat** (same as typing).
- **Interrupt on speech** (default on): if the user starts talking while the assistant is speaking, we stop playback and note the interruption timestamp for the next prompt.
## Voice directives in replies
The assistant may prefix its reply with a **single JSON line** to control voice:
```json
{"voice":"<voice-id>","once":true}
```
Rules:
- First non-empty line only.
- Unknown keys are ignored.
- `once: true` applies to the current reply only.
- Without `once`, the voice becomes the new default for Talk mode.
- The JSON line is stripped before TTS playback.
Supported keys:
- `voice` / `voice_id` / `voiceId`
- `model` / `model_id` / `modelId`
- `speed`, `rate` (WPM), `stability`, `similarity`, `style`, `speakerBoost`
- `seed`, `normalize`, `lang`, `output_format`, `latency_tier`
- `once`
## Config (clawdis.json)
```json5
{
"talk": {
"voiceId": "elevenlabs_voice_id",
"modelId": "eleven_v3",
"outputFormat": "mp3_44100_128",
"apiKey": "elevenlabs_api_key",
"interruptOnSpeech": true
}
}
```
Defaults:
- `interruptOnSpeech`: true
- `voiceId`: falls back to `ELEVENLABS_VOICE_ID` / `SAG_VOICE_ID`
- `apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available)
## macOS UI
- Menu bar toggle: **Talk**
- Config tab: **Talk Mode** group (voice id + interrupt toggle)
- Overlay:
- **Listening**: cloud pulses with mic level
- **Thinking**: sinking animation
- **Speaking**: radiating rings
- Click cloud: stop speaking
- Click X: exit Talk mode
## Notes
- Requires Speech + Microphone permissions.
- Uses `chat.send` against session key `main`.
- TTS uses ElevenLabs API with `ELEVENLABS_API_KEY`.