docs: reorganize documentation structure
This commit is contained in:
79
docs/nodes/talk.md
Normal file
79
docs/nodes/talk.md
Normal file
@@ -0,0 +1,79 @@
|
||||
---
|
||||
summary: "Talk mode: continuous speech conversations with ElevenLabs TTS"
|
||||
read_when:
|
||||
- Implementing Talk mode on macOS/iOS/Android
|
||||
- Changing voice/TTS/interrupt behavior
|
||||
---
|
||||
# Talk Mode
|
||||
|
||||
Talk mode is a continuous voice conversation loop:
|
||||
1) Listen for speech
|
||||
2) Send transcript to the model (main session, chat.send)
|
||||
3) Wait for the response
|
||||
4) Speak it via ElevenLabs (streaming playback)
|
||||
|
||||
## Behavior (macOS)
|
||||
- **Always-on overlay** while Talk mode is enabled.
|
||||
- **Listening → Thinking → Speaking** phase transitions.
|
||||
- On a **short pause** (silence window), the current transcript is sent.
|
||||
- Replies are **written to WebChat** (same as typing).
|
||||
- **Interrupt on speech** (default on): if the user starts talking while the assistant is speaking, we stop playback and note the interruption timestamp for the next prompt.
|
||||
|
||||
## Voice directives in replies
|
||||
The assistant may prefix its reply with a **single JSON line** to control voice:
|
||||
|
||||
```json
|
||||
{"voice":"<voice-id>","once":true}
|
||||
```
|
||||
|
||||
Rules:
|
||||
- First non-empty line only.
|
||||
- Unknown keys are ignored.
|
||||
- `once: true` applies to the current reply only.
|
||||
- Without `once`, the voice becomes the new default for Talk mode.
|
||||
- The JSON line is stripped before TTS playback.
|
||||
|
||||
Supported keys:
|
||||
- `voice` / `voice_id` / `voiceId`
|
||||
- `model` / `model_id` / `modelId`
|
||||
- `speed`, `rate` (WPM), `stability`, `similarity`, `style`, `speakerBoost`
|
||||
- `seed`, `normalize`, `lang`, `output_format`, `latency_tier`
|
||||
- `once`
|
||||
|
||||
## Config (clawdbot.json)
|
||||
```json5
|
||||
{
|
||||
"talk": {
|
||||
"voiceId": "elevenlabs_voice_id",
|
||||
"modelId": "eleven_v3",
|
||||
"outputFormat": "mp3_44100_128",
|
||||
"apiKey": "elevenlabs_api_key",
|
||||
"interruptOnSpeech": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Defaults:
|
||||
- `interruptOnSpeech`: true
|
||||
- `voiceId`: falls back to `ELEVENLABS_VOICE_ID` / `SAG_VOICE_ID` (or first ElevenLabs voice when API key is available)
|
||||
- `modelId`: defaults to `eleven_v3` when unset
|
||||
- `apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available)
|
||||
- `outputFormat`: defaults to `pcm_44100` on macOS/iOS and `pcm_24000` on Android (set `mp3_*` to force MP3 streaming)
|
||||
|
||||
## macOS UI
|
||||
- Menu bar toggle: **Talk**
|
||||
- Config tab: **Talk Mode** group (voice id + interrupt toggle)
|
||||
- Overlay:
|
||||
- **Listening**: cloud pulses with mic level
|
||||
- **Thinking**: sinking animation
|
||||
- **Speaking**: radiating rings
|
||||
- Click cloud: stop speaking
|
||||
- Click X: exit Talk mode
|
||||
|
||||
## Notes
|
||||
- Requires Speech + Microphone permissions.
|
||||
- Uses `chat.send` against session key `main`.
|
||||
- TTS uses ElevenLabs streaming API with `ELEVENLABS_API_KEY` and incremental playback on macOS/iOS/Android for lower latency.
|
||||
- `stability` for `eleven_v3` is validated to `0.0`, `0.5`, or `1.0`; other models accept `0..1`.
|
||||
- `latency_tier` is validated to `0..4` when set.
|
||||
- Android supports `pcm_16000`, `pcm_22050`, `pcm_24000`, and `pcm_44100` output formats for low-latency AudioTrack streaming.
|
||||
Reference in New Issue
Block a user