let5see/clawdbot

Files

Peter Steinberger e119a82334 feat: talk mode key distribution and tts polling

2025-12-30 01:57:58 +01:00

2.3 KiB

Raw Blame History

summary, read_when

summary

read_when

Talk mode: continuous speech conversations with ElevenLabs TTS

Implementing Talk mode on macOS/iOS/Android

Changing voice/TTS/interrupt behavior

Talk Mode

Talk mode is a continuous voice conversation loop:

Listen for speech
Send transcript to the model (main session, chat.send)
Wait for the response
Speak it via ElevenLabs

Behavior (macOS)

Always-on overlay while Talk mode is enabled.
Listening → Thinking → Speaking phase transitions.
On a short pause (silence window), the current transcript is sent.
Replies are written to WebChat (same as typing).
Interrupt on speech (default on): if the user starts talking while the assistant is speaking, we stop playback and note the interruption timestamp for the next prompt.

Voice directives in replies

The assistant may prefix its reply with a single JSON line to control voice:

{"voice":"<voice-id>","once":true}

Rules:

First non-empty line only.
Unknown keys are ignored.
once: true applies to the current reply only.
Without once, the voice becomes the new default for Talk mode.
The JSON line is stripped before TTS playback.

Supported keys:

voice / voice_id / voiceId
model / model_id / modelId
speed, rate (WPM), stability, similarity, style, speakerBoost
seed, normalize, lang, output_format, latency_tier
once

Config (clawdis.json)

{
  "talk": {
    "voiceId": "elevenlabs_voice_id",
    "modelId": "eleven_v3",
    "outputFormat": "mp3_44100_128",
    "apiKey": "elevenlabs_api_key",
    "interruptOnSpeech": true
  }
}

Defaults:

interruptOnSpeech: true
voiceId: falls back to ELEVENLABS_VOICE_ID / SAG_VOICE_ID
apiKey: falls back to ELEVENLABS_API_KEY (or gateway shell profile if available)

macOS UI

Menu bar toggle: Talk
Config tab: Talk Mode group (voice id + interrupt toggle)
Overlay:
- Listening: cloud pulses with mic level
- Thinking: sinking animation
- Speaking: radiating rings
- Click cloud: stop speaking
- Click X: exit Talk mode

Notes

Requires Speech + Microphone permissions.
Uses chat.send against session key main.
TTS uses ElevenLabs API with ELEVENLABS_API_KEY.