feat: move TTS into core (#1559) (thanks @Glucksberg)

This commit is contained in:
Peter Steinberger
2026-01-24 07:57:46 +00:00
parent aef88cd9f1
commit d9a467fe3b
26 changed files with 1522 additions and 1649 deletions

View File

@@ -1446,6 +1446,44 @@ active agents `identity.emoji` when set, otherwise `"👀"`. Set it to `""` t
`removeAckAfterReply` removes the bots ack reaction after a reply is sent
(Slack/Discord/Telegram only). Default: `false`.
#### `messages.tts`
Enable text-to-speech for outbound replies. When on, Clawdbot generates audio
using ElevenLabs or OpenAI and attaches it to responses. Telegram uses Opus
voice notes; other channels send MP3 audio.
```json5
{
messages: {
tts: {
enabled: true,
mode: "final", // final | all (include tool/block replies)
provider: "elevenlabs",
maxTextLength: 4000,
timeoutMs: 30000,
prefsPath: "~/.clawdbot/settings/tts.json",
elevenlabs: {
apiKey: "elevenlabs_api_key",
voiceId: "voice_id",
modelId: "eleven_multilingual_v2"
},
openai: {
apiKey: "openai_api_key",
model: "gpt-4o-mini-tts",
voice: "alloy"
}
}
}
}
```
Notes:
- `messages.tts.enabled` can be overridden by local user prefs (see `/tts_on`, `/tts_off`).
- `prefsPath` stores local overrides (enabled/provider/limit/summarize).
- `maxTextLength` is a hard cap for TTS input; summaries are truncated to fit.
- `/tts_limit` and `/tts_summary` control per-user summarization settings.
- `apiKey` values fall back to `ELEVENLABS_API_KEY`/`XI_API_KEY` and `OPENAI_API_KEY`.
### `talk`
Defaults for Talk mode (macOS/iOS/Android). Voice IDs fall back to `ELEVENLABS_VOICE_ID` or `SAG_VOICE_ID` when unset.

View File

@@ -67,6 +67,13 @@ Text + native (when enabled):
- `/config show|get|set|unset` (persist config to disk, owner-only; requires `commands.config: true`)
- `/debug show|set|unset|reset` (runtime overrides, owner-only; requires `commands.debug: true`)
- `/usage off|tokens|full|cost` (per-response usage footer or local cost summary)
- `/tts_on` (enable TTS replies)
- `/tts_off` (disable TTS replies)
- `/tts_provider [openai|elevenlabs]` (set or show TTS provider)
- `/tts_limit <chars>` (max chars before TTS summarization)
- `/tts_summary on|off` (toggle TTS auto-summary)
- `/tts_status` (show TTS status)
- `/audio <text>` (convert text to a TTS audio reply)
- `/stop`
- `/restart`
- `/dock-telegram` (alias: `/dock_telegram`) (switch replies to Telegram)