TTS: gate auto audio on inbound voice notes (#1667)

Co-authored-by: Sebastian <sebslight@gmail.com>
This commit is contained in:
Seb Slight
2026-01-24 23:35:20 -05:00
committed by GitHub
parent ede5145191
commit d4f60bf16a
20 changed files with 433 additions and 63 deletions

View File

@@ -1509,7 +1509,7 @@ voice notes; other channels send MP3 audio.
{
messages: {
tts: {
enabled: true,
auto: "always", // off | always | inbound | tagged
mode: "final", // final | all (include tool/block replies)
provider: "elevenlabs",
summaryModel: "openai/gpt-4.1-mini",
@@ -1546,8 +1546,10 @@ voice notes; other channels send MP3 audio.
```
Notes:
- `messages.tts.enabled` can be overridden by local user prefs (see `/tts on`, `/tts off`).
- `prefsPath` stores local overrides (enabled/provider/limit/summarize).
- `messages.tts.auto` controls autoTTS (`off`, `always`, `inbound`, `tagged`).
- `/tts off|always|inbound|tagged` sets the persession auto mode (overrides config).
- `messages.tts.enabled` is legacy; doctor migrates it to `messages.tts.auto`.
- `prefsPath` stores local overrides (provider/limit/summarize).
- `maxTextLength` is a hard cap for TTS input; summaries are truncated to fit.
- `summaryModel` overrides `agents.defaults.model.primary` for auto-summary.
- Accepts `provider/model` or an alias from `agents.defaults.models`.

View File

@@ -68,7 +68,7 @@ Text + native (when enabled):
- `/config show|get|set|unset` (persist config to disk, owner-only; requires `commands.config: true`)
- `/debug show|set|unset|reset` (runtime overrides, owner-only; requires `commands.debug: true`)
- `/usage off|tokens|full|cost` (per-response usage footer or local cost summary)
- `/tts on|off|status|provider|limit|summary|audio` (control TTS; see [/tts](/tts))
- `/tts off|always|inbound|tagged|status|provider|limit|summary|audio` (control TTS; see [/tts](/tts))
- Discord: native command is `/voice` (Discord reserves `/tts`); text `/tts` still works.
- `/stop`
- `/restart`

View File

@@ -53,8 +53,8 @@ so that provider must also be authenticated if you enable summaries.
## Is it enabled by default?
No. TTS is **disabled** by default. Enable it in config or with `/tts on`,
which writes a local preference override.
No. AutoTTS is **off** by default. Enable it in config with
`messages.tts.auto` or per session with `/tts always` (alias: `/tts on`).
Edge TTS **is** enabled by default once TTS is on, and is used automatically
when no OpenAI or ElevenLabs API keys are available.
@@ -70,7 +70,7 @@ Full schema is in [Gateway configuration](/gateway/configuration).
{
messages: {
tts: {
enabled: true,
auto: "always",
provider: "elevenlabs"
}
}
@@ -83,7 +83,7 @@ Full schema is in [Gateway configuration](/gateway/configuration).
{
messages: {
tts: {
enabled: true,
auto: "always",
provider: "openai",
summaryModel: "openai/gpt-4.1-mini",
modelOverrides: {
@@ -121,7 +121,7 @@ Full schema is in [Gateway configuration](/gateway/configuration).
{
messages: {
tts: {
enabled: true,
auto: "always",
provider: "edge",
edge: {
enabled: true,
@@ -156,7 +156,7 @@ Full schema is in [Gateway configuration](/gateway/configuration).
{
messages: {
tts: {
enabled: true,
auto: "always",
maxTextLength: 4000,
timeoutMs: 30000,
prefsPath: "~/.clawdbot/settings/tts.json"
@@ -165,13 +165,25 @@ Full schema is in [Gateway configuration](/gateway/configuration).
}
```
### Only reply with audio after an inbound voice note
```json5
{
messages: {
tts: {
auto: "inbound"
}
}
}
```
### Disable auto-summary for long replies
```json5
{
messages: {
tts: {
enabled: true
auto: "always"
}
}
}
@@ -185,7 +197,10 @@ Then run:
### Notes on fields
- `enabled`: master toggle (default `false`; local prefs can override).
- `auto`: autoTTS mode (`off`, `always`, `inbound`, `tagged`).
- `inbound` only sends audio after an inbound voice note.
- `tagged` only sends audio when the reply includes `[[tts]]` tags.
- `enabled`: legacy toggle (doctor migrates this to `auto`).
- `mode`: `"final"` (default) or `"all"` (includes tool/block replies).
- `provider`: `"elevenlabs"`, `"openai"`, or `"edge"` (fallback is automatic).
- If `provider` is **unset**, Clawdbot prefers `openai` (if key), then `elevenlabs` (if key),
@@ -195,7 +210,7 @@ Then run:
- `modelOverrides`: allow the model to emit TTS directives (on by default).
- `maxTextLength`: hard cap for TTS input (chars). `/tts audio` fails if exceeded.
- `timeoutMs`: request timeout (ms).
- `prefsPath`: override the local prefs JSON path.
- `prefsPath`: override the local prefs JSON path (provider/limit/summary).
- `apiKey` values fall back to env vars (`ELEVENLABS_API_KEY`/`XI_API_KEY`, `OPENAI_API_KEY`).
- `elevenlabs.baseUrl`: override ElevenLabs API base URL.
- `elevenlabs.voiceSettings`:
@@ -218,6 +233,7 @@ Then run:
## Model-driven overrides (default on)
By default, the model **can** emit TTS directives for a single reply.
When `messages.tts.auto` is `tagged`, these directives are required to trigger audio.
When enabled, the model can emit `[[tts:...]]` directives to override the voice
for a single reply, plus an optional `[[tts:text]]...[[/tts:text]]` block to
@@ -338,8 +354,10 @@ Discord note: `/tts` is a built-in Discord command, so Clawdbot registers
`/voice` as the native command there. Text `/tts ...` still works.
```
/tts on
/tts off
/tts always
/tts inbound
/tts tagged
/tts status
/tts provider openai
/tts limit 2000
@@ -350,6 +368,7 @@ Discord note: `/tts` is a built-in Discord command, so Clawdbot registers
Notes:
- Commands require an authorized sender (allowlist/owner rules still apply).
- `commands.text` or native command registration must be enabled.
- `off|always|inbound|tagged` are persession toggles (`/tts on` is an alias for `/tts always`).
- `limit` and `summary` are stored in local prefs, not the main config.
- `/tts audio` generates a one-off audio reply (does not toggle TTS on).