TTS: gate auto audio on inbound voice notes (#1667)

Co-authored-by: Sebastian <sebslight@gmail.com>
2026-01-24 23:35:20 -05:00
parent ede5145191
commit d4f60bf16a
20 changed files with 433 additions and 63 deletions
--- a/docs/gateway/configuration.md
+++ b/docs/gateway/configuration.md
@@ -1509,7 +1509,7 @@ voice notes; other channels send MP3 audio.
 {
  messages: {
    tts: {
-      enabled: true,
+      auto: "always", // off | always | inbound | tagged
      mode: "final", // final | all (include tool/block replies)
      provider: "elevenlabs",
      summaryModel: "openai/gpt-4.1-mini",
@@ -1546,8 +1546,10 @@ voice notes; other channels send MP3 audio.
 ```

 Notes:
- `messages.tts.enabled` can be overridden by local user prefs (see `/tts on`, `/tts off`).
- `prefsPath` stores local overrides (enabled/provider/limit/summarize).
+- `messages.tts.auto` controls auto‑TTS (`off`, `always`, `inbound`, `tagged`).
+- `/tts off|always|inbound|tagged` sets the per‑session auto mode (overrides config).
+- `messages.tts.enabled` is legacy; doctor migrates it to `messages.tts.auto`.
+- `prefsPath` stores local overrides (provider/limit/summarize).
 - `maxTextLength` is a hard cap for TTS input; summaries are truncated to fit.
 - `summaryModel` overrides `agents.defaults.model.primary` for auto-summary.
  - Accepts `provider/model` or an alias from `agents.defaults.models`.
--- a/docs/tools/slash-commands.md
+++ b/docs/tools/slash-commands.md
@@ -68,7 +68,7 @@ Text + native (when enabled):
 - `/config show|get|set|unset` (persist config to disk, owner-only; requires `commands.config: true`)
 - `/debug show|set|unset|reset` (runtime overrides, owner-only; requires `commands.debug: true`)
 - `/usage off|tokens|full|cost` (per-response usage footer or local cost summary)
- `/tts on|off|status|provider|limit|summary|audio` (control TTS; see [/tts](/tts))
+- `/tts off|always|inbound|tagged|status|provider|limit|summary|audio` (control TTS; see [/tts](/tts))
  - Discord: native command is `/voice` (Discord reserves `/tts`); text `/tts` still works.
 - `/stop`
 - `/restart`
--- a/docs/tts.md
+++ b/docs/tts.md
@@ -53,8 +53,8 @@ so that provider must also be authenticated if you enable summaries.

 ## Is it enabled by default?

-No. TTS is **disabled** by default. Enable it in config or with `/tts on`,
-which writes a local preference override.
+No. Auto‑TTS is **off** by default. Enable it in config with
+`messages.tts.auto` or per session with `/tts always` (alias: `/tts on`).

 Edge TTS **is** enabled by default once TTS is on, and is used automatically
 when no OpenAI or ElevenLabs API keys are available.
@@ -70,7 +70,7 @@ Full schema is in [Gateway configuration](/gateway/configuration).
 {
  messages: {
    tts: {
-      enabled: true,
+      auto: "always",
      provider: "elevenlabs"
    }
  }
@@ -83,7 +83,7 @@ Full schema is in [Gateway configuration](/gateway/configuration).
 {
  messages: {
    tts: {
-      enabled: true,
+      auto: "always",
      provider: "openai",
      summaryModel: "openai/gpt-4.1-mini",
      modelOverrides: {
@@ -121,7 +121,7 @@ Full schema is in [Gateway configuration](/gateway/configuration).
 {
  messages: {
    tts: {
-      enabled: true,
+      auto: "always",
      provider: "edge",
      edge: {
        enabled: true,
@@ -156,7 +156,7 @@ Full schema is in [Gateway configuration](/gateway/configuration).
 {
  messages: {
    tts: {
-      enabled: true,
+      auto: "always",
      maxTextLength: 4000,
      timeoutMs: 30000,
      prefsPath: "~/.clawdbot/settings/tts.json"
@@ -165,13 +165,25 @@ Full schema is in [Gateway configuration](/gateway/configuration).
 }
 ```

+### Only reply with audio after an inbound voice note
+
+```json5
+{
+  messages: {
+    tts: {
+      auto: "inbound"
+    }
+  }
+}
+```
+
 ### Disable auto-summary for long replies

 ```json5
 {
  messages: {
    tts: {
-      enabled: true
+      auto: "always"
    }
  }
 }
@@ -185,7 +197,10 @@ Then run:

 ### Notes on fields

- `enabled`: master toggle (default `false`; local prefs can override).
+- `auto`: auto‑TTS mode (`off`, `always`, `inbound`, `tagged`).
+  - `inbound` only sends audio after an inbound voice note.
+  - `tagged` only sends audio when the reply includes `[[tts]]` tags.
+- `enabled`: legacy toggle (doctor migrates this to `auto`).
 - `mode`: `"final"` (default) or `"all"` (includes tool/block replies).
 - `provider`: `"elevenlabs"`, `"openai"`, or `"edge"` (fallback is automatic).
 - If `provider` is **unset**, Clawdbot prefers `openai` (if key), then `elevenlabs` (if key),
@@ -195,7 +210,7 @@ Then run:
 - `modelOverrides`: allow the model to emit TTS directives (on by default).
 - `maxTextLength`: hard cap for TTS input (chars). `/tts audio` fails if exceeded.
 - `timeoutMs`: request timeout (ms).
- `prefsPath`: override the local prefs JSON path.
+- `prefsPath`: override the local prefs JSON path (provider/limit/summary).
 - `apiKey` values fall back to env vars (`ELEVENLABS_API_KEY`/`XI_API_KEY`, `OPENAI_API_KEY`).
 - `elevenlabs.baseUrl`: override ElevenLabs API base URL.
 - `elevenlabs.voiceSettings`:
@@ -218,6 +233,7 @@ Then run:
 ## Model-driven overrides (default on)

 By default, the model **can** emit TTS directives for a single reply.
+When `messages.tts.auto` is `tagged`, these directives are required to trigger audio.

 When enabled, the model can emit `[[tts:...]]` directives to override the voice
 for a single reply, plus an optional `[[tts:text]]...[[/tts:text]]` block to
@@ -338,8 +354,10 @@ Discord note: `/tts` is a built-in Discord command, so Clawdbot registers
 `/voice` as the native command there. Text `/tts ...` still works.

 ```
-/tts on
 /tts off
+/tts always
+/tts inbound
+/tts tagged
 /tts status
 /tts provider openai
 /tts limit 2000
@@ -350,6 +368,7 @@ Discord note: `/tts` is a built-in Discord command, so Clawdbot registers
 Notes:
 - Commands require an authorized sender (allowlist/owner rules still apply).
 - `commands.text` or native command registration must be enabled.
+- `off|always|inbound|tagged` are per‑session toggles (`/tts on` is an alias for `/tts always`).
 - `limit` and `summary` are stored in local prefs, not the main config.
 - `/tts audio` generates a one-off audio reply (does not toggle TTS on).