TTS: gate auto audio on inbound voice notes (#1667)

Co-authored-by: Sebastian <sebslight@gmail.com>
This commit is contained in:
Seb Slight
2026-01-24 23:35:20 -05:00
committed by GitHub
parent ede5145191
commit d4f60bf16a
20 changed files with 433 additions and 63 deletions

View File

@@ -11,6 +11,7 @@ Docs: https://docs.clawd.bot
### Changes ### Changes
- TTS: add Edge TTS provider fallback, defaulting to keyless Edge with MP3 retry on format failures. (#1668) Thanks @steipete. https://docs.clawd.bot/tts - TTS: add Edge TTS provider fallback, defaulting to keyless Edge with MP3 retry on format failures. (#1668) Thanks @steipete. https://docs.clawd.bot/tts
- Web search: add Brave freshness filter parameter for time-scoped results. (#1688) Thanks @JonUleis. https://docs.clawd.bot/tools/web - Web search: add Brave freshness filter parameter for time-scoped results. (#1688) Thanks @JonUleis. https://docs.clawd.bot/tools/web
- TTS: add auto mode enum (off/always/inbound/tagged) with per-session `/tts` override. (#1667) Thanks @sebslight. https://docs.clawd.bot/tts
- Docs: expand FAQ (migration, scheduling, concurrency, model recommendations, OpenAI subscription auth, Pi sizing, hackable install, docs SSL workaround). - Docs: expand FAQ (migration, scheduling, concurrency, model recommendations, OpenAI subscription auth, Pi sizing, hackable install, docs SSL workaround).
- Docs: add verbose installer troubleshooting guidance. - Docs: add verbose installer troubleshooting guidance.
- Docs: update Fly.io guide notes. - Docs: update Fly.io guide notes.

View File

@@ -1509,7 +1509,7 @@ voice notes; other channels send MP3 audio.
{ {
messages: { messages: {
tts: { tts: {
enabled: true, auto: "always", // off | always | inbound | tagged
mode: "final", // final | all (include tool/block replies) mode: "final", // final | all (include tool/block replies)
provider: "elevenlabs", provider: "elevenlabs",
summaryModel: "openai/gpt-4.1-mini", summaryModel: "openai/gpt-4.1-mini",
@@ -1546,8 +1546,10 @@ voice notes; other channels send MP3 audio.
``` ```
Notes: Notes:
- `messages.tts.enabled` can be overridden by local user prefs (see `/tts on`, `/tts off`). - `messages.tts.auto` controls autoTTS (`off`, `always`, `inbound`, `tagged`).
- `prefsPath` stores local overrides (enabled/provider/limit/summarize). - `/tts off|always|inbound|tagged` sets the persession auto mode (overrides config).
- `messages.tts.enabled` is legacy; doctor migrates it to `messages.tts.auto`.
- `prefsPath` stores local overrides (provider/limit/summarize).
- `maxTextLength` is a hard cap for TTS input; summaries are truncated to fit. - `maxTextLength` is a hard cap for TTS input; summaries are truncated to fit.
- `summaryModel` overrides `agents.defaults.model.primary` for auto-summary. - `summaryModel` overrides `agents.defaults.model.primary` for auto-summary.
- Accepts `provider/model` or an alias from `agents.defaults.models`. - Accepts `provider/model` or an alias from `agents.defaults.models`.

View File

@@ -68,7 +68,7 @@ Text + native (when enabled):
- `/config show|get|set|unset` (persist config to disk, owner-only; requires `commands.config: true`) - `/config show|get|set|unset` (persist config to disk, owner-only; requires `commands.config: true`)
- `/debug show|set|unset|reset` (runtime overrides, owner-only; requires `commands.debug: true`) - `/debug show|set|unset|reset` (runtime overrides, owner-only; requires `commands.debug: true`)
- `/usage off|tokens|full|cost` (per-response usage footer or local cost summary) - `/usage off|tokens|full|cost` (per-response usage footer or local cost summary)
- `/tts on|off|status|provider|limit|summary|audio` (control TTS; see [/tts](/tts)) - `/tts off|always|inbound|tagged|status|provider|limit|summary|audio` (control TTS; see [/tts](/tts))
- Discord: native command is `/voice` (Discord reserves `/tts`); text `/tts` still works. - Discord: native command is `/voice` (Discord reserves `/tts`); text `/tts` still works.
- `/stop` - `/stop`
- `/restart` - `/restart`

View File

@@ -53,8 +53,8 @@ so that provider must also be authenticated if you enable summaries.
## Is it enabled by default? ## Is it enabled by default?
No. TTS is **disabled** by default. Enable it in config or with `/tts on`, No. AutoTTS is **off** by default. Enable it in config with
which writes a local preference override. `messages.tts.auto` or per session with `/tts always` (alias: `/tts on`).
Edge TTS **is** enabled by default once TTS is on, and is used automatically Edge TTS **is** enabled by default once TTS is on, and is used automatically
when no OpenAI or ElevenLabs API keys are available. when no OpenAI or ElevenLabs API keys are available.
@@ -70,7 +70,7 @@ Full schema is in [Gateway configuration](/gateway/configuration).
{ {
messages: { messages: {
tts: { tts: {
enabled: true, auto: "always",
provider: "elevenlabs" provider: "elevenlabs"
} }
} }
@@ -83,7 +83,7 @@ Full schema is in [Gateway configuration](/gateway/configuration).
{ {
messages: { messages: {
tts: { tts: {
enabled: true, auto: "always",
provider: "openai", provider: "openai",
summaryModel: "openai/gpt-4.1-mini", summaryModel: "openai/gpt-4.1-mini",
modelOverrides: { modelOverrides: {
@@ -121,7 +121,7 @@ Full schema is in [Gateway configuration](/gateway/configuration).
{ {
messages: { messages: {
tts: { tts: {
enabled: true, auto: "always",
provider: "edge", provider: "edge",
edge: { edge: {
enabled: true, enabled: true,
@@ -156,7 +156,7 @@ Full schema is in [Gateway configuration](/gateway/configuration).
{ {
messages: { messages: {
tts: { tts: {
enabled: true, auto: "always",
maxTextLength: 4000, maxTextLength: 4000,
timeoutMs: 30000, timeoutMs: 30000,
prefsPath: "~/.clawdbot/settings/tts.json" prefsPath: "~/.clawdbot/settings/tts.json"
@@ -165,13 +165,25 @@ Full schema is in [Gateway configuration](/gateway/configuration).
} }
``` ```
### Only reply with audio after an inbound voice note
```json5
{
messages: {
tts: {
auto: "inbound"
}
}
}
```
### Disable auto-summary for long replies ### Disable auto-summary for long replies
```json5 ```json5
{ {
messages: { messages: {
tts: { tts: {
enabled: true auto: "always"
} }
} }
} }
@@ -185,7 +197,10 @@ Then run:
### Notes on fields ### Notes on fields
- `enabled`: master toggle (default `false`; local prefs can override). - `auto`: autoTTS mode (`off`, `always`, `inbound`, `tagged`).
- `inbound` only sends audio after an inbound voice note.
- `tagged` only sends audio when the reply includes `[[tts]]` tags.
- `enabled`: legacy toggle (doctor migrates this to `auto`).
- `mode`: `"final"` (default) or `"all"` (includes tool/block replies). - `mode`: `"final"` (default) or `"all"` (includes tool/block replies).
- `provider`: `"elevenlabs"`, `"openai"`, or `"edge"` (fallback is automatic). - `provider`: `"elevenlabs"`, `"openai"`, or `"edge"` (fallback is automatic).
- If `provider` is **unset**, Clawdbot prefers `openai` (if key), then `elevenlabs` (if key), - If `provider` is **unset**, Clawdbot prefers `openai` (if key), then `elevenlabs` (if key),
@@ -195,7 +210,7 @@ Then run:
- `modelOverrides`: allow the model to emit TTS directives (on by default). - `modelOverrides`: allow the model to emit TTS directives (on by default).
- `maxTextLength`: hard cap for TTS input (chars). `/tts audio` fails if exceeded. - `maxTextLength`: hard cap for TTS input (chars). `/tts audio` fails if exceeded.
- `timeoutMs`: request timeout (ms). - `timeoutMs`: request timeout (ms).
- `prefsPath`: override the local prefs JSON path. - `prefsPath`: override the local prefs JSON path (provider/limit/summary).
- `apiKey` values fall back to env vars (`ELEVENLABS_API_KEY`/`XI_API_KEY`, `OPENAI_API_KEY`). - `apiKey` values fall back to env vars (`ELEVENLABS_API_KEY`/`XI_API_KEY`, `OPENAI_API_KEY`).
- `elevenlabs.baseUrl`: override ElevenLabs API base URL. - `elevenlabs.baseUrl`: override ElevenLabs API base URL.
- `elevenlabs.voiceSettings`: - `elevenlabs.voiceSettings`:
@@ -218,6 +233,7 @@ Then run:
## Model-driven overrides (default on) ## Model-driven overrides (default on)
By default, the model **can** emit TTS directives for a single reply. By default, the model **can** emit TTS directives for a single reply.
When `messages.tts.auto` is `tagged`, these directives are required to trigger audio.
When enabled, the model can emit `[[tts:...]]` directives to override the voice When enabled, the model can emit `[[tts:...]]` directives to override the voice
for a single reply, plus an optional `[[tts:text]]...[[/tts:text]]` block to for a single reply, plus an optional `[[tts:text]]...[[/tts:text]]` block to
@@ -338,8 +354,10 @@ Discord note: `/tts` is a built-in Discord command, so Clawdbot registers
`/voice` as the native command there. Text `/tts ...` still works. `/voice` as the native command there. Text `/tts ...` still works.
``` ```
/tts on
/tts off /tts off
/tts always
/tts inbound
/tts tagged
/tts status /tts status
/tts provider openai /tts provider openai
/tts limit 2000 /tts limit 2000
@@ -350,6 +368,7 @@ Discord note: `/tts` is a built-in Discord command, so Clawdbot registers
Notes: Notes:
- Commands require an authorized sender (allowlist/owner rules still apply). - Commands require an authorized sender (allowlist/owner rules still apply).
- `commands.text` or native command registration must be enabled. - `commands.text` or native command registration must be enabled.
- `off|always|inbound|tagged` are persession toggles (`/tts on` is an alias for `/tts always`).
- `limit` and `summary` are stored in local prefs, not the main config. - `limit` and `summary` are stored in local prefs, not the main config.
- `/tts audio` generates a one-off audio reply (does not toggle TTS on). - `/tts audio` generates a one-off audio reply (does not toggle TTS on).

View File

@@ -6,19 +6,20 @@ import {
getTtsMaxLength, getTtsMaxLength,
getTtsProvider, getTtsProvider,
isSummarizationEnabled, isSummarizationEnabled,
isTtsEnabled,
isTtsProviderConfigured, isTtsProviderConfigured,
normalizeTtsAutoMode,
resolveTtsAutoMode,
resolveTtsApiKey, resolveTtsApiKey,
resolveTtsConfig, resolveTtsConfig,
resolveTtsPrefsPath, resolveTtsPrefsPath,
resolveTtsProviderOrder, resolveTtsProviderOrder,
setLastTtsAttempt, setLastTtsAttempt,
setSummarizationEnabled, setSummarizationEnabled,
setTtsEnabled,
setTtsMaxLength, setTtsMaxLength,
setTtsProvider, setTtsProvider,
textToSpeech, textToSpeech,
} from "../../tts/tts.js"; } from "../../tts/tts.js";
import { updateSessionStore } from "../../config/sessions.js";
type ParsedTtsCommand = { type ParsedTtsCommand = {
action: string; action: string;
@@ -39,9 +40,9 @@ function ttsUsage(): ReplyPayload {
// Keep usage in one place so help/validation stays consistent. // Keep usage in one place so help/validation stays consistent.
return { return {
text: text:
"⚙️ Usage: /tts <on|off|status|provider|limit|summary|audio> [value]" + "⚙️ Usage: /tts <off|always|inbound|tagged|status|provider|limit|summary|audio> [value]" +
"\nExamples:\n" + "\nExamples:\n" +
"/tts on\n" + "/tts always\n" +
"/tts provider openai\n" + "/tts provider openai\n" +
"/tts provider edge\n" + "/tts provider edge\n" +
"/tts limit 2000\n" + "/tts limit 2000\n" +
@@ -71,14 +72,30 @@ export const handleTtsCommands: CommandHandler = async (params, allowTextCommand
return { shouldContinue: false, reply: ttsUsage() }; return { shouldContinue: false, reply: ttsUsage() };
} }
if (action === "on") { const requestedAuto = normalizeTtsAutoMode(
setTtsEnabled(prefsPath, true); action === "on" ? "always" : action === "off" ? "off" : action,
return { shouldContinue: false, reply: { text: "🔊 TTS enabled." } }; );
} if (requestedAuto) {
const entry = params.sessionEntry;
if (action === "off") { const sessionKey = params.sessionKey;
setTtsEnabled(prefsPath, false); const store = params.sessionStore;
return { shouldContinue: false, reply: { text: "🔇 TTS disabled." } }; if (entry && store && sessionKey) {
entry.ttsAuto = requestedAuto;
entry.updatedAt = Date.now();
store[sessionKey] = entry;
if (params.storePath) {
await updateSessionStore(params.storePath, (store) => {
store[sessionKey] = entry;
});
}
}
const label = requestedAuto === "always" ? "enabled (always)" : requestedAuto;
return {
shouldContinue: false,
reply: {
text: requestedAuto === "off" ? "🔇 TTS disabled." : `🔊 TTS ${label}.`,
},
};
} }
if (action === "audio") { if (action === "audio") {
@@ -212,7 +229,9 @@ export const handleTtsCommands: CommandHandler = async (params, allowTextCommand
} }
if (action === "status") { if (action === "status") {
const enabled = isTtsEnabled(config, prefsPath); const sessionAuto = params.sessionEntry?.ttsAuto;
const autoMode = resolveTtsAutoMode({ config, prefsPath, sessionAuto });
const enabled = autoMode !== "off";
const provider = getTtsProvider(config, prefsPath); const provider = getTtsProvider(config, prefsPath);
const hasKey = isTtsProviderConfigured(config, provider); const hasKey = isTtsProviderConfigured(config, provider);
const providerStatus = const providerStatus =
@@ -226,9 +245,10 @@ export const handleTtsCommands: CommandHandler = async (params, allowTextCommand
const maxLength = getTtsMaxLength(prefsPath); const maxLength = getTtsMaxLength(prefsPath);
const summarize = isSummarizationEnabled(prefsPath); const summarize = isSummarizationEnabled(prefsPath);
const last = getLastTtsAttempt(); const last = getLastTtsAttempt();
const autoLabel = sessionAuto ? `${autoMode} (session)` : autoMode;
const lines = [ const lines = [
"📊 TTS status", "📊 TTS status",
`State: ${enabled ? "✅ enabled" : "❌ disabled"}`, `Auto: ${enabled ? autoLabel : "off"}`,
`Provider: ${provider} (${providerStatus})`, `Provider: ${provider} (${providerStatus})`,
`Text limit: ${maxLength} chars`, `Text limit: ${maxLength} chars`,
`Auto-summary: ${summarize ? "on" : "off"}`, `Auto-summary: ${summarize ? "on" : "off"}`,

View File

@@ -1,4 +1,6 @@
import type { ClawdbotConfig } from "../../config/config.js"; import type { ClawdbotConfig } from "../../config/config.js";
import { resolveSessionAgentId } from "../../agents/agent-scope.js";
import { loadSessionStore, resolveStorePath } from "../../config/sessions.js";
import { logVerbose } from "../../globals.js"; import { logVerbose } from "../../globals.js";
import { isDiagnosticsEnabled } from "../../infra/diagnostic-events.js"; import { isDiagnosticsEnabled } from "../../infra/diagnostic-events.js";
import { import {
@@ -14,7 +16,55 @@ import { formatAbortReplyText, tryFastAbortFromMessage } from "./abort.js";
import { shouldSkipDuplicateInbound } from "./inbound-dedupe.js"; import { shouldSkipDuplicateInbound } from "./inbound-dedupe.js";
import type { ReplyDispatcher, ReplyDispatchKind } from "./reply-dispatcher.js"; import type { ReplyDispatcher, ReplyDispatchKind } from "./reply-dispatcher.js";
import { isRoutableChannel, routeReply } from "./route-reply.js"; import { isRoutableChannel, routeReply } from "./route-reply.js";
import { maybeApplyTtsToPayload } from "../../tts/tts.js"; import { maybeApplyTtsToPayload, normalizeTtsAutoMode } from "../../tts/tts.js";
const AUDIO_PLACEHOLDER_RE = /^<media:audio>(\s*\([^)]*\))?$/i;
const AUDIO_HEADER_RE = /^\[Audio\b/i;
const normalizeMediaType = (value: string): string => value.split(";")[0]?.trim().toLowerCase();
const isInboundAudioContext = (ctx: FinalizedMsgContext): boolean => {
const rawTypes = [
typeof ctx.MediaType === "string" ? ctx.MediaType : undefined,
...(Array.isArray(ctx.MediaTypes) ? ctx.MediaTypes : []),
].filter(Boolean) as string[];
const types = rawTypes.map((type) => normalizeMediaType(type));
if (types.some((type) => type === "audio" || type.startsWith("audio/"))) return true;
const body =
typeof ctx.BodyForCommands === "string"
? ctx.BodyForCommands
: typeof ctx.CommandBody === "string"
? ctx.CommandBody
: typeof ctx.RawBody === "string"
? ctx.RawBody
: typeof ctx.Body === "string"
? ctx.Body
: "";
const trimmed = body.trim();
if (!trimmed) return false;
if (AUDIO_PLACEHOLDER_RE.test(trimmed)) return true;
return AUDIO_HEADER_RE.test(trimmed);
};
const resolveSessionTtsAuto = (
ctx: FinalizedMsgContext,
cfg: ClawdbotConfig,
): string | undefined => {
const targetSessionKey =
ctx.CommandSource === "native" ? ctx.CommandTargetSessionKey?.trim() : undefined;
const sessionKey = (targetSessionKey ?? ctx.SessionKey)?.trim();
if (!sessionKey) return undefined;
const agentId = resolveSessionAgentId({ sessionKey, config: cfg });
const storePath = resolveStorePath(cfg.session?.store, { agentId });
try {
const store = loadSessionStore(storePath);
const entry = store[sessionKey.toLowerCase()] ?? store[sessionKey];
return normalizeTtsAutoMode(entry?.ttsAuto);
} catch {
return undefined;
}
};
export type DispatchFromConfigResult = { export type DispatchFromConfigResult = {
queuedFinal: boolean; queuedFinal: boolean;
@@ -81,6 +131,8 @@ export async function dispatchReplyFromConfig(params: {
return { queuedFinal: false, counts: dispatcher.getQueuedCounts() }; return { queuedFinal: false, counts: dispatcher.getQueuedCounts() };
} }
const inboundAudio = isInboundAudioContext(ctx);
const sessionTtsAuto = resolveSessionTtsAuto(ctx, cfg);
const hookRunner = getGlobalHookRunner(); const hookRunner = getGlobalHookRunner();
if (hookRunner?.hasHooks("message_received")) { if (hookRunner?.hasHooks("message_received")) {
const timestamp = const timestamp =
@@ -223,6 +275,8 @@ export async function dispatchReplyFromConfig(params: {
cfg, cfg,
channel: ttsChannel, channel: ttsChannel,
kind: "tool", kind: "tool",
inboundAudio,
ttsAuto: sessionTtsAuto,
}); });
if (shouldRouteToOriginating) { if (shouldRouteToOriginating) {
await sendPayloadAsync(ttsPayload); await sendPayloadAsync(ttsPayload);
@@ -239,6 +293,8 @@ export async function dispatchReplyFromConfig(params: {
cfg, cfg,
channel: ttsChannel, channel: ttsChannel,
kind: "block", kind: "block",
inboundAudio,
ttsAuto: sessionTtsAuto,
}); });
if (shouldRouteToOriginating) { if (shouldRouteToOriginating) {
await sendPayloadAsync(ttsPayload, context?.abortSignal); await sendPayloadAsync(ttsPayload, context?.abortSignal);
@@ -262,6 +318,8 @@ export async function dispatchReplyFromConfig(params: {
cfg, cfg,
channel: ttsChannel, channel: ttsChannel,
kind: "final", kind: "final",
inboundAudio,
ttsAuto: sessionTtsAuto,
}); });
if (shouldRouteToOriginating && originatingChannel && originatingTo) { if (shouldRouteToOriginating && originatingChannel && originatingTo) {
// Route final reply to originating channel. // Route final reply to originating channel.

View File

@@ -5,6 +5,7 @@ import path from "node:path";
import { CURRENT_SESSION_VERSION, SessionManager } from "@mariozechner/pi-coding-agent"; import { CURRENT_SESSION_VERSION, SessionManager } from "@mariozechner/pi-coding-agent";
import { resolveSessionAgentId } from "../../agents/agent-scope.js"; import { resolveSessionAgentId } from "../../agents/agent-scope.js";
import type { ClawdbotConfig } from "../../config/config.js"; import type { ClawdbotConfig } from "../../config/config.js";
import type { TtsAutoMode } from "../../config/types.tts.js";
import { import {
DEFAULT_RESET_TRIGGERS, DEFAULT_RESET_TRIGGERS,
deriveSessionMetaPatch, deriveSessionMetaPatch,
@@ -128,6 +129,7 @@ export async function initSessionState(params: {
let persistedThinking: string | undefined; let persistedThinking: string | undefined;
let persistedVerbose: string | undefined; let persistedVerbose: string | undefined;
let persistedReasoning: string | undefined; let persistedReasoning: string | undefined;
let persistedTtsAuto: TtsAutoMode | undefined;
let persistedModelOverride: string | undefined; let persistedModelOverride: string | undefined;
let persistedProviderOverride: string | undefined; let persistedProviderOverride: string | undefined;
@@ -220,6 +222,7 @@ export async function initSessionState(params: {
persistedThinking = entry.thinkingLevel; persistedThinking = entry.thinkingLevel;
persistedVerbose = entry.verboseLevel; persistedVerbose = entry.verboseLevel;
persistedReasoning = entry.reasoningLevel; persistedReasoning = entry.reasoningLevel;
persistedTtsAuto = entry.ttsAuto;
persistedModelOverride = entry.modelOverride; persistedModelOverride = entry.modelOverride;
persistedProviderOverride = entry.providerOverride; persistedProviderOverride = entry.providerOverride;
} else { } else {
@@ -258,6 +261,7 @@ export async function initSessionState(params: {
thinkingLevel: persistedThinking ?? baseEntry?.thinkingLevel, thinkingLevel: persistedThinking ?? baseEntry?.thinkingLevel,
verboseLevel: persistedVerbose ?? baseEntry?.verboseLevel, verboseLevel: persistedVerbose ?? baseEntry?.verboseLevel,
reasoningLevel: persistedReasoning ?? baseEntry?.reasoningLevel, reasoningLevel: persistedReasoning ?? baseEntry?.reasoningLevel,
ttsAuto: persistedTtsAuto ?? baseEntry?.ttsAuto,
responseUsage: baseEntry?.responseUsage, responseUsage: baseEntry?.responseUsage,
modelOverride: persistedModelOverride ?? baseEntry?.modelOverride, modelOverride: persistedModelOverride ?? baseEntry?.modelOverride,
providerOverride: persistedProviderOverride ?? baseEntry?.providerOverride, providerOverride: persistedProviderOverride ?? baseEntry?.providerOverride,

View File

@@ -17,7 +17,7 @@ import {
getTtsMaxLength, getTtsMaxLength,
getTtsProvider, getTtsProvider,
isSummarizationEnabled, isSummarizationEnabled,
isTtsEnabled, resolveTtsAutoMode,
resolveTtsConfig, resolveTtsConfig,
resolveTtsPrefsPath, resolveTtsPrefsPath,
} from "../tts/tts.js"; } from "../tts/tts.js";
@@ -252,15 +252,23 @@ const formatMediaUnderstandingLine = (decisions?: MediaUnderstandingDecision[])
return `📎 Media: ${parts.join(" · ")}`; return `📎 Media: ${parts.join(" · ")}`;
}; };
const formatVoiceModeLine = (config?: ClawdbotConfig): string | null => { const formatVoiceModeLine = (
config?: ClawdbotConfig,
sessionEntry?: SessionEntry,
): string | null => {
if (!config) return null; if (!config) return null;
const ttsConfig = resolveTtsConfig(config); const ttsConfig = resolveTtsConfig(config);
const prefsPath = resolveTtsPrefsPath(ttsConfig); const prefsPath = resolveTtsPrefsPath(ttsConfig);
if (!isTtsEnabled(ttsConfig, prefsPath)) return null; const autoMode = resolveTtsAutoMode({
config: ttsConfig,
prefsPath,
sessionAuto: sessionEntry?.ttsAuto,
});
if (autoMode === "off") return null;
const provider = getTtsProvider(ttsConfig, prefsPath); const provider = getTtsProvider(ttsConfig, prefsPath);
const maxLength = getTtsMaxLength(prefsPath); const maxLength = getTtsMaxLength(prefsPath);
const summarize = isSummarizationEnabled(prefsPath) ? "on" : "off"; const summarize = isSummarizationEnabled(prefsPath) ? "on" : "off";
return `🔊 Voice: on · provider=${provider} · limit=${maxLength} · summary=${summarize}`; return `🔊 Voice: ${autoMode} · provider=${provider} · limit=${maxLength} · summary=${summarize}`;
}; };
export function buildStatusMessage(args: StatusArgs): string { export function buildStatusMessage(args: StatusArgs): string {
@@ -398,7 +406,7 @@ export function buildStatusMessage(args: StatusArgs): string {
const usageCostLine = const usageCostLine =
usagePair && costLine ? `${usagePair} · ${costLine}` : (usagePair ?? costLine); usagePair && costLine ? `${usagePair} · ${costLine}` : (usagePair ?? costLine);
const mediaLine = formatMediaUnderstandingLine(args.mediaDecisions); const mediaLine = formatMediaUnderstandingLine(args.mediaDecisions);
const voiceLine = formatVoiceModeLine(args.config); const voiceLine = formatVoiceModeLine(args.config, args.sessionEntry);
return [ return [
versionLine, versionLine,

View File

@@ -138,6 +138,16 @@ describe("legacy config detection", () => {
expect(res.config?.channels?.telegram?.groups?.["*"]?.requireMention).toBe(false); expect(res.config?.channels?.telegram?.groups?.["*"]?.requireMention).toBe(false);
expect(res.config?.channels?.telegram?.requireMention).toBeUndefined(); expect(res.config?.channels?.telegram?.requireMention).toBeUndefined();
}); });
it("migrates messages.tts.enabled to messages.tts.auto", async () => {
vi.resetModules();
const { migrateLegacyConfig } = await import("./config.js");
const res = migrateLegacyConfig({
messages: { tts: { enabled: true } },
});
expect(res.changes).toContain("Moved messages.tts.enabled → messages.tts.auto (always).");
expect(res.config?.messages?.tts?.auto).toBe("always");
expect(res.config?.messages?.tts?.enabled).toBeUndefined();
});
it("migrates legacy model config to agent.models + model lists", async () => { it("migrates legacy model config to agent.models + model lists", async () => {
vi.resetModules(); vi.resetModules();
const { migrateLegacyConfig } = await import("./config.js"); const { migrateLegacyConfig } = await import("./config.js");

View File

@@ -40,6 +40,26 @@ export const LEGACY_CONFIG_MIGRATIONS_PART_3: LegacyConfigMigration[] = [
delete tools.bash; delete tools.bash;
}, },
}, },
{
id: "messages.tts.enabled->auto",
describe: "Move messages.tts.enabled to messages.tts.auto",
apply: (raw, changes) => {
const messages = getRecord(raw.messages);
const tts = getRecord(messages?.tts);
if (!tts) return;
if (tts.auto !== undefined) {
if ("enabled" in tts) {
delete tts.enabled;
changes.push("Removed messages.tts.enabled (messages.tts.auto already set).");
}
return;
}
if (typeof tts.enabled !== "boolean") return;
tts.auto = tts.enabled ? "always" : "off";
delete tts.enabled;
changes.push(`Moved messages.tts.enabled → messages.tts.auto (${String(tts.auto)}).`);
},
},
{ {
id: "agent.defaults-v2", id: "agent.defaults-v2",
describe: "Move agent config to agents.defaults and tools", describe: "Move agent config to agents.defaults and tools",

View File

@@ -120,6 +120,10 @@ export const LEGACY_CONFIG_RULES: LegacyConfigRule[] = [
message: message:
"agent.imageModelFallbacks was replaced by agents.defaults.imageModel.fallbacks (auto-migrated on load).", "agent.imageModelFallbacks was replaced by agents.defaults.imageModel.fallbacks (auto-migrated on load).",
}, },
{
path: ["messages", "tts", "enabled"],
message: "messages.tts.enabled was replaced by messages.tts.auto (auto-migrated on load).",
},
{ {
path: ["gateway", "token"], path: ["gateway", "token"],
message: "gateway.token is ignored; use gateway.auth.token instead (auto-migrated on load).", message: "gateway.token is ignored; use gateway.auth.token instead (auto-migrated on load).",

View File

@@ -4,6 +4,7 @@ import type { Skill } from "@mariozechner/pi-coding-agent";
import type { NormalizedChatType } from "../../channels/chat-type.js"; import type { NormalizedChatType } from "../../channels/chat-type.js";
import type { ChannelId } from "../../channels/plugins/types.js"; import type { ChannelId } from "../../channels/plugins/types.js";
import type { DeliveryContext } from "../../utils/delivery-context.js"; import type { DeliveryContext } from "../../utils/delivery-context.js";
import type { TtsAutoMode } from "../types.tts.js";
export type SessionScope = "per-sender" | "global"; export type SessionScope = "per-sender" | "global";
@@ -42,6 +43,7 @@ export type SessionEntry = {
verboseLevel?: string; verboseLevel?: string;
reasoningLevel?: string; reasoningLevel?: string;
elevatedLevel?: string; elevatedLevel?: string;
ttsAuto?: TtsAutoMode;
execHost?: string; execHost?: string;
execSecurity?: string; execSecurity?: string;
execAsk?: string; execAsk?: string;

View File

@@ -2,6 +2,8 @@ export type TtsProvider = "elevenlabs" | "openai" | "edge";
export type TtsMode = "final" | "all"; export type TtsMode = "final" | "all";
export type TtsAutoMode = "off" | "always" | "inbound" | "tagged";
export type TtsModelOverrideConfig = { export type TtsModelOverrideConfig = {
/** Enable model-provided overrides for TTS. */ /** Enable model-provided overrides for TTS. */
enabled?: boolean; enabled?: boolean;
@@ -22,7 +24,9 @@ export type TtsModelOverrideConfig = {
}; };
export type TtsConfig = { export type TtsConfig = {
/** Enable auto-TTS (can be overridden by local prefs). */ /** Auto-TTS mode (preferred). */
auto?: TtsAutoMode;
/** Legacy: enable auto-TTS when `auto` is not set. */
enabled?: boolean; enabled?: boolean;
/** Apply TTS to final replies only or to all replies (tool/block/final). */ /** Apply TTS to final replies only or to all replies (tool/block/final). */
mode?: TtsMode; mode?: TtsMode;

View File

@@ -158,8 +158,10 @@ export const MarkdownConfigSchema = z
export const TtsProviderSchema = z.enum(["elevenlabs", "openai", "edge"]); export const TtsProviderSchema = z.enum(["elevenlabs", "openai", "edge"]);
export const TtsModeSchema = z.enum(["final", "all"]); export const TtsModeSchema = z.enum(["final", "all"]);
export const TtsAutoSchema = z.enum(["off", "always", "inbound", "tagged"]);
export const TtsConfigSchema = z export const TtsConfigSchema = z
.object({ .object({
auto: TtsAutoSchema.optional(),
enabled: z.boolean().optional(), enabled: z.boolean().optional(),
mode: TtsModeSchema.optional(), mode: TtsModeSchema.optional(),
provider: TtsProviderSchema.optional(), provider: TtsProviderSchema.optional(),

View File

@@ -136,9 +136,8 @@ export async function processDiscordMessage(ctx: DiscordMessagePreflightContext)
const forumParentSlug = const forumParentSlug =
isForumParent && threadParentName ? normalizeDiscordSlug(threadParentName) : ""; isForumParent && threadParentName ? normalizeDiscordSlug(threadParentName) : "";
const threadChannelId = threadChannel?.id; const threadChannelId = threadChannel?.id;
const isForumStarter = Boolean( const isForumStarter =
threadChannelId && isForumParent && forumParentSlug && message.id === threadChannelId, Boolean(threadChannelId && isForumParent && forumParentSlug) && message.id === threadChannelId;
);
const forumContextLine = isForumStarter ? `[Forum parent: #${forumParentSlug}]` : null; const forumContextLine = isForumStarter ? `[Forum parent: #${forumParentSlug}]` : null;
const groupChannel = isGuildMessage && displayChannelSlug ? `#${displayChannelSlug}` : undefined; const groupChannel = isGuildMessage && displayChannelSlug ? `#${displayChannelSlug}` : undefined;
const groupSubject = isDirectMessage ? undefined : groupChannel; const groupSubject = isDirectMessage ? undefined : groupChannel;

View File

@@ -5,6 +5,7 @@ import {
getTtsProvider, getTtsProvider,
isTtsEnabled, isTtsEnabled,
isTtsProviderConfigured, isTtsProviderConfigured,
resolveTtsAutoMode,
resolveTtsApiKey, resolveTtsApiKey,
resolveTtsConfig, resolveTtsConfig,
resolveTtsPrefsPath, resolveTtsPrefsPath,
@@ -24,11 +25,13 @@ export const ttsHandlers: GatewayRequestHandlers = {
const config = resolveTtsConfig(cfg); const config = resolveTtsConfig(cfg);
const prefsPath = resolveTtsPrefsPath(config); const prefsPath = resolveTtsPrefsPath(config);
const provider = getTtsProvider(config, prefsPath); const provider = getTtsProvider(config, prefsPath);
const autoMode = resolveTtsAutoMode({ config, prefsPath });
const fallbackProviders = resolveTtsProviderOrder(provider) const fallbackProviders = resolveTtsProviderOrder(provider)
.slice(1) .slice(1)
.filter((candidate) => isTtsProviderConfigured(config, candidate)); .filter((candidate) => isTtsProviderConfigured(config, candidate));
respond(true, { respond(true, {
enabled: isTtsEnabled(config, prefsPath), enabled: isTtsEnabled(config, prefsPath),
auto: autoMode,
provider, provider,
fallbackProvider: fallbackProviders[0] ?? null, fallbackProvider: fallbackProviders[0] ?? null,
fallbackProviders, fallbackProviders,

View File

@@ -4,7 +4,7 @@ import { completeSimple } from "@mariozechner/pi-ai";
import { getApiKeyForModel } from "../agents/model-auth.js"; import { getApiKeyForModel } from "../agents/model-auth.js";
import { resolveModel } from "../agents/pi-embedded-runner/model.js"; import { resolveModel } from "../agents/pi-embedded-runner/model.js";
import { _test, getTtsProvider, resolveTtsConfig } from "./tts.js"; import * as tts from "./tts.js";
vi.mock("@mariozechner/pi-ai", () => ({ vi.mock("@mariozechner/pi-ai", () => ({
completeSimple: vi.fn(), completeSimple: vi.fn(),
@@ -37,6 +37,8 @@ vi.mock("../agents/model-auth.js", () => ({
requireApiKey: vi.fn((auth: { apiKey?: string }) => auth.apiKey ?? ""), requireApiKey: vi.fn((auth: { apiKey?: string }) => auth.apiKey ?? ""),
})); }));
const { _test, resolveTtsConfig, maybeApplyTtsToPayload, getTtsProvider } = tts;
const { const {
isValidVoiceId, isValidVoiceId,
isValidOpenAIVoice, isValidOpenAIVoice,
@@ -431,4 +433,129 @@ describe("tts", () => {
); );
}); });
}); });
describe("maybeApplyTtsToPayload", () => {
const baseCfg = {
agents: { defaults: { model: { primary: "openai/gpt-4o-mini" } } },
messages: {
tts: {
auto: "inbound",
provider: "openai",
openai: { apiKey: "test-key", model: "gpt-4o-mini-tts", voice: "alloy" },
},
},
};
it("skips auto-TTS when inbound audio gating is on and the message is not audio", async () => {
const prevPrefs = process.env.CLAWDBOT_TTS_PREFS;
process.env.CLAWDBOT_TTS_PREFS = `/tmp/tts-test-${Date.now()}.json`;
const originalFetch = globalThis.fetch;
const fetchMock = vi.fn(async () => ({
ok: true,
arrayBuffer: async () => new ArrayBuffer(1),
}));
globalThis.fetch = fetchMock as unknown as typeof fetch;
const payload = { text: "Hello world" };
const result = await maybeApplyTtsToPayload({
payload,
cfg: baseCfg,
kind: "final",
inboundAudio: false,
});
expect(result).toBe(payload);
expect(fetchMock).not.toHaveBeenCalled();
globalThis.fetch = originalFetch;
process.env.CLAWDBOT_TTS_PREFS = prevPrefs;
});
it("attempts auto-TTS when inbound audio gating is on and the message is audio", async () => {
const prevPrefs = process.env.CLAWDBOT_TTS_PREFS;
process.env.CLAWDBOT_TTS_PREFS = `/tmp/tts-test-${Date.now()}.json`;
const originalFetch = globalThis.fetch;
const fetchMock = vi.fn(async () => ({
ok: true,
arrayBuffer: async () => new ArrayBuffer(1),
}));
globalThis.fetch = fetchMock as unknown as typeof fetch;
const result = await maybeApplyTtsToPayload({
payload: { text: "Hello world" },
cfg: baseCfg,
kind: "final",
inboundAudio: true,
});
expect(result.mediaUrl).toBeDefined();
expect(fetchMock).toHaveBeenCalledTimes(1);
globalThis.fetch = originalFetch;
process.env.CLAWDBOT_TTS_PREFS = prevPrefs;
});
it("skips auto-TTS in tagged mode unless a tts tag is present", async () => {
const prevPrefs = process.env.CLAWDBOT_TTS_PREFS;
process.env.CLAWDBOT_TTS_PREFS = `/tmp/tts-test-${Date.now()}.json`;
const originalFetch = globalThis.fetch;
const fetchMock = vi.fn(async () => ({
ok: true,
arrayBuffer: async () => new ArrayBuffer(1),
}));
globalThis.fetch = fetchMock as unknown as typeof fetch;
const cfg = {
...baseCfg,
messages: {
...baseCfg.messages,
tts: { ...baseCfg.messages.tts, auto: "tagged" },
},
};
const payload = { text: "Hello world" };
const result = await maybeApplyTtsToPayload({
payload,
cfg,
kind: "final",
});
expect(result).toBe(payload);
expect(fetchMock).not.toHaveBeenCalled();
globalThis.fetch = originalFetch;
process.env.CLAWDBOT_TTS_PREFS = prevPrefs;
});
it("runs auto-TTS in tagged mode when tags are present", async () => {
const prevPrefs = process.env.CLAWDBOT_TTS_PREFS;
process.env.CLAWDBOT_TTS_PREFS = `/tmp/tts-test-${Date.now()}.json`;
const originalFetch = globalThis.fetch;
const fetchMock = vi.fn(async () => ({
ok: true,
arrayBuffer: async () => new ArrayBuffer(1),
}));
globalThis.fetch = fetchMock as unknown as typeof fetch;
const cfg = {
...baseCfg,
messages: {
...baseCfg.messages,
tts: { ...baseCfg.messages.tts, auto: "tagged" },
},
};
const result = await maybeApplyTtsToPayload({
payload: { text: "[[tts:text]]Hello world[[/tts:text]]" },
cfg,
kind: "final",
});
expect(result.mediaUrl).toBeDefined();
expect(fetchMock).toHaveBeenCalledTimes(1);
globalThis.fetch = originalFetch;
process.env.CLAWDBOT_TTS_PREFS = prevPrefs;
});
});
}); });

View File

@@ -20,6 +20,7 @@ import type { ChannelId } from "../channels/plugins/types.js";
import type { ClawdbotConfig } from "../config/config.js"; import type { ClawdbotConfig } from "../config/config.js";
import type { import type {
TtsConfig, TtsConfig,
TtsAutoMode,
TtsMode, TtsMode,
TtsProvider, TtsProvider,
TtsModelOverrideConfig, TtsModelOverrideConfig,
@@ -75,8 +76,10 @@ const DEFAULT_OUTPUT = {
voiceCompatible: false, voiceCompatible: false,
}; };
const TTS_AUTO_MODES = new Set<TtsAutoMode>(["off", "always", "inbound", "tagged"]);
export type ResolvedTtsConfig = { export type ResolvedTtsConfig = {
enabled: boolean; auto: TtsAutoMode;
mode: TtsMode; mode: TtsMode;
provider: TtsProvider; provider: TtsProvider;
providerSource: "config" | "default"; providerSource: "config" | "default";
@@ -123,6 +126,7 @@ export type ResolvedTtsConfig = {
type TtsUserPrefs = { type TtsUserPrefs = {
tts?: { tts?: {
auto?: TtsAutoMode;
enabled?: boolean; enabled?: boolean;
provider?: TtsProvider; provider?: TtsProvider;
maxLength?: number; maxLength?: number;
@@ -161,6 +165,7 @@ type TtsDirectiveOverrides = {
type TtsDirectiveParseResult = { type TtsDirectiveParseResult = {
cleanedText: string; cleanedText: string;
ttsText?: string; ttsText?: string;
hasDirective: boolean;
overrides: TtsDirectiveOverrides; overrides: TtsDirectiveOverrides;
warnings: string[]; warnings: string[];
}; };
@@ -187,6 +192,15 @@ type TtsStatusEntry = {
let lastTtsAttempt: TtsStatusEntry | undefined; let lastTtsAttempt: TtsStatusEntry | undefined;
export function normalizeTtsAutoMode(value: unknown): TtsAutoMode | undefined {
if (typeof value !== "string") return undefined;
const normalized = value.trim().toLowerCase();
if (TTS_AUTO_MODES.has(normalized as TtsAutoMode)) {
return normalized as TtsAutoMode;
}
return undefined;
}
function resolveModelOverridePolicy( function resolveModelOverridePolicy(
overrides: TtsModelOverrideConfig | undefined, overrides: TtsModelOverrideConfig | undefined,
): ResolvedTtsModelOverrides { ): ResolvedTtsModelOverrides {
@@ -220,8 +234,9 @@ export function resolveTtsConfig(cfg: ClawdbotConfig): ResolvedTtsConfig {
const raw: TtsConfig = cfg.messages?.tts ?? {}; const raw: TtsConfig = cfg.messages?.tts ?? {};
const providerSource = raw.provider ? "config" : "default"; const providerSource = raw.provider ? "config" : "default";
const edgeOutputFormat = raw.edge?.outputFormat?.trim(); const edgeOutputFormat = raw.edge?.outputFormat?.trim();
const auto = normalizeTtsAutoMode(raw.auto) ?? (raw.enabled ? "always" : "off");
return { return {
enabled: raw.enabled ?? false, auto,
mode: raw.mode ?? "final", mode: raw.mode ?? "final",
provider: raw.provider ?? "edge", provider: raw.provider ?? "edge",
providerSource, providerSource,
@@ -279,17 +294,48 @@ export function resolveTtsPrefsPath(config: ResolvedTtsConfig): string {
return path.join(CONFIG_DIR, "settings", "tts.json"); return path.join(CONFIG_DIR, "settings", "tts.json");
} }
function resolveTtsAutoModeFromPrefs(prefs: TtsUserPrefs): TtsAutoMode | undefined {
const auto = normalizeTtsAutoMode(prefs.tts?.auto);
if (auto) return auto;
if (typeof prefs.tts?.enabled === "boolean") {
return prefs.tts.enabled ? "always" : "off";
}
return undefined;
}
export function resolveTtsAutoMode(params: {
config: ResolvedTtsConfig;
prefsPath: string;
sessionAuto?: string;
}): TtsAutoMode {
const sessionAuto = normalizeTtsAutoMode(params.sessionAuto);
if (sessionAuto) return sessionAuto;
const prefsAuto = resolveTtsAutoModeFromPrefs(readPrefs(params.prefsPath));
if (prefsAuto) return prefsAuto;
return params.config.auto;
}
export function buildTtsSystemPromptHint(cfg: ClawdbotConfig): string | undefined { export function buildTtsSystemPromptHint(cfg: ClawdbotConfig): string | undefined {
const config = resolveTtsConfig(cfg); const config = resolveTtsConfig(cfg);
const prefsPath = resolveTtsPrefsPath(config); const prefsPath = resolveTtsPrefsPath(config);
if (!isTtsEnabled(config, prefsPath)) return undefined; const autoMode = resolveTtsAutoMode({ config, prefsPath });
if (autoMode === "off") return undefined;
const maxLength = getTtsMaxLength(prefsPath); const maxLength = getTtsMaxLength(prefsPath);
const summarize = isSummarizationEnabled(prefsPath) ? "on" : "off"; const summarize = isSummarizationEnabled(prefsPath) ? "on" : "off";
const autoHint =
autoMode === "inbound"
? "Only use TTS when the user's last message includes audio/voice."
: autoMode === "tagged"
? "Only use TTS when you include [[tts]] or [[tts:text]] tags."
: undefined;
return [ return [
"Voice (TTS) is enabled.", "Voice (TTS) is enabled.",
autoHint,
`Keep spoken text ≤${maxLength} chars to avoid auto-summary (summary ${summarize}).`, `Keep spoken text ≤${maxLength} chars to avoid auto-summary (summary ${summarize}).`,
"Use [[tts:...]] and optional [[tts:text]]...[[/tts:text]] to control voice/expressiveness.", "Use [[tts:...]] and optional [[tts:text]]...[[/tts:text]] to control voice/expressiveness.",
].join("\n"); ]
.filter(Boolean)
.join("\n");
} }
function readPrefs(prefsPath: string): TtsUserPrefs { function readPrefs(prefsPath: string): TtsUserPrefs {
@@ -323,16 +369,25 @@ function updatePrefs(prefsPath: string, update: (prefs: TtsUserPrefs) => void):
atomicWriteFileSync(prefsPath, JSON.stringify(prefs, null, 2)); atomicWriteFileSync(prefsPath, JSON.stringify(prefs, null, 2));
} }
export function isTtsEnabled(config: ResolvedTtsConfig, prefsPath: string): boolean { export function isTtsEnabled(
const prefs = readPrefs(prefsPath); config: ResolvedTtsConfig,
if (prefs.tts?.enabled !== undefined) return prefs.tts.enabled === true; prefsPath: string,
return config.enabled; sessionAuto?: string,
): boolean {
return resolveTtsAutoMode({ config, prefsPath, sessionAuto }) !== "off";
}
export function setTtsAutoMode(prefsPath: string, mode: TtsAutoMode): void {
updatePrefs(prefsPath, (prefs) => {
const next = { ...prefs.tts };
delete next.enabled;
next.auto = mode;
prefs.tts = next;
});
} }
export function setTtsEnabled(prefsPath: string, enabled: boolean): void { export function setTtsEnabled(prefsPath: string, enabled: boolean): void {
updatePrefs(prefsPath, (prefs) => { setTtsAutoMode(prefsPath, enabled ? "always" : "off");
prefs.tts = { ...prefs.tts, enabled };
});
} }
export function getTtsProvider(config: ResolvedTtsConfig, prefsPath: string): TtsProvider { export function getTtsProvider(config: ResolvedTtsConfig, prefsPath: string): TtsProvider {
@@ -485,15 +540,17 @@ function parseTtsDirectives(
policy: ResolvedTtsModelOverrides, policy: ResolvedTtsModelOverrides,
): TtsDirectiveParseResult { ): TtsDirectiveParseResult {
if (!policy.enabled) { if (!policy.enabled) {
return { cleanedText: text, overrides: {}, warnings: [] }; return { cleanedText: text, overrides: {}, warnings: [], hasDirective: false };
} }
const overrides: TtsDirectiveOverrides = {}; const overrides: TtsDirectiveOverrides = {};
const warnings: string[] = []; const warnings: string[] = [];
let cleanedText = text; let cleanedText = text;
let hasDirective = false;
const blockRegex = /\[\[tts:text\]\]([\s\S]*?)\[\[\/tts:text\]\]/gi; const blockRegex = /\[\[tts:text\]\]([\s\S]*?)\[\[\/tts:text\]\]/gi;
cleanedText = cleanedText.replace(blockRegex, (_match, inner: string) => { cleanedText = cleanedText.replace(blockRegex, (_match, inner: string) => {
hasDirective = true;
if (policy.allowText && overrides.ttsText == null) { if (policy.allowText && overrides.ttsText == null) {
overrides.ttsText = inner.trim(); overrides.ttsText = inner.trim();
} }
@@ -502,6 +559,7 @@ function parseTtsDirectives(
const directiveRegex = /\[\[tts:([^\]]+)\]\]/gi; const directiveRegex = /\[\[tts:([^\]]+)\]\]/gi;
cleanedText = cleanedText.replace(directiveRegex, (_match, body: string) => { cleanedText = cleanedText.replace(directiveRegex, (_match, body: string) => {
hasDirective = true;
const tokens = body.split(/\s+/).filter(Boolean); const tokens = body.split(/\s+/).filter(Boolean);
for (const token of tokens) { for (const token of tokens) {
const eqIndex = token.indexOf("="); const eqIndex = token.indexOf("=");
@@ -672,6 +730,7 @@ function parseTtsDirectives(
return { return {
cleanedText, cleanedText,
ttsText: overrides.ttsText, ttsText: overrides.ttsText,
hasDirective,
overrides, overrides,
warnings, warnings,
}; };
@@ -1156,13 +1215,17 @@ export async function maybeApplyTtsToPayload(params: {
cfg: ClawdbotConfig; cfg: ClawdbotConfig;
channel?: string; channel?: string;
kind?: "tool" | "block" | "final"; kind?: "tool" | "block" | "final";
inboundAudio?: boolean;
ttsAuto?: string;
}): Promise<ReplyPayload> { }): Promise<ReplyPayload> {
const config = resolveTtsConfig(params.cfg); const config = resolveTtsConfig(params.cfg);
const prefsPath = resolveTtsPrefsPath(config); const prefsPath = resolveTtsPrefsPath(config);
if (!isTtsEnabled(config, prefsPath)) return params.payload; const autoMode = resolveTtsAutoMode({
config,
const mode = config.mode ?? "final"; prefsPath,
if (mode === "final" && params.kind && params.kind !== "final") return params.payload; sessionAuto: params.ttsAuto,
});
if (autoMode === "off") return params.payload;
const text = params.payload.text ?? ""; const text = params.payload.text ?? "";
const directives = parseTtsDirectives(text, config.modelOverrides); const directives = parseTtsDirectives(text, config.modelOverrides);
@@ -1183,6 +1246,12 @@ export async function maybeApplyTtsToPayload(params: {
text: visibleText.length > 0 ? visibleText : undefined, text: visibleText.length > 0 ? visibleText : undefined,
}; };
if (autoMode === "tagged" && !directives.hasDirective) return nextPayload;
if (autoMode === "inbound" && params.inboundAudio !== true) return nextPayload;
const mode = config.mode ?? "final";
if (mode === "final" && params.kind && params.kind !== "final") return nextPayload;
if (!ttsText.trim()) return nextPayload; if (!ttsText.trim()) return nextPayload;
if (params.payload.mediaUrl || (params.payload.mediaUrls?.length ?? 0) > 0) return nextPayload; if (params.payload.mediaUrl || (params.payload.mediaUrls?.length ?? 0) > 0) return nextPayload;
if (text.includes("MEDIA:")) return nextPayload; if (text.includes("MEDIA:")) return nextPayload;
@@ -1197,7 +1266,7 @@ export async function maybeApplyTtsToPayload(params: {
logVerbose( logVerbose(
`TTS: skipping long text (${textForAudio.length} > ${maxLength}), summarization disabled.`, `TTS: skipping long text (${textForAudio.length} > ${maxLength}), summarization disabled.`,
); );
return params.payload; return nextPayload;
} }
try { try {
@@ -1219,7 +1288,7 @@ export async function maybeApplyTtsToPayload(params: {
} catch (err) { } catch (err) {
const error = err as Error; const error = err as Error;
logVerbose(`TTS: summarization failed: ${error.message}`); logVerbose(`TTS: summarization failed: ${error.message}`);
return params.payload; return nextPayload;
} }
} }

18
src/types/node-edge-tts.d.ts vendored Normal file
View File

@@ -0,0 +1,18 @@
declare module "node-edge-tts" {
export type EdgeTTSOptions = {
voice?: string;
lang?: string;
outputFormat?: string;
saveSubtitles?: boolean;
proxy?: string;
rate?: string;
pitch?: string;
volume?: string;
timeout?: number;
};
export class EdgeTTS {
constructor(options?: EdgeTTSOptions);
ttsPromise(text: string, outputPath: string): Promise<void>;
}
}

View File

@@ -127,9 +127,9 @@ describe("web inbound media saves with extension", () => {
realSock.ev.emit("messages.upsert", upsert); realSock.ev.emit("messages.upsert", upsert);
// Allow a brief window for the async handler to fire on slower hosts. // Allow a brief window for the async handler to fire on slower hosts.
for (let i = 0; i < 10; i++) { for (let i = 0; i < 50; i++) {
if (onMessage.mock.calls.length > 0) break; if (onMessage.mock.calls.length > 0) break;
await new Promise((resolve) => setTimeout(resolve, 5)); await new Promise((resolve) => setTimeout(resolve, 10));
} }
expect(onMessage).toHaveBeenCalledTimes(1); expect(onMessage).toHaveBeenCalledTimes(1);
@@ -178,9 +178,9 @@ describe("web inbound media saves with extension", () => {
realSock.ev.emit("messages.upsert", upsert); realSock.ev.emit("messages.upsert", upsert);
for (let i = 0; i < 10; i++) { for (let i = 0; i < 50; i++) {
if (onMessage.mock.calls.length > 0) break; if (onMessage.mock.calls.length > 0) break;
await new Promise((resolve) => setTimeout(resolve, 5)); await new Promise((resolve) => setTimeout(resolve, 10));
} }
expect(onMessage).toHaveBeenCalledTimes(1); expect(onMessage).toHaveBeenCalledTimes(1);
@@ -218,9 +218,9 @@ describe("web inbound media saves with extension", () => {
realSock.ev.emit("messages.upsert", upsert); realSock.ev.emit("messages.upsert", upsert);
for (let i = 0; i < 10; i++) { for (let i = 0; i < 50; i++) {
if (onMessage.mock.calls.length > 0) break; if (onMessage.mock.calls.length > 0) break;
await new Promise((resolve) => setTimeout(resolve, 5)); await new Promise((resolve) => setTimeout(resolve, 10));
} }
expect(onMessage).toHaveBeenCalledTimes(1); expect(onMessage).toHaveBeenCalledTimes(1);