feat(telegram-tts): add latency logging, status tracking, and unit tests

- Add latency metrics to summarizeText and textToSpeech functions - Add /tts_status command showing config and last attempt result - Add /tts_summary command for feature flag control - Fix atomic write to clean up temp file on rename failure - Add timer.unref() to prevent blocking process shutdown - Add unit tests for validation functions (13 tests) - Update README with new commands and features Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 03:12:37 +00:00
parent 4b24753be7
commit 104d977d12
3 changed files with 361 additions and 52 deletions
--- a/extensions/telegram-tts/README.md
+++ b/extensions/telegram-tts/README.md
@@ -4,15 +4,18 @@ Automatic text-to-speech for chat responses using ElevenLabs or OpenAI.

 ## Features

+- **Auto-TTS**: Automatically converts all text responses to voice when enabled
 - **`speak` Tool**: Converts text to speech and sends as voice message
 - **RPC Methods**: Control TTS via Gateway (`tts.status`, `tts.enable`, `tts.disable`, `tts.convert`, `tts.providers`)
- **User Preferences**: Persistent TTS state via JSON file
- **Multi-provider**: ElevenLabs and OpenAI TTS support
+- **User Commands**: `/tts_on`, `/tts_off`, `/tts_provider`, `/tts_limit`, `/tts_summary`, `/tts_status`
+- **Auto-Summarization**: Long texts are automatically summarized before TTS conversion
+- **Multi-provider**: ElevenLabs and OpenAI TTS with automatic fallback
 - **Self-contained**: No external CLI dependencies - calls APIs directly

 ## Requirements

- ElevenLabs API key OR OpenAI API key
+- **For TTS**: ElevenLabs API key OR OpenAI API key
+- **For Auto-Summarization**: OpenAI API key (uses gpt-4o-mini to summarize long texts)

 ## Installation

@@ -70,19 +73,20 @@ export OPENAI_API_KEY=your-api-key
 | Option | Type | Default | Description |
 |--------|------|---------|-------------|
 | `enabled` | boolean | `false` | Enable the plugin |
-| `provider` | string | `"elevenlabs"` | TTS provider (`elevenlabs` or `openai`) |
+| `provider` | string | `"openai"` | TTS provider (`elevenlabs` or `openai`) |
 | `elevenlabs.apiKey` | string | - | ElevenLabs API key |
 | `elevenlabs.voiceId` | string | `"pMsXgVXv3BLzUgSXRplE"` | ElevenLabs Voice ID |
 | `elevenlabs.modelId` | string | `"eleven_multilingual_v2"` | ElevenLabs Model ID |
 | `openai.apiKey` | string | - | OpenAI API key |
-| `openai.model` | string | `"tts-1"` | OpenAI model (`tts-1` or `tts-1-hd`) |
+| `openai.model` | string | `"gpt-4o-mini-tts"` | OpenAI model (`gpt-4o-mini-tts`, `tts-1`, or `tts-1-hd`) |
 | `openai.voice` | string | `"alloy"` | OpenAI voice |
 | `prefsPath` | string | `~/clawd/.user-preferences.json` | User preferences file |
 | `maxTextLength` | number | `4000` | Max characters for TTS |
+| `timeoutMs` | number | `30000` | API request timeout in milliseconds |

 ### OpenAI Voices

-Available voices: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`
+Available voices: `alloy`, `ash`, `coral`, `echo`, `fable`, `onyx`, `nova`, `sage`, `shimmer`

 ## Usage

@@ -114,23 +118,28 @@ clawdbot gateway call tts.providers

 ### Telegram Commands

-Add custom commands to toggle TTS mode:
+The plugin registers the following commands automatically:

-```json
-{
-  "channels": {
-    "telegram": {
-      "customCommands": [
-        {"command": "tts_on", "description": "Enable voice responses"},
-        {"command": "tts_off", "description": "Disable voice responses"},
-        {"command": "audio", "description": "Send response as voice message"}
-      ]
-    }
-  }
-}
-```
+| Command | Description |
+|---------|-------------|
+| `/tts_on` | Enable auto-TTS for all responses |
+| `/tts_off` | Disable auto-TTS |
+| `/tts_provider [openai\|elevenlabs]` | Switch TTS provider (with fallback) |
+| `/tts_limit [chars]` | Set max text length before summarization (default: 1500) |
+| `/tts_summary [on\|off]` | Enable/disable auto-summarization for long texts |
+| `/tts_status` | Show TTS status, config, and last attempt result |

-Then add handling instructions to your agent workspace (CLAUDE.md or TOOLS.md).
+## Auto-Summarization
+
+When enabled (default), texts exceeding the configured limit are automatically summarized using OpenAI's gpt-4o-mini before TTS conversion. This ensures long responses can still be converted to audio.
+
+**Requirements**: OpenAI API key must be configured for summarization to work, even if using ElevenLabs for TTS.
+
+**Behavior**:
+- Texts under the limit are converted directly
+- Texts over the limit are summarized first, then converted
+- If summarization is disabled (`/tts_summary off`), long texts are skipped (no audio)
+- After summarization, a hard limit is applied to prevent oversized TTS requests

 ## License