- Add latency metrics to summarizeText and textToSpeech functions - Add /tts_status command showing config and last attempt result - Add /tts_summary command for feature flag control - Fix atomic write to clean up temp file on rename failure - Add timer.unref() to prevent blocking process shutdown - Add unit tests for validation functions (13 tests) - Update README with new commands and features Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4.2 KiB
4.2 KiB
Telegram TTS Extension
Automatic text-to-speech for chat responses using ElevenLabs or OpenAI.
Features
- Auto-TTS: Automatically converts all text responses to voice when enabled
speakTool: Converts text to speech and sends as voice message- RPC Methods: Control TTS via Gateway (
tts.status,tts.enable,tts.disable,tts.convert,tts.providers) - User Commands:
/tts_on,/tts_off,/tts_provider,/tts_limit,/tts_summary,/tts_status - Auto-Summarization: Long texts are automatically summarized before TTS conversion
- Multi-provider: ElevenLabs and OpenAI TTS with automatic fallback
- Self-contained: No external CLI dependencies - calls APIs directly
Requirements
- For TTS: ElevenLabs API key OR OpenAI API key
- For Auto-Summarization: OpenAI API key (uses gpt-4o-mini to summarize long texts)
Installation
The extension is bundled with Clawdbot. Enable it in your config:
{
"plugins": {
"entries": {
"telegram-tts": {
"enabled": true,
"provider": "elevenlabs",
"elevenlabs": {
"apiKey": "your-api-key"
}
}
}
}
}
Or use OpenAI:
{
"plugins": {
"entries": {
"telegram-tts": {
"enabled": true,
"provider": "openai",
"openai": {
"apiKey": "your-api-key",
"voice": "nova"
}
}
}
}
}
Or set API keys via environment variables:
# For ElevenLabs
export ELEVENLABS_API_KEY=your-api-key
# or
export XI_API_KEY=your-api-key
# For OpenAI
export OPENAI_API_KEY=your-api-key
Configuration
| Option | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | false |
Enable the plugin |
provider |
string | "openai" |
TTS provider (elevenlabs or openai) |
elevenlabs.apiKey |
string | - | ElevenLabs API key |
elevenlabs.voiceId |
string | "pMsXgVXv3BLzUgSXRplE" |
ElevenLabs Voice ID |
elevenlabs.modelId |
string | "eleven_multilingual_v2" |
ElevenLabs Model ID |
openai.apiKey |
string | - | OpenAI API key |
openai.model |
string | "gpt-4o-mini-tts" |
OpenAI model (gpt-4o-mini-tts, tts-1, or tts-1-hd) |
openai.voice |
string | "alloy" |
OpenAI voice |
prefsPath |
string | ~/clawd/.user-preferences.json |
User preferences file |
maxTextLength |
number | 4000 |
Max characters for TTS |
timeoutMs |
number | 30000 |
API request timeout in milliseconds |
OpenAI Voices
Available voices: alloy, ash, coral, echo, fable, onyx, nova, sage, shimmer
Usage
Agent Tool
The agent can use the speak tool to send voice messages:
User: Send me a voice message saying hello
Agent: [calls speak({ text: "Hello! How can I help you today?" })]
RPC Methods
# Check TTS status
clawdbot gateway call tts.status
# Enable/disable TTS
clawdbot gateway call tts.enable
clawdbot gateway call tts.disable
# Convert text to audio
clawdbot gateway call tts.convert '{"text": "Hello world"}'
# List available providers
clawdbot gateway call tts.providers
Telegram Commands
The plugin registers the following commands automatically:
| Command | Description |
|---|---|
/tts_on |
Enable auto-TTS for all responses |
/tts_off |
Disable auto-TTS |
/tts_provider [openai|elevenlabs] |
Switch TTS provider (with fallback) |
/tts_limit [chars] |
Set max text length before summarization (default: 1500) |
/tts_summary [on|off] |
Enable/disable auto-summarization for long texts |
/tts_status |
Show TTS status, config, and last attempt result |
Auto-Summarization
When enabled (default), texts exceeding the configured limit are automatically summarized using OpenAI's gpt-4o-mini before TTS conversion. This ensures long responses can still be converted to audio.
Requirements: OpenAI API key must be configured for summarization to work, even if using ElevenLabs for TTS.
Behavior:
- Texts under the limit are converted directly
- Texts over the limit are summarized first, then converted
- If summarization is disabled (
/tts_summary off), long texts are skipped (no audio) - After summarization, a hard limit is applied to prevent oversized TTS requests
License
MIT