Files

Glucksberg 104d977d12 feat(telegram-tts): add latency logging, status tracking, and unit tests

- Add latency metrics to summarizeText and textToSpeech functions
- Add /tts_status command showing config and last attempt result
- Add /tts_summary command for feature flag control
- Fix atomic write to clean up temp file on rename failure
- Add timer.unref() to prevent blocking process shutdown
- Add unit tests for validation functions (13 tests)
- Update README with new commands and features

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-24 08:00:44 +00:00

4.2 KiB

Raw Blame History

Telegram TTS Extension

Automatic text-to-speech for chat responses using ElevenLabs or OpenAI.

Features

Auto-TTS: Automatically converts all text responses to voice when enabled
speak Tool: Converts text to speech and sends as voice message
RPC Methods: Control TTS via Gateway (tts.status, tts.enable, tts.disable, tts.convert, tts.providers)
User Commands: /tts_on, /tts_off, /tts_provider, /tts_limit, /tts_summary, /tts_status
Auto-Summarization: Long texts are automatically summarized before TTS conversion
Multi-provider: ElevenLabs and OpenAI TTS with automatic fallback
Self-contained: No external CLI dependencies - calls APIs directly

Requirements

For TTS: ElevenLabs API key OR OpenAI API key
For Auto-Summarization: OpenAI API key (uses gpt-4o-mini to summarize long texts)

Installation

The extension is bundled with Clawdbot. Enable it in your config:

{
  "plugins": {
    "entries": {
      "telegram-tts": {
        "enabled": true,
        "provider": "elevenlabs",
        "elevenlabs": {
          "apiKey": "your-api-key"
        }
      }
    }
  }
}

Or use OpenAI:

{
  "plugins": {
    "entries": {
      "telegram-tts": {
        "enabled": true,
        "provider": "openai",
        "openai": {
          "apiKey": "your-api-key",
          "voice": "nova"
        }
      }
    }
  }
}

Or set API keys via environment variables:

# For ElevenLabs
export ELEVENLABS_API_KEY=your-api-key
# or
export XI_API_KEY=your-api-key

# For OpenAI
export OPENAI_API_KEY=your-api-key

Configuration

Option	Type	Default	Description
`enabled`	boolean	`false`	Enable the plugin
`provider`	string	`"openai"`	TTS provider (`elevenlabs` or `openai`)
`elevenlabs.apiKey`	string	-	ElevenLabs API key
`elevenlabs.voiceId`	string	`"pMsXgVXv3BLzUgSXRplE"`	ElevenLabs Voice ID
`elevenlabs.modelId`	string	`"eleven_multilingual_v2"`	ElevenLabs Model ID
`openai.apiKey`	string	-	OpenAI API key
`openai.model`	string	`"gpt-4o-mini-tts"`	OpenAI model (`gpt-4o-mini-tts`, `tts-1`, or `tts-1-hd`)
`openai.voice`	string	`"alloy"`	OpenAI voice
`prefsPath`	string	`~/clawd/.user-preferences.json`	User preferences file
`maxTextLength`	number	`4000`	Max characters for TTS
`timeoutMs`	number	`30000`	API request timeout in milliseconds

OpenAI Voices

Available voices: alloy, ash, coral, echo, fable, onyx, nova, sage, shimmer

Usage

Agent Tool

The agent can use the speak tool to send voice messages:

User: Send me a voice message saying hello
Agent: [calls speak({ text: "Hello! How can I help you today?" })]

RPC Methods

# Check TTS status
clawdbot gateway call tts.status

# Enable/disable TTS
clawdbot gateway call tts.enable
clawdbot gateway call tts.disable

# Convert text to audio
clawdbot gateway call tts.convert '{"text": "Hello world"}'

# List available providers
clawdbot gateway call tts.providers

Telegram Commands

The plugin registers the following commands automatically:

Command	Description
`/tts_on`	Enable auto-TTS for all responses
`/tts_off`	Disable auto-TTS
`/tts_provider [openai\|elevenlabs]`	Switch TTS provider (with fallback)
`/tts_limit [chars]`	Set max text length before summarization (default: 1500)
`/tts_summary [on\|off]`	Enable/disable auto-summarization for long texts
`/tts_status`	Show TTS status, config, and last attempt result

Auto-Summarization

When enabled (default), texts exceeding the configured limit are automatically summarized using OpenAI's gpt-4o-mini before TTS conversion. This ensures long responses can still be converted to audio.

Requirements: OpenAI API key must be configured for summarization to work, even if using ElevenLabs for TTS.

Behavior:

Texts under the limit are converted directly
Texts over the limit are summarized first, then converted
If summarization is disabled (/tts_summary off), long texts are skipped (no audio)
After summarization, a hard limit is applied to prevent oversized TTS requests

License

MIT

4.2 KiB Raw Blame History