feat: move TTS into core (#1559) (thanks @Glucksberg)
This commit is contained in:
@@ -1,146 +0,0 @@
|
||||
# Telegram TTS Extension
|
||||
|
||||
Automatic text-to-speech for chat responses using ElevenLabs or OpenAI.
|
||||
|
||||
## Features
|
||||
|
||||
- **Auto-TTS**: Automatically converts all text responses to voice when enabled
|
||||
- **`speak` Tool**: Converts text to speech and sends as voice message
|
||||
- **RPC Methods**: Control TTS via Gateway (`tts.status`, `tts.enable`, `tts.disable`, `tts.convert`, `tts.providers`)
|
||||
- **User Commands**: `/tts_on`, `/tts_off`, `/tts_provider`, `/tts_limit`, `/tts_summary`, `/tts_status`
|
||||
- **Auto-Summarization**: Long texts are automatically summarized before TTS conversion
|
||||
- **Multi-provider**: ElevenLabs and OpenAI TTS with automatic fallback
|
||||
- **Self-contained**: No external CLI dependencies - calls APIs directly
|
||||
|
||||
## Requirements
|
||||
|
||||
- **For TTS**: ElevenLabs API key OR OpenAI API key
|
||||
- **For Auto-Summarization**: OpenAI API key (uses gpt-4o-mini to summarize long texts)
|
||||
|
||||
## Installation
|
||||
|
||||
The extension is bundled with Clawdbot. Enable it in your config:
|
||||
|
||||
```json
|
||||
{
|
||||
"plugins": {
|
||||
"entries": {
|
||||
"telegram-tts": {
|
||||
"enabled": true,
|
||||
"provider": "elevenlabs",
|
||||
"elevenlabs": {
|
||||
"apiKey": "your-api-key"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Or use OpenAI:
|
||||
|
||||
```json
|
||||
{
|
||||
"plugins": {
|
||||
"entries": {
|
||||
"telegram-tts": {
|
||||
"enabled": true,
|
||||
"provider": "openai",
|
||||
"openai": {
|
||||
"apiKey": "your-api-key",
|
||||
"voice": "nova"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Or set API keys via environment variables:
|
||||
|
||||
```bash
|
||||
# For ElevenLabs
|
||||
export ELEVENLABS_API_KEY=your-api-key
|
||||
# or
|
||||
export XI_API_KEY=your-api-key
|
||||
|
||||
# For OpenAI
|
||||
export OPENAI_API_KEY=your-api-key
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
| Option | Type | Default | Description |
|
||||
|--------|------|---------|-------------|
|
||||
| `enabled` | boolean | `false` | Enable the plugin |
|
||||
| `provider` | string | `"openai"` | TTS provider (`elevenlabs` or `openai`) |
|
||||
| `elevenlabs.apiKey` | string | - | ElevenLabs API key |
|
||||
| `elevenlabs.voiceId` | string | `"pMsXgVXv3BLzUgSXRplE"` | ElevenLabs Voice ID |
|
||||
| `elevenlabs.modelId` | string | `"eleven_multilingual_v2"` | ElevenLabs Model ID |
|
||||
| `openai.apiKey` | string | - | OpenAI API key |
|
||||
| `openai.model` | string | `"gpt-4o-mini-tts"` | OpenAI model (`gpt-4o-mini-tts`, `tts-1`, or `tts-1-hd`) |
|
||||
| `openai.voice` | string | `"alloy"` | OpenAI voice |
|
||||
| `prefsPath` | string | `~/clawd/.user-preferences.json` | User preferences file |
|
||||
| `maxTextLength` | number | `4000` | Max characters for TTS |
|
||||
| `timeoutMs` | number | `30000` | API request timeout in milliseconds |
|
||||
|
||||
### OpenAI Voices
|
||||
|
||||
Available voices: `alloy`, `ash`, `coral`, `echo`, `fable`, `onyx`, `nova`, `sage`, `shimmer`
|
||||
|
||||
## Usage
|
||||
|
||||
### Agent Tool
|
||||
|
||||
The agent can use the `speak` tool to send voice messages:
|
||||
|
||||
```
|
||||
User: Send me a voice message saying hello
|
||||
Agent: [calls speak({ text: "Hello! How can I help you today?" })]
|
||||
```
|
||||
|
||||
### RPC Methods
|
||||
|
||||
```bash
|
||||
# Check TTS status
|
||||
clawdbot gateway call tts.status
|
||||
|
||||
# Enable/disable TTS
|
||||
clawdbot gateway call tts.enable
|
||||
clawdbot gateway call tts.disable
|
||||
|
||||
# Convert text to audio
|
||||
clawdbot gateway call tts.convert '{"text": "Hello world"}'
|
||||
|
||||
# List available providers
|
||||
clawdbot gateway call tts.providers
|
||||
```
|
||||
|
||||
### Telegram Commands
|
||||
|
||||
The plugin registers the following commands automatically:
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/tts_on` | Enable auto-TTS for all responses |
|
||||
| `/tts_off` | Disable auto-TTS |
|
||||
| `/tts_provider [openai\|elevenlabs]` | Switch TTS provider (with fallback) |
|
||||
| `/tts_limit [chars]` | Set max text length before summarization (default: 1500) |
|
||||
| `/tts_summary [on\|off]` | Enable/disable auto-summarization for long texts |
|
||||
| `/tts_status` | Show TTS status, config, and last attempt result |
|
||||
|
||||
## Auto-Summarization
|
||||
|
||||
When enabled (default), texts exceeding the configured limit are automatically summarized using OpenAI's gpt-4o-mini before TTS conversion. This ensures long responses can still be converted to audio.
|
||||
|
||||
**Requirements**: OpenAI API key must be configured for summarization to work, even if using ElevenLabs for TTS.
|
||||
|
||||
**Behavior**:
|
||||
- Texts under the limit are converted directly
|
||||
- Texts over the limit are summarized first, then converted
|
||||
- If summarization is disabled (`/tts_summary off`), long texts are skipped (no audio)
|
||||
- After summarization, a hard limit is applied to prevent oversized TTS requests
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
@@ -1,117 +0,0 @@
|
||||
{
|
||||
"id": "telegram-tts",
|
||||
"uiHints": {
|
||||
"enabled": {
|
||||
"label": "Enable TTS",
|
||||
"help": "Automatically convert text responses to voice messages"
|
||||
},
|
||||
"provider": {
|
||||
"label": "TTS Provider",
|
||||
"help": "Choose between ElevenLabs or OpenAI for voice synthesis"
|
||||
},
|
||||
"elevenlabs.apiKey": {
|
||||
"label": "ElevenLabs API Key",
|
||||
"sensitive": true
|
||||
},
|
||||
"elevenlabs.voiceId": {
|
||||
"label": "ElevenLabs Voice ID",
|
||||
"help": "Default: pMsXgVXv3BLzUgSXRplE (Borislav)"
|
||||
},
|
||||
"elevenlabs.modelId": {
|
||||
"label": "ElevenLabs Model ID",
|
||||
"help": "Default: eleven_multilingual_v2"
|
||||
},
|
||||
"openai.apiKey": {
|
||||
"label": "OpenAI API Key",
|
||||
"sensitive": true
|
||||
},
|
||||
"openai.model": {
|
||||
"label": "OpenAI TTS Model",
|
||||
"help": "gpt-4o-mini-tts (recommended)"
|
||||
},
|
||||
"openai.voice": {
|
||||
"label": "OpenAI Voice",
|
||||
"help": "alloy, echo, fable, onyx, nova, or shimmer"
|
||||
},
|
||||
"prefsPath": {
|
||||
"label": "User Preferences File",
|
||||
"help": "Path to JSON file storing TTS state",
|
||||
"advanced": true
|
||||
},
|
||||
"maxTextLength": {
|
||||
"label": "Max Text Length",
|
||||
"help": "Maximum characters to convert to speech",
|
||||
"advanced": true
|
||||
},
|
||||
"timeoutMs": {
|
||||
"label": "Request Timeout (ms)",
|
||||
"help": "Maximum time to wait for TTS API response (default: 30000)",
|
||||
"advanced": true
|
||||
}
|
||||
},
|
||||
"configSchema": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"enabled": {
|
||||
"type": "boolean",
|
||||
"default": false
|
||||
},
|
||||
"provider": {
|
||||
"type": "string",
|
||||
"enum": ["elevenlabs", "openai"],
|
||||
"default": "elevenlabs"
|
||||
},
|
||||
"elevenlabs": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"apiKey": {
|
||||
"type": "string"
|
||||
},
|
||||
"voiceId": {
|
||||
"type": "string",
|
||||
"default": "pMsXgVXv3BLzUgSXRplE"
|
||||
},
|
||||
"modelId": {
|
||||
"type": "string",
|
||||
"default": "eleven_multilingual_v2"
|
||||
}
|
||||
}
|
||||
},
|
||||
"openai": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"apiKey": {
|
||||
"type": "string"
|
||||
},
|
||||
"model": {
|
||||
"type": "string",
|
||||
"enum": ["gpt-4o-mini-tts"],
|
||||
"default": "gpt-4o-mini-tts"
|
||||
},
|
||||
"voice": {
|
||||
"type": "string",
|
||||
"enum": ["alloy", "echo", "fable", "onyx", "nova", "shimmer"],
|
||||
"default": "alloy"
|
||||
}
|
||||
}
|
||||
},
|
||||
"prefsPath": {
|
||||
"type": "string"
|
||||
},
|
||||
"maxTextLength": {
|
||||
"type": "integer",
|
||||
"minimum": 1,
|
||||
"default": 4000
|
||||
},
|
||||
"timeoutMs": {
|
||||
"type": "integer",
|
||||
"minimum": 1000,
|
||||
"maximum": 120000,
|
||||
"default": 30000
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,218 +0,0 @@
|
||||
/**
|
||||
* Unit tests for telegram-tts extension
|
||||
*/
|
||||
|
||||
import { describe, expect, it, vi, beforeEach, afterEach } from "vitest";
|
||||
import { _test, meta } from "./index.js";
|
||||
|
||||
const { isValidVoiceId, isValidOpenAIVoice, isValidOpenAIModel, OPENAI_TTS_MODELS, summarizeText } = _test;
|
||||
|
||||
describe("telegram-tts", () => {
|
||||
describe("meta", () => {
|
||||
it("should have correct plugin metadata", () => {
|
||||
expect(meta.id).toBe("telegram-tts");
|
||||
expect(meta.name).toBe("Telegram TTS");
|
||||
expect(meta.version).toMatch(/^\d+\.\d+\.\d+$/);
|
||||
});
|
||||
});
|
||||
|
||||
describe("isValidVoiceId", () => {
|
||||
it("should accept valid ElevenLabs voice IDs", () => {
|
||||
// Real ElevenLabs voice ID format (20 alphanumeric chars)
|
||||
expect(isValidVoiceId("pMsXgVXv3BLzUgSXRplE")).toBe(true);
|
||||
expect(isValidVoiceId("21m00Tcm4TlvDq8ikWAM")).toBe(true);
|
||||
expect(isValidVoiceId("EXAVITQu4vr4xnSDxMaL")).toBe(true);
|
||||
});
|
||||
|
||||
it("should accept voice IDs of varying valid lengths", () => {
|
||||
expect(isValidVoiceId("a1b2c3d4e5")).toBe(true); // 10 chars (min)
|
||||
expect(isValidVoiceId("a".repeat(40))).toBe(true); // 40 chars (max)
|
||||
});
|
||||
|
||||
it("should reject too short voice IDs", () => {
|
||||
expect(isValidVoiceId("")).toBe(false);
|
||||
expect(isValidVoiceId("abc")).toBe(false);
|
||||
expect(isValidVoiceId("123456789")).toBe(false); // 9 chars
|
||||
});
|
||||
|
||||
it("should reject too long voice IDs", () => {
|
||||
expect(isValidVoiceId("a".repeat(41))).toBe(false);
|
||||
expect(isValidVoiceId("a".repeat(100))).toBe(false);
|
||||
});
|
||||
|
||||
it("should reject voice IDs with invalid characters", () => {
|
||||
expect(isValidVoiceId("pMsXgVXv3BLz-gSXRplE")).toBe(false); // hyphen
|
||||
expect(isValidVoiceId("pMsXgVXv3BLz_gSXRplE")).toBe(false); // underscore
|
||||
expect(isValidVoiceId("pMsXgVXv3BLz gSXRplE")).toBe(false); // space
|
||||
expect(isValidVoiceId("../../../etc/passwd")).toBe(false); // path traversal
|
||||
expect(isValidVoiceId("voice?param=value")).toBe(false); // query string
|
||||
});
|
||||
});
|
||||
|
||||
describe("isValidOpenAIVoice", () => {
|
||||
it("should accept all valid OpenAI voices", () => {
|
||||
const validVoices = ["alloy", "ash", "coral", "echo", "fable", "onyx", "nova", "sage", "shimmer"];
|
||||
for (const voice of validVoices) {
|
||||
expect(isValidOpenAIVoice(voice)).toBe(true);
|
||||
}
|
||||
});
|
||||
|
||||
it("should reject invalid voice names", () => {
|
||||
expect(isValidOpenAIVoice("invalid")).toBe(false);
|
||||
expect(isValidOpenAIVoice("")).toBe(false);
|
||||
expect(isValidOpenAIVoice("ALLOY")).toBe(false); // case sensitive
|
||||
expect(isValidOpenAIVoice("alloy ")).toBe(false); // trailing space
|
||||
expect(isValidOpenAIVoice(" alloy")).toBe(false); // leading space
|
||||
});
|
||||
});
|
||||
|
||||
describe("isValidOpenAIModel", () => {
|
||||
it("should accept gpt-4o-mini-tts model", () => {
|
||||
expect(isValidOpenAIModel("gpt-4o-mini-tts")).toBe(true);
|
||||
});
|
||||
|
||||
it("should reject other models", () => {
|
||||
expect(isValidOpenAIModel("tts-1")).toBe(false);
|
||||
expect(isValidOpenAIModel("tts-1-hd")).toBe(false);
|
||||
expect(isValidOpenAIModel("invalid")).toBe(false);
|
||||
expect(isValidOpenAIModel("")).toBe(false);
|
||||
expect(isValidOpenAIModel("gpt-4")).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe("OPENAI_TTS_MODELS", () => {
|
||||
it("should contain only gpt-4o-mini-tts", () => {
|
||||
expect(OPENAI_TTS_MODELS).toContain("gpt-4o-mini-tts");
|
||||
expect(OPENAI_TTS_MODELS).toHaveLength(1);
|
||||
});
|
||||
|
||||
it("should be a non-empty array", () => {
|
||||
expect(Array.isArray(OPENAI_TTS_MODELS)).toBe(true);
|
||||
expect(OPENAI_TTS_MODELS.length).toBeGreaterThan(0);
|
||||
});
|
||||
});
|
||||
|
||||
describe("summarizeText", () => {
|
||||
const mockApiKey = "test-api-key";
|
||||
const originalFetch = globalThis.fetch;
|
||||
|
||||
beforeEach(() => {
|
||||
vi.useFakeTimers({ shouldAdvanceTime: true });
|
||||
});
|
||||
|
||||
afterEach(() => {
|
||||
globalThis.fetch = originalFetch;
|
||||
vi.useRealTimers();
|
||||
});
|
||||
|
||||
it("should summarize text and return result with metrics", async () => {
|
||||
const mockSummary = "This is a summarized version of the text.";
|
||||
globalThis.fetch = vi.fn().mockResolvedValue({
|
||||
ok: true,
|
||||
json: () => Promise.resolve({
|
||||
choices: [{ message: { content: mockSummary } }],
|
||||
}),
|
||||
});
|
||||
|
||||
const longText = "A".repeat(2000); // Text longer than default limit
|
||||
const result = await summarizeText(longText, 1500, mockApiKey);
|
||||
|
||||
expect(result.summary).toBe(mockSummary);
|
||||
expect(result.inputLength).toBe(2000);
|
||||
expect(result.outputLength).toBe(mockSummary.length);
|
||||
expect(result.latencyMs).toBeGreaterThanOrEqual(0);
|
||||
expect(globalThis.fetch).toHaveBeenCalledTimes(1);
|
||||
});
|
||||
|
||||
it("should call OpenAI API with correct parameters", async () => {
|
||||
globalThis.fetch = vi.fn().mockResolvedValue({
|
||||
ok: true,
|
||||
json: () => Promise.resolve({
|
||||
choices: [{ message: { content: "Summary" } }],
|
||||
}),
|
||||
});
|
||||
|
||||
await summarizeText("Long text to summarize", 500, mockApiKey);
|
||||
|
||||
expect(globalThis.fetch).toHaveBeenCalledWith(
|
||||
"https://api.openai.com/v1/chat/completions",
|
||||
expect.objectContaining({
|
||||
method: "POST",
|
||||
headers: {
|
||||
Authorization: `Bearer ${mockApiKey}`,
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
})
|
||||
);
|
||||
|
||||
const callArgs = (globalThis.fetch as ReturnType<typeof vi.fn>).mock.calls[0];
|
||||
const body = JSON.parse(callArgs[1].body);
|
||||
expect(body.model).toBe("gpt-4o-mini");
|
||||
expect(body.temperature).toBe(0.3);
|
||||
expect(body.max_tokens).toBe(250); // Math.ceil(500 / 2)
|
||||
});
|
||||
|
||||
it("should reject targetLength below minimum (100)", async () => {
|
||||
await expect(summarizeText("text", 99, mockApiKey)).rejects.toThrow(
|
||||
"Invalid targetLength: 99"
|
||||
);
|
||||
});
|
||||
|
||||
it("should reject targetLength above maximum (10000)", async () => {
|
||||
await expect(summarizeText("text", 10001, mockApiKey)).rejects.toThrow(
|
||||
"Invalid targetLength: 10001"
|
||||
);
|
||||
});
|
||||
|
||||
it("should accept targetLength at boundaries", async () => {
|
||||
globalThis.fetch = vi.fn().mockResolvedValue({
|
||||
ok: true,
|
||||
json: () => Promise.resolve({
|
||||
choices: [{ message: { content: "Summary" } }],
|
||||
}),
|
||||
});
|
||||
|
||||
// Min boundary
|
||||
await expect(summarizeText("text", 100, mockApiKey)).resolves.toBeDefined();
|
||||
// Max boundary
|
||||
await expect(summarizeText("text", 10000, mockApiKey)).resolves.toBeDefined();
|
||||
});
|
||||
|
||||
it("should throw error when API returns non-ok response", async () => {
|
||||
globalThis.fetch = vi.fn().mockResolvedValue({
|
||||
ok: false,
|
||||
status: 500,
|
||||
});
|
||||
|
||||
await expect(summarizeText("text", 500, mockApiKey)).rejects.toThrow(
|
||||
"Summarization service unavailable"
|
||||
);
|
||||
});
|
||||
|
||||
it("should throw error when no summary is returned", async () => {
|
||||
globalThis.fetch = vi.fn().mockResolvedValue({
|
||||
ok: true,
|
||||
json: () => Promise.resolve({
|
||||
choices: [],
|
||||
}),
|
||||
});
|
||||
|
||||
await expect(summarizeText("text", 500, mockApiKey)).rejects.toThrow(
|
||||
"No summary returned"
|
||||
);
|
||||
});
|
||||
|
||||
it("should throw error when summary content is empty", async () => {
|
||||
globalThis.fetch = vi.fn().mockResolvedValue({
|
||||
ok: true,
|
||||
json: () => Promise.resolve({
|
||||
choices: [{ message: { content: " " } }], // whitespace only
|
||||
}),
|
||||
});
|
||||
|
||||
await expect(summarizeText("text", 500, mockApiKey)).rejects.toThrow(
|
||||
"No summary returned"
|
||||
);
|
||||
});
|
||||
});
|
||||
});
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,8 +0,0 @@
|
||||
{
|
||||
"name": "@clawdbot/telegram-tts",
|
||||
"version": "0.3.0",
|
||||
"private": true,
|
||||
"description": "Text-to-speech for chat responses using ElevenLabs or OpenAI",
|
||||
"main": "index.ts",
|
||||
"keywords": ["clawdbot", "tts", "elevenlabs", "openai", "telegram", "voice"]
|
||||
}
|
||||
Reference in New Issue
Block a user