7.2 KiB
7.2 KiB
Pixelle-Video Capabilities Guide
Complete guide to using LLM, TTS, and Image generation capabilities
Overview
Pixelle-Video provides three core AI capabilities:
- LLM: Text generation using LiteLLM (supports 100+ models)
- TTS: Text-to-speech using Edge TTS (free, 400+ voices)
- Image: Image generation using ComfyKit (local or cloud)
Quick Start
from pixelle_video.service import pixelle_video
# LLM - Generate text
answer = await pixelle_video.llm("Summarize 'Atomic Habits' in 3 sentences")
# TTS - Generate speech
audio_path = await pixelle_video.tts("Hello, world!")
# Image - Generate images
image_url = await pixelle_video.image(
workflow="workflows/book_cover_simple.json",
prompt="minimalist book cover design"
)
1. LLM (Large Language Model)
Configuration
Edit config.yaml:
llm:
default: qwen # Choose: qwen, openai, deepseek, ollama
qwen:
api_key: "your-dashscope-api-key"
base_url: "https://dashscope.aliyuncs.com/compatible-mode/v1"
model: "openai/qwen-max"
openai:
api_key: "your-openai-api-key"
model: "gpt-4"
deepseek:
api_key: "your-deepseek-api-key"
base_url: "https://api.deepseek.com"
model: "openai/deepseek-chat"
ollama:
base_url: "http://localhost:11434"
model: "ollama/llama3.2"
Usage
# Basic usage
answer = await pixelle_video.llm("What is machine learning?")
# With parameters
answer = await pixelle_video.llm(
prompt="Explain atomic habits",
temperature=0.7, # 0.0-2.0 (lower = more deterministic)
max_tokens=2000
)
Environment Variables (Alternative)
Instead of config.yaml, you can use environment variables:
# Qwen
export DASHSCOPE_API_KEY="your-key"
# OpenAI
export OPENAI_API_KEY="your-key"
# DeepSeek
export DEEPSEEK_API_KEY="your-key"
2. TTS (Text-to-Speech)
Configuration
Edit config.yaml:
tts:
default: edge
edge:
# No configuration needed - free to use!
Usage
# Basic usage (auto-generates temp path)
audio_path = await pixelle_video.tts("Hello, world!")
# Returns: "temp/abc123def456.mp3"
# With Chinese text
audio_path = await pixelle_video.tts(
text="你好,世界!",
voice="zh-CN-YunjianNeural"
)
# With custom parameters
audio_path = await pixelle_video.tts(
text="Welcome to Pixelle-Video",
voice="en-US-JennyNeural",
rate="+20%", # Speed: +50% = faster, -20% = slower
volume="+0%",
pitch="+0Hz"
)
# Specify output path
audio_path = await pixelle_video.tts(
text="Hello",
output_path="output/greeting.mp3"
)
Popular Voices
Chinese:
zh-CN-YunjianNeural(male, default)zh-CN-XiaoxiaoNeural(female)zh-CN-YunxiNeural(male)zh-CN-XiaoyiNeural(female)
English:
en-US-JennyNeural(female)en-US-GuyNeural(male)en-GB-SoniaNeural(female, British)
List All Voices
# Get all available voices
voices = await pixelle_video.tts.list_voices()
# Get Chinese voices only
voices = await pixelle_video.tts.list_voices(locale="zh-CN")
# Get English voices only
voices = await pixelle_video.tts.list_voices(locale="en-US")
3. Image Generation
Configuration
Edit config.yaml:
image:
default: comfykit
comfykit:
# Local ComfyUI (optional, default: http://127.0.0.1:8188)
comfyui_url: "http://127.0.0.1:8188"
# RunningHub cloud (optional)
runninghub_api_key: "rh-key-xxx"
Usage
# Basic usage (local ComfyUI)
image_url = await pixelle_video.image(
workflow="workflows/book_cover_simple.json",
prompt="minimalist book cover design, blue and white"
)
# With full parameters
image_url = await pixelle_video.image(
workflow="workflows/book_cover_simple.json",
prompt="book cover for 'Atomic Habits', professional, minimalist",
negative_prompt="ugly, blurry, low quality",
width=1024,
height=1536,
steps=20,
seed=42
)
# Using RunningHub cloud
image_url = await pixelle_video.image(
workflow="12345", # RunningHub workflow ID
prompt="a beautiful landscape"
)
# Check available workflows
workflows = pixelle_video.image.list_workflows()
print(f"Available workflows: {workflows}")
Environment Variables (Alternative)
# Local ComfyUI
export COMFYUI_BASE_URL="http://127.0.0.1:8188"
# RunningHub cloud
export RUNNINGHUB_API_KEY="rh-key-xxx"
Workflow DSL
Pixelle-Video uses ComfyKit's DSL for workflow parameters:
{
"6": {
"class_type": "CLIPTextEncode",
"_meta": {
"title": "$prompt!"
},
"inputs": {
"text": "default prompt",
"clip": ["4", 1]
}
}
}
DSL Markers:
$param!- Required parameter$param- Optional parameter$param~- Upload parameter (for images/audio/video)$output.name- Output variable
Combined Workflow Example
Generate a complete book cover with narration:
import asyncio
from pixelle_video.service import pixelle_video
async def create_book_content(book_title, author):
"""Generate book summary, audio, and cover image"""
# 1. Generate book summary with LLM
summary = await pixelle_video.llm(
prompt=f"Write a compelling 2-sentence summary for a book titled '{book_title}' by {author}",
max_tokens=100
)
print(f"Summary: {summary}")
# 2. Generate audio narration with TTS
audio_path = await pixelle_video.tts(
text=summary,
voice="en-US-JennyNeural"
)
print(f"Audio: {audio_path}")
# 3. Generate book cover image
image_url = await pixelle_video.image(
workflow="workflows/book_cover_simple.json",
prompt=f"book cover for '{book_title}' by {author}, professional, modern design",
width=1024,
height=1536
)
print(f"Cover: {image_url}")
return {
"summary": summary,
"audio": audio_path,
"cover": image_url
}
# Run
result = asyncio.run(create_book_content("Atomic Habits", "James Clear"))
Troubleshooting
LLM Issues
"API key not found"
- Make sure you've set the API key in
config.yamlor environment variables - For Qwen:
DASHSCOPE_API_KEY - For OpenAI:
OPENAI_API_KEY - For DeepSeek:
DEEPSEEK_API_KEY
"Connection error"
- Check
base_urlin config - Verify API endpoint is accessible
- For Ollama, make sure server is running (
ollama serve)
TTS Issues
"SSL error"
- Edge TTS is free but requires internet connection
- SSL verification is disabled by default for development
Image Issues
"ComfyUI connection refused"
- Make sure ComfyUI is running at http://127.0.0.1:8188
- Or configure RunningHub API key for cloud execution
"Workflow file not found"
- Check workflow path is correct
- Use relative path from project root:
workflows/your_workflow.json
"No images generated"
- Check workflow has
SaveImagenode - Verify workflow parameters are correct
- Check ComfyUI logs for errors
Next Steps
- See
/examples/directory for complete examples - Run
python test_integration.pyto test all capabilities - Create custom workflows in
/workflows/directory - Check ComfyKit documentation: https://puke3615.github.io/ComfyKit
Happy creating with Pixelle-Video! 📚🎬