let5see/AI-Video

Fork 0

Files

puke 136514e466 项目重命名: ReelForge => Pixelle-Video

2025-11-07 16:59:12 +08:00

7.2 KiB

Raw Blame History

Pixelle-Video Capabilities Guide

Complete guide to using LLM, TTS, and Image generation capabilities

Overview

Pixelle-Video provides three core AI capabilities:

LLM: Text generation using LiteLLM (supports 100+ models)
TTS: Text-to-speech using Edge TTS (free, 400+ voices)
Image: Image generation using ComfyKit (local or cloud)

Quick Start

from pixelle_video.service import pixelle_video

# LLM - Generate text
answer = await pixelle_video.llm("Summarize 'Atomic Habits' in 3 sentences")

# TTS - Generate speech
audio_path = await pixelle_video.tts("Hello, world!")

# Image - Generate images
image_url = await pixelle_video.image(
    workflow="workflows/book_cover_simple.json",
    prompt="minimalist book cover design"
)

1. LLM (Large Language Model)

Configuration

Edit config.yaml:

llm:
  default: qwen  # Choose: qwen, openai, deepseek, ollama
  
  qwen:
    api_key: "your-dashscope-api-key"
    base_url: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    model: "openai/qwen-max"
  
  openai:
    api_key: "your-openai-api-key"
    model: "gpt-4"
  
  deepseek:
    api_key: "your-deepseek-api-key"
    base_url: "https://api.deepseek.com"
    model: "openai/deepseek-chat"
  
  ollama:
    base_url: "http://localhost:11434"
    model: "ollama/llama3.2"

Usage

# Basic usage
answer = await pixelle_video.llm("What is machine learning?")

# With parameters
answer = await pixelle_video.llm(
    prompt="Explain atomic habits",
    temperature=0.7,  # 0.0-2.0 (lower = more deterministic)
    max_tokens=2000
)

Environment Variables (Alternative)

Instead of config.yaml, you can use environment variables:

# Qwen
export DASHSCOPE_API_KEY="your-key"

# OpenAI
export OPENAI_API_KEY="your-key"

# DeepSeek
export DEEPSEEK_API_KEY="your-key"

2. TTS (Text-to-Speech)

Configuration

Edit config.yaml:

tts:
  default: edge
  
  edge:
    # No configuration needed - free to use!

Usage

# Basic usage (auto-generates temp path)
audio_path = await pixelle_video.tts("Hello, world!")
# Returns: "temp/abc123def456.mp3"

# With Chinese text
audio_path = await pixelle_video.tts(
    text="你好，世界！",
    voice="zh-CN-YunjianNeural"
)

# With custom parameters
audio_path = await pixelle_video.tts(
    text="Welcome to Pixelle-Video",
    voice="en-US-JennyNeural",
    rate="+20%",  # Speed: +50% = faster, -20% = slower
    volume="+0%",
    pitch="+0Hz"
)

# Specify output path
audio_path = await pixelle_video.tts(
    text="Hello",
    output_path="output/greeting.mp3"
)

Popular Voices

Chinese:

zh-CN-YunjianNeural (male, default)
zh-CN-XiaoxiaoNeural (female)
zh-CN-YunxiNeural (male)
zh-CN-XiaoyiNeural (female)

English:

en-US-JennyNeural (female)
en-US-GuyNeural (male)
en-GB-SoniaNeural (female, British)

List All Voices

# Get all available voices
voices = await pixelle_video.tts.list_voices()

# Get Chinese voices only
voices = await pixelle_video.tts.list_voices(locale="zh-CN")

# Get English voices only
voices = await pixelle_video.tts.list_voices(locale="en-US")

3. Image Generation

Configuration

Edit config.yaml:

image:
  default: comfykit
  
  comfykit:
    # Local ComfyUI (optional, default: http://127.0.0.1:8188)
    comfyui_url: "http://127.0.0.1:8188"
    
    # RunningHub cloud (optional)
    runninghub_api_key: "rh-key-xxx"

Usage

# Basic usage (local ComfyUI)
image_url = await pixelle_video.image(
    workflow="workflows/book_cover_simple.json",
    prompt="minimalist book cover design, blue and white"
)

# With full parameters
image_url = await pixelle_video.image(
    workflow="workflows/book_cover_simple.json",
    prompt="book cover for 'Atomic Habits', professional, minimalist",
    negative_prompt="ugly, blurry, low quality",
    width=1024,
    height=1536,
    steps=20,
    seed=42
)

# Using RunningHub cloud
image_url = await pixelle_video.image(
    workflow="12345",  # RunningHub workflow ID
    prompt="a beautiful landscape"
)

# Check available workflows
workflows = pixelle_video.image.list_workflows()
print(f"Available workflows: {workflows}")

Environment Variables (Alternative)

# Local ComfyUI
export COMFYUI_BASE_URL="http://127.0.0.1:8188"

# RunningHub cloud
export RUNNINGHUB_API_KEY="rh-key-xxx"

Workflow DSL

Pixelle-Video uses ComfyKit's DSL for workflow parameters:

{
  "6": {
    "class_type": "CLIPTextEncode",
    "_meta": {
      "title": "$prompt!"
    },
    "inputs": {
      "text": "default prompt",
      "clip": ["4", 1]
    }
  }
}

DSL Markers:

$param! - Required parameter
$param - Optional parameter
$param~ - Upload parameter (for images/audio/video)
$output.name - Output variable

Combined Workflow Example

Generate a complete book cover with narration:

import asyncio
from pixelle_video.service import pixelle_video

async def create_book_content(book_title, author):
    """Generate book summary, audio, and cover image"""
    
    # 1. Generate book summary with LLM
    summary = await pixelle_video.llm(
        prompt=f"Write a compelling 2-sentence summary for a book titled '{book_title}' by {author}",
        max_tokens=100
    )
    print(f"Summary: {summary}")
    
    # 2. Generate audio narration with TTS
    audio_path = await pixelle_video.tts(
        text=summary,
        voice="en-US-JennyNeural"
    )
    print(f"Audio: {audio_path}")
    
    # 3. Generate book cover image
    image_url = await pixelle_video.image(
        workflow="workflows/book_cover_simple.json",
        prompt=f"book cover for '{book_title}' by {author}, professional, modern design",
        width=1024,
        height=1536
    )
    print(f"Cover: {image_url}")
    
    return {
        "summary": summary,
        "audio": audio_path,
        "cover": image_url
    }

# Run
result = asyncio.run(create_book_content("Atomic Habits", "James Clear"))

Troubleshooting

LLM Issues

"API key not found"

Make sure you've set the API key in config.yaml or environment variables
For Qwen: DASHSCOPE_API_KEY
For OpenAI: OPENAI_API_KEY
For DeepSeek: DEEPSEEK_API_KEY

"Connection error"

Check base_url in config
Verify API endpoint is accessible
For Ollama, make sure server is running (ollama serve)

TTS Issues

"SSL error"

Edge TTS is free but requires internet connection
SSL verification is disabled by default for development

Image Issues

"ComfyUI connection refused"

Make sure ComfyUI is running at http://127.0.0.1:8188
Or configure RunningHub API key for cloud execution

"Workflow file not found"

Check workflow path is correct
Use relative path from project root: workflows/your_workflow.json

"No images generated"

Check workflow has SaveImage node
Verify workflow parameters are correct
Check ComfyUI logs for errors

Next Steps

See /examples/ directory for complete examples
Run python test_integration.py to test all capabilities
Create custom workflows in /workflows/ directory
Check ComfyKit documentation: https://puke3615.github.io/ComfyKit

Happy creating with Pixelle-Video! 📚🎬

7.2 KiB Raw Blame History

Pixelle-Video Capabilities Guide

Overview

Quick Start

1. LLM (Large Language Model)

Configuration

Usage

Environment Variables (Alternative)

2. TTS (Text-to-Speech)

Configuration

Usage

Popular Voices

List All Voices

3. Image Generation

Configuration

Usage

Environment Variables (Alternative)

Workflow DSL

Combined Workflow Example

Troubleshooting

LLM Issues

TTS Issues

Image Issues

Next Steps

7.2 KiB

Raw Blame History