Files
AI-Video/docs/capabilities-guide.md
2025-11-07 16:59:12 +08:00

7.1 KiB

ReelForge Capabilities Guide

Complete guide to using LLM, TTS, and Image generation capabilities

Overview

ReelForge provides three core AI capabilities:

  • LLM: Text generation using LiteLLM (supports 100+ models)
  • TTS: Text-to-speech using Edge TTS (free, 400+ voices)
  • Image: Image generation using ComfyKit (local or cloud)

Quick Start

from reelforge.service import reelforge

# LLM - Generate text
answer = await reelforge.llm("Summarize 'Atomic Habits' in 3 sentences")

# TTS - Generate speech
audio_path = await reelforge.tts("Hello, world!")

# Image - Generate images
image_url = await reelforge.image(
    workflow="workflows/book_cover_simple.json",
    prompt="minimalist book cover design"
)

1. LLM (Large Language Model)

Configuration

Edit config.yaml:

llm:
  default: qwen  # Choose: qwen, openai, deepseek, ollama
  
  qwen:
    api_key: "your-dashscope-api-key"
    base_url: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    model: "openai/qwen-max"
  
  openai:
    api_key: "your-openai-api-key"
    model: "gpt-4"
  
  deepseek:
    api_key: "your-deepseek-api-key"
    base_url: "https://api.deepseek.com"
    model: "openai/deepseek-chat"
  
  ollama:
    base_url: "http://localhost:11434"
    model: "ollama/llama3.2"

Usage

# Basic usage
answer = await reelforge.llm("What is machine learning?")

# With parameters
answer = await reelforge.llm(
    prompt="Explain atomic habits",
    temperature=0.7,  # 0.0-2.0 (lower = more deterministic)
    max_tokens=2000
)

Environment Variables (Alternative)

Instead of config.yaml, you can use environment variables:

# Qwen
export DASHSCOPE_API_KEY="your-key"

# OpenAI
export OPENAI_API_KEY="your-key"

# DeepSeek
export DEEPSEEK_API_KEY="your-key"

2. TTS (Text-to-Speech)

Configuration

Edit config.yaml:

tts:
  default: edge
  
  edge:
    # No configuration needed - free to use!

Usage

# Basic usage (auto-generates temp path)
audio_path = await reelforge.tts("Hello, world!")
# Returns: "temp/abc123def456.mp3"

# With Chinese text
audio_path = await reelforge.tts(
    text="你好,世界!",
    voice="zh-CN-YunjianNeural"
)

# With custom parameters
audio_path = await reelforge.tts(
    text="Welcome to ReelForge",
    voice="en-US-JennyNeural",
    rate="+20%",  # Speed: +50% = faster, -20% = slower
    volume="+0%",
    pitch="+0Hz"
)

# Specify output path
audio_path = await reelforge.tts(
    text="Hello",
    output_path="output/greeting.mp3"
)

Chinese:

  • zh-CN-YunjianNeural (male, default)
  • zh-CN-XiaoxiaoNeural (female)
  • zh-CN-YunxiNeural (male)
  • zh-CN-XiaoyiNeural (female)

English:

  • en-US-JennyNeural (female)
  • en-US-GuyNeural (male)
  • en-GB-SoniaNeural (female, British)

List All Voices

# Get all available voices
voices = await reelforge.tts.list_voices()

# Get Chinese voices only
voices = await reelforge.tts.list_voices(locale="zh-CN")

# Get English voices only
voices = await reelforge.tts.list_voices(locale="en-US")

3. Image Generation

Configuration

Edit config.yaml:

image:
  default: comfykit
  
  comfykit:
    # Local ComfyUI (optional, default: http://127.0.0.1:8188)
    comfyui_url: "http://127.0.0.1:8188"
    
    # RunningHub cloud (optional)
    runninghub_api_key: "rh-key-xxx"

Usage

# Basic usage (local ComfyUI)
image_url = await reelforge.image(
    workflow="workflows/book_cover_simple.json",
    prompt="minimalist book cover design, blue and white"
)

# With full parameters
image_url = await reelforge.image(
    workflow="workflows/book_cover_simple.json",
    prompt="book cover for 'Atomic Habits', professional, minimalist",
    negative_prompt="ugly, blurry, low quality",
    width=1024,
    height=1536,
    steps=20,
    seed=42
)

# Using RunningHub cloud
image_url = await reelforge.image(
    workflow="12345",  # RunningHub workflow ID
    prompt="a beautiful landscape"
)

# Check available workflows
workflows = reelforge.image.list_workflows()
print(f"Available workflows: {workflows}")

Environment Variables (Alternative)

# Local ComfyUI
export COMFYUI_BASE_URL="http://127.0.0.1:8188"

# RunningHub cloud
export RUNNINGHUB_API_KEY="rh-key-xxx"

Workflow DSL

ReelForge uses ComfyKit's DSL for workflow parameters:

{
  "6": {
    "class_type": "CLIPTextEncode",
    "_meta": {
      "title": "$prompt!"
    },
    "inputs": {
      "text": "default prompt",
      "clip": ["4", 1]
    }
  }
}

DSL Markers:

  • $param! - Required parameter
  • $param - Optional parameter
  • $param~ - Upload parameter (for images/audio/video)
  • $output.name - Output variable

Combined Workflow Example

Generate a complete book cover with narration:

import asyncio
from reelforge.service import reelforge

async def create_book_content(book_title, author):
    """Generate book summary, audio, and cover image"""
    
    # 1. Generate book summary with LLM
    summary = await reelforge.llm(
        prompt=f"Write a compelling 2-sentence summary for a book titled '{book_title}' by {author}",
        max_tokens=100
    )
    print(f"Summary: {summary}")
    
    # 2. Generate audio narration with TTS
    audio_path = await reelforge.tts(
        text=summary,
        voice="en-US-JennyNeural"
    )
    print(f"Audio: {audio_path}")
    
    # 3. Generate book cover image
    image_url = await reelforge.image(
        workflow="workflows/book_cover_simple.json",
        prompt=f"book cover for '{book_title}' by {author}, professional, modern design",
        width=1024,
        height=1536
    )
    print(f"Cover: {image_url}")
    
    return {
        "summary": summary,
        "audio": audio_path,
        "cover": image_url
    }

# Run
result = asyncio.run(create_book_content("Atomic Habits", "James Clear"))

Troubleshooting

LLM Issues

"API key not found"

  • Make sure you've set the API key in config.yaml or environment variables
  • For Qwen: DASHSCOPE_API_KEY
  • For OpenAI: OPENAI_API_KEY
  • For DeepSeek: DEEPSEEK_API_KEY

"Connection error"

  • Check base_url in config
  • Verify API endpoint is accessible
  • For Ollama, make sure server is running (ollama serve)

TTS Issues

"SSL error"

  • Edge TTS is free but requires internet connection
  • SSL verification is disabled by default for development

Image Issues

"ComfyUI connection refused"

  • Make sure ComfyUI is running at http://127.0.0.1:8188
  • Or configure RunningHub API key for cloud execution

"Workflow file not found"

  • Check workflow path is correct
  • Use relative path from project root: workflows/your_workflow.json

"No images generated"

  • Check workflow has SaveImage node
  • Verify workflow parameters are correct
  • Check ComfyUI logs for errors

Next Steps

  • See /examples/ directory for complete examples
  • Run python test_integration.py to test all capabilities
  • Create custom workflows in /workflows/ directory
  • Check ComfyKit documentation: https://puke3615.github.io/ComfyKit

Happy creating with ReelForge! 📚🎬