init

2025-10-25 19:39:13 +08:00
parent fe6fa4923e
commit 60918f69b1
55 changed files with 13552 additions and 0 deletions
--- a/docs/capabilities-guide.md
+++ b/docs/capabilities-guide.md
@@ -0,0 +1,346 @@
+# ReelForge Capabilities Guide
+
+> Complete guide to using LLM, TTS, and Image generation capabilities
+
+## Overview
+
+ReelForge provides three core AI capabilities:
+- **LLM**: Text generation using LiteLLM (supports 100+ models)
+- **TTS**: Text-to-speech using Edge TTS (free, 400+ voices)
+- **Image**: Image generation using ComfyKit (local or cloud)
+
+## Quick Start
+
+```python
+from reelforge.service import reelforge
+
+# LLM - Generate text
+answer = await reelforge.llm("Summarize 'Atomic Habits' in 3 sentences")
+
+# TTS - Generate speech
+audio_path = await reelforge.tts("Hello, world!")
+
+# Image - Generate images
+image_url = await reelforge.image(
+    workflow="workflows/book_cover_simple.json",
+    prompt="minimalist book cover design"
+)
+```
+
+---
+
+## 1. LLM (Large Language Model)
+
+### Configuration
+
+Edit `config.yaml`:
+
+```yaml
+llm:
+  default: qwen  # Choose: qwen, openai, deepseek, ollama
+  
+  qwen:
+    api_key: "your-dashscope-api-key"
+    base_url: "https://dashscope.aliyuncs.com/compatible-mode/v1"
+    model: "openai/qwen-max"
+  
+  openai:
+    api_key: "your-openai-api-key"
+    model: "gpt-4"
+  
+  deepseek:
+    api_key: "your-deepseek-api-key"
+    base_url: "https://api.deepseek.com"
+    model: "openai/deepseek-chat"
+  
+  ollama:
+    base_url: "http://localhost:11434"
+    model: "ollama/llama3.2"
+```
+
+### Usage
+
+```python
+# Basic usage
+answer = await reelforge.llm("What is machine learning?")
+
+# With parameters
+answer = await reelforge.llm(
+    prompt="Explain atomic habits",
+    temperature=0.7,  # 0.0-2.0 (lower = more deterministic)
+    max_tokens=2000
+)
+
+# Check active LLM
+print(f"Using: {reelforge.llm.active}")
+print(f"Available: {reelforge.llm.available}")
+```
+
+### Environment Variables (Alternative)
+
+Instead of `config.yaml`, you can use environment variables:
+
+```bash
+# Qwen
+export DASHSCOPE_API_KEY="your-key"
+
+# OpenAI
+export OPENAI_API_KEY="your-key"
+
+# DeepSeek
+export DEEPSEEK_API_KEY="your-key"
+```
+
+---
+
+## 2. TTS (Text-to-Speech)
+
+### Configuration
+
+Edit `config.yaml`:
+
+```yaml
+tts:
+  default: edge
+  
+  edge:
+    # No configuration needed - free to use!
+```
+
+### Usage
+
+```python
+# Basic usage (auto-generates temp path)
+audio_path = await reelforge.tts("Hello, world!")
+# Returns: "temp/abc123def456.mp3"
+
+# With Chinese text
+audio_path = await reelforge.tts(
+    text="你好，世界！",
+    voice="zh-CN-YunjianNeural"
+)
+
+# With custom parameters
+audio_path = await reelforge.tts(
+    text="Welcome to ReelForge",
+    voice="en-US-JennyNeural",
+    rate="+20%",  # Speed: +50% = faster, -20% = slower
+    volume="+0%",
+    pitch="+0Hz"
+)
+
+# Specify output path
+audio_path = await reelforge.tts(
+    text="Hello",
+    output_path="output/greeting.mp3"
+)
+```
+
+### Popular Voices
+
+**Chinese:**
+- `zh-CN-YunjianNeural` (male, default)
+- `zh-CN-XiaoxiaoNeural` (female)
+- `zh-CN-YunxiNeural` (male)
+- `zh-CN-XiaoyiNeural` (female)
+
+**English:**
+- `en-US-JennyNeural` (female)
+- `en-US-GuyNeural` (male)
+- `en-GB-SoniaNeural` (female, British)
+
+### List All Voices
+
+```python
+# Get all available voices
+voices = await reelforge.tts.list_voices()
+
+# Get Chinese voices only
+voices = await reelforge.tts.list_voices(locale="zh-CN")
+
+# Get English voices only
+voices = await reelforge.tts.list_voices(locale="en-US")
+```
+
+---
+
+## 3. Image Generation
+
+### Configuration
+
+Edit `config.yaml`:
+
+```yaml
+image:
+  default: comfykit
+  
+  comfykit:
+    # Local ComfyUI (optional, default: http://127.0.0.1:8188)
+    comfyui_url: "http://127.0.0.1:8188"
+    
+    # RunningHub cloud (optional)
+    runninghub_api_key: "rh-key-xxx"
+```
+
+### Usage
+
+```python
+# Basic usage (local ComfyUI)
+image_url = await reelforge.image(
+    workflow="workflows/book_cover_simple.json",
+    prompt="minimalist book cover design, blue and white"
+)
+
+# With full parameters
+image_url = await reelforge.image(
+    workflow="workflows/book_cover_simple.json",
+    prompt="book cover for 'Atomic Habits', professional, minimalist",
+    negative_prompt="ugly, blurry, low quality",
+    width=1024,
+    height=1536,
+    steps=20,
+    seed=42
+)
+
+# Using RunningHub cloud
+image_url = await reelforge.image(
+    workflow="12345",  # RunningHub workflow ID
+    prompt="a beautiful landscape"
+)
+
+# Check active generator
+print(f"Using: {reelforge.image.active}")
+```
+
+### Environment Variables (Alternative)
+
+```bash
+# Local ComfyUI
+export COMFYUI_BASE_URL="http://127.0.0.1:8188"
+
+# RunningHub cloud
+export RUNNINGHUB_API_KEY="rh-key-xxx"
+```
+
+### Workflow DSL
+
+ReelForge uses ComfyKit's DSL for workflow parameters:
+
+```json
+{
+  "6": {
+    "class_type": "CLIPTextEncode",
+    "_meta": {
+      "title": "$prompt!"
+    },
+    "inputs": {
+      "text": "default prompt",
+      "clip": ["4", 1]
+    }
+  }
+}
+```
+
+**DSL Markers:**
+- `$param!` - Required parameter
+- `$param` - Optional parameter
+- `$param~` - Upload parameter (for images/audio/video)
+- `$output.name` - Output variable
+
+---
+
+## Combined Workflow Example
+
+Generate a complete book cover with narration:
+
+```python
+import asyncio
+from reelforge.service import reelforge
+
+async def create_book_content(book_title, author):
+    """Generate book summary, audio, and cover image"""
+    
+    # 1. Generate book summary with LLM
+    summary = await reelforge.llm(
+        prompt=f"Write a compelling 2-sentence summary for a book titled '{book_title}' by {author}",
+        max_tokens=100
+    )
+    print(f"Summary: {summary}")
+    
+    # 2. Generate audio narration with TTS
+    audio_path = await reelforge.tts(
+        text=summary,
+        voice="en-US-JennyNeural"
+    )
+    print(f"Audio: {audio_path}")
+    
+    # 3. Generate book cover image
+    image_url = await reelforge.image(
+        workflow="workflows/book_cover_simple.json",
+        prompt=f"book cover for '{book_title}' by {author}, professional, modern design",
+        width=1024,
+        height=1536
+    )
+    print(f"Cover: {image_url}")
+    
+    return {
+        "summary": summary,
+        "audio": audio_path,
+        "cover": image_url
+    }
+
+# Run
+result = asyncio.run(create_book_content("Atomic Habits", "James Clear"))
+```
+
+---
+
+## Troubleshooting
+
+### LLM Issues
+
+**"API key not found"**
+- Make sure you've set the API key in `config.yaml` or environment variables
+- For Qwen: `DASHSCOPE_API_KEY`
+- For OpenAI: `OPENAI_API_KEY`
+- For DeepSeek: `DEEPSEEK_API_KEY`
+
+**"Connection error"**
+- Check `base_url` in config
+- Verify API endpoint is accessible
+- For Ollama, make sure server is running (`ollama serve`)
+
+### TTS Issues
+
+**"SSL error"**
+- Edge TTS is free but requires internet connection
+- SSL verification is disabled by default for development
+
+### Image Issues
+
+**"ComfyUI connection refused"**
+- Make sure ComfyUI is running at http://127.0.0.1:8188
+- Or configure RunningHub API key for cloud execution
+
+**"Workflow file not found"**
+- Check workflow path is correct
+- Use relative path from project root: `workflows/your_workflow.json`
+
+**"No images generated"**
+- Check workflow has `SaveImage` node
+- Verify workflow parameters are correct
+- Check ComfyUI logs for errors
+
+---
+
+## Next Steps
+
+- See `/examples/` directory for complete examples
+- Run `python test_integration.py` to test all capabilities
+- Create custom workflows in `/workflows/` directory
+- Check ComfyKit documentation: https://puke3615.github.io/ComfyKit
+
+---
+
+**Happy creating with ReelForge!** 📚🎬
+