init
This commit is contained in:
346
docs/capabilities-guide.md
Normal file
346
docs/capabilities-guide.md
Normal file
@@ -0,0 +1,346 @@
|
||||
# ReelForge Capabilities Guide
|
||||
|
||||
> Complete guide to using LLM, TTS, and Image generation capabilities
|
||||
|
||||
## Overview
|
||||
|
||||
ReelForge provides three core AI capabilities:
|
||||
- **LLM**: Text generation using LiteLLM (supports 100+ models)
|
||||
- **TTS**: Text-to-speech using Edge TTS (free, 400+ voices)
|
||||
- **Image**: Image generation using ComfyKit (local or cloud)
|
||||
|
||||
## Quick Start
|
||||
|
||||
```python
|
||||
from reelforge.service import reelforge
|
||||
|
||||
# LLM - Generate text
|
||||
answer = await reelforge.llm("Summarize 'Atomic Habits' in 3 sentences")
|
||||
|
||||
# TTS - Generate speech
|
||||
audio_path = await reelforge.tts("Hello, world!")
|
||||
|
||||
# Image - Generate images
|
||||
image_url = await reelforge.image(
|
||||
workflow="workflows/book_cover_simple.json",
|
||||
prompt="minimalist book cover design"
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 1. LLM (Large Language Model)
|
||||
|
||||
### Configuration
|
||||
|
||||
Edit `config.yaml`:
|
||||
|
||||
```yaml
|
||||
llm:
|
||||
default: qwen # Choose: qwen, openai, deepseek, ollama
|
||||
|
||||
qwen:
|
||||
api_key: "your-dashscope-api-key"
|
||||
base_url: "https://dashscope.aliyuncs.com/compatible-mode/v1"
|
||||
model: "openai/qwen-max"
|
||||
|
||||
openai:
|
||||
api_key: "your-openai-api-key"
|
||||
model: "gpt-4"
|
||||
|
||||
deepseek:
|
||||
api_key: "your-deepseek-api-key"
|
||||
base_url: "https://api.deepseek.com"
|
||||
model: "openai/deepseek-chat"
|
||||
|
||||
ollama:
|
||||
base_url: "http://localhost:11434"
|
||||
model: "ollama/llama3.2"
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
# Basic usage
|
||||
answer = await reelforge.llm("What is machine learning?")
|
||||
|
||||
# With parameters
|
||||
answer = await reelforge.llm(
|
||||
prompt="Explain atomic habits",
|
||||
temperature=0.7, # 0.0-2.0 (lower = more deterministic)
|
||||
max_tokens=2000
|
||||
)
|
||||
|
||||
# Check active LLM
|
||||
print(f"Using: {reelforge.llm.active}")
|
||||
print(f"Available: {reelforge.llm.available}")
|
||||
```
|
||||
|
||||
### Environment Variables (Alternative)
|
||||
|
||||
Instead of `config.yaml`, you can use environment variables:
|
||||
|
||||
```bash
|
||||
# Qwen
|
||||
export DASHSCOPE_API_KEY="your-key"
|
||||
|
||||
# OpenAI
|
||||
export OPENAI_API_KEY="your-key"
|
||||
|
||||
# DeepSeek
|
||||
export DEEPSEEK_API_KEY="your-key"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. TTS (Text-to-Speech)
|
||||
|
||||
### Configuration
|
||||
|
||||
Edit `config.yaml`:
|
||||
|
||||
```yaml
|
||||
tts:
|
||||
default: edge
|
||||
|
||||
edge:
|
||||
# No configuration needed - free to use!
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
# Basic usage (auto-generates temp path)
|
||||
audio_path = await reelforge.tts("Hello, world!")
|
||||
# Returns: "temp/abc123def456.mp3"
|
||||
|
||||
# With Chinese text
|
||||
audio_path = await reelforge.tts(
|
||||
text="你好,世界!",
|
||||
voice="zh-CN-YunjianNeural"
|
||||
)
|
||||
|
||||
# With custom parameters
|
||||
audio_path = await reelforge.tts(
|
||||
text="Welcome to ReelForge",
|
||||
voice="en-US-JennyNeural",
|
||||
rate="+20%", # Speed: +50% = faster, -20% = slower
|
||||
volume="+0%",
|
||||
pitch="+0Hz"
|
||||
)
|
||||
|
||||
# Specify output path
|
||||
audio_path = await reelforge.tts(
|
||||
text="Hello",
|
||||
output_path="output/greeting.mp3"
|
||||
)
|
||||
```
|
||||
|
||||
### Popular Voices
|
||||
|
||||
**Chinese:**
|
||||
- `zh-CN-YunjianNeural` (male, default)
|
||||
- `zh-CN-XiaoxiaoNeural` (female)
|
||||
- `zh-CN-YunxiNeural` (male)
|
||||
- `zh-CN-XiaoyiNeural` (female)
|
||||
|
||||
**English:**
|
||||
- `en-US-JennyNeural` (female)
|
||||
- `en-US-GuyNeural` (male)
|
||||
- `en-GB-SoniaNeural` (female, British)
|
||||
|
||||
### List All Voices
|
||||
|
||||
```python
|
||||
# Get all available voices
|
||||
voices = await reelforge.tts.list_voices()
|
||||
|
||||
# Get Chinese voices only
|
||||
voices = await reelforge.tts.list_voices(locale="zh-CN")
|
||||
|
||||
# Get English voices only
|
||||
voices = await reelforge.tts.list_voices(locale="en-US")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Image Generation
|
||||
|
||||
### Configuration
|
||||
|
||||
Edit `config.yaml`:
|
||||
|
||||
```yaml
|
||||
image:
|
||||
default: comfykit
|
||||
|
||||
comfykit:
|
||||
# Local ComfyUI (optional, default: http://127.0.0.1:8188)
|
||||
comfyui_url: "http://127.0.0.1:8188"
|
||||
|
||||
# RunningHub cloud (optional)
|
||||
runninghub_api_key: "rh-key-xxx"
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
# Basic usage (local ComfyUI)
|
||||
image_url = await reelforge.image(
|
||||
workflow="workflows/book_cover_simple.json",
|
||||
prompt="minimalist book cover design, blue and white"
|
||||
)
|
||||
|
||||
# With full parameters
|
||||
image_url = await reelforge.image(
|
||||
workflow="workflows/book_cover_simple.json",
|
||||
prompt="book cover for 'Atomic Habits', professional, minimalist",
|
||||
negative_prompt="ugly, blurry, low quality",
|
||||
width=1024,
|
||||
height=1536,
|
||||
steps=20,
|
||||
seed=42
|
||||
)
|
||||
|
||||
# Using RunningHub cloud
|
||||
image_url = await reelforge.image(
|
||||
workflow="12345", # RunningHub workflow ID
|
||||
prompt="a beautiful landscape"
|
||||
)
|
||||
|
||||
# Check active generator
|
||||
print(f"Using: {reelforge.image.active}")
|
||||
```
|
||||
|
||||
### Environment Variables (Alternative)
|
||||
|
||||
```bash
|
||||
# Local ComfyUI
|
||||
export COMFYUI_BASE_URL="http://127.0.0.1:8188"
|
||||
|
||||
# RunningHub cloud
|
||||
export RUNNINGHUB_API_KEY="rh-key-xxx"
|
||||
```
|
||||
|
||||
### Workflow DSL
|
||||
|
||||
ReelForge uses ComfyKit's DSL for workflow parameters:
|
||||
|
||||
```json
|
||||
{
|
||||
"6": {
|
||||
"class_type": "CLIPTextEncode",
|
||||
"_meta": {
|
||||
"title": "$prompt!"
|
||||
},
|
||||
"inputs": {
|
||||
"text": "default prompt",
|
||||
"clip": ["4", 1]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**DSL Markers:**
|
||||
- `$param!` - Required parameter
|
||||
- `$param` - Optional parameter
|
||||
- `$param~` - Upload parameter (for images/audio/video)
|
||||
- `$output.name` - Output variable
|
||||
|
||||
---
|
||||
|
||||
## Combined Workflow Example
|
||||
|
||||
Generate a complete book cover with narration:
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from reelforge.service import reelforge
|
||||
|
||||
async def create_book_content(book_title, author):
|
||||
"""Generate book summary, audio, and cover image"""
|
||||
|
||||
# 1. Generate book summary with LLM
|
||||
summary = await reelforge.llm(
|
||||
prompt=f"Write a compelling 2-sentence summary for a book titled '{book_title}' by {author}",
|
||||
max_tokens=100
|
||||
)
|
||||
print(f"Summary: {summary}")
|
||||
|
||||
# 2. Generate audio narration with TTS
|
||||
audio_path = await reelforge.tts(
|
||||
text=summary,
|
||||
voice="en-US-JennyNeural"
|
||||
)
|
||||
print(f"Audio: {audio_path}")
|
||||
|
||||
# 3. Generate book cover image
|
||||
image_url = await reelforge.image(
|
||||
workflow="workflows/book_cover_simple.json",
|
||||
prompt=f"book cover for '{book_title}' by {author}, professional, modern design",
|
||||
width=1024,
|
||||
height=1536
|
||||
)
|
||||
print(f"Cover: {image_url}")
|
||||
|
||||
return {
|
||||
"summary": summary,
|
||||
"audio": audio_path,
|
||||
"cover": image_url
|
||||
}
|
||||
|
||||
# Run
|
||||
result = asyncio.run(create_book_content("Atomic Habits", "James Clear"))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### LLM Issues
|
||||
|
||||
**"API key not found"**
|
||||
- Make sure you've set the API key in `config.yaml` or environment variables
|
||||
- For Qwen: `DASHSCOPE_API_KEY`
|
||||
- For OpenAI: `OPENAI_API_KEY`
|
||||
- For DeepSeek: `DEEPSEEK_API_KEY`
|
||||
|
||||
**"Connection error"**
|
||||
- Check `base_url` in config
|
||||
- Verify API endpoint is accessible
|
||||
- For Ollama, make sure server is running (`ollama serve`)
|
||||
|
||||
### TTS Issues
|
||||
|
||||
**"SSL error"**
|
||||
- Edge TTS is free but requires internet connection
|
||||
- SSL verification is disabled by default for development
|
||||
|
||||
### Image Issues
|
||||
|
||||
**"ComfyUI connection refused"**
|
||||
- Make sure ComfyUI is running at http://127.0.0.1:8188
|
||||
- Or configure RunningHub API key for cloud execution
|
||||
|
||||
**"Workflow file not found"**
|
||||
- Check workflow path is correct
|
||||
- Use relative path from project root: `workflows/your_workflow.json`
|
||||
|
||||
**"No images generated"**
|
||||
- Check workflow has `SaveImage` node
|
||||
- Verify workflow parameters are correct
|
||||
- Check ComfyUI logs for errors
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
- See `/examples/` directory for complete examples
|
||||
- Run `python test_integration.py` to test all capabilities
|
||||
- Create custom workflows in `/workflows/` directory
|
||||
- Check ComfyKit documentation: https://puke3615.github.io/ComfyKit
|
||||
|
||||
---
|
||||
|
||||
**Happy creating with ReelForge!** 📚🎬
|
||||
|
||||
Reference in New Issue
Block a user