8.0 KiB
🙋♀️ Pixelle-Video Frequently Asked Questions
What is Pixelle-Video and how does it work?
Pixelle-Video is an AI-powered video generation tool that creates complete videos from a single topic input. The workflow is:
- Script Generation → Image Planning → Frame Processing → Video Synthesis
Simply input a topic keyword, and Pixelle-Video automatically handles scriptwriting, image generation, voice synthesis, background music, and final video compilation - requiring zero video editing experience.
What installation methods are supported?
Pixelle-Video supports the following installation methods:
-
Standard Installation:
git clone https://github.com/AIDC-AI/Pixelle-Video.git cd Pixelle-Video uv run streamlit run web/app.py -
Prerequisites:
- Install
uvpackage manager (see official documentation for your system) - Install
ffmpegfor video processing:- macOS:
brew install ffmpeg - Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg - Windows: Download from ffmpeg.org and add to PATH
- macOS:
- Install
What are the system requirements?
- Basic: Python 3.10+, uv package manager, ffmpeg
- For Image Generation: ComfyUI server (local or cloud)
- For LLM Integration: API keys for supported models (optional for local models)
- Hardware: GPU recommended for local image generation, but cloud options available
How to configure the system for first use?
- Open the web interface at http://localhost:8501
- Expand the "⚙️ System Configuration" panel
- Configure two main sections:
- LLM Configuration:
- Select preset model (Qwen, GPT-4o, DeepSeek, etc.)
- Enter API key or configure local model (Ollama)
- Image Configuration:
- Local deployment: Set ComfyUI URL (default: http://127.0.0.1:8188)
- Cloud deployment: Enter RunningHub API key
- LLM Configuration:
- Click "Save Configuration" to complete setup
What generation modes are available?
Pixelle-Video offers two main generation modes:
-
AI Generated Content:
- Input just a topic keyword
- AI automatically writes the script and creates the video
- Example: "Why develop a reading habit"
-
Fixed Script Content:
- Provide your complete script text
- Skip AI scriptwriting, go directly to video generation
- Ideal when you already have prepared content
How to customize the audio settings?
Audio customization options include:
-
Background Music (BGM):
- No BGM: Pure voice narration
- Built-in music: Select from preset tracks (e.g., default.mp3)
- Custom music: Place your MP3/WAV files in the
bgm/folder
-
Text-to-Speech (TTS):
- Select from available TTS workflows (Edge-TTS, Index-TTS, etc.)
- System automatically scans
workflows/folder for available options - Preview voice effect with test text
-
Voice Cloning:
- Upload reference audio (MP3/WAV/FLAC) for supported TTS workflows
- Use reference audio during preview and generation
How to customize the visual style?
Visual customization includes:
-
Image Generation Workflow:
- Select from available ComfyUI workflows (local or cloud)
- Default workflow:
image_flux.json - Custom workflows can be added to
workflows/folder
-
Image Dimensions:
- Set width and height in pixels (default: 1024x1024)
- Note: Different models have different size limitations
-
Style Prompt Prefix:
- Control overall image style (must be in English)
- Example: "Minimalist black-and-white matchstick figure style illustration, clean lines, simple sketch style"
- Click "Preview Style" to test the effect
-
Video Templates:
- Choose from multiple templates grouped by aspect ratio (vertical/horizontal/square)
- Preview templates with custom parameters
- Advanced users can create custom HTML templates in
templates/folder
What AI models are supported?
Pixelle-Video supports multiple AI model providers:
- LLM Models: GPT, Qwen (通义千问), DeepSeek, Ollama (local)
- Image Generation: ComfyUI with various models (FLUX, SDXL, etc.)
- TTS Engines: Edge-TTS, Index-TTS, ChatTTS, and more
The modular architecture allows flexible replacement of any component - for example, you can replace the image generation model with FLUX or switch TTS to ChatTTS.
What are the cost options for running Pixelle-Video?
Pixelle-Video offers three cost tiers:
-
Completely Free:
- LLM: Ollama (local)
- Image Generation: Local ComfyUI deployment
- Total cost: $0
-
Recommended Balanced Option:
- LLM: Qwen (通义千问) - very low cost, high value
- Image Generation: Local ComfyUI deployment
- Cost: Minimal API fees for text generation only
-
Cloud-Only Option:
- LLM: OpenAI API
- Image Generation: RunningHub cloud service
- Cost: Higher but requires no local hardware
Recommendation: Use local deployment if you have a GPU, otherwise Qwen + local ComfyUI offers the best value.
How long does video generation take?
Generation time depends on several factors:
- Number of scenes in the script
- Network speed for API calls
- AI inference speed (local vs cloud)
- Video length and resolution
Typical generation time: 2-10 minutes for most videos. The interface shows real-time progress through each stage (script → images → audio → final video).
What to do if the video quality is unsatisfactory?
Try these adjustments:
-
Script Quality:
- Switch to a different LLM model (different models have different writing styles)
- Use "Fixed Script Content" mode with your own refined script
-
Image Quality:
- Adjust image dimensions to match model requirements
- Modify the prompt prefix to change visual style
- Try different ComfyUI workflows
-
Audio Quality:
- Switch TTS workflow (Edge-TTS vs Index-TTS vs others)
- Upload reference audio for voice cloning
- Adjust TTS parameters
-
Video Layout:
- Try different video templates
- Change video dimensions (vertical/horizontal/square)
Where are the generated videos saved?
All generated videos are automatically saved to the output/ folder in the project directory. The interface displays detailed information after generation:
- Video duration
- File size
- Number of scenes
- Download link
How to troubleshoot common errors?
-
FFmpeg Errors:
- Verify ffmpeg installation with
ffmpeg -version - Ensure ffmpeg is in your system PATH
- Verify ffmpeg installation with
-
API Connection Issues:
- Verify API keys are correct
- Test LLM connection in system configuration
- For ComfyUI: Click "Test Connection" in image configuration
-
Image Generation Failures:
- Ensure ComfyUI server is running
- Check image dimensions are supported by your model
- Verify workflow files exist in
workflows/folder
-
Audio Generation Issues:
- Ensure selected TTS workflow is properly configured
- For voice cloning: verify reference audio format is supported
How to extend Pixelle-Video with custom features?
Pixelle-Video is built on ComfyUI architecture, allowing deep customization:
- Custom Workflows: Add your own ComfyUI workflows to
workflows/folder - Custom Templates: Create HTML templates in
templates/folder - Custom BGM: Add your music files to
bgm/folder - Advanced Integration: Since it's based on ComfyUI, you can integrate any ComfyUI custom nodes
The atomic capability design means you can mix and match any component - replace text generation, image models, TTS engines, or video templates independently.
What community resources are available?
- GitHub Repository: https://github.com/AIDC-AI/Pixelle-Video
- Issue Tracking: Submit bugs or feature requests via GitHub Issues
- Community Support: Join discussion groups for help and sharing
- Template Gallery: View all available templates and their effects
- Contributions: The project welcomes contributions under MIT license
💡 Tip: If your question isn't answered here, please submit an issue on GitHub or join our community discussions. We continuously update this FAQ based on user feedback!