feat(P0): Enhance image prompt generation with core imagery extraction for better text-image alignment
This commit is contained in:
@@ -58,31 +58,37 @@ Based on the existing video script, create corresponding **English** image promp
|
||||
# Input Content
|
||||
{narrations_json}
|
||||
|
||||
# ⭐ Core Imagery Extraction (Critical for Relevance)
|
||||
For EACH narration, you MUST:
|
||||
1. **Extract 2-3 core visual imagery/metaphors** that best represent the narration's meaning
|
||||
2. **Identify the emotional tone** (hopeful, melancholic, inspiring, etc.)
|
||||
3. **Determine concrete visual elements** that embody these abstract concepts
|
||||
|
||||
Example thought process:
|
||||
- Narration: "给自己一个不设限的探索时间"
|
||||
- Core Imagery: exploration, freedom, open paths
|
||||
- Emotional Tone: hopeful, adventurous
|
||||
- Visual Elements: open road, person looking at horizon, map with unmarked routes
|
||||
|
||||
# Output Requirements
|
||||
|
||||
## Image Prompt Specifications
|
||||
- Language: **Must use English** (for AI image generation models)
|
||||
- Description structure: scene + character action + emotion + symbolic elements
|
||||
- Description length: Ensure clear, complete, and creative descriptions (recommended 50-100 English words)
|
||||
- **REQUIRED Structure**: [Core imagery] + [Scene description] + [Character action] + [Emotional atmosphere]
|
||||
- Description length: 50-100 English words
|
||||
- **The image prompt MUST directly reflect the extracted core imagery from the narration**
|
||||
|
||||
## Visual Creative Requirements
|
||||
- Each image must accurately reflect the specific content and emotion of the corresponding narration
|
||||
- Use symbolic techniques to visualize abstract concepts (e.g., use paths to represent life choices, chains to represent constraints, etc.)
|
||||
- **Prioritize core visual metaphors** - the main visual elements must embody the narration's key message
|
||||
- Use symbolic techniques to visualize abstract concepts (e.g., paths=choices, chains=constraints, open doors=opportunities)
|
||||
- Scenes should express rich emotions and actions to enhance visual impact
|
||||
- Highlight themes through composition and element arrangement, avoid overly literal representations
|
||||
|
||||
## Key English Vocabulary Reference
|
||||
- Symbolic elements: symbolic elements
|
||||
- Expression: expression / facial expression
|
||||
- Action: action / gesture / movement
|
||||
- Scene: scene / setting
|
||||
- Atmosphere: atmosphere / mood
|
||||
|
||||
## Visual and Copy Coordination Principles
|
||||
- Images should serve the copy, becoming a visual extension of the copy content
|
||||
- Avoid visual elements unrelated to or contradicting the copy content
|
||||
- Choose visual presentation methods that best enhance the persuasiveness of the copy
|
||||
- Ensure the audience can quickly understand the core viewpoint of the copy through images
|
||||
## Visual and Narration Coordination Principles (Most Important)
|
||||
- **Direct semantic connection**: The main visual elements MUST represent the narration's core meaning
|
||||
- **Avoid decorative scenes**: Don't add unrelated beautiful scenery that doesn't support the message
|
||||
- **Ask yourself**: If someone saw only the image, could they guess what the narration is about?
|
||||
- **Test question**: What is the ONE THING this narration is about? Make sure that thing is visible in the image.
|
||||
|
||||
## Creative Guidance
|
||||
1. **Phenomenon Description Copy**: Use intuitive scenes to represent social phenomena
|
||||
@@ -97,8 +103,8 @@ Strictly output in the following JSON format, **image prompts must be in English
|
||||
```json
|
||||
{{
|
||||
"image_prompts": [
|
||||
"[detailed English image prompt following the style requirements]",
|
||||
"[detailed English image prompt following the style requirements]"
|
||||
"[Core imagery visible] + [Scene with semantic connection to narration] + [Character/action reflecting the message] + [Emotional atmosphere]",
|
||||
"[Next image prompt following the same structure]"
|
||||
]
|
||||
}}
|
||||
```
|
||||
@@ -109,14 +115,15 @@ Strictly output in the following JSON format, **image prompts must be in English
|
||||
3. Input is {{"narrations": [narration array]}} format, output is {{"image_prompts": [image prompt array]}} format
|
||||
4. **The output image_prompts array must contain exactly {narrations_count} elements, corresponding one-to-one with the input narrations array**
|
||||
5. **Image prompts must use English** (for AI image generation models)
|
||||
6. Image prompts must accurately reflect the specific content and emotion of the corresponding narration
|
||||
7. Each image must be creative and visually impactful, avoid being monotonous
|
||||
8. Ensure visual scenes can enhance the persuasiveness of the copy and audience understanding
|
||||
6. **⭐ Most Critical: Each image prompt must have DIRECT semantic relevance to its narration**
|
||||
7. Before writing each prompt, mentally extract the core visual metaphor from the narration
|
||||
8. Verify: Could someone understand the narration's message from the image alone?
|
||||
|
||||
Now, please create {narrations_count} corresponding **English** image prompts for the above {narrations_count} narrations. Only output JSON, no other content.
|
||||
"""
|
||||
|
||||
|
||||
|
||||
def build_image_prompt_prompt(
|
||||
narrations: List[str],
|
||||
min_words: int,
|
||||
|
||||
Reference in New Issue
Block a user