feat(P0): Enhance image prompt generation with core imagery extraction for better text-image alignment
This commit is contained in:
@@ -58,31 +58,37 @@ Based on the existing video script, create corresponding **English** image promp
|
|||||||
# Input Content
|
# Input Content
|
||||||
{narrations_json}
|
{narrations_json}
|
||||||
|
|
||||||
|
# ⭐ Core Imagery Extraction (Critical for Relevance)
|
||||||
|
For EACH narration, you MUST:
|
||||||
|
1. **Extract 2-3 core visual imagery/metaphors** that best represent the narration's meaning
|
||||||
|
2. **Identify the emotional tone** (hopeful, melancholic, inspiring, etc.)
|
||||||
|
3. **Determine concrete visual elements** that embody these abstract concepts
|
||||||
|
|
||||||
|
Example thought process:
|
||||||
|
- Narration: "给自己一个不设限的探索时间"
|
||||||
|
- Core Imagery: exploration, freedom, open paths
|
||||||
|
- Emotional Tone: hopeful, adventurous
|
||||||
|
- Visual Elements: open road, person looking at horizon, map with unmarked routes
|
||||||
|
|
||||||
# Output Requirements
|
# Output Requirements
|
||||||
|
|
||||||
## Image Prompt Specifications
|
## Image Prompt Specifications
|
||||||
- Language: **Must use English** (for AI image generation models)
|
- Language: **Must use English** (for AI image generation models)
|
||||||
- Description structure: scene + character action + emotion + symbolic elements
|
- **REQUIRED Structure**: [Core imagery] + [Scene description] + [Character action] + [Emotional atmosphere]
|
||||||
- Description length: Ensure clear, complete, and creative descriptions (recommended 50-100 English words)
|
- Description length: 50-100 English words
|
||||||
|
- **The image prompt MUST directly reflect the extracted core imagery from the narration**
|
||||||
|
|
||||||
## Visual Creative Requirements
|
## Visual Creative Requirements
|
||||||
- Each image must accurately reflect the specific content and emotion of the corresponding narration
|
- Each image must accurately reflect the specific content and emotion of the corresponding narration
|
||||||
- Use symbolic techniques to visualize abstract concepts (e.g., use paths to represent life choices, chains to represent constraints, etc.)
|
- **Prioritize core visual metaphors** - the main visual elements must embody the narration's key message
|
||||||
|
- Use symbolic techniques to visualize abstract concepts (e.g., paths=choices, chains=constraints, open doors=opportunities)
|
||||||
- Scenes should express rich emotions and actions to enhance visual impact
|
- Scenes should express rich emotions and actions to enhance visual impact
|
||||||
- Highlight themes through composition and element arrangement, avoid overly literal representations
|
|
||||||
|
|
||||||
## Key English Vocabulary Reference
|
## Visual and Narration Coordination Principles (Most Important)
|
||||||
- Symbolic elements: symbolic elements
|
- **Direct semantic connection**: The main visual elements MUST represent the narration's core meaning
|
||||||
- Expression: expression / facial expression
|
- **Avoid decorative scenes**: Don't add unrelated beautiful scenery that doesn't support the message
|
||||||
- Action: action / gesture / movement
|
- **Ask yourself**: If someone saw only the image, could they guess what the narration is about?
|
||||||
- Scene: scene / setting
|
- **Test question**: What is the ONE THING this narration is about? Make sure that thing is visible in the image.
|
||||||
- Atmosphere: atmosphere / mood
|
|
||||||
|
|
||||||
## Visual and Copy Coordination Principles
|
|
||||||
- Images should serve the copy, becoming a visual extension of the copy content
|
|
||||||
- Avoid visual elements unrelated to or contradicting the copy content
|
|
||||||
- Choose visual presentation methods that best enhance the persuasiveness of the copy
|
|
||||||
- Ensure the audience can quickly understand the core viewpoint of the copy through images
|
|
||||||
|
|
||||||
## Creative Guidance
|
## Creative Guidance
|
||||||
1. **Phenomenon Description Copy**: Use intuitive scenes to represent social phenomena
|
1. **Phenomenon Description Copy**: Use intuitive scenes to represent social phenomena
|
||||||
@@ -97,8 +103,8 @@ Strictly output in the following JSON format, **image prompts must be in English
|
|||||||
```json
|
```json
|
||||||
{{
|
{{
|
||||||
"image_prompts": [
|
"image_prompts": [
|
||||||
"[detailed English image prompt following the style requirements]",
|
"[Core imagery visible] + [Scene with semantic connection to narration] + [Character/action reflecting the message] + [Emotional atmosphere]",
|
||||||
"[detailed English image prompt following the style requirements]"
|
"[Next image prompt following the same structure]"
|
||||||
]
|
]
|
||||||
}}
|
}}
|
||||||
```
|
```
|
||||||
@@ -109,14 +115,15 @@ Strictly output in the following JSON format, **image prompts must be in English
|
|||||||
3. Input is {{"narrations": [narration array]}} format, output is {{"image_prompts": [image prompt array]}} format
|
3. Input is {{"narrations": [narration array]}} format, output is {{"image_prompts": [image prompt array]}} format
|
||||||
4. **The output image_prompts array must contain exactly {narrations_count} elements, corresponding one-to-one with the input narrations array**
|
4. **The output image_prompts array must contain exactly {narrations_count} elements, corresponding one-to-one with the input narrations array**
|
||||||
5. **Image prompts must use English** (for AI image generation models)
|
5. **Image prompts must use English** (for AI image generation models)
|
||||||
6. Image prompts must accurately reflect the specific content and emotion of the corresponding narration
|
6. **⭐ Most Critical: Each image prompt must have DIRECT semantic relevance to its narration**
|
||||||
7. Each image must be creative and visually impactful, avoid being monotonous
|
7. Before writing each prompt, mentally extract the core visual metaphor from the narration
|
||||||
8. Ensure visual scenes can enhance the persuasiveness of the copy and audience understanding
|
8. Verify: Could someone understand the narration's message from the image alone?
|
||||||
|
|
||||||
Now, please create {narrations_count} corresponding **English** image prompts for the above {narrations_count} narrations. Only output JSON, no other content.
|
Now, please create {narrations_count} corresponding **English** image prompts for the above {narrations_count} narrations. Only output JSON, no other content.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
def build_image_prompt_prompt(
|
def build_image_prompt_prompt(
|
||||||
narrations: List[str],
|
narrations: List[str],
|
||||||
min_words: int,
|
min_words: int,
|
||||||
|
|||||||
Reference in New Issue
Block a user