let5see/clawdbot

Fork 0

Files

Elie Habib f85807a2a6 fix: add serveBaseUrl to compactEmbeddedPiSession params

2026-01-09 19:39:42 +01:00

5.9 KiB

Raw Blame History

Lessons Learned - The Builders v2

Post-mortem from the OpenAI episode generation session (Jan 6, 2026).

Issues & Fixes

1. Wrong Model Used

Issue: Used GPT-4o instead of GPT-5.2 as requested
Result: 47 turns instead of 120+
Fix: Always confirm model before running. Add -m flag to CLI.

2. Script Too Short in Single Pass

Issue: Even GPT-5.2 only generated 90 turns when asked for 120+
Result: Episode too shallow (7 min instead of 25 min)
Fix: Section-by-section generation. LLMs follow length instructions better on smaller chunks with explicit turn targets.

3. TTS Chunked by Size, Not by Speaker

Issue: Audio chunks alternated voices randomly instead of per speaker
Result: Sounded like a confused monologue
Fix: Parse script by <Person1>/<Person2> tags, assign consistent voice per speaker.

4. v3 Style Guide Not Integrated

Issue: The Acquired Bible existed but wasn't in the prompts
Result: Generic dialogue, not Acquired-style discovery
Fix: Include the Bible in EVERY section prompt. The patterns matter:

Interruptions ("Wait—", "Hold on—")
Discovery moments ("That's insane", "Wait, really?")
Distinct perspectives (Maya=technical, James=strategy)

5. No Intro/Greeting

Issue: Jumped straight into hook with no "Welcome to..."
Result: Felt robotic, not like real hosts
Fix: Add INTRO section (8 turns) with:

Welcome
Casual banter
"Today we're covering X"
Tease the story

6. Voices Too Similar

Issue: Used nova/onyx which sound somewhat similar
Result: Hard to distinguish hosts
Fix: Use more distinct voices: alloy (female) + echo (male)

What Works

Section-by-Section

Each section has explicit turn targets:

INTRO:        8 turns
HOOK:        12 turns
ORIGIN:      20 turns
INFLECTION1: 18 turns
INFLECTION2: 18 turns
MESSY_MIDDLE: 14 turns
NOW:         12 turns
TAKEAWAYS:   10 turns
─────────────────────
Total:      112 turns

Speaker-Aware TTS

// WRONG - chunks by size
for (chunk of script.split(3500)) {
  voice = i % 2 === 0 ? 'nova' : 'onyx';
}

// RIGHT - parses by speaker
for (turn of script.matchAll(/<(Person[12])>(.+?)<\/Person[12]>/)) {
  voice = turn[1] === 'Person1' ? 'alloy' : 'echo';
}

Bible in Every Prompt

The acquired-bible.md template contains:

Host personalities
Conversation patterns
Language to use
What NOT to do

Including it in every section prompt ensures consistency.

Pipeline Summary

1. Gemini Deep Research (~5 min)
2. Hook Generation (~15s)
3. Section Generation (7 sections × ~20s = ~2.5 min)
4. Speaker-Aware TTS (~45s for 112 turns)
5. FFmpeg Merge (~2s)
────────────────────────────────────
Total: ~8-10 min for 20-25 min episode

7. Facts Repeated Across Sections

Issue: Same facts (hot dog $1.50, renewal rate 93.3%, etc.) repeated 10-20 times
Costco example: Hot dog mentioned 19×, renewal rate 20×, chicken 15×
Root cause: Each section generated independently with same research context
Fix: Deduplication system

Extract key facts after each section
Pass "DO NOT REPEAT" list to subsequent sections
Track up to 100 facts across sections

8. Process Dies Mid-Generation

Issue: Long-running generation killed by OOM or gateway timeout
Result: Lost 30+ minutes of work, had to restart
Fix: Checkpoint system

Save each section immediately after generation
Save state (usedFacts, prevContext) to JSON
On restart, detect checkpoint and resume from last section
Location: <output>/checkpoints/

What Works

Checkpoint System

<output>/checkpoints/
├── section-0.txt   # INTRO
├── section-1.txt   # HOOK
├── ...
└── state.json      # { completedSections: 5, usedFacts: [...] }

If process dies at section 7, re-run same command → resumes from section 7.

Deduplication

Section 0: Extract 17 facts → pass to Section 1 as "DO NOT USE"
Section 1: Extract 16 facts → 33 total blocked
Section 2: Extract 14 facts → 47 total blocked
...

Result: 100 unique facts tracked, no repetition.

Section-by-Section

Each section has explicit turn targets:

INTRO:        8 turns
HOOK:        12 turns
ORIGIN:      20 turns
INFLECTION1: 18 turns
INFLECTION2: 18 turns
MESSY_MIDDLE: 14 turns
NOW:         12 turns
TAKEAWAYS:   10 turns
─────────────────────
Total:      112 turns

Speaker-Aware TTS

// WRONG - chunks by size
for (chunk of script.split(3500)) {
  voice = i % 2 === 0 ? 'nova' : 'onyx';
}

// RIGHT - parses by speaker
for (turn of script.matchAll(/<(Person[12])>(.+?)<\/Person[12]>/)) {
  voice = turn[1] === 'Person1' ? 'alloy' : 'echo';
}

Bible in Every Prompt

The acquired-bible.md template contains:

Host personalities
Conversation patterns
Language to use
What NOT to do

Including it in every section prompt ensures consistency.

Pipeline Summary

1. Gemini Deep Research (~5 min)
2. Hook Generation (~15s)
3. Section Generation (11 sections × ~25s = ~4.5 min) [with checkpoints]
4. Speaker-Aware TTS (~2 min for 250+ turns)
5. FFmpeg Merge (~2s)
────────────────────────────────────
Total: ~12-15 min for 45-60 min deep dive
       ~8-10 min for 20 min quick dive

Checklist for Future Episodes

Confirm model (gpt-5.2 or better)
Run deep research first
Generate section-by-section
Include Bible in all prompts
Parse by speaker tags for TTS
Use alloy + echo voices
Verify intro section exists
Listen to first 30s to check voice distinction
Verify no repeated facts (deduplication working)
If process dies, just re-run same command (checkpoint resume)

5.9 KiB Raw Blame History Unescape Escape

Lessons Learned - The Builders v2

Issues & Fixes

1. Wrong Model Used

2. Script Too Short in Single Pass

3. TTS Chunked by Size, Not by Speaker

4. v3 Style Guide Not Integrated

5. No Intro/Greeting

6. Voices Too Similar

What Works

Section-by-Section

Speaker-Aware TTS

Bible in Every Prompt

Pipeline Summary

7. Facts Repeated Across Sections

8. Process Dies Mid-Generation

What Works

Checkpoint System

Deduplication

Section-by-Section

Speaker-Aware TTS

Bible in Every Prompt

Pipeline Summary

Checklist for Future Episodes

5.9 KiB

Raw Blame History