Files
clawdbot/skills/thebuilders-v2/LESSONS.md

198 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Lessons Learned - The Builders v2
Post-mortem from the OpenAI episode generation session (Jan 6, 2026).
## Issues & Fixes
### 1. Wrong Model Used
**Issue:** Used GPT-4o instead of GPT-5.2 as requested
**Result:** 47 turns instead of 120+
**Fix:** Always confirm model before running. Add `-m` flag to CLI.
### 2. Script Too Short in Single Pass
**Issue:** Even GPT-5.2 only generated 90 turns when asked for 120+
**Result:** Episode too shallow (7 min instead of 25 min)
**Fix:** **Section-by-section generation**. LLMs follow length instructions better on smaller chunks with explicit turn targets.
### 3. TTS Chunked by Size, Not by Speaker
**Issue:** Audio chunks alternated voices randomly instead of per speaker
**Result:** Sounded like a confused monologue
**Fix:** Parse script by `<Person1>`/`<Person2>` tags, assign consistent voice per speaker.
### 4. v3 Style Guide Not Integrated
**Issue:** The Acquired Bible existed but wasn't in the prompts
**Result:** Generic dialogue, not Acquired-style discovery
**Fix:** Include the Bible in EVERY section prompt. The patterns matter:
- Interruptions ("Wait—", "Hold on—")
- Discovery moments ("That's insane", "Wait, really?")
- Distinct perspectives (Maya=technical, James=strategy)
### 5. No Intro/Greeting
**Issue:** Jumped straight into hook with no "Welcome to..."
**Result:** Felt robotic, not like real hosts
**Fix:** Add INTRO section (8 turns) with:
- Welcome
- Casual banter
- "Today we're covering X"
- Tease the story
### 6. Voices Too Similar
**Issue:** Used nova/onyx which sound somewhat similar
**Result:** Hard to distinguish hosts
**Fix:** Use more distinct voices: **alloy** (female) + **echo** (male)
## What Works
### Section-by-Section
Each section has explicit turn targets:
```
INTRO: 8 turns
HOOK: 12 turns
ORIGIN: 20 turns
INFLECTION1: 18 turns
INFLECTION2: 18 turns
MESSY_MIDDLE: 14 turns
NOW: 12 turns
TAKEAWAYS: 10 turns
─────────────────────
Total: 112 turns
```
### Speaker-Aware TTS
```javascript
// WRONG - chunks by size
for (chunk of script.split(3500)) {
voice = i % 2 === 0 ? 'nova' : 'onyx';
}
// RIGHT - parses by speaker
for (turn of script.matchAll(/<(Person[12])>(.+?)<\/Person[12]>/)) {
voice = turn[1] === 'Person1' ? 'alloy' : 'echo';
}
```
### Bible in Every Prompt
The `acquired-bible.md` template contains:
- Host personalities
- Conversation patterns
- Language to use
- What NOT to do
Including it in every section prompt ensures consistency.
## Pipeline Summary
```
1. Gemini Deep Research (~5 min)
2. Hook Generation (~15s)
3. Section Generation (7 sections × ~20s = ~2.5 min)
4. Speaker-Aware TTS (~45s for 112 turns)
5. FFmpeg Merge (~2s)
────────────────────────────────────
Total: ~8-10 min for 20-25 min episode
```
### 7. Facts Repeated Across Sections
**Issue:** Same facts (hot dog $1.50, renewal rate 93.3%, etc.) repeated 10-20 times
**Costco example:** Hot dog mentioned 19×, renewal rate 20×, chicken 15×
**Root cause:** Each section generated independently with same research context
**Fix:** **Deduplication system**
- Extract key facts after each section
- Pass "DO NOT REPEAT" list to subsequent sections
- Track up to 100 facts across sections
### 8. Process Dies Mid-Generation
**Issue:** Long-running generation killed by OOM or gateway timeout
**Result:** Lost 30+ minutes of work, had to restart
**Fix:** **Checkpoint system**
- Save each section immediately after generation
- Save state (usedFacts, prevContext) to JSON
- On restart, detect checkpoint and resume from last section
- Location: `<output>/checkpoints/`
## What Works
### Checkpoint System
```
<output>/checkpoints/
├── section-0.txt # INTRO
├── section-1.txt # HOOK
├── ...
└── state.json # { completedSections: 5, usedFacts: [...] }
```
If process dies at section 7, re-run same command → resumes from section 7.
### Deduplication
```
Section 0: Extract 17 facts → pass to Section 1 as "DO NOT USE"
Section 1: Extract 16 facts → 33 total blocked
Section 2: Extract 14 facts → 47 total blocked
...
```
Result: 100 unique facts tracked, no repetition.
### Section-by-Section
Each section has explicit turn targets:
```
INTRO: 8 turns
HOOK: 12 turns
ORIGIN: 20 turns
INFLECTION1: 18 turns
INFLECTION2: 18 turns
MESSY_MIDDLE: 14 turns
NOW: 12 turns
TAKEAWAYS: 10 turns
─────────────────────
Total: 112 turns
```
### Speaker-Aware TTS
```javascript
// WRONG - chunks by size
for (chunk of script.split(3500)) {
voice = i % 2 === 0 ? 'nova' : 'onyx';
}
// RIGHT - parses by speaker
for (turn of script.matchAll(/<(Person[12])>(.+?)<\/Person[12]>/)) {
voice = turn[1] === 'Person1' ? 'alloy' : 'echo';
}
```
### Bible in Every Prompt
The `acquired-bible.md` template contains:
- Host personalities
- Conversation patterns
- Language to use
- What NOT to do
Including it in every section prompt ensures consistency.
## Pipeline Summary
```
1. Gemini Deep Research (~5 min)
2. Hook Generation (~15s)
3. Section Generation (11 sections × ~25s = ~4.5 min) [with checkpoints]
4. Speaker-Aware TTS (~2 min for 250+ turns)
5. FFmpeg Merge (~2s)
────────────────────────────────────
Total: ~12-15 min for 45-60 min deep dive
~8-10 min for 20 min quick dive
```
## Checklist for Future Episodes
- [ ] Confirm model (gpt-5.2 or better)
- [ ] Run deep research first
- [ ] Generate section-by-section
- [ ] Include Bible in all prompts
- [ ] Parse by speaker tags for TTS
- [ ] Use alloy + echo voices
- [ ] Verify intro section exists
- [ ] Listen to first 30s to check voice distinction
- [ ] Verify no repeated facts (deduplication working)
- [ ] If process dies, just re-run same command (checkpoint resume)