refactor: unify media understanding pipeline
This commit is contained in:
@@ -6,7 +6,7 @@ read_when:
|
||||
# Audio / Voice Notes — 2026-01-17
|
||||
|
||||
## What works
|
||||
- **Media understanding (audio)**: If `tools.media.audio` is enabled and has `models`, Clawdbot:
|
||||
- **Media understanding (audio)**: If `tools.media.audio` is enabled (or a shared `tools.media.models` entry supports audio), Clawdbot:
|
||||
1) Locates the first audio attachment (local path or URL) and downloads it if needed.
|
||||
2) Enforces `maxBytes` before sending to each model entry.
|
||||
3) Runs the first eligible model entry in order (provider or CLI).
|
||||
@@ -66,6 +66,7 @@ read_when:
|
||||
- Provider auth follows the standard model auth order (auth profiles, env vars, `models.providers.*.apiKey`).
|
||||
- Default size cap is 20MB (`tools.media.audio.maxBytes`). Oversize audio is skipped for that model and the next entry is tried.
|
||||
- Default `maxChars` for audio is **unset** (full transcript). Set `tools.media.audio.maxChars` or per-entry `maxChars` to trim output.
|
||||
- Use `tools.media.audio.attachments` to process multiple voice notes (`mode: "all"` + `maxAttachments`).
|
||||
- Transcript is available to templates as `{{Transcript}}`.
|
||||
- CLI stdout is capped (5MB); keep CLI output concise.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user