feat: add deepgram audio options

This commit is contained in:
Peter Steinberger
2026-01-17 08:50:28 +00:00
parent e637bbdfb5
commit ae6792522d
9 changed files with 110 additions and 3 deletions

View File

@@ -80,6 +80,7 @@ read_when:
- Provider auth follows the standard model auth order (auth profiles, env vars, `models.providers.*.apiKey`).
- Deepgram picks up `DEEPGRAM_API_KEY` when `provider: "deepgram"` is used.
- Deepgram setup details: [Deepgram (audio transcription)](/providers/deepgram).
- Audio providers can override `baseUrl`/`headers` via `tools.media.audio`.
- Default size cap is 20MB (`tools.media.audio.maxBytes`). Oversize audio is skipped for that model and the next entry is tried.
- Default `maxChars` for audio is **unset** (full transcript). Set `tools.media.audio.maxChars` or per-entry `maxChars` to trim output.
- Use `tools.media.audio.attachments` to process multiple voice notes (`mode: "all"` + `maxAttachments`).

View File

@@ -32,6 +32,8 @@ If understanding fails or is disabled, **the reply flow continues** with the ori
- `tools.media.models`: shared model list (use `capabilities` to gate).
- `tools.media.image` / `tools.media.audio` / `tools.media.video`:
- defaults (`prompt`, `maxChars`, `maxBytes`, `timeoutSeconds`, `language`)
- provider overrides (`baseUrl`, `headers`)
- Deepgram audio options (`deepgram` in `tools.media.audio`)
- optional **percapability `models` list** (preferred before shared models)
- `attachments` policy (`mode`, `maxAttachments`, `prefer`)
- `scope` (optional gating by channel/chatType/session key)

View File

@@ -41,6 +41,9 @@ DEEPGRAM_API_KEY=dg_...
- `model`: Deepgram model id (default: `nova-3`)
- `language`: language hint (optional)
- `tools.media.audio.deepgram.detectLanguage`: enable language detection (optional)
- `tools.media.audio.deepgram.punctuate`: enable punctuation (optional)
- `tools.media.audio.deepgram.smartFormat`: enable smart formatting (optional)
Example with language:
```json5
@@ -58,7 +61,27 @@ Example with language:
}
```
Example with Deepgram options:
```json5
{
tools: {
media: {
audio: {
enabled: true,
deepgram: {
detectLanguage: true,
punctuate: true,
smartFormat: true
},
models: [{ provider: "deepgram", model: "nova-3" }]
}
}
}
}
```
## Notes
- Authentication follows the standard provider auth order; `DEEPGRAM_API_KEY` is the simplest path.
- Override endpoints or headers with `tools.media.audio.baseUrl` and `tools.media.audio.headers` when using a proxy.
- Output follows the same audio rules as other providers (size caps, timeouts, transcript injection).