feat: add Deepgram audio transcription

Co-authored-by: Safzan Pirani <safzanpirani@users.noreply.github.com>
2026-01-17 08:46:40 +00:00
parent 869ef0c5ba
commit e637bbdfb5
12 changed files with 303 additions and 2 deletions
--- a/docs/nodes/audio.md
+++ b/docs/nodes/audio.md
@@ -62,8 +62,24 @@ read_when:
 }
 ```

+### Provider-only (Deepgram)
+```json5
+{
+  tools: {
+    media: {
+      audio: {
+        enabled: true,
+        models: [{ provider: "deepgram", model: "nova-3" }]
+      }
+    }
+  }
+}
+```
+
 ## Notes & limits
 - Provider auth follows the standard model auth order (auth profiles, env vars, `models.providers.*.apiKey`).
+- Deepgram picks up `DEEPGRAM_API_KEY` when `provider: "deepgram"` is used.
+- Deepgram setup details: [Deepgram (audio transcription)](/providers/deepgram).
 - Default size cap is 20MB (`tools.media.audio.maxBytes`). Oversize audio is skipped for that model and the next entry is tried.
 - Default `maxChars` for audio is **unset** (full transcript). Set `tools.media.audio.maxChars` or per-entry `maxChars` to trim output.
 - Use `tools.media.audio.attachments` to process multiple voice notes (`mode: "all"` + `maxAttachments`).
--- a/docs/nodes/media-understanding.md
+++ b/docs/nodes/media-understanding.md
@@ -108,6 +108,7 @@ lists, Clawdbot can infer defaults:
 - `openai`, `anthropic`, `minimax`: **image**
 - `google` (Gemini API): **image + audio + video**
 - `groq`: **audio**
+- `deepgram`: **audio**

 For CLI entries, **set `capabilities` explicitly** to avoid surprising matches.
 If you omit `capabilities`, the entry is eligible for the list it appears in.
@@ -116,7 +117,7 @@ If you omit `capabilities`, the entry is eligible for the list it appears in.
 | Capability | Provider integration | Notes |
 |------------|----------------------|-------|
 | Image | OpenAI / Anthropic / Google / others via `pi-ai` | Any image-capable model in the registry works. |
-| Audio | OpenAI, Groq | Provider transcription (Whisper). |
+| Audio | OpenAI, Groq, Deepgram | Provider transcription (Whisper/Deepgram). |
 | Video | Google (Gemini API) | Provider video understanding. |

 ## Recommended providers
@@ -125,8 +126,9 @@ If you omit `capabilities`, the entry is eligible for the list it appears in.
 - Good defaults: `openai/gpt-5.2`, `anthropic/claude-opus-4-5`, `google/gemini-3-pro-preview`.

 **Audio**
- `openai/whisper-1` or `groq/whisper-large-v3-turbo`.
+- `openai/whisper-1`, `groq/whisper-large-v3-turbo`, or `deepgram/nova-3`.
 - CLI fallback: `whisper` binary.
+- Deepgram setup: [Deepgram (audio transcription)](/providers/deepgram).

 **Video**
 - `google/gemini-3-flash-preview` (fast), `google/gemini-3-pro-preview` (richer).