Files
clawdbot/docs/providers/deepgram.md
2026-01-17 08:53:42 +00:00

2.0 KiB

summary, read_when
summary read_when
Deepgram transcription for inbound voice notes
You want Deepgram speech-to-text for audio attachments
You need a quick Deepgram config example

Deepgram (Audio Transcription)

Deepgram is a speech-to-text API. In Clawdbot it is used for inbound audio/voice note transcription via tools.media.audio.

When enabled, Clawdbot uploads the audio file to Deepgram and injects the transcript into the reply pipeline ({{Transcript}} + [Audio] block). This is not streaming; it uses the pre-recorded transcription endpoint.

Website: https://deepgram.com
Docs: https://developers.deepgram.com

Quick start

  1. Set your API key:
DEEPGRAM_API_KEY=dg_...
  1. Enable the provider:
{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [{ provider: "deepgram", model: "nova-3" }]
      }
    }
  }
}

Options

  • model: Deepgram model id (default: nova-3)
  • language: language hint (optional)
  • tools.media.audio.deepgram.detectLanguage: enable language detection (optional)
  • tools.media.audio.deepgram.punctuate: enable punctuation (optional)
  • tools.media.audio.deepgram.smartFormat: enable smart formatting (optional)

Example with language:

{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [
          { provider: "deepgram", model: "nova-3", language: "en" }
        ]
      }
    }
  }
}

Example with Deepgram options:

{
  tools: {
    media: {
      audio: {
        enabled: true,
        deepgram: {
          detectLanguage: true,
          punctuate: true,
          smartFormat: true
        },
        models: [{ provider: "deepgram", model: "nova-3" }]
      }
    }
  }
}

Notes

  • Authentication follows the standard provider auth order; DEEPGRAM_API_KEY is the simplest path.
  • Override endpoints or headers with tools.media.audio.baseUrl and tools.media.audio.headers when using a proxy.
  • Output follows the same audio rules as other providers (size caps, timeouts, transcript injection).