2.0 KiB
2.0 KiB
Audio / Voice Notes — 2025-12-05
What works
- Optional transcription: If
inbound.transcribeAudio.commandis set in~/.clawdis/clawdis.json, CLAWDIS will:- Download inbound audio to a temp path when WhatsApp only provides a URL.
- Run the configured CLI (templated with
{{MediaPath}}), expecting transcript on stdout. - Replace
Bodywith the transcript, set{{Transcript}}, and prepend the original media path plus aTranscript:section in the command prompt so models see both. - Continue through the normal auto-reply pipeline (templating, sessions, Pi command).
- Verbose logging: In
--verbose, we log when transcription runs and when the transcript replaces the body.
Config example (OpenAI Whisper CLI)
Requires OPENAI_API_KEY in env and openai CLI installed:
{
inbound: {
transcribeAudio: {
command: [
"openai",
"api",
"audio.transcriptions.create",
"-m",
"whisper-1",
"-f",
"{{MediaPath}}",
"--response-format",
"text"
],
timeoutSeconds: 45
},
reply: {
mode: "command",
command: ["pi", "{{Body}}"],
agent: { kind: "pi" }
}
}
}
Notes & limits
- We don’t ship a transcriber; you opt in with any CLI that prints text to stdout (Whisper cloud, whisper.cpp, vosk, Deepgram, etc.).
- Size guard: inbound audio must be ≤5 MB (matches the temp media store and transcript pipeline).
- Outbound caps: web send supports audio/voice up to 16 MB (sent as a voice note with
ptt: true). - If transcription fails, we fall back to the original body/media note; replies still go through.
- Transcript is available to templates as
{{Transcript}}; models get both the media path and aTranscript:block in the prompt when using command mode.
Gotchas
- Ensure your CLI exits 0 and prints plain text; JSON needs to be massaged via
jq -r .text. - Keep timeouts reasonable (
timeoutSeconds, default 45s) to avoid blocking the reply queue.