diff --git a/CHANGELOG.md b/CHANGELOG.md index c91af534e..a105657ea 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,7 @@ ### Pending - Web auto-replies now resize/recompress media and honor `inbound.reply.mediaMaxMb` in `~/.warelay/warelay.json` (default 5 MB) to avoid provider/API limits. +- Web provider now detects media kind (image/audio/video/document), logs the source path, and enforces provider caps: images ≤6 MB, audio/video ≤16 MB, documents ≤100 MB; images still target the configurable cap above with resize + JPEG recompress. - Optional voice-note transcription: set `inbound.transcribeAudio.command` (e.g., OpenAI Whisper CLI) to turn inbound audio into text before templating/Claude; verbose logs surface when transcription runs. Prompts now include the original media path plus a `Transcript:` block so models see both. ## 1.0.4 — 2025-11-25 diff --git a/README.md b/README.md index 49bbfbce0..4e143d4a4 100644 --- a/README.md +++ b/README.md @@ -46,10 +46,10 @@ Install from npm (global): `npm install -g warelay` (Node 22+). Then choose **on | `warelay webhook` | Run inbound webhook (`ingress=tailscale` updates Twilio; `none` is local-only) | `--ingress tailscale\|none` `--port ` `--path ` `--reply ` `--verbose` `--yes` `--dry-run` | | `warelay login` | Link personal WhatsApp Web via QR | `--verbose` | -### Sending images -- Twilio: `warelay send --to +1... --message "Hi" --media ./pic.jpg --serve-media` (needs `warelay webhook --ingress tailscale` or `--serve-media` to auto-host via Funnel; max 5 MB). -- Web: `warelay send --provider web --media ./pic.jpg --message "Hi"` (local path or URL; no hosting needed). -- Auto-replies can attach `mediaUrl` in `~/.warelay/warelay.json` (used alongside `text` when present). Web auto-replies now auto-resize/recompress images and cap size by config: set `inbound.reply.mediaMaxMb` (default 5) to control the post-compression limit; images are resized (max side 2048px) and JPEG-compressed to fit. +### Sending media +- Twilio: `warelay send --to +1... --message "Hi" --media ./pic.jpg --serve-media` (needs `warelay webhook --ingress tailscale` or `--serve-media` to auto-host via Funnel; max 5 MB per file because of the built-in host). +- Web: `warelay send --provider web --media ./pic.jpg --message "Hi"` (local path or URL; no hosting needed). Web auto-detects media kind: images (≤6 MB), audio/voice or video (≤16 MB), other docs (≤100 MB). Images are resized to max 2048px and JPEG recompressed when the cap would be exceeded. +- Auto-replies can attach `mediaUrl` in `~/.warelay/warelay.json` (used alongside `text` when present). Web auto-replies honor `inbound.reply.mediaMaxMb` (default 5 MB) as a post-compression target but will never exceed the provider hard limits above. ### Voice notes (optional transcription) - If you set `inbound.transcribeAudio.command`, warelay will run that CLI when inbound audio arrives (e.g., WhatsApp voice notes) and replace the Body with the transcript before templating/Claude. diff --git a/docs/audio.md b/docs/audio.md index c92e93cf7..1efcb5336 100644 --- a/docs/audio.md +++ b/docs/audio.md @@ -37,7 +37,8 @@ Requires `OPENAI_API_KEY` in env and `openai` CLI installed: ## Notes & limits - We don’t ship a transcriber; you opt in with any CLI that prints text to stdout (Whisper cloud, whisper.cpp, vosk, Deepgram, etc.). -- Size guard: inbound audio must be ≤5 MB (same as other media). +- Size guard: inbound audio must be ≤5 MB (matches the temp media store and transcript pipeline). +- Outbound caps: Web can send audio/voice up to 16 MB (sends as a voice note with `ptt: true`); Twilio still uses the 5 MB media host guard. - If transcription fails, we fall back to the original body/media note; replies still go through. - Transcript is available to templates as `{{Transcript}}`; models get both the media path and a `Transcript:` block in the prompt when using command mode. diff --git a/docs/claude-config.md b/docs/claude-config.md index be94e1475..63e3f612e 100644 --- a/docs/claude-config.md +++ b/docs/claude-config.md @@ -58,7 +58,8 @@ Notes on this configuration: - To send an image from Claude, include a line like `MEDIA:https://example.com/pic.jpg` in the output. warelay will: - Host local paths for Twilio using the media server/Tailscale Funnel. - Send buffers directly for the Web provider. -- Inbound media is downloaded (≤5 MB) and exposed to your templates as `{{MediaPath}}`, `{{MediaUrl}}`, and `{{MediaType}}`. You can mention this in your prompt if you want Claude to reason about the attachment. Outbound media from Claude (via `MEDIA:`) is resized/recompressed on the Web provider path; control the cap with `inbound.reply.mediaMaxMb` (default 5). +- Inbound media is downloaded (≤5 MB) and exposed to your templates as `{{MediaPath}}`, `{{MediaUrl}}`, and `{{MediaType}}`. You can mention this in your prompt if you want Claude to reason about the attachment. +- Outbound media from Claude (via `MEDIA:`) follows provider caps: Web resizes images to the configured target (`inbound.reply.mediaMaxMb`, default 5 MB) within hard limits of 6 MB (image), 16 MB (audio/video voice notes), and 100 MB (documents); Twilio still uses the Funnel host with a 5 MB guard. - Voice notes: set `inbound.transcribeAudio.command` to run a CLI that emits the transcript to stdout (e.g., OpenAI Whisper: `openai api audio.transcriptions.create -m whisper-1 -f {{MediaPath}} --response-format text`). If it succeeds, warelay replaces `Body` with the transcript and adds the original media path plus a `Transcript:` block into the prompt before invoking Claude. ## Testing the setup diff --git a/docs/images.md b/docs/images.md index 6a622cb30..59d982c77 100644 --- a/docs/images.md +++ b/docs/images.md @@ -5,6 +5,7 @@ This document defines how `warelay` should handle sending and replying with imag ## Goals - Allow sending an image with an optional caption via `warelay send` for both providers. - Allow auto-replies (Twilio webhook, Twilio poller, Web inbox) to return an image (optionally with text) when configured. +- For the Web provider, also support audio/voice, video, and generic documents with sensible per-type limits. - Keep the “one command at a time” queue intact; media fetch/serve must not block other replies longer than necessary. - Avoid introducing new external services: reuse the existing Tailscale Funnel port to host media for Twilio. @@ -21,10 +22,12 @@ This document defines how `warelay` should handle sending and replying with imag ## Provider Behavior ### Web (Baileys) - Input: local file path **or** HTTP(S) URL. -- Flow: load into Buffer, **resize + recompress to JPEG** (max side 2048px, quality step-down) to fit under a configurable cap, then send via `sock.sendMessage(jid, { image: buffer, caption })`. -- Size cap: default 5 MB; override with `inbound.reply.mediaMaxMb` in `~/.warelay/warelay.json`. +- Flow: load into Buffer, detect media kind, and apply the right payload: + - Images: **resize + recompress to JPEG** (max side 2048px, quality step-down) to fit under `inbound.reply.mediaMaxMb` (default 5 MB) but never above the Web hard cap (6 MB). + - Audio/voice and video: pass through up to 16 MB; set `ptt: true` for audio to send as a voice note. + - Everything else becomes a document with filename, up to 100 MB. - Caption uses `--message` or `reply.text`; if caption is empty, send media-only. -- Logging: non-verbose shows `↩️`/`✅` with caption; verbose includes `(media, B, ms fetch)`. +- Logging: non-verbose shows `↩️`/`✅` with caption; verbose includes `(media, B, ms fetch)` and the local/remote path. ### Twilio - Twilio API requires a public HTTPS `MediaUrl`; it will not accept local paths. @@ -32,7 +35,7 @@ This document defines how `warelay` should handle sending and replying with imag - When `--media` is a local path, copy to temp dir (`~/.warelay/media/`), serve at `/media/` on the existing Express app started for webhook, or spin up a short-lived server on demand for `send`. - `MediaUrl` = `https://.ts.net/media/`. - Files auto-removed after TTL (default 2 minutes) or after first successful fetch (best-effort). - - Enforce size limit 5 MB; reject early with clear error. + - Enforce size limit 5 MB (matches the media host guard); reject early with clear error. - If `--media` is already an HTTPS URL, pass through unchanged. - Fallback: if Funnel is not enabled (or host unknown) and a local path is provided, fail with guidance to run `warelay webhook --ingress tailscale` (or pass a URL instead). @@ -56,12 +59,12 @@ This document defines how `warelay` should handle sending and replying with imag - For completeness: when inbound Twilio/Web messages include media, download to temp file, expose templating variables: - `{{MediaUrl}}` original URL (Twilio) or pseudo-URL (web). - `{{MediaPath}}` local temp path written before running the command. -- Size guard: only download if ≤5 MB; else skip and log. +- Size guard: only download if ≤5 MB; else skip and log (aligns with the temp media store limit). - Audio/voice notes: if you set `inbound.transcribeAudio.command`, warelay will run that CLI (templated with `{{MediaPath}}`) and replace `Body` with the transcript before continuing the reply flow; verbose logs indicate when transcription runs. The command prompt includes the original media path plus a `Transcript:` section so the model sees both. ## Errors & Messaging - Local path with twilio + Funnel disabled → error: “Twilio media needs a public URL; start `warelay webhook --ingress tailscale` or pass an https:// URL.” -- File too large (>5 MB) → “Media exceeds 5 MB limit; resize or host elsewhere.” +- File too large → error mentions the applicable cap (5 MB for Twilio host, 6/16/100 MB for Web image/audio-video/doc respectively). - Download failure for web provider → “Failed to load media from ; skipping send.” ## Tests to Add diff --git a/src/provider-web.test.ts b/src/provider-web.test.ts index c06b309b0..4ef875524 100644 --- a/src/provider-web.test.ts +++ b/src/provider-web.test.ts @@ -423,7 +423,7 @@ describe("provider-web", () => { sendMedia, }); - expect(sendMedia).toHaveBeenCalled(); + expect(sendMedia).toHaveBeenCalledTimes(1); expect(reply).toHaveBeenCalledWith("hi"); fetchMock.mockRestore(); }); @@ -501,42 +501,42 @@ describe("provider-web", () => { fetchMock.mockRestore(); }); - it( - "compresses common formats to jpeg under the cap", - { timeout: 15_000 }, - async () => { - const formats = [ - { - name: "png", - mime: "image/png", - make: (buf: Buffer, opts: { width: number; height: number }) => - sharp(buf, { - raw: { width: opts.width, height: opts.height, channels: 3 }, - }) - .png({ compressionLevel: 0 }) - .toBuffer(), - }, - { - name: "jpeg", - mime: "image/jpeg", - make: (buf: Buffer, opts: { width: number; height: number }) => - sharp(buf, { - raw: { width: opts.width, height: opts.height, channels: 3 }, - }) - .jpeg({ quality: 100, chromaSubsampling: "4:4:4" }) - .toBuffer(), - }, - { - name: "webp", - mime: "image/webp", - make: (buf: Buffer, opts: { width: number; height: number }) => - sharp(buf, { - raw: { width: opts.width, height: opts.height, channels: 3 }, - }) - .webp({ quality: 100 }) - .toBuffer(), - }, - ] as const; + it( + "compresses common formats to jpeg under the cap", + { timeout: 15_000 }, + async () => { + const formats = [ + { + name: "png", + mime: "image/png", + make: (buf: Buffer, opts: { width: number; height: number }) => + sharp(buf, { + raw: { width: opts.width, height: opts.height, channels: 3 }, + }) + .png({ compressionLevel: 0 }) + .toBuffer(), + }, + { + name: "jpeg", + mime: "image/jpeg", + make: (buf: Buffer, opts: { width: number; height: number }) => + sharp(buf, { + raw: { width: opts.width, height: opts.height, channels: 3 }, + }) + .jpeg({ quality: 100, chromaSubsampling: "4:4:4" }) + .toBuffer(), + }, + { + name: "webp", + mime: "image/webp", + make: (buf: Buffer, opts: { width: number; height: number }) => + sharp(buf, { + raw: { width: opts.width, height: opts.height, channels: 3 }, + }) + .webp({ quality: 100 }) + .toBuffer(), + }, + ] as const; for (const fmt of formats) { // Force a small cap to ensure compression is exercised for every format.