Files
clawdbot/docs/images.md
2025-11-25 23:43:57 +01:00

5.9 KiB
Raw Blame History

Image Support Specification — 2025-11-25

This document defines how warelay should handle sending and replying with images across both providers. It is intentionally implementation-ready and keeps the UX consistent with existing CLI patterns and Tailscale Funnel usage.

Goals

  • Allow sending an image with an optional caption via warelay send for both providers.
  • Allow auto-replies (Twilio webhook, Twilio poller, Web inbox) to return an image (optionally with text) when configured.
  • For the Web provider, also support audio/voice, video, and generic documents with sensible per-type limits.
  • Keep the “one command at a time” queue intact; media fetch/serve must not block other replies longer than necessary.
  • Avoid introducing new external services: reuse the existing Tailscale Funnel port to host media for Twilio.

CLI & Config Surface

  • warelay send --media <path-or-url> [--message <caption>] [--provider twilio|web]
    • --media optional; --message remains required for now (caption can be empty string to send only media).
    • --dry-run prints the resolved payload including hosted URL (twilio) or file path (web).
    • --json emits { provider, to, sid/messageId, mediaUrl, caption }.
  • Config auto-reply (~/.warelay/warelay.json):
    • Add inbound.reply.mediaUrl?: string (templated like reply.text).
    • Return shape from getReplyFromConfig becomes { text?: string; mediaUrl?: string }.
    • Both text and mediaUrl optional; at least one must be present to send a reply.

Provider Behavior

Web (Baileys)

  • Input: local file path or HTTP(S) URL.
  • Flow: load into Buffer, detect media kind, and apply the right payload:
    • Images: resize + recompress to JPEG (max side 2048px, quality step-down) to fit under inbound.reply.mediaMaxMb (default 5MB) but never above the Web hard cap (6MB).
    • Audio/voice and video: pass through up to 16MB; set ptt: true for audio to send as a voice note.
    • Everything else becomes a document with filename, up to 100MB.
  • Caption uses --message or reply.text; if caption is empty, send media-only.
  • Logging: non-verbose shows ↩️/ with caption; verbose includes (media, <bytes>B, <ms>ms fetch) and the local/remote path.

Twilio

  • Twilio API requires a public HTTPS MediaUrl; it will not accept local paths.
  • Hosting strategy: reuse the webhook/Funnel port.
  • When --media is a local path, copy to temp dir (~/.warelay/media/<uuid>), serve at /media/<uuid> on the existing Express app started for webhook, or spin up a short-lived server on demand for send.
    • MediaUrl = https://<tailnet-host>.ts.net/media/<uuid>.
    • Files auto-removed after TTL (default 2 minutes) or after first successful fetch (best-effort).
    • Enforce size limit 5MB (matches the media host guard); reject early with clear error.
  • If --media is already an HTTPS URL, pass through unchanged.
  • Fallback: if Funnel is not enabled (or host unknown) and a local path is provided, fail with guidance to run warelay webhook --ingress tailscale (or pass a URL instead).

Hosting/Server Details

  • Extend startWebhook Express app:
    • Static media route /media/:id reading from temp dir.
    • 404/410 if expired or missing.
    • Optional ?delete=1 to self-delete after fetch (used by Twilio fetch hook if we detect first hit).
  • Temp storage: ~/.warelay/media; cleaned on startup (remove files older than 15 minutes) and during TTL eviction.
  • Security: no directory listing; only UUID file names; CORS open (Twilio fetch); content-type derived from mime-types lookup by extension or content-type header on download, else application/octet-stream.

Auto-Reply Pipeline

  • getReplyFromConfig returns { text?, mediaUrl? }.
  • Webhook / Twilio poller:
    • If mediaUrl present, include mediaUrl in Twilio message payload; caption = text (may be empty).
    • If only text, behave as today.
  • Web inbox:
    • If mediaUrl present, fetch/resolve same as send (local path or URL), send via Baileys with caption.

Inbound Media to Commands (Claude etc.)

  • For completeness: when inbound Twilio/Web messages include media, download to temp file, expose templating variables:
    • {{MediaUrl}} original URL (Twilio) or pseudo-URL (web).
    • {{MediaPath}} local temp path written before running the command.
  • Size guard: only download if ≤5MB; else skip and log (aligns with the temp media store limit).
  • Audio/voice notes: if you set inbound.transcribeAudio.command, warelay will run that CLI (templated with {{MediaPath}}) and replace Body with the transcript before continuing the reply flow; verbose logs indicate when transcription runs. The command prompt includes the original media path plus a Transcript: section so the model sees both.

Errors & Messaging

  • Local path with twilio + Funnel disabled → error: “Twilio media needs a public URL; start warelay webhook --ingress tailscale or pass an https:// URL.”
  • File too large → error mentions the applicable cap (5MB for Twilio host, 6/16/100MB for Web image/audio-video/doc respectively).
  • Download failure for web provider → “Failed to load media from ; skipping send.”

Tests to Add

  • Twilio: dry-run shows hosted URL; send payload includes mediaUrl; rejects when Funnel host missing.
  • Web: local path sends image (mock Baileys buffer assertion).
  • Config: zod allows mediaUrl, returns combined object; command auto-reply handles text+media, media-only.
  • Media server: serves file, enforces TTL, returns 404 after cleanup.

Open Decisions (confirm before coding)

  • TTL for temp media (proposal: 2 minutes, cleanup at start + interval).
  • One-file-per-send vs. batching: default to one-file-per-send; multi-attach not supported.
  • Should warelay send --provider twilio --media implicitly start the media server (even if webhook not running), or require warelay webhook already active? (Proposal: auto-start lightweight server on demand, auto-stop after media is fetched or TTL.)