docs: update for web-only pi rpc

2025-12-05 19:04:09 +00:00
parent 7c7314f673
commit c25b0c1a66
10 changed files with 54 additions and 100 deletions
--- a/docs/RELEASING.md
+++ b/docs/RELEASING.md
@@ -20,7 +20,7 @@ Use `pnpm` (Node 22+) from the repo root. Keep the working tree clean before tag
 - [ ] `pnpm lint`
 - [ ] `pnpm test` (or `pnpm test:coverage` if you need coverage output)
 - [ ] `pnpm run build` (last sanity check after tests)
- [ ] (Optional) Spot-check a Twilio/Web flow if your changes affect send/receive paths.
+- [ ] (Optional) Spot-check the web relay if your changes affect send/receive paths.

 5) **Publish**
 - [ ] Confirm git status is clean; commit and push as needed.
--- a/docs/audio.md
+++ b/docs/audio.md
@@ -1,8 +1,8 @@
-# Audio / Voice Notes — 2025-11-25
+# Audio / Voice Notes — 2025-12-05

 ## What works
- **Optional transcription**: If `inbound.transcribeAudio.command` is set in `~/.warelay/warelay.json`, warelay will:
-  1) Download inbound audio (Web or Twilio) to a temp path if only a URL is present.
+- **Optional transcription**: If `inbound.transcribeAudio.command` is set in `~/.clawdis/clawdis.json`, CLAWDIS will:
+  1) Download inbound audio to a temp path when WhatsApp only provides a URL.
  2) Run the configured CLI (templated with `{{MediaPath}}`), expecting transcript on stdout.
  3) Replace `Body` with the transcript, set `{{Transcript}}`, and prepend the original media path plus a `Transcript:` section in the command prompt so models see both.
  4) Continue through the normal auto-reply pipeline (templating, sessions, Pi command).
@@ -39,11 +39,10 @@ Requires `OPENAI_API_KEY` in env and `openai` CLI installed:
 ## Notes & limits
 - We don’t ship a transcriber; you opt in with any CLI that prints text to stdout (Whisper cloud, whisper.cpp, vosk, Deepgram, etc.).
 - Size guard: inbound audio must be ≤5 MB (matches the temp media store and transcript pipeline).
- Outbound caps: Web can send audio/voice up to 16 MB (sends as a voice note with `ptt: true`); Twilio still uses the 5 MB media host guard.
+- Outbound caps: web send supports audio/voice up to 16 MB (sent as a voice note with `ptt: true`).
 - If transcription fails, we fall back to the original body/media note; replies still go through.
 - Transcript is available to templates as `{{Transcript}}`; models get both the media path and a `Transcript:` block in the prompt when using command mode.

 ## Gotchas
 - Ensure your CLI exits 0 and prints plain text; JSON needs to be massaged via `jq -r .text`.
 - Keep timeouts reasonable (`timeoutSeconds`, default 45s) to avoid blocking the reply queue.
- Twilio paths are hosted URLs; Web paths are local. The temp download uses HTTPS for Twilio and a temp file for Web-only media.
--- a/docs/clawd.md
+++ b/docs/clawd.md
@@ -87,7 +87,7 @@ The magic is in the combination: WhatsApp's ubiquity + Claude's intelligence + w

 ## The Config That Powers Clawd

-This is the actual config running on @steipete's Mac (`~/.warelay/warelay.json`):
+This is the actual config running on @steipete's Mac (`~/.clawdis/clawdis.json`):

 ```json5
 {
--- a/docs/group-messages.md
+++ b/docs/group-messages.md
@@ -12,7 +12,7 @@ Goal: let Clawd sit in WhatsApp groups, wake up only when pinged, and keep that
 - New session primer: on the first turn of a group session we now prepend a short blurb to the model like `You are replying inside the WhatsApp group "<subject>". Group members: +44..., +43..., … Address the specific sender noted in the message context.` If metadata isn’t available we still tell the agent it’s a group chat.

 ## Config for Clawd UK (+447511247203)
-Add a `groupChat` block to `~/.warelay/warelay.json` so display-name pings work even when WhatsApp strips the visual `@` in the text body:
+Add a `groupChat` block to `~/.clawdis/clawdis.json` so display-name pings work even when WhatsApp strips the visual `@` in the text body:

 ```json5
 {
@@ -46,7 +46,7 @@ Notes:
 - Manual smoke:
  - Send an `@clawd` ping in the group and confirm a reply that references the sender name.
  - Send a second ping and verify the history block is included then cleared on the next turn.
-  - Check `/tmp/warelay/warelay.log` at level `trace` (run relay with `--verbose`) to see `inbound web message (batched)` entries showing `from: <groupJid>` and the `[from: …]` suffix.
+  - Check relay logs (run with `--verbose`) to see `inbound web message (batched)` entries showing `from: <groupJid>` and the `[from: …]` suffix.

 ## Known considerations
 - Heartbeats are intentionally skipped for groups to avoid noisy broadcasts.
--- a/docs/images.md
+++ b/docs/images.md
@@ -1,81 +1,44 @@
-# Image Support Specification — 2025-11-25
+# Image & Media Support — 2025-12-05

-This document defines how `warelay` should handle sending and replying with images across both providers. It is intentionally implementation-ready and keeps the UX consistent with existing CLI patterns and Tailscale Funnel usage.
+CLAWDIS is now **web-only** (Baileys). This document captures the current media handling rules for send, relay, and agent replies.

 ## Goals
- Allow sending an image with an optional caption via `warelay send` for both providers.
- Allow auto-replies (Twilio webhook, Twilio poller, Web inbox) to return an image (optionally with text) when configured.
- For the Web provider, also support audio/voice, video, and generic documents with sensible per-type limits.
- Keep the “one command at a time” queue intact; media fetch/serve must not block other replies longer than necessary.
- Avoid introducing new external services: reuse the existing Tailscale Funnel port to host media for Twilio.
+- Send media with optional captions via `clawdis send --media`.
+- Allow auto-replies from the web inbox to include media alongside text.
+- Keep per-type limits sane and predictable.

-## CLI & Config Surface
- `warelay send --media <path-or-url> [--message <caption>] [--provider twilio|web]`
-  - `--media` optional; `--message` remains required for now (caption can be empty string to send only media).
-  - `--dry-run` prints the resolved payload including hosted URL (twilio) or file path (web).
-  - `--json` emits `{ provider, to, sid/messageId, mediaUrl, caption }`.
- Config auto-reply (`~/.warelay/warelay.json`):
-  - Add `inbound.reply.mediaUrl?: string` (templated like `reply.text`).
-  - Return shape from `getReplyFromConfig` becomes `{ text?: string; mediaUrl?: string }`.
-  - Both `text` and `mediaUrl` optional; at least one must be present to send a reply.
+## CLI Surface
+- `clawdis send --media <path-or-url> [--message <caption>]`
+  - `--media` optional; caption can be empty for media-only sends.
+  - `--dry-run` prints the resolved payload; `--json` emits `{ provider, to, messageId, mediaUrl, caption }`.

-## Provider Behavior
-### Web (Baileys)
+## Web Provider Behavior
 - Input: local file path **or** HTTP(S) URL.
- Flow: load into Buffer, detect media kind, and apply the right payload:
-  - Images: **resize + recompress to JPEG** (max side 2048px, quality step-down) to fit under `inbound.reply.mediaMaxMb` (default 5 MB) but never above the Web hard cap (6 MB).
-  - Audio/voice and video: pass through up to 16 MB; set `ptt: true` for audio to send as a voice note.
-  - Everything else becomes a document with filename, up to 100 MB.
- MIME is detected by magic bytes first (then header, then path); wrong file extensions are tolerated and the detected MIME drives payload kind and recompression.
- Caption uses `--message` or `reply.text`; if caption is empty, send media-only.
- Logging: non-verbose shows `↩️`/`✅` with caption; verbose includes `(media, <bytes>B, <ms>ms fetch)` and the local/remote path.
-
-### Twilio
- Twilio API requires a public HTTPS `MediaUrl`; it will not accept local paths.
- Hosting strategy: reuse the webhook/Funnel port.
- When `--media` is a local path, copy to temp dir (`~/.warelay/media/<uuid>`), serve at `/media/<uuid>` on the existing Express app started for webhook, or spin up a short-lived server on demand for `send`.
-  - `MediaUrl` = `https://<tailnet-host>.ts.net/media/<uuid>`.
-  - Files auto-removed after TTL (default 2 minutes) or after first successful fetch (best-effort).
-  - Enforce size limit 5 MB (matches the media host guard); reject early with clear error.
- If `--media` is already an HTTPS URL, pass through unchanged.
- Fallback: if Funnel is not enabled (or host unknown) and a local path is provided, fail with guidance to run `warelay webhook --ingress tailscale` (or pass a URL instead).
-
-## Hosting/Server Details
- Extend `startWebhook` Express app:
-  - Static media route `/media/:id` reading from temp dir.
-  - 404/410 if expired or missing.
-  - Optional `?delete=1` to self-delete after fetch (used by Twilio fetch hook if we detect first hit).
- Temp storage: `~/.warelay/media`; cleaned on startup (remove files older than 15 minutes) and during TTL eviction.
- Security: no directory listing; only UUID file names; CORS open (Twilio fetch); content-type derived from sniffed bytes (fallback to header, then extension). Saved files are renamed with an extension that matches the detected MIME so downstream fetches present the correct type.
+- Flow: load into a Buffer, detect media kind, and build the correct payload:
+  - **Images:** resize & recompress to JPEG (max side 2048px) targeting `inbound.reply.mediaMaxMb` (default 5 MB), capped at 6 MB.
+  - **Audio/Voice/Video:** pass-through up to 16 MB; audio is sent as a voice note (`ptt: true`).
+  - **Documents:** anything else, up to 100 MB, with filename preserved when available.
+- MIME detection prefers magic bytes, then headers, then file extension.
+- Caption comes from `--message` or `reply.text`; empty caption is allowed.
+- Logging: non-verbose shows `↩️`/`✅`; verbose includes size and source path/URL.

 ## Auto-Reply Pipeline
- `getReplyFromConfig` returns `{ text?, mediaUrl? }`.
- Webhook / Twilio poller:
-  - If `mediaUrl` present, include `mediaUrl` in Twilio message payload; caption = `text` (may be empty).
-  - If only `text`, behave as today.
- Web inbox:
-  - If `mediaUrl` present, fetch/resolve same as send (local path or URL), send via Baileys with caption.
+- `getReplyFromConfig` returns `{ text?, mediaUrl?, mediaUrls? }`.
+- When media is present, the web sender resolves local paths or URLs using the same pipeline as `clawdis send`.
+- Multiple media entries are sent sequentially if provided.

 ## Inbound Media to Commands (Pi/Tau)
- For completeness: when inbound Twilio/Web messages include media, download to temp file, expose templating variables:
-  - `{{MediaUrl}}` original URL (Twilio) or pseudo-URL (web).
+- When inbound web messages include media, CLAWDIS downloads to a temp file and exposes templating variables:
+  - `{{MediaUrl}}` pseudo-URL for the inbound media.
  - `{{MediaPath}}` local temp path written before running the command.
- Size guard: only download if ≤5 MB; else skip and log (aligns with the temp media store limit).
- Saved inbound media is named with the detected MIME-based extension (e.g., `.jpg`), so later CLI sends reuse a correct filename/content-type even if WhatsApp omitted an extension.
- Audio/voice notes: if you set `inbound.transcribeAudio.command`, warelay will run that CLI (templated with `{{MediaPath}}`) and replace `Body` with the transcript before continuing the reply flow; verbose logs indicate when transcription runs. The command prompt includes the original media path plus a `Transcript:` section so the model sees both.
+- Audio transcription (if configured) runs before templating and can replace `Body` with the transcript.

-## Errors & Messaging
- Local path with twilio + Funnel disabled → error: “Twilio media needs a public URL; start `warelay webhook --ingress tailscale` or pass an https:// URL.”
- File too large → error mentions the applicable cap (5 MB for Twilio host, 6/16/100 MB for Web image/audio-video/doc respectively).
- Download failure for web provider → “Failed to load media from <source>; skipping send.”
+## Limits & Errors
+- Images: ~6 MB cap after recompression.
+- Audio/voice/video: 16 MB cap; documents: 100 MB cap.
+- Oversize or unreadable media → clear error in logs and the reply is skipped.

-## Tests to Add
- Twilio: dry-run shows hosted URL; send payload includes `mediaUrl`; rejects when Funnel host missing.
- Web: local path sends image (mock Baileys buffer assertion).
- Config: zod allows `mediaUrl`, returns combined object; command auto-reply handles `text+media`, `media-only`.
- Media server: serves file, enforces TTL, returns 404 after cleanup.
-
-## Open Decisions (confirm before coding)
- TTL for temp media (proposal: 2 minutes, cleanup at start + interval).
- One-file-per-send vs. batching: default to one-file-per-send; multi-attach not supported.
- Should `warelay send --provider twilio --media` implicitly start the media server (even if webhook not running), or require `warelay webhook` already active? (Proposal: auto-start lightweight server on demand, auto-stop after media is fetched or TTL.)
+## Notes for Tests
+- Cover send + reply flows for image/audio/document cases.
+- Validate recompression for images (size bound) and voice-note flag for audio.
+- Ensure multi-media replies fan out as sequential sends.
--- a/docs/queue.md
+++ b/docs/queue.md
@@ -1,6 +1,6 @@
 # Command Queue (2025-11-25)

-We now serialize all command-based auto-replies (Twilio webhook + poller + WhatsApp Web listener) through a tiny in-process queue to prevent multiple commands from running at once.
+We now serialize all command-based auto-replies (WhatsApp Web listener) through a tiny in-process queue to prevent multiple commands from running at once.

 ## Why
 - Some auto-reply commands are expensive (LLM calls) and can collide when multiple inbound messages arrive close together.
@@ -14,7 +14,7 @@ We now serialize all command-based auto-replies (Twilio webhook + poller + Whats

 ## Scope and guarantees
 - Applies only to config-driven command replies; plain text replies are unaffected.
- Queue is process-wide, so webhook handlers, Twilio polling, and the web inbox listener all respect the same lock.
+- Queue is process-wide, so the web inbox listener (and any future entrypoints) all respect the same lock.
 - No external dependencies or background worker threads; pure TypeScript + promises.

 ## Troubleshooting