diff --git a/AGENTS.md b/AGENTS.md index 81a204b69..b80a259da 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,8 +1,8 @@ # Repository Guidelines ## Project Structure & Module Organization -- Source code: `src/` (CLI wiring in `src/cli`, commands in `src/commands`, Twilio in `src/twilio`, Web provider in `src/provider-web.ts`, infra in `src/infra`, media pipeline in `src/media`). -- Tests: colocated `*.test.ts` plus e2e in `src/cli/relay.e2e.test.ts`. +- Source code: `src/` (CLI wiring in `src/cli`, commands in `src/commands`, web provider in `src/provider-web.ts`, infra in `src/infra`, media pipeline in `src/media`). +- Tests: colocated `*.test.ts`. - Docs: `docs/` (images, queue, Pi config). Built output lives in `dist/`. ## Build, Test, and Development Commands @@ -29,9 +29,8 @@ - PRs should summarize scope, note testing performed, and mention any user-facing changes or new flags. ## Security & Configuration Tips -- Environment: copy `.env.example`; set Twilio creds and WhatsApp sender (`TWILIO_WHATSAPP_FROM`). -- Web provider stores creds at `~/.clawdis/credentials/` (legacy fallback: `~/.warelay/credentials/`); rerun `clawdis login` if logged out. -- Media hosting relies on Tailscale Funnel when using Twilio; use `clawdis webhook --ingress tailscale` or `--serve-media` for local hosting. +- Web provider stores creds at `~/.clawdis/credentials/`; rerun `clawdis login` if logged out. +- Pi/Tau sessions live under `~/.clawdis/sessions/` by default; the base directory is not configurable. ## Agent-Specific Notes - Relay is managed by launchctl (label `com.steipete.clawdis`). After code changes restart with `launchctl kickstart -k gui/$UID/com.steipete.clawdis` and verify via `launchctl list | grep clawdis`. Legacy label `com.steipete.warelay` still exists for rollback; prefer the new one. Use tmux only if you spin up a temporary relay yourself and clean it up afterward. @@ -43,10 +42,10 @@ The Claude Code Bash tool escapes `!` to `\\!` in command arguments. When using ```bash # WRONG - will send "Hello\\!" with backslash -clawdis send --provider web --to "+1234" --message 'Hello!' +clawdis send --to "+1234" --message 'Hello!' # CORRECT - use heredoc to avoid escaping -clawdis send --provider web --to "+1234" --message "$(cat <<'EOF' +clawdis send --to "+1234" --message "$(cat <<'EOF' Hello! EOF )" diff --git a/CHANGELOG.md b/CHANGELOG.md index d273f3e3a..be8199312 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,10 +4,13 @@ ### Breaking - Dropped all non-Pi agents (Claude, Codex, Gemini, Opencode); `inbound.reply.agent.kind` now only accepts `"pi"` and related CLI helpers have been removed. +- Removed Twilio support and all related commands/options (webhook/up/provider flags/wait-poll); CLAWDIS is Baileys Web-only. ### Changes - Default agent handling now favors Pi RPC while falling back to the plain command runner for non-Pi invocations, keeping heartbeat/session plumbing intact. - Documentation updated to reflect Pi-only support and to mark legacy Claude paths as historical. +- Status command reports web session health + session recipients; config paths are locked to `~/.clawdis` with session metadata stored under `~/.clawdis/sessions/`. +- Simplified send/agent/relay/heartbeat to web-only delivery; removed Twilio mocks/tests and dead code. ## 1.4.1 — 2025-12-04 diff --git a/README.md b/README.md index a70507e09..3b86e329e 100644 --- a/README.md +++ b/README.md @@ -31,7 +31,7 @@ Because every space lobster needs a time-and-space machine. The Doctor has a TAR ## Features -- 📱 **WhatsApp Integration** — Personal WhatsApp Web or Twilio +- 📱 **WhatsApp Integration** — Personal WhatsApp Web (Baileys) - 🤖 **AI Agent Gateway** — Pi/Tau only (Pi CLI in RPC mode) - 💬 **Session Management** — Per-sender conversation context - 🔔 **Heartbeats** — Periodic check-ins for proactive AI @@ -100,24 +100,14 @@ CLAWDIS was built for **Clawd**, a space lobster AI assistant. See the full setu - 👨‍💻 **Peter's Blog:** [steipete.me](https://steipete.me) - 🐦 **Twitter:** [@steipete](https://twitter.com/steipete) -## Providers +## Provider -### WhatsApp Web (Recommended) +### WhatsApp Web ```bash clawdis login # Scan QR code clawdis relay # Start listening ``` -### Twilio -```bash -# Set environment variables -export TWILIO_ACCOUNT_SID=... -export TWILIO_AUTH_TOKEN=... -export TWILIO_WHATSAPP_FROM=whatsapp:+1234567890 - -clawdis relay --provider twilio -``` - ## Commands | Command | Description | diff --git a/docs/RELEASING.md b/docs/RELEASING.md index c42b0f42f..f074134f6 100644 --- a/docs/RELEASING.md +++ b/docs/RELEASING.md @@ -20,7 +20,7 @@ Use `pnpm` (Node 22+) from the repo root. Keep the working tree clean before tag - [ ] `pnpm lint` - [ ] `pnpm test` (or `pnpm test:coverage` if you need coverage output) - [ ] `pnpm run build` (last sanity check after tests) -- [ ] (Optional) Spot-check a Twilio/Web flow if your changes affect send/receive paths. +- [ ] (Optional) Spot-check the web relay if your changes affect send/receive paths. 5) **Publish** - [ ] Confirm git status is clean; commit and push as needed. diff --git a/docs/audio.md b/docs/audio.md index fc90b5618..a0c9a059e 100644 --- a/docs/audio.md +++ b/docs/audio.md @@ -1,8 +1,8 @@ -# Audio / Voice Notes — 2025-11-25 +# Audio / Voice Notes — 2025-12-05 ## What works -- **Optional transcription**: If `inbound.transcribeAudio.command` is set in `~/.warelay/warelay.json`, warelay will: - 1) Download inbound audio (Web or Twilio) to a temp path if only a URL is present. +- **Optional transcription**: If `inbound.transcribeAudio.command` is set in `~/.clawdis/clawdis.json`, CLAWDIS will: + 1) Download inbound audio to a temp path when WhatsApp only provides a URL. 2) Run the configured CLI (templated with `{{MediaPath}}`), expecting transcript on stdout. 3) Replace `Body` with the transcript, set `{{Transcript}}`, and prepend the original media path plus a `Transcript:` section in the command prompt so models see both. 4) Continue through the normal auto-reply pipeline (templating, sessions, Pi command). @@ -39,11 +39,10 @@ Requires `OPENAI_API_KEY` in env and `openai` CLI installed: ## Notes & limits - We don’t ship a transcriber; you opt in with any CLI that prints text to stdout (Whisper cloud, whisper.cpp, vosk, Deepgram, etc.). - Size guard: inbound audio must be ≤5 MB (matches the temp media store and transcript pipeline). -- Outbound caps: Web can send audio/voice up to 16 MB (sends as a voice note with `ptt: true`); Twilio still uses the 5 MB media host guard. +- Outbound caps: web send supports audio/voice up to 16 MB (sent as a voice note with `ptt: true`). - If transcription fails, we fall back to the original body/media note; replies still go through. - Transcript is available to templates as `{{Transcript}}`; models get both the media path and a `Transcript:` block in the prompt when using command mode. ## Gotchas - Ensure your CLI exits 0 and prints plain text; JSON needs to be massaged via `jq -r .text`. - Keep timeouts reasonable (`timeoutSeconds`, default 45s) to avoid blocking the reply queue. -- Twilio paths are hosted URLs; Web paths are local. The temp download uses HTTPS for Twilio and a temp file for Web-only media. diff --git a/docs/clawd.md b/docs/clawd.md index 1ef4da97d..5264be1cd 100644 --- a/docs/clawd.md +++ b/docs/clawd.md @@ -87,7 +87,7 @@ The magic is in the combination: WhatsApp's ubiquity + Claude's intelligence + w ## The Config That Powers Clawd -This is the actual config running on @steipete's Mac (`~/.warelay/warelay.json`): +This is the actual config running on @steipete's Mac (`~/.clawdis/clawdis.json`): ```json5 { diff --git a/docs/group-messages.md b/docs/group-messages.md index 5b315727f..d2ce3bc23 100644 --- a/docs/group-messages.md +++ b/docs/group-messages.md @@ -12,7 +12,7 @@ Goal: let Clawd sit in WhatsApp groups, wake up only when pinged, and keep that - New session primer: on the first turn of a group session we now prepend a short blurb to the model like `You are replying inside the WhatsApp group "". Group members: +44..., +43..., … Address the specific sender noted in the message context.` If metadata isn’t available we still tell the agent it’s a group chat. ## Config for Clawd UK (+447511247203) -Add a `groupChat` block to `~/.warelay/warelay.json` so display-name pings work even when WhatsApp strips the visual `@` in the text body: +Add a `groupChat` block to `~/.clawdis/clawdis.json` so display-name pings work even when WhatsApp strips the visual `@` in the text body: ```json5 { @@ -46,7 +46,7 @@ Notes: - Manual smoke: - Send an `@clawd` ping in the group and confirm a reply that references the sender name. - Send a second ping and verify the history block is included then cleared on the next turn. - - Check `/tmp/warelay/warelay.log` at level `trace` (run relay with `--verbose`) to see `inbound web message (batched)` entries showing `from: ` and the `[from: …]` suffix. + - Check relay logs (run with `--verbose`) to see `inbound web message (batched)` entries showing `from: ` and the `[from: …]` suffix. ## Known considerations - Heartbeats are intentionally skipped for groups to avoid noisy broadcasts. diff --git a/docs/images.md b/docs/images.md index 6aba8aabc..0afd41a87 100644 --- a/docs/images.md +++ b/docs/images.md @@ -1,81 +1,44 @@ -# Image Support Specification — 2025-11-25 +# Image & Media Support — 2025-12-05 -This document defines how `warelay` should handle sending and replying with images across both providers. It is intentionally implementation-ready and keeps the UX consistent with existing CLI patterns and Tailscale Funnel usage. +CLAWDIS is now **web-only** (Baileys). This document captures the current media handling rules for send, relay, and agent replies. ## Goals -- Allow sending an image with an optional caption via `warelay send` for both providers. -- Allow auto-replies (Twilio webhook, Twilio poller, Web inbox) to return an image (optionally with text) when configured. -- For the Web provider, also support audio/voice, video, and generic documents with sensible per-type limits. -- Keep the “one command at a time” queue intact; media fetch/serve must not block other replies longer than necessary. -- Avoid introducing new external services: reuse the existing Tailscale Funnel port to host media for Twilio. +- Send media with optional captions via `clawdis send --media`. +- Allow auto-replies from the web inbox to include media alongside text. +- Keep per-type limits sane and predictable. -## CLI & Config Surface -- `warelay send --media [--message ] [--provider twilio|web]` - - `--media` optional; `--message` remains required for now (caption can be empty string to send only media). - - `--dry-run` prints the resolved payload including hosted URL (twilio) or file path (web). - - `--json` emits `{ provider, to, sid/messageId, mediaUrl, caption }`. -- Config auto-reply (`~/.warelay/warelay.json`): - - Add `inbound.reply.mediaUrl?: string` (templated like `reply.text`). - - Return shape from `getReplyFromConfig` becomes `{ text?: string; mediaUrl?: string }`. - - Both `text` and `mediaUrl` optional; at least one must be present to send a reply. +## CLI Surface +- `clawdis send --media [--message ]` + - `--media` optional; caption can be empty for media-only sends. + - `--dry-run` prints the resolved payload; `--json` emits `{ provider, to, messageId, mediaUrl, caption }`. -## Provider Behavior -### Web (Baileys) +## Web Provider Behavior - Input: local file path **or** HTTP(S) URL. -- Flow: load into Buffer, detect media kind, and apply the right payload: - - Images: **resize + recompress to JPEG** (max side 2048px, quality step-down) to fit under `inbound.reply.mediaMaxMb` (default 5 MB) but never above the Web hard cap (6 MB). - - Audio/voice and video: pass through up to 16 MB; set `ptt: true` for audio to send as a voice note. - - Everything else becomes a document with filename, up to 100 MB. -- MIME is detected by magic bytes first (then header, then path); wrong file extensions are tolerated and the detected MIME drives payload kind and recompression. -- Caption uses `--message` or `reply.text`; if caption is empty, send media-only. -- Logging: non-verbose shows `↩️`/`✅` with caption; verbose includes `(media, B, ms fetch)` and the local/remote path. - -### Twilio -- Twilio API requires a public HTTPS `MediaUrl`; it will not accept local paths. -- Hosting strategy: reuse the webhook/Funnel port. -- When `--media` is a local path, copy to temp dir (`~/.warelay/media/`), serve at `/media/` on the existing Express app started for webhook, or spin up a short-lived server on demand for `send`. - - `MediaUrl` = `https://.ts.net/media/`. - - Files auto-removed after TTL (default 2 minutes) or after first successful fetch (best-effort). - - Enforce size limit 5 MB (matches the media host guard); reject early with clear error. -- If `--media` is already an HTTPS URL, pass through unchanged. -- Fallback: if Funnel is not enabled (or host unknown) and a local path is provided, fail with guidance to run `warelay webhook --ingress tailscale` (or pass a URL instead). - -## Hosting/Server Details -- Extend `startWebhook` Express app: - - Static media route `/media/:id` reading from temp dir. - - 404/410 if expired or missing. - - Optional `?delete=1` to self-delete after fetch (used by Twilio fetch hook if we detect first hit). -- Temp storage: `~/.warelay/media`; cleaned on startup (remove files older than 15 minutes) and during TTL eviction. -- Security: no directory listing; only UUID file names; CORS open (Twilio fetch); content-type derived from sniffed bytes (fallback to header, then extension). Saved files are renamed with an extension that matches the detected MIME so downstream fetches present the correct type. +- Flow: load into a Buffer, detect media kind, and build the correct payload: + - **Images:** resize & recompress to JPEG (max side 2048px) targeting `inbound.reply.mediaMaxMb` (default 5 MB), capped at 6 MB. + - **Audio/Voice/Video:** pass-through up to 16 MB; audio is sent as a voice note (`ptt: true`). + - **Documents:** anything else, up to 100 MB, with filename preserved when available. +- MIME detection prefers magic bytes, then headers, then file extension. +- Caption comes from `--message` or `reply.text`; empty caption is allowed. +- Logging: non-verbose shows `↩️`/`✅`; verbose includes size and source path/URL. ## Auto-Reply Pipeline -- `getReplyFromConfig` returns `{ text?, mediaUrl? }`. -- Webhook / Twilio poller: - - If `mediaUrl` present, include `mediaUrl` in Twilio message payload; caption = `text` (may be empty). - - If only `text`, behave as today. -- Web inbox: - - If `mediaUrl` present, fetch/resolve same as send (local path or URL), send via Baileys with caption. +- `getReplyFromConfig` returns `{ text?, mediaUrl?, mediaUrls? }`. +- When media is present, the web sender resolves local paths or URLs using the same pipeline as `clawdis send`. +- Multiple media entries are sent sequentially if provided. ## Inbound Media to Commands (Pi/Tau) -- For completeness: when inbound Twilio/Web messages include media, download to temp file, expose templating variables: - - `{{MediaUrl}}` original URL (Twilio) or pseudo-URL (web). +- When inbound web messages include media, CLAWDIS downloads to a temp file and exposes templating variables: + - `{{MediaUrl}}` pseudo-URL for the inbound media. - `{{MediaPath}}` local temp path written before running the command. -- Size guard: only download if ≤5 MB; else skip and log (aligns with the temp media store limit). -- Saved inbound media is named with the detected MIME-based extension (e.g., `.jpg`), so later CLI sends reuse a correct filename/content-type even if WhatsApp omitted an extension. -- Audio/voice notes: if you set `inbound.transcribeAudio.command`, warelay will run that CLI (templated with `{{MediaPath}}`) and replace `Body` with the transcript before continuing the reply flow; verbose logs indicate when transcription runs. The command prompt includes the original media path plus a `Transcript:` section so the model sees both. +- Audio transcription (if configured) runs before templating and can replace `Body` with the transcript. -## Errors & Messaging -- Local path with twilio + Funnel disabled → error: “Twilio media needs a public URL; start `warelay webhook --ingress tailscale` or pass an https:// URL.” -- File too large → error mentions the applicable cap (5 MB for Twilio host, 6/16/100 MB for Web image/audio-video/doc respectively). -- Download failure for web provider → “Failed to load media from ; skipping send.” +## Limits & Errors +- Images: ~6 MB cap after recompression. +- Audio/voice/video: 16 MB cap; documents: 100 MB cap. +- Oversize or unreadable media → clear error in logs and the reply is skipped. -## Tests to Add -- Twilio: dry-run shows hosted URL; send payload includes `mediaUrl`; rejects when Funnel host missing. -- Web: local path sends image (mock Baileys buffer assertion). -- Config: zod allows `mediaUrl`, returns combined object; command auto-reply handles `text+media`, `media-only`. -- Media server: serves file, enforces TTL, returns 404 after cleanup. - -## Open Decisions (confirm before coding) -- TTL for temp media (proposal: 2 minutes, cleanup at start + interval). -- One-file-per-send vs. batching: default to one-file-per-send; multi-attach not supported. -- Should `warelay send --provider twilio --media` implicitly start the media server (even if webhook not running), or require `warelay webhook` already active? (Proposal: auto-start lightweight server on demand, auto-stop after media is fetched or TTL.) +## Notes for Tests +- Cover send + reply flows for image/audio/document cases. +- Validate recompression for images (size bound) and voice-note flag for audio. +- Ensure multi-media replies fan out as sequential sends. diff --git a/docs/queue.md b/docs/queue.md index 811605728..8faf853a7 100644 --- a/docs/queue.md +++ b/docs/queue.md @@ -1,6 +1,6 @@ # Command Queue (2025-11-25) -We now serialize all command-based auto-replies (Twilio webhook + poller + WhatsApp Web listener) through a tiny in-process queue to prevent multiple commands from running at once. +We now serialize all command-based auto-replies (WhatsApp Web listener) through a tiny in-process queue to prevent multiple commands from running at once. ## Why - Some auto-reply commands are expensive (LLM calls) and can collide when multiple inbound messages arrive close together. @@ -14,7 +14,7 @@ We now serialize all command-based auto-replies (Twilio webhook + poller + Whats ## Scope and guarantees - Applies only to config-driven command replies; plain text replies are unaffected. -- Queue is process-wide, so webhook handlers, Twilio polling, and the web inbox listener all respect the same lock. +- Queue is process-wide, so the web inbox listener (and any future entrypoints) all respect the same lock. - No external dependencies or background worker threads; pure TypeScript + promises. ## Troubleshooting diff --git a/package.json b/package.json index edd38e2ab..13ee691fb 100644 --- a/package.json +++ b/package.json @@ -1,7 +1,7 @@ { "name": "warelay", "version": "1.4.0", - "description": "WhatsApp relay CLI (send, monitor, webhook, auto-reply) using Twilio", + "description": "WhatsApp relay CLI (Baileys web) with Pi RPC agent", "type": "module", "main": "dist/index.js", "bin": {