Files
clawdbot/docs/whatsapp.md
2026-01-03 23:56:40 +00:00

5.6 KiB

summary, read_when
summary read_when
WhatsApp (web provider) integration: login, inbox, replies, media, and ops
Working on WhatsApp/web provider behavior or inbox routing

WhatsApp (web provider)

Updated: 2025-12-23

Status: WhatsApp Web via Baileys only. Gateway owns the single session.

Goals

  • One WhatsApp identity, one gateway session.
  • Deterministic routing: replies return to WhatsApp, no model routing.
  • Model sees enough context to understand quoted replies.

Architecture (who owns what)

  • Gateway owns the Baileys socket and inbox loop.
  • CLI / macOS app talk to the gateway; no direct Baileys use.
  • Active listener is required for outbound sends; otherwise send fails fast.

Getting a phone number

WhatsApp requires a real mobile number for verification. VoIP and virtual numbers are usually blocked.

Recommended approaches:

  • Local eSIM from your country's mobile carrier (most reliable)
  • Prepaid SIM — cheap, just needs to receive one SMS for verification

Avoid: TextNow, Google Voice, most "free SMS" services — WhatsApp blocks these aggressively.

Tip: The number only needs to receive one verification SMS. After that, WhatsApp Web sessions persist via creds.json.

Login + credentials

  • Login command: clawdis login (QR via Linked Devices).
  • Credentials stored in ~/.clawdis/credentials/creds.json.
  • Backup copy at creds.json.bak (restored on corruption).
  • Logout: clawdis logout deletes creds and session store.
  • Logged-out socket => error instructs re-link.

Inbound flow (DM + group)

  • WhatsApp events come from messages.upsert (Baileys).
  • Inbox listeners are detached on shutdown to avoid accumulating event handlers in tests/restarts.
  • Status/broadcast chats are ignored.
  • Direct chats use E.164; groups use group JID.
  • Allowlist: whatsapp.allowFrom enforced for direct chats only.
    • If whatsapp.allowFrom is empty, default allowlist = self number (self-chat mode).
  • Self-chat mode: avoids auto read receipts and ignores mention JIDs.
  • Read receipts sent for non-self-chat DMs.

Message normalization (what the model sees)

  • Body is the current message body with envelope.
  • Quoted reply context is always appended:
    [Replying to +1555 id:ABC123]
    <quoted text or <media:...>>
    [/Replying]
    
  • Reply metadata also set:
    • ReplyToId = stanzaId
    • ReplyToBody = quoted body or media placeholder
    • ReplyToSender = E.164 when known
  • Media-only inbound messages use placeholders:
    • <media:image|video|audio|document|sticker>

Groups

  • Groups map to whatsapp:group:<jid> sessions.
  • Activation modes:
    • mention (default): requires @mention or regex match.
    • always: always triggers.
  • /activation mention|always is owner-only.
  • Owner = whatsapp.allowFrom (or self E.164 if unset).
  • History injection:
    • Recent messages (default 50) inserted under: [Chat messages since your last reply - for context]
    • Current message under: [Current message - respond to this]
    • Sender suffix appended: [from: Name (+E164)]
  • Group metadata cached 5 min (subject + participants).

Reply delivery (threading)

  • WhatsApp Web sends standard messages (no quoted reply threading in the current gateway).
  • Reply tags are ignored on this surface.

Outbound send (text + media)

  • Uses active web listener; error if gateway not running.
  • Text chunking: 4k max per message.
  • Media:
    • Image/video/audio/document supported.
    • Audio sent as PTT; audio/ogg => audio/ogg; codecs=opus.
    • Caption only on first media item.
    • Media fetch supports HTTP(S) and local paths.
    • Animated GIFs: WhatsApp expects MP4 with gifPlayback: true for inline looping.
      • CLI: clawdis send --media <mp4> --gif-playback
      • Gateway: send params include gifPlayback: true

Media limits + optimization

  • Default cap: 5 MB (per media item).
  • Override: agent.mediaMaxMb.
  • Images are auto-optimized to JPEG under cap (resize + quality sweep).
  • Oversize media => error; media reply falls back to text warning.

Heartbeats

  • Gateway heartbeat logs connection health (web.heartbeatSeconds, default 60s).
  • Agent heartbeat is global (agent.heartbeat.*) and runs in the main session.
    • Uses HEARTBEAT prompt + HEARTBEAT_OK skip behavior.
    • Delivery defaults to the last used channel (or configured target).

Reconnect behavior

  • Backoff policy: web.reconnect:
    • initialMs, maxMs, factor, jitter, maxAttempts.
  • If maxAttempts reached, web monitoring stops (degraded).
  • Logged-out => stop and require re-link.

Config quick map

  • whatsapp.allowFrom (DM allowlist).
  • whatsapp.groups (group mention gating defaults/overrides)
  • routing.groupChat.mentionPatterns
  • routing.groupChat.historyLimit
  • messages.messagePrefix (inbound prefix)
  • messages.responsePrefix (outbound prefix)
  • agent.mediaMaxMb
  • agent.heartbeat.every
  • agent.heartbeat.model (optional override)
  • agent.heartbeat.target
  • agent.heartbeat.to
  • session.* (scope, idle, store; mainKey is ignored)
  • web.enabled (disable provider startup when false)
  • web.heartbeatSeconds
  • web.reconnect.*

Logs + troubleshooting

  • Subsystems: whatsapp/inbound, whatsapp/outbound, web-heartbeat, web-reconnect.
  • Log file: /tmp/clawdis/clawdis-YYYY-MM-DD.log (configurable).
  • Troubleshooting guide: docs/refactor/web-gateway-troubleshooting.md.

Tests

  • src/web/auto-reply.test.ts (mention gating, history injection, reply flow)
  • src/web/monitor-inbox.test.ts (inbound parsing + reply context)
  • src/web/outbound.test.ts (send mapping + media)