Merge origin/main

This commit is contained in:
Peter Steinberger
2025-12-14 00:52:40 +00:00
34 changed files with 1862 additions and 81 deletions

View File

@@ -9,7 +9,7 @@ read_when:
## What Clawdis Does
- Runs WhatsApp gateway + Pi coding agent so the assistant can read/write chats, fetch context, and run tools via the host Mac.
- macOS app manages permissions (screen recording, notifications, microphone) and exposes a CLI helper `clawdis-mac` for scripts.
- Sessions are per-sender; heartbeats keep background tasks alive.
- Direct chats collapse into the shared `main` session by default; groups stay isolated as `group:<jid>`; heartbeats keep background tasks alive.
## Core Tools (enable in Settings → Tools)
- **mcporter** — MCP runtime/CLI to list, call, and sync Model Context Protocol servers.

View File

@@ -122,8 +122,8 @@
<span class="footer__sep">·</span>
<a href="https://github.com/steipete/clawdis">source</a>
<span class="footer__sep">·</span>
<a href="https://www.npmjs.com/package/clawdis">npm</a>
</div>
<a href="https://github.com/steipete/clawdis/releases">releases</a>
</div>
<div class="footer__hint" aria-hidden="true">
tip: press <kbd>F2</kbd> (Mac: <kbd>fn</kbd>+<kbd>F2</kbd>) to flip
the universe

98
docs/camera.md Normal file
View File

@@ -0,0 +1,98 @@
---
summary: "Camera capture (iOS node + macOS app) for agent use: photos (jpg) and short video clips (mp4)"
read_when:
- Adding or modifying camera capture on iOS nodes or macOS
- Extending agent-accessible MEDIA temp-file workflows
---
# Camera capture (agent)
Clawdis supports **camera capture** for agent workflows:
- **iOS node** (paired via Gateway): capture a **photo** (`jpg`) or **short video clip** (`mp4`, with optional audio) via `node.invoke`.
- **macOS app** (local control socket): capture a **photo** (`jpg`) or **short video clip** (`mp4`, with optional audio) via `clawdis-mac`.
All camera access is gated behind **user-controlled settings**.
## iOS node
### User setting (default on)
- iOS Settings tab → **Camera****Allow Camera** (`camera.enabled`)
- Default: **on** (missing key is treated as enabled).
- When off: `camera.*` commands return `CAMERA_DISABLED`.
### Commands (via Gateway `node.invoke`)
- `camera.snap`
- Params:
- `facing`: `front|back` (default: `front`)
- `maxWidth`: number (optional)
- `quality`: `0..1` (optional; default `0.9`)
- `format`: currently `jpg`
- Response payload:
- `format: "jpg"`
- `base64: "<...>"`
- `width`, `height`
- `camera.clip`
- Params:
- `facing`: `front|back` (default: `front`)
- `durationMs`: number (default `3000`, clamped to a max)
- `includeAudio`: boolean (default `true`)
- `format`: currently `mp4`
- Response payload:
- `format: "mp4"`
- `base64: "<...>"`
- `durationMs`
- `hasAudio`
### Foreground requirement
Like `screen.*`, the iOS node only allows `camera.*` commands in the **foreground**. Background invocations return `NODE_BACKGROUND_UNAVAILABLE`.
### CLI helper (temp files + MEDIA)
The easiest way to get attachments is via the CLI helper, which writes decoded media to a temp file and prints `MEDIA:<path>`.
Examples:
```bash
clawdis nodes camera snap --node <id> # default: both front + back (2 MEDIA lines)
clawdis nodes camera snap --node <id> --facing front
clawdis nodes camera clip --node <id> --duration 3000
clawdis nodes camera clip --node <id> --no-audio
```
Notes:
- `nodes camera snap` defaults to **both** facings to give the agent both views.
- Output files are temporary (in the OS temp directory) unless you build your own wrapper.
## macOS app
### User setting (default off)
The macOS companion app exposes a checkbox:
- **Settings → Debug → Camera → Allow Camera (agent)** (`clawdis.cameraEnabled`)
- Default: **off**
- When off: camera requests return “Camera disabled by user”.
### CLI helper (local control socket)
The `clawdis-mac` helper talks to the running menu bar app over the local control socket.
Examples:
```bash
clawdis-mac camera snap # prints MEDIA:<path>
clawdis-mac camera snap --max-width 1280
clawdis-mac camera clip --duration-ms 3000 # prints MEDIA:<path>
clawdis-mac camera clip --no-audio
```
## Safety + practical limits
- Camera and microphone access trigger the usual OS permission prompts (and require usage strings in Info.plist).
- Video clips are intentionally short to avoid oversized bridge payloads (base64 overhead + WebSocket message limits).

View File

@@ -24,9 +24,17 @@ Start conservative:
## Prerequisites
- Node **22+**
- CLAWDIS installed: `npm install -g clawdis`
- CLAWDIS available on PATH (recommended during development: from source + global link)
- A second phone number (SIM/eSIM/prepaid) for the assistant
From source (recommended while the npm package is still settling):
```bash
pnpm install
pnpm build
pnpm link --global
```
## The two-phone setup (recommended)
You want this:
@@ -121,7 +129,7 @@ Example:
## Sessions and memory
- Session files: `~/.clawdis/sessions/{{SessionId}}.jsonl`
- Session metadata (token usage, last route, etc): `~/.clawdis/sessions.json`
- Session metadata (token usage, last route, etc): `~/.clawdis/sessions/sessions.json` (legacy: `~/.clawdis/sessions.json`)
- `/new` starts a fresh session for that chat (configurable via `resetTriggers`)
## Heartbeats (proactive mode)

View File

@@ -5,9 +5,10 @@ read_when:
---
# Control channel API (newline-delimited JSON)
**Deprecated:** superseded by the WebSocket Gateway protocol (`clawdis gateway`, see `docs/architecture.md` and `docs/gateway.md`). Use only for legacy builds predating the Gateway rollout.
**Deprecated (historical):** superseded by the WebSocket Gateway protocol (`clawdis gateway`, see `docs/architecture.md` and `docs/gateway.md`).
Current builds use a WebSocket server on `ws://127.0.0.1:18789` and do **not** expose this TCP control channel.
Endpoint: `127.0.0.1:18789` (TCP, localhost only). Clients reach it via SSH port forward in remote mode.
Legacy endpoint (if present in an older build): `127.0.0.1:18789` (TCP, localhost only), typically reached via SSH port forward in remote mode.
## Frame format
Each line is a JSON object. Two shapes exist:
@@ -45,4 +46,4 @@ Each line is a JSON object. Two shapes exist:
4) For user toggles, send `set-heartbeats` and await response.
## Backward compatibility
- If the control port is unavailable (older gateway), the client may fall back to the legacy CLI path, but the intended path is to rely solely on this API.
- If the control channel is unavailable: thats expected on modern builds. Use the Gateway WS protocol instead.

View File

@@ -56,4 +56,4 @@ Notes:
## Known considerations
- Heartbeats are intentionally skipped for groups to avoid noisy broadcasts.
- Echo suppression uses the combined batch string; if you send identical text twice without mentions, only the first will get a response.
- Session store entries will appear as `group:<jid>` in `sessions.json`; a missing entry just means the group hasnt triggered a run yet.
- Session store entries will appear as `group:<jid>` in the session store (`~/.clawdis/sessions/sessions.json` by default); a missing entry just means the group hasnt triggered a run yet.

View File

@@ -16,7 +16,7 @@ Short guide to verify the WhatsApp Web / Baileys stack without guessing.
## Deep diagnostics
- Creds on disk: `ls -l ~/.clawdis/credentials/creds.json` (mtime should be recent).
- Session store: `ls -l ~/.clawdis/sessions.json` (path can be overridden in config). Count and recent recipients are surfaced via `status`.
- Session store: `ls -l ~/.clawdis/sessions/sessions.json` (legacy: `~/.clawdis/sessions.json`; path can be overridden in config). Count and recent recipients are surfaced via `status`.
- Relink flow: `clawdis logout && clawdis login --verbose` when status codes 409515 or `loggedOut` appear in logs.
## When something fails

View File

@@ -19,7 +19,7 @@ read_when:
<p align="center">
<a href="https://github.com/steipete/clawdis">GitHub</a> ·
<a href="https://www.npmjs.com/package/clawdis">npm</a> ·
<a href="https://github.com/steipete/clawdis/releases">Releases</a> ·
<a href="./clawd">Clawd setup</a>
</p>
@@ -29,25 +29,41 @@ Its built for [Clawd](https://clawd.me), a space lobster who needed a TARDIS.
## How it works
```
┌─────────────┐ ┌──────────┐ ┌─────────────┐
WhatsApp │ ───▶ │ CLAWDIS │ ───▶ │ AI Agent
Telegram │ ───▶ │ 🦞⏱️💙 │ ◀─── │ (Pi) │
(You) │ ◀─── │ │ │ │
└─────────────┘ └──────────┘ └─────────────┘
WhatsApp / Telegram
┌──────────────────────────┐
Gateway │ ws://127.0.0.1:18789 (loopback-only)
│ (single source) │ tcp://0.0.0.0:18790 (optional Bridge)
└───────────┬───────────────┘
├─ Pi agent (RPC)
├─ CLI (clawdis …)
├─ WebChat (loopback UI)
├─ macOS app (Clawdis.app)
└─ iOS node (Iris) via Bridge + pairing
```
Most operations flow through the **Gateway** (`clawdis gateway`), a single long-running process that owns provider connections and the WebSocket control plane.
## Network model
- **One Gateway per host**: it is the only process allowed to own the WhatsApp Web session.
- **Loopback-first**: Gateway WS is `ws://127.0.0.1:18789` (not exposed on the LAN).
- **Bridge for nodes**: optional LAN/tailnet-facing bridge on `tcp://0.0.0.0:18790` for paired nodes (Bonjour-discoverable).
- **Remote use**: SSH tunnel or tailnet/VPN; see `docs/remote.md` and `docs/discovery.md`.
## Features (high level)
- 📱 **WhatsApp Integration** — Uses Baileys for WhatsApp Web protocol
- ✈️ **Telegram Bot** — DMs + groups via grammY
- 🤖 **Agent bridge** — Pi (RPC mode) with tool streaming
- 💬 **Sessions**Per-sender (or shared `main`) conversation context
- 💬 **Sessions**Direct chats collapse into shared `main` (default); groups are isolated
- 👥 **Group Chat Support** — Mention-based triggering in group chats
- 📎 **Media Support** — Send and receive images, audio, documents
- 🎤 **Voice notes** — Optional transcription hook
- 🖥️ **WebChat + macOS app**A local UI + menu bar companion for ops and voice wake
- 🖥️ **WebChat + macOS app**Local UI + menu bar companion for ops and voice wake
- 📱 **iOS node (Iris)** — Pairs as a node and exposes a Canvas surface
Note: legacy Claude/Codex/Gemini/Opencode paths have been removed; Pi is the only coding-agent path.
@@ -56,8 +72,10 @@ Note: legacy Claude/Codex/Gemini/Opencode paths have been removed; Pi is the onl
Runtime requirement: **Node ≥ 22**.
```bash
# Install
npm install -g clawdis
# From source (recommended while the npm package is still settling)
pnpm install
pnpm build
pnpm link --global
# Pair WhatsApp Web (shows QR)
clawdis login
@@ -95,18 +113,23 @@ Example:
## Docs
- [Configuration](./configuration.md)
- [Gateway runbook](./gateway.md)
- [WebChat](./webchat.md)
- [Agent integration](./agents.md)
- [Telegram](./telegram.md)
- [Group messages](./group-messages.md)
- [Media: images](./images.md)
- [Media: audio](./audio.md)
- [Sessions](./session.md)
- [Cron + wakeups](./cron.md)
- [Security](./security.md)
- [Troubleshooting](./troubleshooting.md)
- Start here:
- [Configuration](./configuration.md)
- [Clawd personal assistant setup](./clawd.md)
- [Gateway runbook](./gateway.md)
- [Discovery + transports](./discovery.md)
- [Remote access](./remote.md)
- Providers and UX:
- [WebChat](./webchat.md)
- [Telegram](./telegram.md)
- [Group messages](./group-messages.md)
- [Media: images](./images.md)
- [Media: audio](./audio.md)
- Ops and safety:
- [Sessions](./session.md)
- [Cron + wakeups](./cron.md)
- [Security](./security.md)
- [Troubleshooting](./troubleshooting.md)
## The name

View File

@@ -54,13 +54,13 @@ More debugging notes: `docs/bonjour.md`.
In Iris:
- Pick the discovered bridge (or hit refresh).
- If not paired yet, Iris will initiate pairing automatically.
- After the first successful pairing, Iris will auto-reconnect to the **last bridge** on launch (including after reinstall), as long as the iOS Keychain entry is still present.
- After the first successful pairing, Iris will auto-reconnect **strictly to the last discovered gateway** on launch (including after reinstall), as long as the iOS Keychain entry is still present.
### Connection indicator (always visible)
The Settings tab icon shows a small status dot:
- **Green**: connected to the bridge
- **Yellow**: connecting
- **Yellow**: connecting (subtle pulse)
- **Red**: not connected / error
## 4) Approve pairing (CLI)

View File

@@ -10,7 +10,7 @@ Context: web chat currently lives in a WKWebView that loads the pi-web bundle. S
## Target state
- Gateway WS adds methods:
- `chat.history { sessionKey }``{ sessionKey, messages[], thinkingLevel }` (reads the existing JSONL + sessions.json).
- `chat.history { sessionKey }``{ sessionKey, messages[], thinkingLevel }` (reads the existing JSONL + session store).
- `chat.send { sessionKey, message, attachments?, thinking?, deliver?, timeoutMs<=30000, idempotencyKey }``res { runId, status:"accepted" }` or `res ok:false` on validation/timeout.
- Gateway WS emits `chat` events `{ runId, sessionKey, seq, state:"delta"|"final"|"error", message?, errorMessage?, usage?, stopReason? }`. Streaming is optional; minimum is a single `state:"final"` per send.
- Client consumes only WS: bootstrap via `chat.history`, send via `chat.send`, live updates via `chat` events. No file watchers.

View File

@@ -3,48 +3,50 @@ summary: "Remote mode topology using SSH control channels between gateway and ma
read_when:
- Running or troubleshooting remote gateway setups
---
# Remote mode with control channel
# Remote access (SSH, tunnels, and tailnets)
This repo supports “remote over SSH” by keeping a single gateway (the master) running on a host (e.g., your Mac Studio) and connecting one or more macOS menu bar clients to it. The menu app no longer shells out to `pnpm clawdis …`; it talks to the gateway over a persistent control channel that is tunneled through SSH.
This repo supports “remote over SSH” by keeping a single Gateway (the master) running on a host (e.g., your Mac Studio) and connecting clients to it.
Remote mode is the SSH fallback transport. As Clawdis adds a direct “bridge” transport for LAN/tailnet setups, SSH remains supported for universal reach.
See `docs/discovery.md` for how clients choose between direct vs SSH.
- For **operators (you / the macOS app)**: SSH tunneling is the universal fallback.
- For **nodes (Iris/iOS and future devices)**: prefer the Gateway **Bridge** when on the same LAN/tailnet (see `docs/discovery.md`).
## Topology
- Master: runs the gateway + control server on `127.0.0.1:18789` (in-process TCP server).
- Clients: when “Remote over SSH” is selected, the app opens one SSH tunnel:
- `ssh -N -L <localPort>:127.0.0.1:18789 <user>@<host>`
- The app then connects to `localhost:<localPort>` and keeps that socket open.
- Messages are newline-delimited JSON (documented in `docs/control-api.md`).
## The core idea
## Connection flow (clients)
1) Establish SSH tunnel.
2) Open TCP socket to the local forwarded port.
3) Send `ping` to verify connectivity.
4) Issue `health`, `status`, and `last-heartbeat` requests to seed UI.
5) Listen for `event` frames (heartbeat updates, gateway status).
- The Gateway WebSocket binds to **loopback**: `ws://127.0.0.1:18789`.
- For remote use, you forward that loopback port over SSH (or use a tailnet/VPN and tunnel less).
## Heartbeats
- Heartbeats always run on the master gateway.
- The control server emits `event: "heartbeat"` after each heartbeat attempt and keeps the latest in memory for `last-heartbeat` requests.
- No file-based heartbeat logs/state are required when the control stream is available.
## SSH tunnel (CLI + tools)
## Local mode
- The menu app skips SSH and connects directly to `127.0.0.1:18789` with the same protocol.
Create a local tunnel to the remote Gateway WS:
## Failure handling
- If the tunnel drops, the client reconnects and re-issues `ping`, `health`, and `last-heartbeat` to refresh state (the mac app shows “Control channel disconnected”).
- If the control port is unavailable (older gateway), the app can optionally fall back to the legacy CLI path, but the goal is to rely solely on the control channel.
```bash
ssh -N -L 18789:127.0.0.1:18789 user@host
```
## Test Remote (in the mac app)
1) SSH reachability check (`ssh -o BatchMode=yes … echo ok`).
2) If SSH succeeds, the app opens the control tunnel and issues a `health` request; success marks the remote as ready.
With the tunnel up:
- `clawdis health` and `clawdis status --deep` now reach the remote gateway via `ws://127.0.0.1:18789`.
- `clawdis gateway {status,health,send,agent,call}` can also target the forwarded URL via `--url` when needed.
## Security
- Control server listens only on localhost.
- SSH tunneling reuses existing keys/agent; no additional auth is added by the control server.
## WebChat over SSH
## Files to keep in sync
- Protocol definition: `docs/control-api.md`.
- App connection logic: macOS `Remote over SSH` plumbing.
- Gateway control server: lives inside the Node gateway process.
Forward both the WebChat HTTP port and the Gateway WS port:
```bash
ssh -N \
-L 18788:127.0.0.1:18788 \
-L 18789:127.0.0.1:18789 \
user@host
```
Then open `http://127.0.0.1:18788/webchat/` locally. (Details: `docs/webchat.md`.)
## macOS app “Remote over SSH”
The macOS menu bar app can drive the same setup end-to-end (remote status checks, WebChat, and Voice Wake forwarding).
Runbook: `docs/mac/remote.md`.
## Legacy control channel
Older builds experimented with a newline-delimited TCP control channel on the same port.
That API is deprecated and should not be relied on. (Historical reference: `docs/control-api.md`.)

View File

@@ -7,7 +7,7 @@ read_when:
Updated: 2025-12-07
Status: ready for bot-mode use with grammY (long-poll + webhook). Text + media send, proxy, and webhook helpers all ship in-tree.
Status: ready for bot-mode use with grammY (long-polling by default; webhook supported when configured). Text + media send, mention-gated group replies, and optional proxy support are implemented.
## Goals
- Let you talk to Clawdis via a Telegram bot in DMs and groups.
@@ -17,7 +17,11 @@ Status: ready for bot-mode use with grammY (long-poll + webhook). Text + media s
## How it will work (Bot API)
1) Create a bot with @BotFather and grab the token.
2) Configure Clawdis with `TELEGRAM_BOT_TOKEN` (or `telegram.botToken` in `~/.clawdis/clawdis.json`).
3) Run the gateway; it auto-starts Telegram when the bot token is set. To force Telegram-only: `clawdis gateway --provider telegram`. Webhook mode: `clawdis gateway --provider telegram --webhook --port 8787 --webhook-secret <secret>` (optionally `--webhook-url` when the public URL differs).
3) Run the gateway; it auto-starts Telegram when the bot token is set.
- **Long-polling** is the default.
- **Webhook mode** is enabled by setting `telegram.webhookUrl` (optionally `telegram.webhookSecret` / `telegram.webhookPath`).
- The webhook listener currently binds to `0.0.0.0:8787` and serves `POST /telegram-webhook` by default.
- If you need a different public port/host, set `telegram.webhookUrl` to the externally reachable URL and use a reverse proxy to forward to `:8787`.
4) Direct chats: user sends the first message; all subsequent turns land in the shared `main` session (default, no extra config).
5) Groups: add the bot, disable privacy mode (or make it admin) so it can read messages; group threads stay on `group:<chatId>` and require mention/command to trigger replies.
6) Optional allowlist: reuse `inbound.allowFrom` for direct chats by chat id (`123456789` or `telegram:123456789`).
@@ -32,7 +36,7 @@ Status: ready for bot-mode use with grammY (long-poll + webhook). Text + media s
- Library: grammY is the only client for send + gateway (fetch fallback removed); grammY throttler is enabled by default to stay under Bot API limits.
- Inbound normalization: maps Bot API updates to `MsgContext` with `Surface: "telegram"`, `ChatType: direct|group`, `SenderName`, `MediaPath`/`MediaType` when attachments arrive, and `Timestamp`; groups require @bot mention by default.
- Outbound: text and media (photo/video/audio/document) with optional caption; chunked to limits. Typing cue sent best-effort.
- Config: `TELEGRAM_BOT_TOKEN` env or `telegram.botToken` required; `telegram.requireMention`, `telegram.allowFrom`, `telegram.mediaMaxMb`, `telegram.proxy`, `telegram.webhookSecret`, `telegram.webhookUrl` supported.
- Config: `TELEGRAM_BOT_TOKEN` env or `telegram.botToken` required; `telegram.requireMention`, `telegram.allowFrom`, `telegram.mediaMaxMb`, `telegram.proxy`, `telegram.webhookSecret`, `telegram.webhookUrl`, `telegram.webhookPath` supported.
Example config:
```json5
@@ -44,6 +48,7 @@ Example config:
mediaMaxMb: 5,
proxy: "socks5://localhost:9050",
webhookSecret: "mysecret",
webhookPath: "/telegram-webhook",
webhookUrl: "https://yourdomain.com/telegram-webhook"
}
}
@@ -62,6 +67,6 @@ Example config:
- ⏳ Add more grammY coverage (webhook payloads, media edge cases)
## Safety & ops
- Treat the bot token as a secret (equivalent to account control); store under `~/.clawdis/credentials/` with 0600 perms.
- Respect Telegram rate limits (429s); well add throttling in the provider to stay below flood thresholds.
- Treat the bot token as a secret (equivalent to account control); prefer `TELEGRAM_BOT_TOKEN` or a locked-down config file (`chmod 600 ~/.clawdis/clawdis.json`).
- Respect Telegram rate limits (429s); grammY throttling is enabled by default.
- Use a test bot for development to avoid hitting production chats.