docs: reorganize documentation structure

This commit is contained in:
Peter Steinberger
2026-01-07 00:41:31 +01:00
parent b8db8502aa
commit db4d0b8e75
126 changed files with 881 additions and 270 deletions

View File

@@ -0,0 +1,61 @@
---
summary: "Agent loop lifecycle, streams, and wait semantics"
read_when:
- You need an exact walkthrough of the agent loop or lifecycle events
---
# Agent Loop (Clawdis)
Short, exact flow of one agent run. Source of truth: current code in `src/`.
## Entry points
- Gateway RPC: `agent` and `agent.wait` in [`src/gateway/server-methods/agent.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/agent.ts).
- CLI: `agentCommand` in [`src/commands/agent.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/commands/agent.ts).
## High-level flow
1) `agent` RPC validates params, resolves session (sessionKey/sessionId), persists session metadata, returns `{ runId, acceptedAt }` immediately.
2) `agentCommand` runs the agent:
- resolves model + thinking/verbose defaults
- loads skills snapshot
- calls `runEmbeddedPiAgent` (pi-agent-core runtime)
- emits **lifecycle end/error** if the embedded loop does not emit one
3) `runEmbeddedPiAgent`:
- builds `AgentSession` and subscribes to pi events
- streams assistant deltas + tool events
- enforces timeout -> aborts run if exceeded
- returns payloads + usage metadata
4) `subscribeEmbeddedPiSession` bridges pi-agent-core events to Clawdis `agent` stream:
- tool events => `stream: "tool"`
- assistant deltas => `stream: "assistant"`
- lifecycle events => `stream: "lifecycle"` (`phase: "start" | "end" | "error"`)
5) `agent.wait` uses `waitForAgentJob`:
- waits for **lifecycle end/error** for `runId`
- returns `{ status: ok|error|timeout, startedAt, endedAt, error? }`
## Event streams (today)
- `lifecycle`: emitted by `subscribeEmbeddedPiSession` (and as a fallback by `agentCommand`)
- `assistant`: streamed deltas from pi-agent-core
- `tool`: streamed tool events from pi-agent-core
## Chat provider handling
- `createAgentEventHandler` in [`src/gateway/server-chat.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-chat.ts):
- buffers assistant deltas
- emits chat `delta` messages
- emits chat `final` when **lifecycle end/error** arrives
## Timeouts
- `agent.wait` default: 30s (just the wait). `timeoutMs` param overrides.
- Agent runtime: `agent.timeoutSeconds` default 600s; enforced in `runEmbeddedPiAgent` abort timer.
## Where things can end early
- Agent timeout (abort)
- AbortSignal (cancel)
- Gateway disconnect or RPC timeout
- `agent.wait` timeout (wait-only, does not stop agent)
## Files
- [`src/gateway/server-methods/agent.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/agent.ts)
- [`src/gateway/server-methods/agent-job.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-methods/agent-job.ts)
- [`src/commands/agent.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/commands/agent.ts)
- [`src/agents/pi-embedded-runner.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-embedded-runner.ts)
- [`src/agents/pi-embedded-subscribe.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/agents/pi-embedded-subscribe.ts)
- [`src/gateway/server-chat.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server-chat.ts)

View File

@@ -0,0 +1,187 @@
---
summary: "Agent workspace: location, layout, and backup strategy"
read_when:
- You need to explain the agent workspace or its file layout
- You want to back up or migrate an agent workspace
---
# Agent workspace
The workspace is the agent's home. It is the only working directory used for
file tools and for workspace context. Keep it private and treat it as memory.
This is separate from `~/.clawdbot/`, which stores config, credentials, and
sessions.
## Default location
- Default: `~/clawd`
- If `CLAWDBOT_PROFILE` is set and not `"default"`, the default becomes
`~/clawd-<profile>`.
- Override in `~/.clawdbot/clawdbot.json`:
```json5
{
agent: {
workspace: "~/clawd"
}
}
```
`clawdbot onboard`, `clawdbot configure`, or `clawdbot setup` will create the
workspace and seed the bootstrap files if they are missing.
If you already manage the workspace files yourself, you can disable bootstrap
file creation:
```json5
{ agent: { skipBootstrap: true } }
```
## Workspace file map (what each file means)
These are the standard files Clawdbot expects inside the workspace:
- `AGENTS.md`
- Operating instructions for the agent and how it should use memory.
- Loaded at the start of every session.
- Good place for rules, priorities, and "how to behave" details.
- `SOUL.md`
- Persona, tone, and boundaries.
- Loaded every session.
- `USER.md`
- Who the user is and how to address them.
- Loaded every session.
- `IDENTITY.md`
- The agent's name, vibe, and emoji.
- Created/updated during the bootstrap ritual.
- `TOOLS.md`
- Notes about your local tools and conventions.
- Does not control tool availability; it is only guidance.
- `HEARTBEAT.md`
- Optional tiny checklist for heartbeat runs.
- Keep it short to avoid token burn.
- `BOOTSTRAP.md`
- One-time first-run ritual.
- Only created for a brand-new workspace.
- Delete it after the ritual is complete.
- `memory/YYYY-MM-DD.md`
- Daily memory log (one file per day).
- Recommended to read today + yesterday on session start.
- `MEMORY.md` (optional)
- Curated long-term memory.
- Only load in the main, private session (not shared/group contexts).
- `skills/` (optional)
- Workspace-specific skills.
- Overrides managed/bundled skills when names collide.
- `canvas/` (optional)
- Canvas UI files for node displays (for example `canvas/index.html`).
If any bootstrap file is missing, Clawdbot injects a "missing file" marker into
the session and continues. `clawdbot setup` can recreate missing defaults
without overwriting existing files.
## What is NOT in the workspace
These live under `~/.clawdbot/` and should NOT be committed to the workspace repo:
- `~/.clawdbot/clawdbot.json` (config)
- `~/.clawdbot/credentials/` (OAuth tokens, API keys)
- `~/.clawdbot/agents/<agentId>/sessions/` (session transcripts + metadata)
- `~/.clawdbot/skills/` (managed skills)
If you need to migrate sessions or config, copy them separately and keep them
out of version control.
## Git backup (recommended, private)
Treat the workspace as private memory. Put it in a **private** git repo so it is
backed up and recoverable.
Run these steps on the machine where the Gateway runs (that is where the
workspace lives).
### 1) Initialize the repo
```bash
cd ~/clawd
git init
git add AGENTS.md SOUL.md TOOLS.md IDENTITY.md USER.md HEARTBEAT.md memory/
git commit -m "Add agent workspace"
```
### 2) Add a private remote (beginner-friendly options)
Option A: GitHub web UI
1. Create a new **private** repository on GitHub.
2. Do not initialize with a README (avoids merge conflicts).
3. Copy the HTTPS remote URL.
4. Add the remote and push:
```bash
git branch -M main
git remote add origin <https-url>
git push -u origin main
```
Option B: GitHub CLI (`gh`)
```bash
gh auth login
gh repo create clawd-workspace --private --source . --remote origin --push
```
### 3) Ongoing updates
```bash
git status
git add .
git commit -m "Update memory"
git push
```
## Do not commit secrets
Even in a private repo, avoid storing secrets in the workspace:
- API keys, OAuth tokens, passwords, or private credentials.
- Anything under `~/.clawdbot/`.
- Raw dumps of chats or sensitive attachments.
If you must store sensitive references, use placeholders and keep the real
secret elsewhere (password manager, environment variables, or `~/.clawdbot/`).
Suggested `.gitignore` starter:
```gitignore
.DS_Store
.env
**/*.key
**/*.pem
**/secrets*
```
## Moving the workspace to a new machine
1. Clone the repo to the desired path (default `~/clawd`).
2. Set `agent.workspace` to that path in `~/.clawdbot/clawdbot.json`.
3. Run `clawdbot setup --workspace <path>` to seed any missing files.
4. If you need sessions, copy `~/.clawdbot/agents/<agentId>/sessions/` from the
old machine separately.
## Advanced notes
- Multi-agent routing can use different workspaces per agent. See
`docs/provider-routing.md` for routing configuration.
- If `agent.sandbox` is enabled, non-main sessions can use per-session sandbox
workspaces under `agent.sandbox.workspaceRoot`.

112
docs/concepts/agent.md Normal file
View File

@@ -0,0 +1,112 @@
---
summary: "Agent runtime (embedded p-mono), workspace contract, and session bootstrap"
read_when:
- Changing agent runtime, workspace bootstrap, or session behavior
---
# Agent Runtime 🤖
CLAWDBOT runs a single embedded agent runtime derived from **p-mono** (internal name: **p**).
## Workspace (required)
CLAWDBOT uses a single agent workspace directory (`agent.workspace`) as the agents **only** working directory (`cwd`) for tools and context.
Recommended: use `clawdbot setup` to create `~/.clawdbot/clawdbot.json` if missing and initialize the workspace files.
Full workspace layout + backup guide: [`docs/agent-workspace.md`](/agent-workspace)
If `agent.sandbox` is enabled, non-main sessions can override this with
per-session workspaces under `agent.sandbox.workspaceRoot` (see
[`docs/configuration.md`](/configuration)).
## Bootstrap files (injected)
Inside `agent.workspace`, CLAWDBOT expects these user-editable files:
- `AGENTS.md` — operating instructions + “memory”
- `SOUL.md` — persona, boundaries, tone
- `TOOLS.md` — user-maintained tool notes (e.g. `imsg`, `sag`, conventions)
- `BOOTSTRAP.md` — one-time first-run ritual (deleted after completion)
- `IDENTITY.md` — agent name/vibe/emoji
- `USER.md` — user profile + preferred address
On the first turn of a new session, CLAWDBOT injects the contents of these files directly into the agent context.
If a file is missing, CLAWDBOT injects a single “missing file” marker line (and `clawdbot setup` will create a safe default template).
`BOOTSTRAP.md` is only created for a **brand new workspace** (no other bootstrap files present). If you delete it after completing the ritual, it should not be recreated on later restarts.
To disable bootstrap file creation entirely (for pre-seeded workspaces), set:
```json5
{ agent: { skipBootstrap: true } }
```
## Built-in tools (internal)
ps embedded core tools (read/bash/edit/write and related internals) are defined in code and always available. `TOOLS.md` does **not** control which tools exist; its guidance for how *you* want them used.
## Skills
Clawdbot loads skills from three locations (workspace wins on name conflict):
- Bundled (shipped with the install)
- Managed/local: `~/.clawdbot/skills`
- Workspace: `<workspace>/skills`
Skills can be gated by config/env (see `skills` in [`docs/configuration.md`](/configuration)).
## p-mono integration
Clawdbot reuses pieces of the p-mono codebase (models/tools), but **session management, discovery, and tool wiring are Clawdbot-owned**.
- No p-coding agent runtime.
- No `~/.pi/agent` or `<workspace>/.pi` settings are consulted.
## Peter @ steipete (only)
Apply these notes **only** when the user is Peter Steinberger at steipete.
- Gateway runs on the **Mac Studio in London**.
- Primary work computer: **MacBook Pro**.
- Peter travels between **Vienna** and **London**; there are two networks bridged via **Tailscale**.
- For debugging, connect to the Mac Studio (London) or MacBook Pro (primary).
- There is also an **M1 MacBook Pro** on the Vienna tailnet you can use to access the Vienna network.
- Nodes can be accessed via the `clawdbot` binary (`pnpm clawdbot` in `~/Projects/clawdbot`).
- See also `skills/clawdbot*` for node/browser/canvas/cron usage.
## Sessions
Session transcripts are stored as JSONL at:
- `~/.clawdbot/agents/<agentId>/sessions/<SessionId>.jsonl`
The session ID is stable and chosen by CLAWDBOT.
Legacy Pi/Tau session folders are **not** read.
## Steering while streaming
When queue mode is `steer`, inbound messages are injected into the current run.
The queue is checked **after each tool call**; if a queued message is present,
remaining tool calls from the current assistant message are skipped (error tool
results with "Skipped due to queued user message."), then the queued user
message is injected before the next assistant response.
When queue mode is `followup` or `collect`, inbound messages are held until the
current turn ends, then a new agent turn starts with the queued payloads. See
[`docs/queue.md`](/queue) for mode + debounce/cap behavior.
Block streaming sends completed assistant blocks as soon as they finish; disable
via `agent.blockStreamingDefault: "off"` if you only want the final response.
Tune the boundary via `agent.blockStreamingBreak` (`text_end` vs `message_end`; defaults to text_end).
Control soft block chunking with `agent.blockStreamingChunk` (defaults to
8001200 chars; prefers paragraph breaks, then newlines; sentences last).
Verbose tool summaries are emitted at tool start (no debounce); Control UI
streams tool output via agent events when available.
## Configuration (minimal)
At minimum, set:
- `agent.workspace`
- `whatsapp.allowFrom` (strongly recommended)
---
*Next: [Group Chats](/group-messages)* 🦞

View File

@@ -0,0 +1,114 @@
---
summary: "WebSocket gateway architecture, components, and client flows"
read_when:
- Working on gateway protocol, clients, or transports
---
# Gateway Architecture
Last updated: 2026-01-05
## Overview
- A single long-lived **Gateway** process owns all messaging surfaces (WhatsApp via Baileys, Telegram via grammY, Slack via Bolt, Discord via discord.js, Signal via signal-cli, iMessage via imsg, WebChat) and the control/event plane.
- All clients (macOS app, CLI, web UI, automations) connect to the Gateway over one transport: **WebSocket on the configured bind host** (default `127.0.0.1:18789`; tunnel or VPN for remote).
- One Gateway per host; it is the only place that is allowed to open a WhatsApp session. All sends/agent runs go through it.
- By default: the Gateway exposes a Canvas host on `canvasHost.port` (default `18793`), serving `~/clawd/canvas` at `/__clawdbot__/canvas/` with live-reload; disable via `canvasHost.enabled=false` or `CLAWDBOT_SKIP_CANVAS_HOST=1`.
## Implementation snapshot (current code)
### TypeScript Gateway ([`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts))
- Single HTTP + WebSocket server (default `18789`); bind policy `loopback|lan|tailnet|auto`. Refuses non-loopback binds without auth; Tailscale serve/funnel requires loopback.
- Handshake: first frame must be a `connect` request; AJV validates request + params against TypeBox schemas; protocol negotiated via `minProtocol`/`maxProtocol`.
- `hello-ok` includes snapshot (presence/health/stateVersion/uptime/configPath/stateDir), features (methods/events), policy (max payload/buffer/tick), and `canvasHostUrl` when available.
- Events emitted: `agent`, `chat`, `presence`, `tick`, `health`, `heartbeat`, `cron`, `talk.mode`, `node.pair.requested`, `node.pair.resolved`, `voicewake.changed`, `shutdown`.
- Idempotency keys are required for `send`, `agent`, `chat.send`, and node invokes; the dedupe cache avoids double-sends on reconnects. Payload sizes are capped per connection.
- Optional node bridge ([`src/infra/bridge/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/bridge/server.ts)): TCP JSONL frames (`hello`, `pair-request`, `req/res`, `event`, `invoke`, `ping`). Node connect/disconnect updates presence and flows into the session bus.
- Control UI + Canvas host: HTTP serves Control UI (base path configurable) and can host the A2UI canvas via [`src/canvas-host/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/canvas-host/server.ts) (live reload). Canvas host URL is advertised to nodes + clients.
### iOS node (`apps/ios`)
- Discovery + pairing: `BridgeDiscoveryModel` uses `NWBrowser` Bonjour discovery and reads TXT fields for LAN/tailnet host hints plus gateway/bridge/canvas ports.
- Auto-connect: `BridgeConnectionController` uses stored `node.instanceId` + Keychain token; supports manual host/port; performs `pair-and-hello`.
- Bridge runtime: `BridgeSession` actor owns an `NWConnection`, JSONL frames, `hello`/`hello-ok`, ping/pong, `req/res`, server `event`s, and `invoke` callbacks; stores `canvasHostUrl`.
- Commands: `NodeAppModel` executes `canvas.*`, `canvas.a2ui.*`, `camera.*`, `screen.record`, `location.get`. Canvas/camera/screen are blocked when backgrounded.
- Canvas + actions: `WKWebView` with A2UI action bridge; accepts actions from local-network or trusted file URLs; intercepts `clawdbot://` deep links and forwards `agent.request` to the bridge.
- Voice/talk: voice wake sends `voice.transcript` events and syncs triggers via `voicewake.get` + `voicewake.changed`; Talk Mode attaches to the bridge.
### Android node (`apps/android`)
- Discovery + pairing: `BridgeDiscovery` uses mDNS/NSD to find `_clawdbot-bridge._tcp`, with manual host/port fallback.
- Auto-connect: `NodeRuntime` restores a stored token, performs `pair-and-hello`, and reconnects to the last discovered or manual bridge.
- Bridge runtime: `BridgeSession` owns the TCP JSONL session (`hello`/`hello-ok`, ping/pong, `req/res`, `event`, `invoke`); stores `canvasHostUrl`.
- Commands: `NodeRuntime` executes `canvas.*`, `canvas.a2ui.*`, `camera.*`, and chat/session events; foreground-only for canvas/camera.
## Components and flows
- **Gateway (daemon)**
- Maintains WhatsApp (Baileys), Telegram (grammY), Slack (Bolt), Discord (discord.js), Signal (signal-cli), and iMessage (imsg) connections.
- Exposes a typed WS API (req/resp + server push events).
- Validates every inbound frame against JSON Schema; rejects anything before a mandatory `connect`.
- **Clients (mac app / CLI / web admin)**
- One WS connection per client.
- Send requests (`health`, `status`, `send`, `agent`, `system-presence`, toggles) and subscribe to events (`tick`, `agent`, `presence`, `shutdown`).
- On macOS, the app can also be invoked via deep links (`clawdbot://agent?...`) which translate into the same Gateway `agent` request path (see [`docs/macos.md`](/macos)).
- **Agent process (Pi)**
- Spawned by the Gateway on demand for `agent` calls; streams events back over the same WS connection.
- **WebChat**
- Serves static assets locally.
- Holds a single WS connection to the Gateway for control/data; all sends/agent runs go through the Gateway WS.
- Remote use goes through the same SSH/Tailscale tunnel as other clients.
## Connection lifecycle (single client)
```
Client Gateway
| |
|---- req:connect -------->|
|<------ res (ok) ---------| (or res error + close)
| (payload=hello-ok carries snapshot: presence + health)
| |
|<------ event:presence ---| (deltas)
|<------ event:tick -------| (keepalive/no-op)
| |
|------- req:agent ------->|
|<------ res:agent --------| (ack: {runId,status:"accepted"})
|<------ event:agent ------| (streaming)
|<------ res:agent --------| (final: {runId,status,summary})
| |
```
## Wire protocol (summary)
- Transport: WebSocket, text frames with JSON payloads.
- First frame must be `req {type:"req", id, method:"connect", params:{minProtocol, maxProtocol, client:{name,version,platform,mode,instanceId}, caps, auth?, locale?, userAgent? } }`.
- Server replies `res {type:"res", id, ok:true, payload: hello-ok }` or `ok:false` then closes.
- After handshake:
- Requests: `{type:"req", id, method, params}``{type:"res", id, ok, payload|error}`
- Events: `{type:"event", event:"agent"|"presence"|"tick"|"shutdown", payload, seq?, stateVersion?}`
- If `CLAWDBOT_GATEWAY_TOKEN` (or `--token`) is set, `connect.params.auth.token` must match; otherwise the socket closes with policy violation.
- Presence payload is structured, not free text: `{host, ip, version, mode, lastInputSeconds?, ts, reason?, tags?[], instanceId? }`.
- Agent runs are acked `{runId,status:"accepted"}` then complete with a final res `{runId,status,summary}`; streamed output arrives as `event:"agent"`.
- Protocol versions are bumped on breaking changes; clients must match `minClient`; Gateway chooses within clients min/max.
- Idempotency keys are required for side-effecting methods (`send`, `agent`) to safely retry; server keeps a short-lived dedupe cache.
- Policy in `hello-ok` communicates payload/queue limits and tick interval.
## Type system and codegen
- Source of truth: TypeBox (or ArkType) definitions in `protocol/` on the server.
- Build step emits JSON Schema.
- Clients:
- TypeScript: uses the same TypeBox types directly.
- Swift: generated `Codable` models via quicktype from the JSON Schema.
- Validation: AJV on the server for every inbound frame; optional client-side validation for defensive programming.
## Invariants
- Exactly one Gateway controls a single Baileys session per host. No fallbacks to ad-hoc direct Baileys sends.
- Handshake is mandatory; any non-JSON or non-connect first frame is a hard close.
- All methods and events are versioned; new fields are additive; breaking changes increment `protocol`.
- No event replay: on seq gaps, clients must refresh (`health` + `system-presence`) and continue; presence is bounded via TTL/max entries.
## Remote access
- Preferred: Tailscale or VPN; alternate: SSH tunnel `ssh -N -L 18789:127.0.0.1:18789 user@host`.
- Same protocol over the tunnel; same handshake. If a shared token is configured, clients must send it in `connect.params.auth.token` even over the tunnel.
- Same protocol over the tunnel; same handshake. If a shared token is configured, clients must send it in `connect.params.auth.token` even over the tunnel.
## Operations snapshot
- Start: `clawdbot gateway` (foreground, logs to stdout).
Supervise with launchd/systemd for restarts.
- Health: request `health` over WS; also surfaced in `hello-ok.health`.
- Metrics/logging: keep outside this spec; gateway should expose Prometheus text or structured logs separately.
## Migration notes
- This architecture supersedes the legacy stdin RPC and the ad-hoc TCP control port. New clients should speak only the WS protocol. Legacy compatibility is intentionally dropped.

View File

@@ -0,0 +1,73 @@
---
summary: "Behavior and config for WhatsApp group message handling (mentionPatterns are shared across surfaces)"
read_when:
- Changing group message rules or mentions
---
# Group messages (web provider)
Goal: let Clawd sit in WhatsApp groups, wake up only when pinged, and keep that thread separate from the personal DM session.
Note: `routing.groupChat.mentionPatterns` is now used by Telegram/Discord/Slack/iMessage as well; this doc focuses on WhatsApp-specific behavior.
## Whats implemented (2025-12-03)
- Activation modes: `mention` (default) or `always`. `mention` requires a ping (real WhatsApp @-mentions via `mentionedJids`, regex patterns, or the bots E.164 anywhere in the text). `always` wakes the agent on every message but it should reply only when it can add meaningful value; otherwise it returns the silent token `NO_REPLY`. Defaults can be set in config (`whatsapp.groups`) and overridden per group via `/activation`. When `whatsapp.groups` is set, it also acts as a group allowlist (include `"*"` to allow all).
- Group policy: `whatsapp.groupPolicy` controls whether group messages are accepted (`open|disabled|allowlist`). `allowlist` uses `whatsapp.groupAllowFrom` (fallback: explicit `whatsapp.allowFrom`).
- Per-group sessions: session keys look like `whatsapp:group:<jid>` so commands such as `/verbose on` or `/think high` (sent as standalone messages) are scoped to that group; personal DM state is untouched. Heartbeats are skipped for group threads.
- Context injection: last N (default 50) group messages are prefixed under `[Chat messages since your last reply - for context]`, with the triggering line under `[Current message - respond to this]`.
- Sender surfacing: every group batch now ends with `[from: Sender Name (+E164)]` so Pi knows who is speaking.
- Ephemeral/view-once: we unwrap those before extracting text/mentions, so pings inside them still trigger.
- Group system prompt: on the first turn of a group session (and whenever `/activation` changes the mode) we inject a short blurb into the system prompt like `You are replying inside the WhatsApp group "<subject>". Group members: Alice (+44...), Bob (+43...), … Activation: trigger-only … Address the specific sender noted in the message context.` If metadata isnt available we still tell the agent its a group chat.
## Config for Clawd UK (+447700900123)
Add a `groupChat` block to `~/.clawdbot/clawdbot.json` so display-name pings work even when WhatsApp strips the visual `@` in the text body:
```json5
{
"whatsapp": {
"groups": {
"*": { "requireMention": true }
}
},
"routing": {
"groupChat": {
"historyLimit": 50,
"mentionPatterns": [
"@?clawd",
"@?clawd\\s*uk",
"@?clawdbot",
"\\+?447700900123"
]
}
}
}
```
Notes:
- The regexes are case-insensitive; they cover `@clawd`, `@clawd uk`, `clawdbot`, and the raw number with or without `+`/spaces.
- WhatsApp still sends canonical mentions via `mentionedJids` when someone taps the contact, so the number fallback is rarely needed but is a good safety net.
### Activation command (owner-only)
Use the group chat command:
- `/activation mention`
- `/activation always`
Only the owner number (from `whatsapp.allowFrom`, or the bots own E.164 when unset) can change this. Send `/status` as a standalone message in the group to see the current activation mode.
## How to use
1) Add Clawd UK (`+447700900123`) to the group.
2) Say `@clawd …` (or `@clawd uk`, `@clawdbot`, or include the number). Anyone in the group can trigger it.
3) The agent prompt will include recent group context plus the trailing `[from: …]` marker so it can address the right person.
4) Session-level directives (`/verbose on`, `/think high`, `/new` or `/reset`, `/compact`) apply only to that groups session; send them as standalone messages so they register. Your personal DM session remains independent.
## Testing / verification
- Automated: `pnpm test -- src/web/auto-reply.test.ts --runInBand` (covers mention gating, history injection, sender suffix).
- Manual smoke:
- Send an `@clawd` ping in the group and confirm a reply that references the sender name.
- Send a second ping and verify the history block is included then cleared on the next turn.
- Check gateway logs (run with `--verbose`) to see `inbound web message` entries showing `from: <groupJid>` and the `[from: …]` suffix.
## Known considerations
- Heartbeats are intentionally skipped for groups to avoid noisy broadcasts.
- Echo suppression uses the combined batch string; if you send identical text twice without mentions, only the first will get a response.
- Session store entries will appear as `agent:<agentId>:whatsapp:group:<jid>` in the session store (`~/.clawdbot/agents/<agentId>/sessions/sessions.json` by default); a missing entry just means the group hasnt triggered a run yet.

130
docs/concepts/groups.md Normal file
View File

@@ -0,0 +1,130 @@
---
summary: "Group chat behavior across surfaces (WhatsApp/Telegram/Discord/Slack/Signal/iMessage)"
read_when:
- Changing group chat behavior or mention gating
---
# Groups
Clawdbot treats group chats consistently across surfaces: WhatsApp, Telegram, Discord, Slack, Signal, iMessage.
## Session keys
- Group sessions use `agent:<agentId>:<provider>:group:<id>` session keys (rooms/channels use `agent:<agentId>:<provider>:channel:<id>`).
- Direct chats use the main session (or per-sender if configured).
- Heartbeats are skipped for group sessions.
## Display labels
- UI labels use `displayName` when available, formatted as `<provider>:<token>`.
- `#room` is reserved for rooms/channels; group chats use `g-<slug>` (lowercase, spaces -> `-`, keep `#@+._-`).
## Group policy
Control how group/room messages are handled per provider:
```json5
{
whatsapp: {
groupPolicy: "disabled", // "open" | "disabled" | "allowlist"
groupAllowFrom: ["+15551234567"]
},
telegram: {
groupPolicy: "disabled",
groupAllowFrom: ["123456789", "@username"]
},
signal: {
groupPolicy: "disabled",
groupAllowFrom: ["+15551234567"]
},
imessage: {
groupPolicy: "disabled",
groupAllowFrom: ["chat_id:123"]
},
discord: {
groupPolicy: "allowlist",
guilds: {
"GUILD_ID": { channels: { help: { allow: true } } }
}
},
slack: {
groupPolicy: "allowlist",
channels: { "#general": { allow: true } }
}
}
```
| Policy | Behavior |
|--------|----------|
| `"open"` | Default. Groups bypass allowlists; mention-gating still applies. |
| `"disabled"` | Block all group messages entirely. |
| `"allowlist"` | Only allow groups/rooms that match the configured allowlist. |
Notes:
- `groupPolicy` is separate from mention-gating (which requires @mentions).
- WhatsApp/Telegram/Signal/iMessage: use `groupAllowFrom` (fallback: explicit `allowFrom`).
- Discord: allowlist uses `discord.guilds.<id>.channels`.
- Slack: allowlist uses `slack.channels`.
- Group DMs are controlled separately (`discord.dm.*`, `slack.dm.*`).
- Telegram allowlist can match user IDs (`"123456789"`, `"telegram:123456789"`, `"tg:123456789"`) or usernames (`"@alice"` or `"alice"`); prefixes are case-insensitive.
## Mention gating (default)
Group messages require a mention unless overridden per group. Defaults live per subsystem under `*.groups."*"`.
```json5
{
whatsapp: {
groups: {
"*": { requireMention: true },
"123@g.us": { requireMention: false }
}
},
telegram: {
groups: {
"*": { requireMention: true },
"123456789": { requireMention: false }
}
},
imessage: {
groups: {
"*": { requireMention: true },
"123": { requireMention: false }
}
},
routing: {
groupChat: {
mentionPatterns: ["@clawd", "clawdbot", "\\+15555550123"],
historyLimit: 50
}
}
}
```
Notes:
- `mentionPatterns` are case-insensitive regexes.
- Surfaces that provide explicit mentions still pass; patterns are a fallback.
- Mention gating is only enforced when mention detection is possible (native mentions or `mentionPatterns` are configured).
- Discord defaults live in `discord.guilds."*"` (overridable per guild/channel).
## Group allowlists
When `whatsapp.groups`, `telegram.groups`, or `imessage.groups` is configured, the keys act as a group allowlist. Use `"*"` to allow all groups while still setting default mention behavior.
## Activation (owner-only)
Group owners can toggle per-group activation:
- `/activation mention`
- `/activation always`
Owner is determined by `whatsapp.allowFrom` (or the bots self E.164 when unset). Send the command as a standalone message. Other surfaces currently ignore `/activation`.
## Context fields
Group inbound payloads set:
- `ChatType=group`
- `GroupSubject` (if known)
- `GroupMembers` (if known)
- `WasMentioned` (mention gating result)
The agent system prompt includes a group intro on the first turn of a new group session.
## iMessage specifics
- Prefer `chat_id:<id>` when routing or allowlisting.
- List chats: `imsg chats --limit 20`.
- Group replies always go back to the same `chat_id`.
## WhatsApp specifics
See [`docs/group-messages.md`](/group-messages) for WhatsApp-only behavior (history injection, mention handling details).

View File

@@ -0,0 +1,93 @@
---
summary: "How Clawdbot rotates auth profiles and falls back across models"
read_when:
- Diagnosing auth profile rotation, cooldowns, or model fallback behavior
- Updating failover rules for auth profiles or models
---
# Model failover
Clawdbot handles failures in two stages:
1) **Auth profile rotation** within the current provider.
2) **Model fallback** to the next model in `agent.model.fallbacks`.
This doc explains the runtime rules and the data that backs them.
## Auth storage (keys + OAuth)
Clawdbot uses **auth profiles** for both API keys and OAuth tokens.
- Secrets live in `~/.clawdbot/agents/<agentId>/agent/auth-profiles.json` (legacy: `~/.clawdbot/agent/auth-profiles.json`).
- Config `auth.profiles` / `auth.order` are **metadata + routing only** (no secrets).
- Legacy import-only OAuth file: `~/.clawdbot/credentials/oauth.json` (imported into `auth-profiles.json` on first use).
Credential types:
- `type: "api_key"``{ provider, key }`
- `type: "oauth"``{ provider, access, refresh, expires, email? }` (+ `projectId`/`enterpriseUrl` for some providers)
## Profile IDs
OAuth logins create distinct profiles so multiple accounts can coexist.
- Default: `provider:default` when no email is available.
- OAuth with email: `provider:<email>` (for example `google-antigravity:user@gmail.com`).
Profiles live in `~/.clawdbot/agents/<agentId>/agent/auth-profiles.json` under `profiles`.
## Rotation order
When a provider has multiple profiles, Clawdbot chooses an order like this:
1) **Explicit config**: `auth.order[provider]` (if set).
2) **Configured profiles**: `auth.profiles` filtered by provider.
3) **Stored profiles**: entries in `auth-profiles.json` for the provider.
If no explicit order is configured, Clawdbot uses a roundrobin order:
- **Primary key:** profile type (**OAuth before API keys**).
- **Secondary key:** `usageStats.lastUsed` (oldest first, within each type).
- **Cooldown profiles** are moved to the end, ordered by soonest cooldown expiry.
### Why OAuth can “look lost”
If you have both an OAuth profile and an API key profile for the same provider, roundrobin can switch between them across messages unless pinned. To force a single profile:
- Pin with `auth.order[provider] = ["provider:profileId"]`, or
- Use a per-session override via `/model …` with a profile override (when supported by your UI/chat surface).
## Cooldowns
When a profile fails due to auth/ratelimit errors (or a timeout that looks
like rate limiting), Clawdbot marks it in cooldown and moves to the next profile.
Cooldowns use exponential backoff:
- 1 minute
- 5 minutes
- 25 minutes
- 1 hour (cap)
State is stored in `auth-profiles.json` under `usageStats`:
```json
{
"usageStats": {
"provider:profile": {
"lastUsed": 1736160000000,
"cooldownUntil": 1736160600000,
"errorCount": 2
}
}
}
```
## Model fallback
If all profiles for a provider fail, Clawdbot moves to the next model in
`agent.model.fallbacks`. This applies to auth failures, rate limits, and
timeouts that exhausted profile rotation.
## Related config
See [`docs/configuration.md`](/configuration) for:
- `auth.profiles` / `auth.order`
- `agent.model.primary` / `agent.model.fallbacks`
- `agent.imageModel` routing
See [`docs/models.md`](/models) for the broader model selection and fallback overview.

97
docs/concepts/models.md Normal file
View File

@@ -0,0 +1,97 @@
---
summary: "Plan for models CLI: scan, list, aliases, fallbacks, status"
read_when:
- Adding or modifying models CLI (models list/set/scan/aliases/fallbacks)
- Changing model fallback behavior or selection UX
- Updating model scan probes (tools/images)
---
# Models CLI plan
See [`docs/model-failover.md`](/model-failover) for how auth profiles rotate (OAuth vs API keys), cooldowns, and how that interacts with model fallbacks.
Goal: give clear model visibility + control (configured vs available), plus scan tooling
that prefers tool-call + image-capable models and maintains ordered fallbacks.
## Model recommendations
Through testing, weve found Anthropic Opus 4.5 is the most useful general-purpose model for anything coding-related. We suggest GPT 5.2 Codex for coding and sub-agents. For personal assistant work, nothing comes close to Opus. If youre going all-in on Claude, we recommend the Max $200 subscription: https://claude.com/pricing
## Command tree (draft)
- `clawdbot models list`
- default: configured models only
- flags: `--all` (full catalog), `--local`, `--provider <name>`, `--json`, `--plain`
- `clawdbot models status`
- show default model + aliases + fallbacks + configured models
- `clawdbot models set <modelOrAlias>`
- writes `agent.model.primary` and ensures `agent.models` entry
- `clawdbot models set-image <modelOrAlias>`
- writes `agent.imageModel.primary` and ensures `agent.models` entry
- `clawdbot models aliases list|add|remove`
- writes `agent.models.*.alias`
- `clawdbot models fallbacks list|add|remove|clear`
- writes `agent.model.fallbacks`
- `clawdbot models image-fallbacks list|add|remove|clear`
- writes `agent.imageModel.fallbacks`
- `clawdbot models scan`
- OpenRouter :free scan; probe tool-call + image; interactive selection
## Config changes
- `agent.models` (configured model catalog + aliases).
- `agent.model.primary` + `agent.model.fallbacks`.
- `agent.imageModel.primary` + `agent.imageModel.fallbacks` (optional).
- `auth.profiles` + `auth.order` for per-provider auth failover.
## Scan behavior (models scan)
Input
- OpenRouter `/models` list (filter `:free`)
- Requires OpenRouter API key from auth profiles or `OPENROUTER_API_KEY`
- Optional filters: `--max-age-days`, `--min-params`, `--provider`, `--max-candidates`
- Probe controls: `--timeout`, `--concurrency`
Probes (direct pi-ai complete)
- Tool-call probe (required):
- Provide a dummy tool, verify tool call emitted.
- Image probe (preferred):
- Prompt includes 1x1 PNG; success if no "unsupported image" error.
Scoring/selection
- Prefer models passing tool + image for text/tool fallbacks.
- Prefer image-only models for image tool fallback (even if tool probe fails).
- Rank by: image ok, then lower tool latency, then larger context, then params.
Interactive selection (TTY)
- Multiselect list with per-model stats:
- model id, tool ok, image ok, median latency, context, inferred params.
- Pre-select top N (default 6).
- Non-TTY: auto-select; require `--yes`/`--no-input` to apply.
Output
- Writes `agent.model.fallbacks` ordered.
- Writes `agent.imageModel.fallbacks` ordered (image-capable models).
- Ensures `agent.models` entries exist for selected models.
- Optional `--set-default` to set `agent.model.primary`.
- Optional `--set-image` to set `agent.imageModel.primary`.
## Runtime fallback
- On model failure: try `agent.model.fallbacks` in order.
- Per-provider auth failover uses `auth.order` (or stored profile order) **before**
moving to the next model.
- Image routing uses `agent.imageModel` **only when configured** and the primary
model lacks image input.
- Persist last successful provider/model to session entry; auth profile success is global.
- See [`docs/model-failover.md`](/model-failover) for auth profile rotation, cooldowns, and timeout handling.
## Tests
- Unit: scan selection ordering + probe classification.
- CLI: list/aliases/fallbacks add/remove + scan writes config.
- Status: shows last used model + fallbacks.
## Docs
- Update [`docs/configuration.md`](/configuration) with `agent.models` + `agent.model` + `agent.imageModel`.
- Keep this doc current when CLI surface or scan logic changes.

View File

@@ -0,0 +1,121 @@
---
summary: "Multi-agent routing: isolated agents, provider accounts, and bindings"
title: Multi-Agent Routing
read_when: "You want multiple isolated agents (workspaces + auth) in one gateway process."
status: active
---
# Multi-Agent Routing
Goal: multiple *isolated* agents (separate workspace + `agentDir` + sessions), plus multiple provider accounts (e.g. two WhatsApps) in one running Gateway. Inbound is routed to an agent via bindings.
## What is “one agent”?
An **agent** is a fully scoped brain with its own:
- **Workspace** (files, AGENTS.md/SOUL.md/USER.md, local notes, persona rules).
- **State directory** (`agentDir`) for auth profiles, model registry, and per-agent config.
- **Session store** (chat history + routing state) under `~/.clawdbot/agents/<agentId>/sessions`.
The Gateway can host **one agent** (default) or **many agents** side-by-side.
### Single-agent mode (default)
If you do nothing, Clawdbot runs a single agent:
- `agentId` defaults to **`main`**.
- Sessions are keyed as `agent:main:<mainKey>`.
- Workspace defaults to `~/clawd` (or `~/clawd-<profile>` when `CLAWDBOT_PROFILE` is set).
- State defaults to `~/.clawdbot/agents/main/agent`.
## Multiple agents = multiple people, multiple personalities
With **multiple agents**, each `agentId` becomes a **fully isolated persona**:
- **Different phone numbers/accounts** (per provider `accountId`).
- **Different personalities** (per-agent workspace files like `AGENTS.md` and `SOUL.md`).
- **Separate auth + sessions** (no cross-talk unless explicitly enabled).
This lets **multiple people** share one Gateway server while keeping their AI “brains” and data isolated.
## Routing rules (how messages pick an agent)
Bindings are **deterministic** and **most-specific wins**:
1. `peer` match (exact DM/group/channel id)
2. `guildId` (Discord)
3. `teamId` (Slack)
4. `accountId` match for a provider
5. provider-level match (`accountId: "*"`)
6. fallback to `routing.defaultAgentId` (default: `main`)
## Multiple accounts / phone numbers
Providers that support **multiple accounts** (e.g. WhatsApp) use `accountId` to identify
each login. Each `accountId` can be routed to a different agent, so one server can host
multiple phone numbers without mixing sessions.
## Concepts
- `agentId`: one “brain” (workspace, per-agent auth, per-agent session store).
- `accountId`: one provider account instance (e.g. WhatsApp account `"personal"` vs `"biz"`).
- `binding`: routes inbound messages to an `agentId` by `(provider, accountId, peer)` and optionally guild/team ids.
- Direct chats collapse to `agent:<agentId>:<mainKey>` (per-agent “main”; `session.mainKey`).
## Example: two WhatsApps → two agents
`~/.clawdbot/clawdbot.json` (JSON5):
```js
{
routing: {
defaultAgentId: "home",
agents: {
home: {
workspace: "~/clawd-home",
agentDir: "~/.clawdbot/agents/home/agent",
},
work: {
workspace: "~/clawd-work",
agentDir: "~/.clawdbot/agents/work/agent",
},
},
// Deterministic routing: first match wins (most-specific first).
bindings: [
{ agentId: "home", match: { provider: "whatsapp", accountId: "personal" } },
{ agentId: "work", match: { provider: "whatsapp", accountId: "biz" } },
// Optional per-peer override (example: send a specific group to work agent).
{
agentId: "work",
match: {
provider: "whatsapp",
accountId: "personal",
peer: { kind: "group", id: "1203630...@g.us" },
},
},
],
// Off by default: agent-to-agent messaging must be explicitly enabled + allowlisted.
agentToAgent: {
enabled: false,
allow: ["home", "work"],
},
},
whatsapp: {
accounts: {
personal: {
// Optional override. Default: ~/.clawdbot/credentials/whatsapp/personal
// authDir: "~/.clawdbot/credentials/whatsapp/personal",
},
biz: {
// Optional override. Default: ~/.clawdbot/credentials/whatsapp/biz
// authDir: "~/.clawdbot/credentials/whatsapp/biz",
},
},
},
}
```

133
docs/concepts/presence.md Normal file
View File

@@ -0,0 +1,133 @@
---
summary: "How Clawdbot presence entries are produced, merged, and displayed"
read_when:
- Debugging the Instances tab
- Investigating duplicate or stale instance rows
- Changing gateway WS connect or system-event beacons
---
# Presence
Clawdbot “presence” is a lightweight, best-effort view of:
- The **Gateway** itself (one per host), and
- The **clients connected to the Gateway** (mac app, WebChat, CLI, etc.).
Presence is used primarily to render the mac apps **Instances** tab and to provide quick operator visibility.
## The data model
Presence entries are structured objects with (some) fields:
- `instanceId` (optional but strongly recommended): stable client identity used for dedupe
- `host`: a human-readable name (often the machine name)
- `ip`: best-effort IP address (may be missing or stale)
- `version`: client version string
- `deviceFamily` (optional): hardware family like `iPad`, `iPhone`, `Mac`
- `modelIdentifier` (optional): hardware model identifier like `iPad16,6` or `Mac16,6`
- `mode`: e.g. `gateway`, `app`, `webchat`, `cli`
- `lastInputSeconds` (optional): “seconds since last user input” for that client machine
- `reason`: a short marker like `self`, `connect`, `node-connected`, `node-disconnected`, `periodic`, `instances-refresh`
- `text`: legacy/debug summary string (kept for backwards compatibility and UI display)
- `ts`: last update timestamp (ms since epoch)
## Producers (where presence comes from)
Presence entries are produced by multiple sources and then **merged**.
### 1) Gateway self entry
The Gateway seeds a “self” entry at startup so UIs always show at least the current gateway host.
Implementation: [`src/infra/system-presence.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/system-presence.ts) (`initSelfPresence()`).
### 2) WebSocket connect (connection-derived presence)
Every WS client must begin with a `connect` request. On successful handshake, the Gateway upserts a presence entry for that connection.
This is meant to answer: “Which clients are currently connected?”
Implementation: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) (connect handling uses `connect.params.client.instanceId` when provided; otherwise falls back to `connId`).
#### Why one-off CLI commands do not show up
The CLI connects to the Gateway to execute one-off commands (health/status/send/agent/etc.). These are not “nodes” and would spam the Instances list, so the Gateway does not create presence entries for clients with `client.mode === "cli"`.
### 3) `system-event` beacons (client-reported presence)
Clients can publish richer periodic beacons via the `system-event` method. The mac app uses this to report:
- a human-friendly host name
- its best-known IP address
- `lastInputSeconds`
Implementation:
- Gateway: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) handles method `system-event` by calling `updateSystemPresence(...)`.
- mac app beaconing: [`apps/macos/Sources/Clawdbot/PresenceReporter.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/PresenceReporter.swift).
### 4) Node bridge beacons (gateway-owned presence)
When a node bridge connection authenticates, the Gateway emits a presence entry
for that node and starts periodic refresh beacons so it does not expire.
- Connect/disconnect markers: `node-connected`, `node-disconnected`
- Periodic heartbeat: every 3 minutes (`reason: periodic`)
Implementation: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) (node bridge handlers + timer beacons).
## Merge + dedupe rules (why `instanceId` matters)
All producers write into a single in-memory presence map.
Key points:
- Entries are **keyed** by a “presence key”. If two producers use the same key, they update the same entry.
- The best key is a stable, opaque `instanceId` that does not change across restarts.
- Keys are treated case-insensitively.
Implementation: [`src/infra/system-presence.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/system-presence.ts) (`normalizePresenceKey()`).
### mac app identity (stable UUID)
The mac app uses a persisted UUID as `instanceId` so:
- restarts/reconnects do not create duplicates
- renaming the Mac does not create a new “instance”
- debug/release builds can share the same identity
Implementation: [`apps/macos/Sources/Clawdbot/InstanceIdentity.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/InstanceIdentity.swift).
`displayName` (machine name) is used for UI, while `instanceId` is used for dedupe.
## TTL and bounded size (why stale rows disappear)
Presence entries are not permanent:
- TTL: entries older than 5 minutes are pruned
- Max: map is capped at 200 entries (LRU by `ts`)
Implementation: [`src/infra/system-presence.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/infra/system-presence.ts) (`TTL_MS`, `MAX_ENTRIES`, pruning in `listSystemPresence()`).
## Remote/tunnel caveat (loopback IPs)
When a client connects over an SSH tunnel / local port forward, the Gateway may see the remote address as loopback (`127.0.0.1`).
To avoid degrading an otherwise-correct client beacon IP, the Gateway avoids writing loopback remote addresses into presence entries.
Implementation: [`src/gateway/server.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/server.ts) (`isLoopbackAddress()`).
## Consumers (who reads presence)
### macOS Instances tab
The mac apps Instances tab renders the result of `system-presence`.
Implementation:
- View: [`apps/macos/Sources/Clawdbot/InstancesSettings.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/InstancesSettings.swift)
- Store: [`apps/macos/Sources/Clawdbot/InstancesStore.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/Clawdbot/InstancesStore.swift)
The Instances rows show a small presence indicator (Active/Idle/Stale) based on
the last beacon age. The label is derived from the entry timestamp (`ts`).
The store refreshes periodically and also applies `presence` WS events.
## Debugging tips
- To see the raw list, call `system-presence` against the gateway.
- If you see duplicates:
- confirm clients send a stable `instanceId` in the handshake (`connect.params.client.instanceId`)
- confirm beaconing uses the same `instanceId`
- check whether the connection-derived entry is missing `instanceId` (then it will be keyed by `connId` and duplicates are expected on reconnect)

View File

@@ -0,0 +1,25 @@
---
summary: "Routing rules per provider (WhatsApp, Telegram, Discord, web) and shared context"
read_when:
- Changing provider routing or inbox behavior
---
# Providers & Routing
Updated: 2026-01-06
Goal: deterministic replies per provider, while supporting multi-agent + multi-account routing.
- **Provider**: provider label (`whatsapp`, `webchat`, `telegram`, `discord`, `signal`, `imessage`, …). Routing is fixed: replies go back to the origin provider; the model doesnt choose.
- **AccountId**: provider account instance (e.g. WhatsApp account `"default"` vs `"work"`). Not every provider supports multi-account yet.
- **AgentId**: one isolated “brain” (workspace + per-agent agentDir + per-agent session store).
- **Reply context:** inbound replies include `ReplyToId`, `ReplyToBody`, and `ReplyToSender`, and the quoted context is appended to `Body` as a `[Replying to ...]` block.
- **Canonical direct session (per agent):** direct chats collapse to `agent:<agentId>:<mainKey>` (default `main`). Groups/channels stay isolated per agent:
- group: `agent:<agentId>:<provider>:group:<id>`
- channel/room: `agent:<agentId>:<provider>:channel:<id>`
- **Session store:** per-agent store lives under `~/.clawdbot/agents/<agentId>/sessions/sessions.json` (override via `session.store` with `{agentId}` templating). JSONL transcripts live next to it.
- **WebChat:** attaches to the selected agents main session (so desktop reflects cross-provider history for that agent).
- **Implementation hints:**
- Set `Provider` + `AccountId` in each ingress.
- Route inbound to an agent via `routing.bindings` (match on `provider`, `accountId`, plus optional peer/guild/team).
- Keep routing deterministic: originate → same provider. Use the gateway WebSocket for sends; avoid side channels.
- Do not let the agent emit “send to X” decisions; keep that policy in the host code.

77
docs/concepts/queue.md Normal file
View File

@@ -0,0 +1,77 @@
---
summary: "Command queue design that serializes auto-reply command execution"
read_when:
- Changing auto-reply execution or concurrency
---
# Command Queue (2026-01-03)
We now serialize command-based auto-replies (WhatsApp Web listener) through a tiny in-process queue to prevent multiple commands from running at once, while allowing safe parallelism across sessions.
## Why
- Some auto-reply commands are expensive (LLM calls) and can collide when multiple inbound messages arrive close together.
- Serializing avoids competing for terminal/stdin, keeps logs readable, and reduces the chance of rate limits from upstream tools.
## How it works
- [`src/process/command-queue.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/process/command-queue.ts) holds a lane-aware FIFO queue and drains each lane synchronously.
- `runEmbeddedPiAgent` enqueues by **session key** (lane `session:<key>`) to guarantee only one active run per session.
- Each session run is then queued into a **global lane** (`main` by default) so overall parallelism is capped by `agent.maxConcurrent`.
- When verbose logging is enabled, queued commands emit a short notice if they waited more than ~2s before starting.
- Typing indicators (`onReplyStart`) still fire immediately on enqueue so user experience is unchanged while we wait our turn.
## Queue modes (per provider)
Inbound messages can steer the current run, wait for a followup turn, or do both:
- `steer`: inject immediately into the current run (cancels pending tool calls after the next tool boundary). If not streaming, falls back to followup.
- `followup`: enqueue for the next agent turn after the current run ends.
- `collect`: coalesce all queued messages into a **single** followup turn (default).
- `steer-backlog` (aka `steer+backlog`): steer now **and** preserve the message for a followup turn.
- `interrupt` (legacy): abort the active run for that session, then run the newest message.
- `queue` (legacy alias): same as `steer`.
Steer-backlog means you can get a followup response after the steered run, so
streaming surfaces can look like duplicates. Prefer `collect`/`steer` if you want
one response per inbound message.
Send `/queue collect` as a standalone command (per-session) or set `routing.queue.byProvider.discord: "collect"`.
Defaults (when unset in config):
- All surfaces → `collect`
Configure globally or per provider via `routing.queue`:
```json5
{
routing: {
queue: {
mode: "collect",
debounceMs: 1000,
cap: 20,
drop: "summarize",
byProvider: { discord: "collect" }
}
}
}
```
## Queue options
Options apply to `followup`, `collect`, and `steer-backlog` (and to `steer` when it falls back to followup):
- `debounceMs`: wait for quiet before starting a followup turn (prevents “continue, continue”).
- `cap`: max queued messages per session.
- `drop`: overflow policy (`old`, `new`, `summarize`).
Summarize keeps a short bullet list of dropped messages and injects it as a synthetic followup prompt.
Defaults: `debounceMs: 1000`, `cap: 20`, `drop: summarize`.
## Per-session overrides
- Send `/queue <mode>` as a standalone command to store the mode for the current session.
- Options can be combined: `/queue collect debounce:2s cap:25 drop:summarize`
- `/queue default` or `/queue reset` clears the session override.
## Scope and guarantees
- Applies only to config-driven command replies; plain text replies are unaffected.
- Default lane (`main`) is process-wide for inbound + main heartbeats; set `agent.maxConcurrent` to allow multiple sessions in parallel.
- Additional lanes may exist (e.g. `cron`) so background jobs can run in parallel without blocking inbound replies.
- Per-session lanes guarantee that only one agent run touches a given session at a time.
- No external dependencies or background worker threads; pure TypeScript + promises.
## Troubleshooting
- If commands seem stuck, enable verbose logs and look for “queued for …ms” lines to confirm the queue is draining.
- `enqueueCommand` exposes a lightweight `getQueueSize()` helper if you need to surface queue depth in future diagnostics.

View File

@@ -0,0 +1,155 @@
---
summary: "Agent session tools for listing sessions, fetching history, and sending cross-session messages"
read_when:
- Adding or modifying session tools
---
# Session Tools
Goal: small, hard-to-misuse tool set so agents can list sessions, fetch history, and send to another session.
## Tool Names
- `sessions_list`
- `sessions_history`
- `sessions_send`
- `sessions_spawn`
## Key Model
- Main direct chat bucket is always the literal key `"main"`.
- Group chats use `<provider>:group:<id>` or `<provider>:channel:<id>`.
- Cron jobs use `cron:<job.id>`.
- Hooks use `hook:<uuid>` unless explicitly set.
- Node bridge uses `node-<nodeId>` unless explicitly set.
`global` and `unknown` are internal-only and never listed. If `session.scope = "global"`, we alias it to `main` for all tools so callers never see `global`.
## sessions_list
List sessions as an array of rows.
Parameters:
- `kinds?: string[]` filter: any of `"main" | "group" | "cron" | "hook" | "node" | "other"`
- `limit?: number` max rows (default: server default, clamp e.g. 200)
- `activeMinutes?: number` only sessions updated within N minutes
- `messageLimit?: number` 0 = no messages (default 0); >0 = include last N messages
Behavior:
- `messageLimit > 0` fetches `chat.history` per session and includes the last N messages.
- Tool results are filtered out in list output; use `sessions_history` for tool messages.
- When running in a **sandboxed** agent session, session tools default to **spawned-only visibility** (see below).
Row shape (JSON):
- `key`: session key (string)
- `kind`: `main | group | cron | hook | node | other`
- `provider`: `whatsapp | telegram | discord | signal | imessage | webchat | internal | unknown`
- `displayName` (group display label if available)
- `updatedAt` (ms)
- `sessionId`
- `model`, `contextTokens`, `totalTokens`
- `thinkingLevel`, `verboseLevel`, `systemSent`, `abortedLastRun`
- `sendPolicy` (session override if set)
- `lastProvider`, `lastTo`
- `transcriptPath` (best-effort path derived from store dir + sessionId)
- `messages?` (only when `messageLimit > 0`)
## sessions_history
Fetch transcript for one session.
Parameters:
- `sessionKey` (required)
- `limit?: number` max messages (server clamps)
- `includeTools?: boolean` (default false)
Behavior:
- `includeTools=false` filters `role: "toolResult"` messages.
- Returns messages array in the raw transcript format.
## sessions_send
Send a message into another session.
Parameters:
- `sessionKey` (required)
- `message` (required)
- `timeoutSeconds?: number` (default >0; 0 = fire-and-forget)
Behavior:
- `timeoutSeconds = 0`: enqueue and return `{ runId, status: "accepted" }`.
- `timeoutSeconds > 0`: wait up to N seconds for completion, then return `{ runId, status: "ok", reply }`.
- If wait times out: `{ runId, status: "timeout", error }`. Run continues; call `sessions_history` later.
- If the run fails: `{ runId, status: "error", error }`.
- Waits via gateway `agent.wait` (server-side) so reconnects don't drop the wait.
- Agent-to-agent message context is injected for the primary run.
- After the primary run completes, Clawdbot runs a **reply-back loop**:
- Round 2+ alternates between requester and target agents.
- Reply exactly `REPLY_SKIP` to stop the pingpong.
- Max turns is `session.agentToAgent.maxPingPongTurns` (05, default 5).
- Once the loop ends, Clawdbot runs the **agenttoagent announce step** (target agent only):
- Reply exactly `ANNOUNCE_SKIP` to stay silent.
- Any other reply is sent to the target provider.
- Announce step includes the original request + round1 reply + latest pingpong reply.
## Provider Field
- For groups, `provider` is the provider recorded on the session entry.
- For direct chats, `provider` maps from `lastProvider`.
- For cron/hook/node, `provider` is `internal`.
- If missing, `provider` is `unknown`.
## Security / Send Policy
Policy-based blocking by provider/chat type (not per session id).
```json
{
"session": {
"sendPolicy": {
"rules": [
{
"match": { "provider": "discord", "chatType": "group" },
"action": "deny"
}
],
"default": "allow"
}
}
}
```
Runtime override (per session entry):
- `sendPolicy: "allow" | "deny"` (unset = inherit config)
- Settable via `sessions.patch` or owner-only `/send on|off|inherit` (standalone message).
Enforcement points:
- `chat.send` / `agent` (gateway)
- auto-reply delivery logic
## sessions_spawn
Spawn a sub-agent run in an isolated session and announce the result back to the requester chat provider.
Parameters:
- `task` (required)
- `label?` (optional; used for logs/UI)
- `model?` (optional; overrides the sub-agent model; invalid values error)
- `timeoutSeconds?` (default 0; 0 = fire-and-forget)
- `cleanup?` (`delete|keep`, default `delete`)
Behavior:
- Starts a new `subagent:<uuid>` session with `deliver: false`.
- Sub-agents default to the full tool set **minus session tools** (configurable via `agent.subagents.tools`).
- Sub-agents are not allowed to call `sessions_spawn` (no sub-agent → sub-agent spawning).
- After completion (or best-effort wait), Clawdbot runs a sub-agent **announce step** and posts the result to the requester chat provider.
- Reply exactly `ANNOUNCE_SKIP` during the announce step to stay silent.
## Sandbox Session Visibility
Sandboxed sessions can use session tools, but by default they only see sessions they spawned via `sessions_spawn`.
Config:
```json5
{
agent: {
sandbox: {
// default: "spawned"
sessionToolsVisibility: "spawned" // or "all"
}
}
}
```

87
docs/concepts/session.md Normal file
View File

@@ -0,0 +1,87 @@
---
summary: "Session management rules, keys, and persistence for chats"
read_when:
- Modifying session handling or storage
---
# Session Management
Clawdbot treats **one direct-chat session per agent** as primary. Direct chats collapse to `agent:<agentId>:<mainKey>` (default `main`), while group/channel chats get their own keys. `session.mainKey` is honored.
## Gateway is the source of truth
All session state is **owned by the gateway** (the “master” Clawdbot). UI clients (macOS app, WebChat, etc.) must query the gateway for session lists and token counts instead of reading local files.
- In **remote mode**, the session store you care about lives on the remote gateway host, not your Mac.
- Token counts shown in UIs come from the gateways store fields (`inputTokens`, `outputTokens`, `totalTokens`, `contextTokens`). Clients do not parse JSONL transcripts to “fix up” totals.
## Where state lives
- On the **gateway host**:
- Store file: `~/.clawdbot/agents/<agentId>/sessions/sessions.json` (per agent).
- Transcripts: `~/.clawdbot/agents/<agentId>/sessions/<SessionId>.jsonl` (one file per session id).
- The store is a map `sessionKey -> { sessionId, updatedAt, ... }`. Deleting entries is safe; they are recreated on demand.
- Group entries may include `displayName`, `provider`, `subject`, `room`, and `space` to label sessions in UIs.
- Clawdbot does **not** read legacy Pi/Tau session folders.
## Mapping transports → session keys
- Direct chats collapse to the per-agent primary key: `agent:<agentId>:<mainKey>`.
- Multiple phone numbers and providers can map to the same agent main key; they act as transports into one conversation.
- Group chats isolate state: `agent:<agentId>:<provider>:group:<id>` (rooms/channels use `agent:<agentId>:<provider>:channel:<id>`).
- Legacy `group:<id>` keys are still recognized for migration.
- Other sources:
- Cron jobs: `cron:<job.id>`
- Webhooks: `hook:<uuid>` (unless explicitly set by the hook)
- Node bridge runs: `node-<nodeId>`
## Lifecyle
- Idle expiry: `session.idleMinutes` (default 60). After the timeout a new `sessionId` is minted on the next message.
- Reset triggers: exact `/new` or `/reset` (plus any extras in `resetTriggers`) start a fresh session id and pass the remainder of the message through. If `/new` or `/reset` is sent alone, Clawdbot runs a short “hello” greeting turn to confirm the reset.
- Manual reset: delete specific keys from the store or remove the JSONL transcript; the next message recreates them.
## Send policy (optional)
Block delivery for specific session types without listing individual ids.
```json5
{
session: {
sendPolicy: {
rules: [
{ action: "deny", match: { provider: "discord", chatType: "group" } },
{ action: "deny", match: { keyPrefix: "cron:" } }
],
default: "allow"
}
}
}
```
Runtime override (owner only):
- `/send on` → allow for this session
- `/send off` → deny for this session
- `/send inherit` → clear override and use config rules
Send these as standalone messages so they register.
## Configuration (optional rename example)
```json5
// ~/.clawdbot/clawdbot.json
{
session: {
scope: "per-sender", // keep group keys separate
idleMinutes: 120,
resetTriggers: ["/new", "/reset"],
store: "~/.clawdbot/agents/{agentId}/sessions/sessions.json",
mainKey: "main",
}
}
```
## Inspecting
- `pnpm clawdbot status` — shows store path and recent sessions.
- `pnpm clawdbot sessions --json` — dumps every entry (filter with `--active <minutes>`).
- `clawdbot gateway call sessions.list --params '{}'` — fetch sessions from the running gateway (use `--url`/`--token` for remote gateway access).
- Send `/status` as a standalone message in chat to see whether the agent is reachable, how much of the session context is used, current thinking/verbose toggles, and when your WhatsApp web creds were last refreshed (helps spot relink needs).
- Send `/stop` as a standalone message to abort the current run.
- Send `/compact` (optional instructions) as a standalone message to summarize older context and free up window space.
- JSONL transcripts can be opened directly to review full turns.
## Tips
- Keep the primary key dedicated to 1:1 traffic; let groups keep their own keys.
- When automating cleanup, delete individual keys instead of the whole store to preserve context elsewhere.

View File

@@ -0,0 +1,8 @@
---
summary: "Alias for session management docs"
read_when:
- You looked for docs/sessions.md; canonical doc lives in docs/session.md
---
# Sessions
Canonical session management docs live in [`docs/session.md`](/session).

40
docs/concepts/timezone.md Normal file
View File

@@ -0,0 +1,40 @@
---
summary: "Timezone handling for agents, envelopes, and prompts"
read_when:
- You need to understand how timestamps are normalized for the model
- Configuring the user timezone for system prompts
---
# Timezones
Clawdbot standardizes timestamps so the model sees a **single reference time**.
## Message envelopes (UTC)
Inbound messages are wrapped in an envelope like:
```
[Provider ... 2026-01-05T21:26Z] message text
```
The timestamp in the envelope is **always UTC**, with minutes precision.
## Tool payloads (raw provider data)
Tool calls (`discord.readMessages`, `slack.readMessages`, etc.) return **raw provider timestamps**.
These are typically UTC ISO strings (Discord) or UTC epoch strings (Slack). We do not rewrite them.
## User timezone for the system prompt
Set `agent.userTimezone` to tell the model the user's local time zone. If it is
unset, Clawdbot resolves the **host timezone at runtime** (no config write).
```json5
{
agent: { userTimezone: "America/Chicago" }
}
```
The system prompt includes:
- `User timezone: America/Chicago`
- `Current user time: 2026-01-05 15:26`

42
docs/concepts/typebox.md Normal file
View File

@@ -0,0 +1,42 @@
---
summary: "TypeBox schemas as the single source of truth for the gateway protocol"
read_when:
- Updating protocol schemas or codegen
---
# TypeBox as Protocol Source of Truth
Last updated: 2025-12-09
We use TypeBox schemas in [`src/gateway/protocol/schema.ts`](https://github.com/clawdbot/clawdbot/blob/main/src/gateway/protocol/schema.ts) as the single source of truth for the Gateway control plane (connect/req/res/event frames and payloads). All derived artifacts should be generated from these schemas, not edited by hand.
## Current pipeline
- **TypeBox → JSON Schema**: `pnpm protocol:gen` writes [`dist/protocol.schema.json`](https://github.com/clawdbot/clawdbot/blob/main/dist/protocol.schema.json) (draft-07) and runs AJV in the server tests.
- **TypeBox → Swift**: `pnpm protocol:gen:swift` generates [`apps/macos/Sources/ClawdbotProtocol/GatewayModels.swift`](https://github.com/clawdbot/clawdbot/blob/main/apps/macos/Sources/ClawdbotProtocol/GatewayModels.swift).
## Problem
- We want strong typing in Swift, including a sealed `GatewayFrame` enum with a discriminator and a forward-compatible `unknown` case.
## Preferred plan (next step)
- Add a small, custom Swift generator driven directly by the TypeBox schemas:
- Emit a sealed `enum GatewayFrame: Codable { case req(RequestFrame), res(ResponseFrame), event(EventFrame) }`.
- Emit strongly typed payload structs/enums (`ConnectParams`, `HelloOk`, `RequestFrame`, `ResponseFrame`, `EventFrame`, `PresenceEntry`, `Snapshot`, `StateVersion`, `ErrorShape`, `AgentEvent`, `TickEvent`, `ShutdownEvent`, `SendParams`, `AgentParams`, `ErrorCode`, `PROTOCOL_VERSION`).
- Custom `init(from:)` / `encode(to:)` enforces the `type` discriminator and can include an `unknown` case for forward compatibility.
- Wire a new script (e.g., `pnpm protocol:gen:swift`) into `protocol:check` so CI fails if the generated Swift is stale.
Why this path:
- Single source of truth stays TypeBox; no new IDL to maintain.
- Predictable, strongly typed Swift (no optional soup).
- Small deterministic codegen (~150200 LOC script) we control.
## Alternative (if we want off-the-shelf codegen)
- Wrap the existing JSON Schema into an OpenAPI 3.1 doc (auto-generated) and use **swift-openapi-generator** or **openapi-generator swift5**. More moving parts, but also yields enums with discriminator support. Keep this as a fallback if we dont want a custom emitter.
## Action items
- Implement `protocol:gen:swift` that reads the TypeBox schemas and emits the sealed Swift enum + payload structs.
- Update `protocol:check` to include the Swift generator output in the diff check.
- Remove quicktype output once the custom generator is in place (or keep it for docs only).