chore: make pi-only rpc with fixed sessions

This commit is contained in:
Peter Steinberger
2025-12-05 17:50:02 +00:00
parent b3e50cbb33
commit fcf0c28132
33 changed files with 217 additions and 1565 deletions

View File

@@ -1,78 +0,0 @@
# Agent Abstraction Refactor Plan
Goal: support multiple agent CLIs (Claude, Codex, Pi, Opencode, Gemini) cleanly, without legacy flags, and make parsing/injection per-agent. Keep WhatsApp/Twilio plumbing intact.
## Overview
- Introduce a pluggable agent layer (`src/agents/*`), selected by config.
- Normalize config (`agent` block) and remove `claudeOutputFormat` legacy knobs.
- Provide per-agent argv builders and output parsers (including NDJSON streams).
- Preserve MEDIA-token handling and shared queue/heartbeat behavior.
## Configuration
- New shape (no backward compat):
```json5
inbound: {
reply: {
mode: "command",
agent: {
kind: "claude" | "opencode" | "pi" | "codex" | "gemini",
format?: "text" | "json",
identityPrefix?: string
},
command: ["claude", "{{Body}}"],
cwd?: string,
session?: { ... },
timeoutSeconds?: number,
bodyPrefix?: string,
mediaUrl?: string,
mediaMaxMb?: number,
typingIntervalSeconds?: number,
heartbeatMinutes?: number
}
}
```
- Validation moves to `config.ts` (new `AgentKind`/`AgentConfig` types).
- If `agent` is missing → config error.
## Agent modules
- `src/agents/types.ts` `AgentKind`, `AgentSpec`:
- `buildArgs(argv: string[], body: string, ctx: { sessionId?, isNewSession?, sendSystemOnce?, systemSent?, identityPrefix? }): string[]`
- `parse(stdout: string): { text?: string; mediaUrls?: string[]; meta?: AgentMeta }`
- `src/agents/claude.ts` current flag injection (`--output-format`, `-p`), identity prepend.
- `src/agents/opencode.ts` reuse `parseOpencodeJson` (from PR #5), inject `--format json`, session flag `--session` defaults, identity prefix.
- `src/agents/pi.ts` parse NDJSON `AssistantMessageEvent` (final `message_end.message.content[text]`), inject `--mode json`/`-p` defaults, session flags.
- `src/agents/codex.ts` parse Codex JSONL (last `item` with `type:"agent_message"`; usage from `turn.completed`), inject `codex exec --json --skip-git-repo-check`, sandbox default read-only.
- `src/agents/gemini.ts` minimal parsing (plain text), identity prepend, honors `--output-format` when `format` is set, and defaults to `--resume {{SessionId}}` for session resume (new sessions need no flag). Override `sessionArgNew/sessionArgResume` if you use a different session strategy.
- Shared MEDIA extraction stays in `media/parse.ts`.
## Command runner changes
- `runCommandReply`:
- Resolve agent spec from config.
- Apply `buildArgs` (handles identity prepend and session args per agent).
- Run command; send stdout to `spec.parse` → `text`, `mediaUrls`, `meta` (stored as `agentMeta`).
- Remove `claudeMeta` naming; tests updated to `agentMeta`.
## Sessions
- Session arg defaults become agent-specific (Claude: `--resume/--session-id`; Opencode/Pi/Codex: `--session`).
- Still overridable via `sessionArgNew/sessionArgResume` in config.
## Tests
- Update existing tests to new config (no `claudeOutputFormat`).
- Add fixtures:
- Opencode NDJSON sample (from PR #5) → parsed text + meta.
- Codex NDJSON sample (captured: thread/turn/item/usage) → parsed text.
- Pi NDJSON sample (AssistantMessageEvent) → parsed text.
- Ensure MEDIA token parsing works on agent text output.
## Docs
- README: rename “Claude-aware” → “Multi-agent (Claude, Codex, Pi, Opencode)”.
- New short guide per agent (Opencode doc from PR #5; add Codex/Pi snippets).
- Mention identityPrefix override and session arg differences.
## Migration
- Breaking change: configs must specify `agent`. Remove old `claudeOutputFormat` keys.
- Provide migration note in CHANGELOG 1.3.x.
## Out of scope
- No media binary support; still relies on MEDIA tokens in text.
- No UI changes; WhatsApp/Twilio plumbing unchanged.

View File

@@ -1,12 +1,10 @@
# Agent Integration 🤖
CLAWDIS can work with any AI agent that accepts prompts via CLI. Here's how to set them up.
CLAWDIS now ships with a single coding agent: Pi (the Tau CLI). Legacy Claude/Codex/Gemini/Opencode paths have been removed.
## Supported Agents
## Pi / Tau
### Tau / Pi
The recommended agent for CLAWDIS. Built by Mario Zechner, forked with love.
The recommended (and only) agent for CLAWDIS. Built by Mario Zechner, forked with love.
```json
{
@@ -41,40 +39,11 @@ For streaming tool output and better integration:
}
```
RPC mode gives you:
RPC mode is enforced by CLAWDIS (we rewrite `--mode` to `rpc` for Pi invocations). It gives you:
- 💻 Real-time tool execution display
- 📊 Token usage tracking
- 🔄 Streaming responses
### Claude Code
```json
{
"command": [
"claude",
"-p",
"{{BodyStripped}}"
]
}
```
### Custom Agents
Any CLI that:
1. Accepts a prompt as an argument
2. Outputs text to stdout
3. Exits when done
```json
{
"command": [
"/path/to/my-agent",
"--prompt", "{{Body}}",
"--format", "text"
]
}
```
## Session Management
### Per-Sender Sessions
@@ -90,6 +59,7 @@ Each phone number gets its own conversation history:
}
}
```
By default CLAWDIS stores sessions under `~/.clawdis/sessions` and will create the folder automatically.
### Global Session

View File

@@ -5,7 +5,7 @@
1) Download inbound audio (Web or Twilio) to a temp path if only a URL is present.
2) Run the configured CLI (templated with `{{MediaPath}}`), expecting transcript on stdout.
3) Replace `Body` with the transcript, set `{{Transcript}}`, and prepend the original media path plus a `Transcript:` section in the command prompt so models see both.
4) Continue through the normal auto-reply pipeline (templating, sessions, Claude/command).
4) Continue through the normal auto-reply pipeline (templating, sessions, Pi command).
- **Verbose logging**: In `--verbose`, we log when transcription runs and when the transcript replaces the body.
## Config example (OpenAI Whisper CLI)
@@ -29,7 +29,8 @@ Requires `OPENAI_API_KEY` in env and `openai` CLI installed:
},
reply: {
mode: "command",
command: ["claude", "{{Body}}"]
command: ["pi", "{{Body}}"],
agent: { kind: "pi" }
}
}
}

View File

@@ -1,6 +1,8 @@
# Building Your Own AI Personal Assistant with warelay
> **TL;DR:** warelay lets you turn Claude into a proactive personal assistant that lives in your pocket via WhatsApp. It can check in on you, remember context across conversations, run commands on your Mac, and even wake you up with music. This doc shows you how.
> **TL;DR:** CLAWDIS (Pi/Tau only) lets you run a proactive assistant over WhatsApp. It can check in on you, remember context across conversations, run commands on your Mac, and even wake you up with music. This doc was originally written for Claude Code; where you see `claude ...`, use `pi --mode rpc ...` instead. A Pi-specific rewrite is coming soon.
⚠️ **Note (2025-12-05):** CLAWDIS now ships with only the Pi/Tau agent. The walkthrough below references Claude Code; swap those commands for `pi`/`tau` if you follow along. A Pi-specific guide is coming soon.
---

View File

@@ -7,7 +7,7 @@ Goal: let Clawd sit in WhatsApp groups, wake up only when pinged, and keep that
- Group allowlist bypass: we still enforce `allowFrom` on the participant at inbox ingest, but group JIDs themselves no longer block replies.
- Per-group sessions: session keys look like `group:<jid>` so commands such as `/verbose on` or `/think:high` are scoped to that group; personal DM state is untouched. Heartbeats are skipped for group threads.
- Context injection: last N (default 50) group messages are prefixed under `[Chat messages since your last reply - for context]`, with the triggering line under `[Current message - respond to this]`.
- Sender surfacing: every group batch now ends with `[from: Sender Name (+E164)]` so Tau/Claude know who is speaking.
- Sender surfacing: every group batch now ends with `[from: Sender Name (+E164)]` so Tau/Pi know who is speaking.
- Ephemeral/view-once: we unwrap those before extracting text/mentions, so pings inside them still trigger.
- New session primer: on the first turn of a group session we now prepend a short blurb to the model like `You are replying inside the WhatsApp group "<subject>". Group members: +44..., +43..., … Address the specific sender noted in the message context.` If metadata isnt available we still tell the agent its a group chat.

View File

@@ -1,9 +1,9 @@
# Heartbeat polling plan (2025-11-26)
Goal: add a simple heartbeat poll for command-based auto-replies (Claude-driven) that only notifies users when something matters, using the `HEARTBEAT_OK` sentinel. The heartbeat body we send is `HEARTBEAT /think:high` so the model can easily spot it.
Goal: add a simple heartbeat poll for command-based auto-replies (Pi/Tau) that only notifies users when something matters, using the `HEARTBEAT_OK` sentinel. The heartbeat body we send is `HEARTBEAT /think:high` so the model can easily spot it.
## Prompt contract
- Extend the Claude system/identity text to explain: “If this is a heartbeat poll and nothing needs attention, reply exactly `HEARTBEAT_OK` and nothing else. For any alert, do **not** include `HEARTBEAT_OK`; just return the alert text.” Heartbeat prompt body is `HEARTBEAT /think:high`.
- Extend the Pi/Tau system/identity text to explain: “If this is a heartbeat poll and nothing needs attention, reply exactly `HEARTBEAT_OK` and nothing else. For any alert, do **not** include `HEARTBEAT_OK`; just return the alert text.” Heartbeat prompt body is `HEARTBEAT /think:high`.
- Keep existing WhatsApp length guidance; forbid burying the sentinel inside alerts.
## Config & defaults
@@ -13,8 +13,8 @@ Goal: add a simple heartbeat poll for command-based auto-replies (Claude-driven)
## Poller behavior
- When relay runs with command-mode auto-reply, start a timer with the resolved heartbeat interval.
- Each tick invokes the configured command with a short heartbeat body (e.g., “(heartbeat) summarize any important changes since last turn”) while reusing the active session args so Claude context stays warm.
- Heartbeats never create a new session implicitly: if theres no stored session for the target (fallback path), the heartbeat is skipped instead of starting a fresh Claude session.
- Each tick invokes the configured command with a short heartbeat body (e.g., “(heartbeat) summarize any important changes since last turn”) while reusing the active session args so Pi context stays warm.
- Heartbeats never create a new session implicitly: if theres no stored session for the target (fallback path), the heartbeat is skipped instead of starting a fresh Pi session.
- Abort timer on SIGINT/abort of the relay.
## Sentinel handling

View File

@@ -56,7 +56,7 @@ This document defines how `warelay` should handle sending and replying with imag
- Web inbox:
- If `mediaUrl` present, fetch/resolve same as send (local path or URL), send via Baileys with caption.
## Inbound Media to Commands (Claude etc.)
## Inbound Media to Commands (Pi/Tau)
- For completeness: when inbound Twilio/Web messages include media, download to temp file, expose templating variables:
- `{{MediaUrl}}` original URL (Twilio) or pseudo-URL (web).
- `{{MediaPath}}` local temp path written before running the command.

View File

@@ -18,7 +18,7 @@ CLAWDIS (née Warelay) bridges WhatsApp to AI coding agents like [Tau/Pi](https:
## Features
- 📱 **WhatsApp Integration** — Uses Baileys for WhatsApp Web protocol
- 🤖 **AI Agent Gateway**Spawns coding agents (Tau, Claude, etc.) per message
- 🤖 **AI Agent Gateway**Pi/Tau only (Pi CLI in RPC mode)
- 💬 **Session Management** — Maintains conversation context across messages
- 🔔 **Heartbeats** — Periodic check-ins so your AI doesn't feel lonely
- 👥 **Group Chat Support** — Mention-based triggering in group chats
@@ -26,6 +26,8 @@ CLAWDIS (née Warelay) bridges WhatsApp to AI coding agents like [Tau/Pi](https:
- 🎤 **Voice Messages** — Transcription via Whisper
- 🔧 **Tool Streaming** — Real-time display of AI tool usage (💻📄✍️📝)
Note: support for Claude, Codex, Gemini, and Opencode has been removed; Pi/Tau is now the only coding agent path.
## The Name
**CLAWDIS** = CLAW + TARDIS

View File

@@ -21,8 +21,7 @@
- Confirmation reply is sent (`Thinking level set to high.` / `Thinking disabled.`). If the level is invalid (e.g. `/thinking big`), the command is rejected with a hint and the session state is left unchanged.
## Application by agent
- **Pi/Tau**: injects `--thinking <level>` (skipped for `off`).
- **Claude & other text agents**: appends the cue word to the prompt text as above.
- **Pi/Tau**: injects `--thinking <level>` (skipped for `off`). Other agent paths have been removed.
## Verbose directives (/verbose or /v)
- Levels: `on|full` or `off` (default).

View File

@@ -8,7 +8,7 @@
## Commands
- `warelay relay:tmux` — restarts the `warelay-relay` session running `pnpm warelay relay --verbose`, then attaches (skips attach when stdout isnt a TTY).
- `warelay relay:tmux:attach` — attach to the existing session without restarting it.
- `warelay relay:heartbeat:tmux` — same as `relay:tmux` but adds `--heartbeat-now` so Claude is pinged immediately on startup.
- `warelay relay:heartbeat:tmux` — same as `relay:tmux` but adds `--heartbeat-now` so Pi is pinged immediately on startup.
All helpers use the fixed session name `warelay-relay`.