Files
clawdbot/docs/agent.md
2025-12-02 20:09:51 +00:00

79 lines
3.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Agent Abstraction Refactor Plan
Goal: support multiple agent CLIs (Claude, Codex, Pi, Opencode, Gemini) cleanly, without legacy flags, and make parsing/injection per-agent. Keep WhatsApp/Twilio plumbing intact.
## Overview
- Introduce a pluggable agent layer (`src/agents/*`), selected by config.
- Normalize config (`agent` block) and remove `claudeOutputFormat` legacy knobs.
- Provide per-agent argv builders and output parsers (including NDJSON streams).
- Preserve MEDIA-token handling and shared queue/heartbeat behavior.
## Configuration
- New shape (no backward compat):
```json5
inbound: {
reply: {
mode: "command",
agent: {
kind: "claude" | "opencode" | "pi" | "codex" | "gemini",
format?: "text" | "json",
identityPrefix?: string
},
command: ["claude", "{{Body}}"],
cwd?: string,
session?: { ... },
timeoutSeconds?: number,
bodyPrefix?: string,
mediaUrl?: string,
mediaMaxMb?: number,
typingIntervalSeconds?: number,
heartbeatMinutes?: number
}
}
```
- Validation moves to `config.ts` (new `AgentKind`/`AgentConfig` types).
- If `agent` is missing → config error.
## Agent modules
- `src/agents/types.ts` `AgentKind`, `AgentSpec`:
- `buildArgs(argv: string[], body: string, ctx: { sessionId?, isNewSession?, sendSystemOnce?, systemSent?, identityPrefix? }): string[]`
- `parse(stdout: string): { text?: string; mediaUrls?: string[]; meta?: AgentMeta }`
- `src/agents/claude.ts` current flag injection (`--output-format`, `-p`), identity prepend.
- `src/agents/opencode.ts` reuse `parseOpencodeJson` (from PR #5), inject `--format json`, session flag `--session` defaults, identity prefix.
- `src/agents/pi.ts` parse NDJSON `AssistantMessageEvent` (final `message_end.message.content[text]`), inject `--mode json`/`-p` defaults, session flags.
- `src/agents/codex.ts` parse Codex JSONL (last `item` with `type:"agent_message"`; usage from `turn.completed`), inject `codex exec --json --skip-git-repo-check`, sandbox default read-only.
- `src/agents/gemini.ts` minimal parsing (plain text), identity prepend, honors `--output-format` when `format` is set, and defaults to `--resume {{SessionId}}` for session resume (new sessions need no flag). Override `sessionArgNew/sessionArgResume` if you use a different session strategy.
- Shared MEDIA extraction stays in `media/parse.ts`.
## Command runner changes
- `runCommandReply`:
- Resolve agent spec from config.
- Apply `buildArgs` (handles identity prepend and session args per agent).
- Run command; send stdout to `spec.parse` → `text`, `mediaUrls`, `meta` (stored as `agentMeta`).
- Remove `claudeMeta` naming; tests updated to `agentMeta`.
## Sessions
- Session arg defaults become agent-specific (Claude: `--resume/--session-id`; Opencode/Pi/Codex: `--session`).
- Still overridable via `sessionArgNew/sessionArgResume` in config.
## Tests
- Update existing tests to new config (no `claudeOutputFormat`).
- Add fixtures:
- Opencode NDJSON sample (from PR #5) → parsed text + meta.
- Codex NDJSON sample (captured: thread/turn/item/usage) → parsed text.
- Pi NDJSON sample (AssistantMessageEvent) → parsed text.
- Ensure MEDIA token parsing works on agent text output.
## Docs
- README: rename “Claude-aware” → “Multi-agent (Claude, Codex, Pi, Opencode)”.
- New short guide per agent (Opencode doc from PR #5; add Codex/Pi snippets).
- Mention identityPrefix override and session arg differences.
## Migration
- Breaking change: configs must specify `agent`. Remove old `claudeOutputFormat` keys.
- Provide migration note in CHANGELOG 1.3.x.
## Out of scope
- No media binary support; still relies on MEDIA tokens in text.
- No UI changes; WhatsApp/Twilio plumbing unchanged.