3.7 KiB
3.7 KiB
Agent Abstraction Refactor Plan
Goal: support multiple agent CLIs (Claude, Codex, Pi, Opencode, Gemini) cleanly, without legacy flags, and make parsing/injection per-agent. Keep WhatsApp/Twilio plumbing intact.
Overview
- Introduce a pluggable agent layer (
src/agents/*), selected by config. - Normalize config (
agentblock) and removeclaudeOutputFormatlegacy knobs. - Provide per-agent argv builders and output parsers (including NDJSON streams).
- Preserve MEDIA-token handling and shared queue/heartbeat behavior.
Configuration
- New shape (no backward compat):
inbound: { reply: { mode: "command", agent: { kind: "claude" | "opencode" | "pi" | "codex" | "gemini", format?: "text" | "json", identityPrefix?: string }, command: ["claude", "{{Body}}"], cwd?: string, session?: { ... }, timeoutSeconds?: number, bodyPrefix?: string, mediaUrl?: string, mediaMaxMb?: number, typingIntervalSeconds?: number, heartbeatMinutes?: number } } - Validation moves to
config.ts(newAgentKind/AgentConfigtypes). - If
agentis missing → config error.
Agent modules
src/agents/types.ts–AgentKind,AgentSpec:buildArgs(argv: string[], body: string, ctx: { sessionId?, isNewSession?, sendSystemOnce?, systemSent?, identityPrefix? }): string[]parse(stdout: string): { text?: string; mediaUrls?: string[]; meta?: AgentMeta }
src/agents/claude.ts– current flag injection (--output-format,-p), identity prepend.src/agents/opencode.ts– reuseparseOpencodeJson(from PR #5), inject--format json, session flag--sessiondefaults, identity prefix.src/agents/pi.ts– parse NDJSONAssistantMessageEvent(finalmessage_end.message.content[text]), inject--mode json/-pdefaults, session flags.src/agents/codex.ts– parse Codex JSONL (lastitemwithtype:"agent_message"; usage fromturn.completed), injectcodex exec --json --skip-git-repo-check, sandbox default read-only.src/agents/gemini.ts– minimal parsing (plain text), identity prepend, honors--output-formatwhenformatis set, and defaults to--resume {{SessionId}}for session resume (new sessions need no flag). OverridesessionArgNew/sessionArgResumeif you use a different session strategy.- Shared MEDIA extraction stays in
media/parse.ts.
Command runner changes
runCommandReply:- Resolve agent spec from config.
- Apply
buildArgs(handles identity prepend and session args per agent). - Run command; send stdout to
spec.parse→text,mediaUrls,meta(stored asagentMeta). - Remove
claudeMetanaming; tests updated toagentMeta.
Sessions
- Session arg defaults become agent-specific (Claude:
--resume/--session-id; Opencode/Pi/Codex:--session). - Still overridable via
sessionArgNew/sessionArgResumein config.
Tests
- Update existing tests to new config (no
claudeOutputFormat). - Add fixtures:
- Opencode NDJSON sample (from PR #5) → parsed text + meta.
- Codex NDJSON sample (captured: thread/turn/item/usage) → parsed text.
- Pi NDJSON sample (AssistantMessageEvent) → parsed text.
- Ensure MEDIA token parsing works on agent text output.
Docs
- README: rename “Claude-aware” → “Multi-agent (Claude, Codex, Pi, Opencode)”.
- New short guide per agent (Opencode doc from PR #5; add Codex/Pi snippets).
- Mention identityPrefix override and session arg differences.
Migration
- Breaking change: configs must specify
agent. Remove oldclaudeOutputFormatkeys. - Provide migration note in CHANGELOG 1.3.x.
Out of scope
- No media binary support; still relies on MEDIA tokens in text.
- No UI changes; WhatsApp/Twilio plumbing unchanged.