Agent Abstraction Refactor Plan

Goal: support multiple agent CLIs (Claude, Codex, Pi, Opencode, Gemini) cleanly, without legacy flags, and make parsing/injection per-agent. Keep WhatsApp/Twilio plumbing intact.

Overview

Introduce a pluggable agent layer (src/agents/*), selected by config.
Normalize config (agent block) and remove claudeOutputFormat legacy knobs.
Provide per-agent argv builders and output parsers (including NDJSON streams).
Preserve MEDIA-token handling and shared queue/heartbeat behavior.

Configuration

New shape (no backward compat):

inbound: {
  reply: {
    mode: "command",
    agent: {
      kind: "claude" | "opencode" | "pi" | "codex" | "gemini",
      format?: "text" | "json",
      identityPrefix?: string
    },
    command: ["claude", "{{Body}}"],
    cwd?: string,
    session?: { ... },
    timeoutSeconds?: number,
    bodyPrefix?: string,
    mediaUrl?: string,
    mediaMaxMb?: number,
    typingIntervalSeconds?: number,
    heartbeatMinutes?: number
  }
}

Validation moves to config.ts (new AgentKind/AgentConfig types).
If agent is missing → config error.

Agent modules

src/agents/types.ts – AgentKind, AgentSpec:
- buildArgs(argv: string[], body: string, ctx: { sessionId?, isNewSession?, sendSystemOnce?, systemSent?, identityPrefix? }): string[]
- parse(stdout: string): { text?: string; mediaUrls?: string[]; meta?: AgentMeta }
src/agents/claude.ts – current flag injection (--output-format, -p), identity prepend.
src/agents/opencode.ts – reuse parseOpencodeJson (from PR #5), inject --format json, session flag --session defaults, identity prefix.
src/agents/pi.ts – parse NDJSON AssistantMessageEvent (final message_end.message.content[text]), inject --mode json/-p defaults, session flags.
src/agents/codex.ts – parse Codex JSONL (last item with type:"agent_message"; usage from turn.completed), inject codex exec --json --skip-git-repo-check, sandbox default read-only.
src/agents/gemini.ts – minimal parsing (plain text), identity prepend, honors --output-format when format is set, and defaults to --resume {{SessionId}} for session resume (new sessions need no flag). Override sessionArgNew/sessionArgResume if you use a different session strategy.
Shared MEDIA extraction stays in media/parse.ts.

Command runner changes

runCommandReply:
- Resolve agent spec from config.
- Apply buildArgs (handles identity prepend and session args per agent).
- Run command; send stdout to spec.parse → text, mediaUrls, meta (stored as agentMeta).
- Remove claudeMeta naming; tests updated to agentMeta.

Sessions

Session arg defaults become agent-specific (Claude: --resume/--session-id; Opencode/Pi/Codex: --session).
Still overridable via sessionArgNew/sessionArgResume in config.

Tests

Update existing tests to new config (no claudeOutputFormat).
Add fixtures:
- Opencode NDJSON sample (from PR #5) → parsed text + meta.
- Codex NDJSON sample (captured: thread/turn/item/usage) → parsed text.
- Pi NDJSON sample (AssistantMessageEvent) → parsed text.
Ensure MEDIA token parsing works on agent text output.

Docs

README: rename “Claude-aware” → “Multi-agent (Claude, Codex, Pi, Opencode)”.
New short guide per agent (Opencode doc from PR #5; add Codex/Pi snippets).
Mention identityPrefix override and session arg differences.

Migration

Breaking change: configs must specify agent. Remove old claudeOutputFormat keys.
Provide migration note in CHANGELOG 1.3.x.

Out of scope

No media binary support; still relies on MEDIA tokens in text.
No UI changes; WhatsApp/Twilio plumbing unchanged.

3.7 KiB Raw Blame History Unescape Escape