2.6 KiB
2.6 KiB
summary, read_when
| summary | read_when | |
|---|---|---|
| Refactor plan: unify agent lifecycle events and wait semantics |
|
Refactor: Agent Loop
Goal: align Clawdis run lifecycle with pi/mom semantics, remove ambiguity between "job" and "agent_end".
Problem
- Two lifecycles today:
job(gateway wrapper) => used byagent.wait+ chat final- pi-agent
agent_end(inner loop) => only logged
- This can finalize early (job done) while late assistant deltas still arrive.
afterMsand timeouts can cause false timeouts inagent.wait.
Reference (mom)
- Single lifecycle:
agent_start/agent_endfrom pi-agent-core event stream. waitForIdle()resolves onagent_end.- No separate job state exposed to clients.
Proposed refactor (breaking allowed)
- Replace public
jobstream withlifecyclestreamstream: "lifecycle"data: { phase: "start" | "end" | "error", startedAt, endedAt, error? }
agent.waitwaits on lifecycle end/error only- remove
afterMs - return
{ runId, status, startedAt, endedAt, error? }
- remove
- Chat final emitted on lifecycle end only
- deltas still from
assistantstream
- deltas still from
- Centralize run registry
- one map keyed by runId: sessionKey, startedAt, lastSeq, bufferedText
- clear on lifecycle end
Implementation outline
src/agents/pi-embedded-subscribe.ts- emit lifecycle start/end events (translate pi
agent_start/agent_end)
- emit lifecycle start/end events (translate pi
src/infra/agent-events.ts- add
"lifecycle"to stream type
- add
src/gateway/protocol/schema.ts- update AgentEvent schema; update AgentWait params (remove afterMs, add status)
src/gateway/server-methods/agent-job.ts- rename to
agent-wait.tsor similar; wait on lifecycle end/error
- rename to
src/gateway/server-chat.ts- finalize on lifecycle end (not job)
src/commands/agent.ts- stop emitting
jobexternally (keep internal log if needed)
- stop emitting
Migration notes (breaking)
- Update all callers of
agent.waitto new response shape. - Update tests that expect
timeoutbased on job events. - If any UI relies on job state, map lifecycle instead.
Risks
- If lifecycle events are dropped, wait/chat could hang; add timeout in
agent.waitto fail fast. - Late deltas after lifecycle end should be ignored; keep seq tracking + drop.
Acceptance
- One lifecycle visible to clients.
agent.waitresolves when agent loop ends, not wrapper completion.- Chat final never emits before last assistant delta.
Rollout (if we wanted safety)
- Gate with config flag
agent.lifecycleMode = "legacy"|"refactor". - Remove legacy after one release.