66 lines
2.6 KiB
Markdown
66 lines
2.6 KiB
Markdown
---
|
|
summary: "Refactor plan: unify agent lifecycle events and wait semantics"
|
|
read_when:
|
|
- Refactoring agent lifecycle events or wait behavior
|
|
---
|
|
# Refactor: Agent Loop
|
|
|
|
Goal: align Clawdis run lifecycle with pi/mom semantics, remove ambiguity between "job" and "agent_end".
|
|
|
|
## Problem
|
|
- Two lifecycles today:
|
|
- `job` (gateway wrapper) => used by `agent.wait` + chat final
|
|
- pi-agent `agent_end` (inner loop) => only logged
|
|
- This can finalize early (job done) while late assistant deltas still arrive.
|
|
- `afterMs` and timeouts can cause false timeouts in `agent.wait`.
|
|
|
|
## Reference (mom)
|
|
- Single lifecycle: `agent_start`/`agent_end` from pi-agent-core event stream.
|
|
- `waitForIdle()` resolves on `agent_end`.
|
|
- No separate job state exposed to clients.
|
|
|
|
## Proposed refactor (breaking allowed)
|
|
1) Replace public `job` stream with `lifecycle` stream
|
|
- `stream: "lifecycle"`
|
|
- `data: { phase: "start" | "end" | "error", startedAt, endedAt, error? }`
|
|
2) `agent.wait` waits on lifecycle end/error only
|
|
- remove `afterMs`
|
|
- return `{ runId, status, startedAt, endedAt, error? }`
|
|
3) Chat final emitted on lifecycle end only
|
|
- deltas still from `assistant` stream
|
|
4) Centralize run registry
|
|
- one map keyed by runId: sessionKey, startedAt, lastSeq, bufferedText
|
|
- clear on lifecycle end
|
|
|
|
## Implementation outline
|
|
- `src/agents/pi-embedded-subscribe.ts`
|
|
- emit lifecycle start/end events (translate pi `agent_start`/`agent_end`)
|
|
- `src/infra/agent-events.ts`
|
|
- add `"lifecycle"` to stream type
|
|
- `src/gateway/protocol/schema.ts`
|
|
- update AgentEvent schema; update AgentWait params (remove afterMs, add status)
|
|
- `src/gateway/server-methods/agent-job.ts`
|
|
- rename to `agent-wait.ts` or similar; wait on lifecycle end/error
|
|
- `src/gateway/server-chat.ts`
|
|
- finalize on lifecycle end (not job)
|
|
- `src/commands/agent.ts`
|
|
- stop emitting `job` externally (keep internal log if needed)
|
|
|
|
## Migration notes (breaking)
|
|
- Update all callers of `agent.wait` to new response shape.
|
|
- Update tests that expect `timeout` based on job events.
|
|
- If any UI relies on job state, map lifecycle instead.
|
|
|
|
## Risks
|
|
- If lifecycle events are dropped, wait/chat could hang; add timeout in `agent.wait` to fail fast.
|
|
- Late deltas after lifecycle end should be ignored; keep seq tracking + drop.
|
|
|
|
## Acceptance
|
|
- One lifecycle visible to clients.
|
|
- `agent.wait` resolves when agent loop ends, not wrapper completion.
|
|
- Chat final never emits before last assistant delta.
|
|
|
|
## Rollout (if we wanted safety)
|
|
- Gate with config flag `agent.lifecycleMode = "legacy"|"refactor"`.
|
|
- Remove legacy after one release.
|