--- summary: "Command queue design that serializes auto-reply command execution" read_when: - Changing auto-reply execution or concurrency --- # Command Queue (2026-01-03) We now serialize command-based auto-replies (WhatsApp Web listener) through a tiny in-process queue to prevent multiple commands from running at once, while allowing safe parallelism across sessions. ## Why - Some auto-reply commands are expensive (LLM calls) and can collide when multiple inbound messages arrive close together. - Serializing avoids competing for terminal/stdin, keeps logs readable, and reduces the chance of rate limits from upstream tools. ## How it works - `src/process/command-queue.ts` holds a lane-aware FIFO queue and drains each lane synchronously. - `runEmbeddedPiAgent` enqueues by **session key** (lane `session:`) to guarantee only one active run per session. - Each session run is then queued into a **global lane** (`main` by default) so overall parallelism is capped by `agent.maxConcurrent`. - When verbose logging is enabled, queued commands emit a short notice if they waited more than ~2s before starting. - Typing indicators (`onReplyStart`) still fire immediately on enqueue so user experience is unchanged while we wait our turn. ## Queue modes (per surface) Inbound messages can steer the current run, wait for a followup turn, or do both: - `steer`: inject immediately into the current run (cancels pending tool calls after the next tool boundary). If not streaming, falls back to followup. - `followup`: enqueue for the next agent turn after the current run ends. - `collect`: coalesce all queued messages into a **single** followup turn (default). - `steer-backlog` (aka `steer+backlog`): steer now **and** preserve the message for a followup turn. - `interrupt` (legacy): abort the active run for that session, then run the newest message. - `queue` (legacy alias): same as `steer`. Steer-backlog means you can get a followup response after the steered run, so streaming surfaces can look like duplicates. Prefer `collect`/`steer` if you want one response per inbound message. Inline fix: `/queue collect` (per-session) or set `routing.queue.bySurface.discord: "collect"`. Defaults (when unset in config): - All surfaces → `collect` Configure globally or per surface via `routing.queue`: ```json5 { routing: { queue: { mode: "collect", debounceMs: 1000, cap: 20, drop: "summarize", bySurface: { discord: "collect" } } } } ``` ## Queue options Options apply to `followup`, `collect`, and `steer-backlog` (and to `steer` when it falls back to followup): - `debounceMs`: wait for quiet before starting a followup turn (prevents “continue, continue”). - `cap`: max queued messages per session. - `drop`: overflow policy (`old`, `new`, `summarize`). Summarize keeps a short bullet list of dropped messages and injects it as a synthetic followup prompt. Defaults: `debounceMs: 1000`, `cap: 20`, `drop: summarize`. ## Per-session overrides - `/queue ` as a standalone command stores the mode for the current session. - `/queue ` embedded in a message applies **once** (no persistence). - Options can be combined: `/queue collect debounce:2s cap:25 drop:summarize` - `/queue default` or `/queue reset` clears the session override. ## Scope and guarantees - Applies only to config-driven command replies; plain text replies are unaffected. - Default lane (`main`) is process-wide for inbound + main heartbeats; set `agent.maxConcurrent` to allow multiple sessions in parallel. - Additional lanes may exist (e.g. `cron`) so background jobs can run in parallel without blocking inbound replies. - Per-session lanes guarantee that only one agent run touches a given session at a time. - No external dependencies or background worker threads; pure TypeScript + promises. ## Troubleshooting - If commands seem stuck, enable verbose logs and look for “queued for …ms” lines to confirm the queue is draining. - `enqueueCommand` exposes a lightweight `getQueueSize()` helper if you need to surface queue depth in future diagnostics.