feat: unify gateway heartbeat

This commit is contained in:
Peter Steinberger
2025-12-26 02:35:21 +01:00
parent 8f9d7405ed
commit 0d8e0ddc4f
19 changed files with 744 additions and 953 deletions

View File

@@ -130,7 +130,8 @@ Controls the embedded agent runtime (model/thinking/verbose/timeouts).
timeoutSeconds: 600,
mediaMaxMb: 5,
heartbeat: {
every: "30m"
every: "30m",
target: "last"
},
maxConcurrent: 3,
bash: {
@@ -151,6 +152,9 @@ deprecation fallback.
- `every`: duration string (`ms`, `s`, `m`, `h`); default unit minutes. Omit or set
`0m` to disable.
- `model`: optional override model for heartbeat runs (`provider/model`).
- `target`: delivery channel (`last`, `whatsapp`, `telegram`, `none`). Default: `last`.
- `to`: optional recipient override (E.164 for WhatsApp, chat id for Telegram).
- `prompt`: override the default heartbeat body (`HEARTBEAT`).
`agent.bash` configures background bash defaults:
- `backgroundMs`: time before auto-background (ms, default 20000)

View File

@@ -14,7 +14,7 @@ Last updated: 2025-12-13
## Context
Clawdis already has:
- A **periodic reply heartbeat** that runs the agent with `HEARTBEAT` and suppresses `HEARTBEAT_OK` (`src/web/auto-reply.ts`).
- A **gateway heartbeat runner** that runs the agent with `HEARTBEAT` and suppresses `HEARTBEAT_OK` (`src/infra/heartbeat-runner.ts`).
- A lightweight, in-memory **system event queue** (`enqueueSystemEvent`) that is injected into the next **main session** turn (`drainSystemEvents` in `src/auto-reply/reply.ts`).
- A WebSocket **Gateway** daemon that is intended to be always-on (`docs/gateway.md`).
@@ -197,12 +197,12 @@ This yields:
We need a way for the Gateway (or the scheduler) to request an immediate heartbeat without duplicating heartbeat logic.
Design:
- `monitorWebProvider` owns the real `runReplyHeartbeat()` function (it already has all the local state needed).
- Add a small global hook module:
- `setReplyHeartbeatWakeHandler(fn | null)` installed by `monitorWebProvider`
- `requestReplyHeartbeatNow({ reason, coalesceMs? })`
- If the handler is absent (provider not connected), the request is stored as “pending”; the next time the handler is installed, it runs once.
- Coalesce rapid calls and respect the existing “skip when queue busy” behavior (prefer retrying soon vs dropping).
- `startHeartbeatRunner` owns the real heartbeat execution and installs a wake handler.
- Wake hook lives in `src/infra/heartbeat-wake.ts`:
- `setHeartbeatWakeHandler(fn | null)` installed by the heartbeat runner
- `requestHeartbeatNow({ reason, coalesceMs? })`
- If the handler is absent, the request is stored as “pending”; the next time the handler is installed, it runs once.
- Coalesce rapid calls and respect the “skip when queue busy” behavior (retry soon vs dropping).
## Run history log (JSONL)

View File

@@ -3,49 +3,53 @@ summary: "Plan for heartbeat polling messages and notification rules"
read_when:
- Adjusting heartbeat cadence or messaging
---
# Heartbeat polling plan (2025-11-26)
# Heartbeat (Gateway)
Goal: add a simple heartbeat poll for the embedded agent that only notifies users when something matters, using the `HEARTBEAT_OK` sentinel. The heartbeat body we send is `HEARTBEAT` so the model can easily spot it.
Heartbeat runs periodic agent turns in the **main session** so the model can
surface anything that needs attention without spamming the user.
## Prompt contract
- Extend the agent system prompt to explain: “If this is a heartbeat poll and nothing needs attention, reply exactly `HEARTBEAT_OK` and nothing else. For any alert, do **not** include `HEARTBEAT_OK`; just return the alert text.” Heartbeat prompt body is `HEARTBEAT`.
- Keep existing WhatsApp length guidance; forbid burying the sentinel inside alerts.
- Heartbeat body defaults to `HEARTBEAT` (configurable via `agent.heartbeat.prompt`).
- If nothing needs attention, the model must reply **exactly** `HEARTBEAT_OK`.
- For alerts, do **not** include `HEARTBEAT_OK`; return only the alert text.
## Config & defaults
- New config key: `agent.heartbeat` with:
- `every`: duration string (`ms`, `s`, `m`, `h`; default unit minutes). `0m` disables.
- `model`: optional override model (`provider/model`) for heartbeat runs.
- Default: disabled unless `agent.heartbeat.every` is set.
- New optional idle override for heartbeats: `session.heartbeatIdleMinutes` (defaults to `idleMinutes`). Heartbeat skips do **not** update the session `updatedAt` so idle expiry still works.
## Config
## Poller behavior
- When gateway runs with command-mode auto-reply, start a timer with the resolved heartbeat interval.
- Each tick invokes the configured command with a short heartbeat body (e.g., “(heartbeat) summarize any important changes since last turn”) while reusing the active session args so Pi context stays warm.
- Heartbeats never create a new session implicitly: if theres no stored session for the target (fallback path), the heartbeat is skipped instead of starting a fresh Pi session.
- Abort timer on SIGINT/abort of the gateway.
```json5
{
agent: {
heartbeat: {
every: "30m", // duration string: ms|s|m|h (0m disables)
model: "anthropic/claude-opus-4-5",
target: "last", // last | whatsapp | telegram | none
to: "+15551234567", // optional override for whatsapp/telegram
prompt: "HEARTBEAT" // optional override
}
}
}
```
## Sentinel handling
- Trim output. If the trimmed text equals `HEARTBEAT_OK` (case-sensitive) -> skip outbound message.
- Otherwise, send the text/media as normal, stripping the sentinel if it somehow appears.
- Treat empty output as `HEARTBEAT_OK` to avoid spurious pings.
### Fields
- `every`: heartbeat interval (duration string; default unit minutes). Omit or set
to `0m` to disable.
- `model`: optional model override for heartbeat runs (`provider/model`).
- `target`: where heartbeat output is delivered.
- `last` (default): send to the last used external channel.
- `whatsapp` / `telegram`: force the channel (optionally set `to`).
- `none`: do not deliver externally; output stays in the session (WebChat-visible).
- `to`: optional recipient override (E.164 for WhatsApp, chat id for Telegram).
- `prompt`: override the default heartbeat body.
## Logging requirements
- Normal mode: single info line per tick, e.g., `heartbeat: ok (skipped)` or `heartbeat: alert sent (32ms)`.
- `--verbose`: log start/end, command argv, duration, and whether it was skipped/sent/error; include session ID and connection/run IDs via `getChildLogger` for correlation.
- On command failure: warn-level one-liner in normal mode; verbose log includes stdout/stderr snippets.
## Behavior
- Runs in the main session (`session.mainKey`, or `global` when scope is global).
- Uses the main lane queue; if requests are in flight, the wake is retried.
- Empty output or `HEARTBEAT_OK` is treated as “ok” and does **not** keep the
session alive (`updatedAt` is restored).
- If `target` resolves to no external destination (no last route or `none`), the
heartbeat still runs but no outbound message is sent.
## Failure/backoff
- If a heartbeat command errors, log it and retry on the next scheduled tick (no exponential backoff unless command repeatedly fails; keep it simple for now).
## Tests to add
- Unit: sentinel detection (`HEARTBEAT_OK`, empty output, mixed text), skip vs send decision, default interval resolver (30m, override, disable).
- Unit/integration: verbose logger emits start/end lines; normal logger emits a single line.
## Documentation
- Add a short README snippet under configuration showing `agent.heartbeat` and the sentinel rule.
- Expose CLI triggers:
- `clawdis heartbeat` (web provider, defaults to first `routing.allowFrom`; optional `--to` override)
- `--session-id <uuid>` forces resuming a specific session for that heartbeat
- `clawdis gateway --heartbeat-now` to run the gateway loop with an immediate heartbeat
- Gateway supports `--heartbeat-now` to fire once at startup.
- When multiple sessions are active or `routing.allowFrom` is only `"*"`, require `--to <E.164>` or `--all` for manual heartbeats to avoid ambiguous targets.
## Wake hook
- The gateway exposes a heartbeat wake hook so cron/jobs/webhooks can request an
immediate run (`requestHeartbeatNow`).
- `wake` endpoints should enqueue system events and optionally trigger a wake; the
heartbeat runner picks those up on the next tick or immediately.

View File

@@ -86,10 +86,9 @@ Status: WhatsApp Web via Baileys only. Gateway owns the single session.
## Heartbeats
- **Gateway heartbeat** logs connection health (`web.heartbeatSeconds`, default 60s).
- **Reply heartbeat** asks agent on a timer (`agent.heartbeat.every`).
- Uses `HEARTBEAT` prompt + `HEARTBEAT_TOKEN` skip behavior.
- Skips if queue busy or last inbound was a group.
- Falls back to last direct recipient if needed.
- **Agent heartbeat** is global (`agent.heartbeat.*`) and runs in the main session.
- Uses `HEARTBEAT` prompt + `HEARTBEAT_OK` skip behavior.
- Delivery defaults to the last used channel (or configured target).
## Reconnect behavior
- Backoff policy: `web.reconnect`:
@@ -106,6 +105,8 @@ Status: WhatsApp Web via Baileys only. Gateway owns the single session.
- `agent.mediaMaxMb`
- `agent.heartbeat.every`
- `agent.heartbeat.model` (optional override)
- `agent.heartbeat.target`
- `agent.heartbeat.to`
- `session.*` (scope, idle, store, mainKey)
- `web.heartbeatSeconds`
- `web.reconnect.*`