16 KiB
summary, read_when
| summary | read_when | |||
|---|---|---|---|---|
| RFC: Cron jobs + wakeups for Clawd/Clawdbot (main vs isolated sessions) |
|
RFC: Cron jobs + wakeups for Clawd
Status: Draft
Last updated: 2025-12-13
Context
Clawdbot already has:
- A gateway heartbeat runner that runs the agent with
HEARTBEATand suppressesHEARTBEAT_OK(src/infra/heartbeat-runner.ts). - A lightweight, in-memory system event queue (
enqueueSystemEvent) that is injected into the next main session turn (drainSystemEventsinsrc/auto-reply/reply.ts). - A WebSocket Gateway daemon that is intended to be always-on (
docs/gateway.md).
This RFC adds a small “cron job system” so Clawd can schedule future work and reliably wake itself up:
- Delayed: run on the next normal heartbeat tick
- Immediate: run now (trigger a heartbeat immediately)
- Isolated jobs: optionally run in their own session that does not pollute the main session and can run concurrently (within configured limits).
Goals
- Provide a persistent job store and an in-process scheduler owned by the Gateway.
- Allow each job to target either:
sessionTarget: "main": inject asSystem:lines and rely on the main heartbeat (or trigger it immediately).sessionTarget: "isolated": run an agent turn in a dedicated session key (job session), optionally delivering a message and/or posting a summary back to main.
- Expose a stable control surface:
- Gateway methods (
cron.*,wake) for programmatic usage (mac app, CLI, agents). - CLI commands (
clawdbot cron ...) to add/remove/edit/list and to debugrun.
- Gateway methods (
- Produce clear, structured logs for job lifecycle and execution outcomes.
Non-goals (v1)
- Multi-host distributed scheduling.
- Exactly-once semantics across crashes (we aim for “at-least-once with idempotency hooks”).
- A full Unix-cron parser as the only schedule format (we can support it, but v1 should not require complex cron features to be useful).
Terminology
- Wake: a request to ensure the agent gets a turn soon (either right now or next heartbeat).
- Main session: the canonical session bucket (default key
"main") that receivesSystem:events. - Isolated session: a per-job session key (e.g.
cron:<jobId>) with its own session id / session file.
User stories
- “Remind me in 20 minutes” → add a one-shot job that triggers an immediate heartbeat at T+20m.
- “Every weekday at 7:30, wake me up and start music” → recurring job, isolated session, deliver to WhatsApp.
- “Every hour, check battery; only interrupt me if < 20%” → isolated job that decides whether to deliver; may also post a brief status to main.
- “Next heartbeat, please check calendar” → delayed wake targeting main session.
Job model
Storage schema (v1)
Each job is a JSON object with stable keys (unknown keys ignored for forward compatibility):
id: string(UUID)name: string(required)description?: string(optional)enabled: booleancreatedAtMs: numberupdatedAtMs: numberschedule(one of){"kind":"at","atMs":number}(one-shot){"kind":"every","everyMs":number,"anchorMs"?:number}(simple interval){"kind":"cron","expr":string,"tz"?:string}(optional; see “Schedule parsing”)
sessionTarget: "main" | "isolated"wakeMode: "next-heartbeat" | "now"- For
sessionTarget:"isolated",wakeMode:"now"means “run immediately when due”. - For
sessionTarget:"main",wakeModecontrols whether we trigger the heartbeat immediately or just enqueue and wait.
- For
payload(one of){"kind":"systemEvent","text":string}(enqueue asSystem:){"kind":"agentTurn","message":string,"deliver"?:boolean,"channel"?: "last"|"whatsapp"|"telegram"|"discord"|"signal"|"imessage","to"?:string,"timeoutSeconds"?:number}
isolation(optional; only meaningful for isolated jobs){"postToMainPrefix"?: string}
runtime(optional){"maxAttempts"?:number,"retryBackoffMs"?:number}(best-effort retries; defaults off)
state(runtime-maintained){"nextRunAtMs":number,"lastRunAtMs"?:number,"lastStatus"?: "ok"|"error"|"skipped","lastError"?:string,"lastDurationMs"?:number}
Key behavior
sessionTarget:"main"jobs always enqueuepayload.kind:"systemEvent"(directly or derived fromagentTurnresults; see below).sessionTarget:"isolated"jobs create/use a stable session key:cron:<jobId>.
Storage location
Cron persists everything under ~/.clawdbot/cron/:
- Job store:
~/.clawdbot/cron/jobs.json - Run history:
~/.clawdbot/cron/runs/<jobId>.jsonl
You can override the job store path via cron.store in config.
The scheduler should never require additional configuration for the base directory (Clawdbot already treats ~/.clawdbot as fixed).
Enabling
Cron execution is enabled by default inside the Gateway.
To disable it, set:
{
cron: {
enabled: false,
// optional:
store: "~/.clawdbot/cron/jobs.json",
maxConcurrentRuns: 1
}
}
You can also disable scheduling via the environment variable CLAWDBOT_SKIP_CRON=1.
Scheduler design
Ownership
The Gateway owns:
- the scheduler timer,
- job store reads/writes,
- job execution (enqueue system events and/or agent turns).
This keeps scheduling unified with the always-on process and prevents “two schedulers” when multiple CLIs run.
Timer strategy
- Maintain an in-memory heap/array of enabled jobs keyed by
state.nextRunAtMs. - Use a single
setTimeoutto wake at the earliest next run. - On wake:
- compute all due jobs (now >= nextRunAtMs),
- mark them “in flight” (in memory),
- persist updated
state(at least bumpnextRunAtMs/lastRunAtMs) before starting execution to minimize duplicate runs on crash, - execute jobs (with concurrency limits),
- persist final
lastStatus/lastError/lastDurationMs, - re-arm timer for the next earliest run.
Schedule parsing
V1 can ship with at + every without extra deps.
If we add "kind":"cron":
- Use a well-maintained parser (we use
croner) and support:- 5-field cron (
min hour dom mon dow) at minimum - optional
tz
- 5-field cron (
- Store
nextRunAtMscomputed by the parser; re-compute after each run.
Execution semantics
Main session jobs
Main session jobs do not run the agent directly by default.
When due:
enqueueSystemEvent(job.payload.text)(or a derived message)- If
wakeMode:"now", trigger an immediate heartbeat run (see “Heartbeat wake hook”). - Otherwise do nothing else (the next scheduled heartbeat will pick up the system event).
Why: This keeps the main session’s “proactive” behavior centralized in the heartbeat rules and avoids ad-hoc agent turns that might fight with inbound message processing.
Isolated session jobs
Isolated jobs run an agent turn in a dedicated session key, intended to be separate from main.
When due:
- Build a message body that includes schedule metadata, e.g.:
"[cron:<jobId>] <job.name>: <payload.message>"
- Execute via the same agent runner path as other command-mode runs, but pinned to:
sessionKey = cron:<jobId>sessionId = store[sessionKey].sessionId(create if missing)
- Optionally deliver output (
payload.deliver === true) to the configured channel/to. - Isolated jobs always enqueue a summary system event to the main session when they finish (derived from the last agent text output).
- Prefix defaults to
Cron, and can be customized viaisolation.postToMainPrefix.
- Prefix defaults to
- If
deliveris omitted/false, nothing is sent to external providers; you still get the main-session summary and can inspect the full isolated transcript incron:<jobId>.
“Run in parallel to main”
Clawdbot currently serializes command execution through a global in-process queue (src/process/command-queue.ts) to avoid collisions.
To support isolated cron jobs running “in parallel”, we should introduce lanes (keyed queues) plus a global concurrency cap:
- Lane
"main": inbound auto-replies + main heartbeat. - Lane
"cron"(orcron:<jobId>): isolated jobs. - Configurable
cron.maxConcurrentRuns(default 1 or 2).
This yields:
- isolated jobs can overlap with the main lane (up to cap),
- each lane still preserves ordering for its own work (optional),
- we retain safety knobs to prevent runaway resource contention.
Heartbeat wake hook (immediate vs next heartbeat)
We need a way for the Gateway (or the scheduler) to request an immediate heartbeat without duplicating heartbeat logic.
Design:
startHeartbeatRunnerowns the real heartbeat execution and installs a wake handler.- Wake hook lives in
src/infra/heartbeat-wake.ts:setHeartbeatWakeHandler(fn | null)installed by the heartbeat runnerrequestHeartbeatNow({ reason, coalesceMs? })
- If the handler is absent, the request is stored as “pending”; the next time the handler is installed, it runs once.
- Coalesce rapid calls and respect the “skip when queue busy” behavior (retry soon vs dropping).
Run history log (JSONL)
In addition to normal structured logs, the Gateway writes an append-only run history “ledger” (JSONL) whenever a job finishes. This is intended for quick debugging (“did the job run, when, and what happened?”).
Path rules:
- Run logs are stored per job next to the store:
.../runs/<jobId>.jsonl.
Retention:
- Best-effort pruning when the file grows beyond ~2MB; keep the newest ~2000 lines.
Each log line includes (at minimum) job id, status/error, timing, and a summary string (systemEvent text for main jobs, and the last agent text output for isolated jobs).
Compatibility policy (cron.add/cron.update)
To keep older clients working, the Gateway applies best-effort normalization for cron.add and cron.update:
- Accepts wrapped payloads under
dataorjoband unwraps them. - Infers
schedule.kindfromatMs,everyMs, orexprif missing. - Infers
payload.kindfromtext(systemEvent) ormessage(agentTurn) if missing. - Defaults
wakeModeto"next-heartbeat"when omitted. - Defaults
sessionTargetbased on payload kind (systemEvent→"main",agentTurn→"isolated").
Normalization is compat-only. New clients should send the full schema (including kind, sessionTarget, and wakeMode) to avoid ambiguity. Unknown fields are still rejected by schema validation.
Gateway API
New methods (names can be bikeshed; cron.* is suggested):
-
wake- params:
{ mode: "now" | "next-heartbeat", text: string } - effect:
enqueueSystemEvent(text), plus optional immediate heartbeat trigger
- params:
-
cron.list- params: optional
{ includeDisabled?: boolean } - returns:
{ jobs: CronJob[] }
- params: optional
-
cron.add- params: job payload without
id/state(server generates and returns created job)
- params: job payload without
-
cron.update- params:
{ id: string, patch: Partial<CronJobWritableFields> }
- params:
-
cron.remove- params:
{ id: string }
- params:
-
cron.run- params:
{ id: string, mode?: "due" | "force" }(debugging; does not change schedule unlessforcerequires it)
- params:
-
cron.runs- params:
{ id: string, limit?: number } - returns:
{ entries: CronRunLogEntry[] } - note:
idis required (runs are stored per-job).
- params:
The Gateway should broadcast a cron event for UI/debug:
- event:
cron- payload:
{ jobId, action: "added"|"updated"|"removed"|"started"|"finished", status?, error?, nextRunAtMs? }
- payload:
CLI surface
Add a cron command group (all commands should also support --json where sensible):
-
clawdbot cron list [--json] [--all] -
clawdbot cron add ...- schedule flags:
--at <iso8601|ms|relative>(one-shot)--every <duration>(e.g.10m,1h)--cron "<expr>" [--tz "<tz>"]
- target flags:
--session main|isolated--wake now|next-heartbeat
- payload flags (choose one):
--system-event "<text>"--message "<agent message>" [--deliver] [--channel last|whatsapp|telegram|discord|slack|signal|imessage] [--to <dest>]
- schedule flags:
-
clawdbot cron edit <id> ...(patch-by-flags, non-interactive) -
clawdbot cron rm <id> -
clawdbot cron enable <id>/clawdbot cron disable <id> -
clawdbot cron run <id> [--force](debug) -
clawdbot cron runs --id <id> [--limit <n>](run history) -
clawdbot cron status(scheduler enabled + next wake)
Additionally:
clawdbot wake --mode now|next-heartbeat --text "<text>"as a thin wrapper aroundwakefor agents to call.
Examples
Run once at a specific time
One-shot reminder that targets the main session and triggers a heartbeat immediately at the scheduled time:
clawdbot cron add \
--at "2025-12-14T07:00:00-08:00" \
--session main \
--wake now \
--system-event "Alarm: wake up (meeting in 30 minutes)."
Run daily (calendar-accurate)
Daily at 07:00 in a specific timezone (preferred over “every 24h” to avoid DST drift):
clawdbot cron add \
--cron "0 7 * * *" \
--tz "America/Los_Angeles" \
--session isolated \
--wake now \
--message "Daily check: scan calendar + inbox; deliver only if urgent." \
--deliver \
--channel last
Run weekly (every Wednesday)
Every Wednesday at 09:00:
clawdbot cron add \
--cron "0 9 * * 3" \
--tz "America/Los_Angeles" \
--session isolated \
--wake now \
--message "Weekly: summarize status and remind me of goals." \
--deliver \
--channel last
“Next heartbeat”
Enqueue a note for the main session but let the existing heartbeat cadence pick it up:
clawdbot wake --mode next-heartbeat --text "Next heartbeat: check battery + upcoming meetings."
Logging & observability
Logging requirements:
- Use
getChildLogger({ module: "cron", jobId, runId, name })for every run. - Log lifecycle:
- store load/save (debug; include job count)
- schedule recompute (debug; include nextRunAt)
- job start/end (info)
- job skipped (info; include reason)
- job error (warn; include error + stack where available)
- Emit a concise user-facing line to stdout when running in CLI mode (similar to heartbeat logs).
Suggested log events:
cron: scheduler started(jobCount, nextWakeAt)cron: job started(jobId, scheduleKind, sessionTarget, wakeMode)cron: job finished(status, durationMs, nextRunAtMs)- When
cron.enabledis false, the Gateway logscron: disabledand jobs will not run automatically (the CLI warns oncron add/cron edit). - Use
clawdbot cron statusto confirm the scheduler is enabled and see the next wake time.
Safety & security
- Respect existing allowlists/routing rules: delivery defaults should not send to arbitrary destinations unless explicitly configured.
- Provide a global “kill switch”:
cron.enabled: boolean(defaulttrue).gateway method set-heartbeatsalready exists; cron should have similar.
- Avoid persistence of sensitive payloads unless requested; job text may contain private content.
Testing plan (v1)
- Unit tests:
- schedule computation for
atandevery - job store read/write + migration behavior
- lane concurrency: main vs cron overlap is bounded
- “wake now” coalescing and pending behavior when provider not ready
- schedule computation for
- Integration tests:
- start Gateway with
CLAWDBOT_SKIP_PROVIDERS=1, add jobs, list/edit/remove - simulate due jobs and assert
enqueueSystemEventcalled + cron events broadcast
- start Gateway with
Rollout plan
- Add the
wakeprimitive + heartbeat wake hook (no persistent jobs yet). - Add
cron.*API and CLI wrappers withat+every. - Add optional cron expression parsing (
kind:"cron") if needed. - Add UI surfacing in WebChat/macOS app (optional).