--- role: sqlite-state-management status: experimental summary: | SQLite-based state management for OpenProse programs. This approach persists execution state to a SQLite database, enabling structured queries, atomic transactions, and flexible schema evolution. requires: sqlite3 CLI tool in PATH see-also: - ../prose.md: VM execution semantics - filesystem.md: File-based state (default, more prescriptive) - in-context.md: In-context state (for simple programs) - ../primitives/session.md: Session context and compaction guidelines --- # SQLite State Management (Experimental) This document describes how the OpenProse VM tracks execution state using a **SQLite database**. This is an experimental alternative to file-based state (`filesystem.md`) and in-context state (`in-context.md`). ## Prerequisites **Requires:** The `sqlite3` command-line tool must be available in your PATH. | Platform | Installation | |----------|--------------| | macOS | Pre-installed | | Linux | `apt install sqlite3` / `dnf install sqlite3` / etc. | | Windows | `winget install SQLite.SQLite` or download from sqlite.org | If `sqlite3` is not available, the VM will fall back to filesystem state and warn the user. --- ## Overview SQLite state provides: - **Atomic transactions**: State changes are ACID-compliant - **Structured queries**: Find specific bindings, filter by status, aggregate results - **Flexible schema**: Add columns and tables as needed - **Single-file portability**: The entire run state is one `.db` file - **Concurrent access**: SQLite handles locking automatically **Key principle:** The database is a flexible workspace. The VM and subagents share it as a coordination mechanism, not a rigid contract. --- ## Database Location The database lives within the standard run directory: ``` .prose/runs/{YYYYMMDD}-{HHMMSS}-{random}/ ├── state.db # SQLite database (this file) ├── program.prose # Copy of running program └── attachments/ # Large outputs that don't fit in DB (optional) ``` **Run ID format:** Same as filesystem state: `{YYYYMMDD}-{HHMMSS}-{random6}` Example: `.prose/runs/20260116-143052-a7b3c9/state.db` ### Project-Scoped and User-Scoped Agents Execution-scoped agents (the default) live in the per-run `state.db`. However, **project-scoped agents** (`persist: project`) and **user-scoped agents** (`persist: user`) must survive across runs. For project-scoped agents, use a separate database: ``` .prose/ ├── agents.db # Project-scoped agent memory (survives runs) └── runs/ └── {id}/ └── state.db # Execution-scoped state (dies with run) ``` For user-scoped agents, use a database in the home directory: ``` ~/.prose/ └── agents.db # User-scoped agent memory (survives across projects) ``` The `agents` and `agent_segments` tables for project-scoped agents live in `.prose/agents.db`, and for user-scoped agents live in `~/.prose/agents.db`. The VM initializes these databases on first use and provides the correct path to subagents. --- ## Responsibility Separation This section defines **who does what**. This is the contract between the VM and subagents. ### VM Responsibilities The VM (the orchestrating agent running the .prose program) is responsible for: | Responsibility | Description | |----------------|-------------| | **Database creation** | Create `state.db` and initialize core tables at run start | | **Program registration** | Store the program source and metadata | | **Execution tracking** | Update position, status, and timing as statements execute | | **Subagent spawning** | Spawn sessions via Task tool with database path and instructions | | **Parallel coordination** | Track branch status, implement join strategies | | **Loop management** | Track iteration counts, evaluate conditions | | **Error aggregation** | Record failures, manage retry state | | **Context preservation** | Maintain sufficient narration in the main conversation thread so execution can be understood and resumed | | **Completion detection** | Mark the run as complete when finished | **Critical:** The VM must preserve enough context in its own conversation to understand execution state without re-reading the entire database. The database is for coordination and persistence, not a replacement for working memory. ### Subagent Responsibilities Subagents (sessions spawned by the VM) are responsible for: | Responsibility | Description | |----------------|-------------| | **Writing own outputs** | Insert/update their binding in the `bindings` table | | **Memory management** | For persistent agents: read and update their memory record | | **Segment recording** | For persistent agents: append segment history | | **Attachment handling** | Write large outputs to `attachments/` directory, store path in DB | | **Atomic writes** | Use transactions when updating multiple related records | **Critical:** Subagents write ONLY to `bindings`, `agents`, and `agent_segments` tables. The VM owns the `execution` table entirely. Completion signaling happens through the substrate (Task tool return), not database updates. **Critical:** Subagents must write their outputs directly to the database. The VM does not write subagent outputs—it only reads them after the subagent completes. **What subagents return to the VM:** A confirmation message with the binding location—not the full content: **Root scope:** ``` Binding written: research Location: .prose/runs/20260116-143052-a7b3c9/state.db (bindings table, name='research', execution_id=NULL) Summary: AI safety research covering alignment, robustness, and interpretability with 15 citations. ``` **Inside block invocation:** ``` Binding written: result Location: .prose/runs/20260116-143052-a7b3c9/state.db (bindings table, name='result', execution_id=43) Execution ID: 43 Summary: Processed chunk into 3 sub-parts for recursive processing. ``` The VM tracks locations, not values. This keeps the VM's context lean and enables arbitrarily large intermediate values. ### Shared Concerns | Concern | Who Handles | |---------|-------------| | Schema evolution | Either (use `CREATE TABLE IF NOT EXISTS`, `ALTER TABLE` as needed) | | Custom tables | Either (prefix with `x_` for extensions) | | Indexing | Either (add indexes for frequently-queried columns) | | Cleanup | VM (at run end, optionally vacuum) | --- ## Core Schema The VM initializes these tables. This is a **minimum viable schema**—extend freely. ```sql -- Run metadata CREATE TABLE IF NOT EXISTS run ( id TEXT PRIMARY KEY, program_path TEXT, program_source TEXT, started_at TEXT DEFAULT (datetime('now')), updated_at TEXT DEFAULT (datetime('now')), status TEXT DEFAULT 'running', -- running, completed, failed, interrupted state_mode TEXT DEFAULT 'sqlite' ); -- Execution position and history CREATE TABLE IF NOT EXISTS execution ( id INTEGER PRIMARY KEY AUTOINCREMENT, statement_index INTEGER, statement_text TEXT, status TEXT, -- pending, executing, completed, failed, skipped started_at TEXT, completed_at TEXT, error_message TEXT, parent_id INTEGER REFERENCES execution(id), -- for nested blocks metadata TEXT -- JSON for construct-specific data (loop iteration, parallel branch, etc.) ); -- All named values (input, output, let, const) CREATE TABLE IF NOT EXISTS bindings ( name TEXT, execution_id INTEGER, -- NULL for root scope, non-null for block invocations kind TEXT, -- input, output, let, const value TEXT, source_statement TEXT, created_at TEXT DEFAULT (datetime('now')), updated_at TEXT DEFAULT (datetime('now')), attachment_path TEXT, -- if value is too large, store path to file PRIMARY KEY (name, IFNULL(execution_id, -1)) -- IFNULL handles NULL for root scope ); -- Persistent agent memory CREATE TABLE IF NOT EXISTS agents ( name TEXT PRIMARY KEY, scope TEXT, -- execution, project, user, custom memory TEXT, created_at TEXT DEFAULT (datetime('now')), updated_at TEXT DEFAULT (datetime('now')) ); -- Agent invocation history CREATE TABLE IF NOT EXISTS agent_segments ( id INTEGER PRIMARY KEY AUTOINCREMENT, agent_name TEXT REFERENCES agents(name), segment_number INTEGER, timestamp TEXT DEFAULT (datetime('now')), prompt TEXT, summary TEXT, UNIQUE(agent_name, segment_number) ); -- Import registry CREATE TABLE IF NOT EXISTS imports ( alias TEXT PRIMARY KEY, source_url TEXT, fetched_at TEXT, inputs_schema TEXT, -- JSON outputs_schema TEXT -- JSON ); ``` ### Schema Conventions - **Timestamps**: Use ISO 8601 format (`datetime('now')`) - **JSON fields**: Store structured data as JSON text in `metadata`, `*_schema` columns - **Large values**: If a binding value exceeds ~100KB, write to `attachments/{name}.md` and store path - **Extension tables**: Prefix with `x_` (e.g., `x_metrics`, `x_audit_log`) - **Anonymous bindings**: Sessions without explicit capture (`session "..."` without `let x =`) use auto-generated names: `anon_001`, `anon_002`, etc. - **Import bindings**: Prefix with import alias for scoping: `research.findings`, `research.sources` - **Scoped bindings**: Use `execution_id` column—NULL for root scope, non-null for block invocations ### Scope Resolution Query For recursive blocks, bindings are scoped to their execution frame. Resolve variables by walking up the call stack: ```sql -- Find binding 'result' starting from execution_id 43 WITH RECURSIVE scope_chain AS ( -- Start with current execution SELECT id, parent_id FROM execution WHERE id = 43 UNION ALL -- Walk up to parent SELECT e.id, e.parent_id FROM execution e JOIN scope_chain s ON e.id = s.parent_id ) SELECT b.* FROM bindings b LEFT JOIN scope_chain s ON b.execution_id = s.id WHERE b.name = 'result' AND (b.execution_id IN (SELECT id FROM scope_chain) OR b.execution_id IS NULL) ORDER BY CASE WHEN b.execution_id IS NULL THEN 1 ELSE 0 END, -- Prefer scoped over root s.id DESC NULLS LAST -- Prefer deeper (more local) scope LIMIT 1; ``` **Simpler version if you know the scope chain:** ```sql -- Direct lookup: check current scope, then parent, then root SELECT * FROM bindings WHERE name = 'result' AND (execution_id = 43 OR execution_id = 42 OR execution_id IS NULL) ORDER BY execution_id DESC NULLS LAST LIMIT 1; ``` --- ## Database Interaction Both VM and subagents interact via the `sqlite3` CLI. ### From the VM ```bash # Initialize database sqlite3 .prose/runs/20260116-143052-a7b3c9/state.db "CREATE TABLE IF NOT EXISTS..." # Update execution position sqlite3 .prose/runs/20260116-143052-a7b3c9/state.db " INSERT INTO execution (statement_index, statement_text, status, started_at) VALUES (3, 'session \"Research AI safety\"', 'executing', datetime('now')) " # Read a binding sqlite3 -json .prose/runs/20260116-143052-a7b3c9/state.db " SELECT value FROM bindings WHERE name = 'research' " # Check parallel branch status sqlite3 .prose/runs/20260116-143052-a7b3c9/state.db " SELECT statement_text, status FROM execution WHERE json_extract(metadata, '$.parallel_id') = 'p1' " ``` ### From Subagents The VM provides the database path and instructions when spawning: **Root scope (outside block invocations):** ``` Your output database is: .prose/runs/20260116-143052-a7b3c9/state.db When complete, write your output: sqlite3 .prose/runs/20260116-143052-a7b3c9/state.db " INSERT OR REPLACE INTO bindings (name, execution_id, kind, value, source_statement, updated_at) VALUES ( 'research', NULL, -- root scope 'let', 'AI safety research covers alignment, robustness...', 'let research = session: researcher', datetime('now') ) " ``` **Inside block invocation (include execution_id):** ``` Execution scope: execution_id: 43 block: process depth: 3 Your output database is: .prose/runs/20260116-143052-a7b3c9/state.db When complete, write your output: sqlite3 .prose/runs/20260116-143052-a7b3c9/state.db " INSERT OR REPLACE INTO bindings (name, execution_id, kind, value, source_statement, updated_at) VALUES ( 'result', 43, -- scoped to this execution 'let', 'Processed chunk into 3 sub-parts...', 'let result = session \"Process chunk\"', datetime('now') ) " ``` For persistent agents (execution-scoped): ``` Your memory is in the database: .prose/runs/20260116-143052-a7b3c9/state.db Read your current state: sqlite3 -json .prose/runs/20260116-143052-a7b3c9/state.db "SELECT memory FROM agents WHERE name = 'captain'" Update when done: sqlite3 .prose/runs/20260116-143052-a7b3c9/state.db "UPDATE agents SET memory = '...', updated_at = datetime('now') WHERE name = 'captain'" Record this segment: sqlite3 .prose/runs/20260116-143052-a7b3c9/state.db "INSERT INTO agent_segments (agent_name, segment_number, prompt, summary) VALUES ('captain', 3, '...', '...')" ``` For project-scoped agents, use `.prose/agents.db`. For user-scoped agents, use `~/.prose/agents.db`. --- ## Context Preservation in Main Thread **This is critical.** The database is for persistence and coordination, but the VM must still maintain conversational context. ### What the VM Must Narrate Even with SQLite state, the VM should narrate key events in its conversation: ``` [Position] Statement 3: let research = session: researcher Spawning session, will write to state.db [Task tool call] [Success] Session complete, binding written to DB [Binding] research = ``` ### Why Both? | Purpose | Mechanism | |---------|-----------| | **Working memory** | Conversation narration (what the VM "remembers" without re-querying) | | **Durable state** | SQLite database (survives context limits, enables resumption) | | **Subagent coordination** | SQLite database (shared access point) | | **Debugging/inspection** | SQLite database (queryable history) | The narration is the VM's "mental model" of execution. The database is the "source of truth" for resumption and inspection. --- ## Parallel Execution For parallel blocks, the VM uses the `metadata` JSON field to track branches. **Only the VM writes to the `execution` table.** ```sql -- VM marks parallel start INSERT INTO execution (statement_index, statement_text, status, metadata) VALUES (5, 'parallel:', 'executing', '{"parallel_id": "p1", "strategy": "all", "branches": ["a", "b", "c"]}'); -- VM creates execution record for each branch INSERT INTO execution (statement_index, statement_text, status, parent_id, metadata) VALUES (6, 'a = session "Task A"', 'executing', 5, '{"parallel_id": "p1", "branch": "a"}'); -- Subagent writes its output to bindings table (see "From Subagents" section) -- Task tool signals completion to VM via substrate -- VM marks branch complete after Task returns UPDATE execution SET status = 'completed', completed_at = datetime('now') WHERE json_extract(metadata, '$.parallel_id') = 'p1' AND json_extract(metadata, '$.branch') = 'a'; -- VM checks if all branches complete SELECT COUNT(*) as pending FROM execution WHERE json_extract(metadata, '$.parallel_id') = 'p1' AND status != 'completed'; ``` --- ## Loop Tracking ```sql -- Loop metadata tracks iteration state INSERT INTO execution (statement_index, statement_text, status, metadata) VALUES (10, 'loop until **analysis complete** (max: 5):', 'executing', '{"loop_id": "l1", "max_iterations": 5, "current_iteration": 0, "condition": "**analysis complete**"}'); -- Update iteration UPDATE execution SET metadata = json_set(metadata, '$.current_iteration', 2), updated_at = datetime('now') WHERE json_extract(metadata, '$.loop_id') = 'l1'; ``` --- ## Error Handling ```sql -- Record failure UPDATE execution SET status = 'failed', error_message = 'Connection timeout after 30s', completed_at = datetime('now') WHERE id = 15; -- Track retry attempts in metadata UPDATE execution SET metadata = json_set(metadata, '$.retry_attempt', 2, '$.max_retries', 3) WHERE id = 15; ``` --- ## Large Outputs When a binding value is too large for comfortable database storage (>100KB): 1. Write content to `attachments/{binding_name}.md` 2. Store the path in the `attachment_path` column 3. Leave `value` as a summary or null ```sql INSERT INTO bindings (name, kind, value, attachment_path, source_statement) VALUES ( 'full_report', 'let', 'Full analysis report (847KB) - see attachment', 'attachments/full_report.md', 'let full_report = session "Generate comprehensive report"' ); ``` --- ## Resuming Execution To resume an interrupted run: ```sql -- Find current position SELECT statement_index, statement_text, status FROM execution WHERE status = 'executing' ORDER BY id DESC LIMIT 1; -- Get all completed bindings SELECT name, kind, value, attachment_path FROM bindings; -- Get agent memory states SELECT name, memory FROM agents; -- Check parallel block status SELECT json_extract(metadata, '$.branch') as branch, status FROM execution WHERE json_extract(metadata, '$.parallel_id') IS NOT NULL AND parent_id = (SELECT id FROM execution WHERE status = 'executing' AND statement_text LIKE 'parallel:%'); ``` --- ## Flexibility Encouragement Unlike filesystem state, SQLite state is intentionally **less prescriptive**. The core schema is a starting point. You are encouraged to: - **Add columns** to existing tables as needed - **Create extension tables** (prefix with `x_`) - **Store custom metrics** (timing, token counts, model info) - **Build indexes** for your query patterns - **Use JSON functions** for semi-structured data Example extensions: ```sql -- Custom metrics table CREATE TABLE x_metrics ( execution_id INTEGER REFERENCES execution(id), metric_name TEXT, metric_value REAL, recorded_at TEXT DEFAULT (datetime('now')) ); -- Add custom column ALTER TABLE bindings ADD COLUMN token_count INTEGER; -- Create index for common query CREATE INDEX idx_execution_status ON execution(status); ``` The database is your workspace. Use it. --- ## Comparison with Other Modes | Aspect | filesystem.md | in-context.md | sqlite.md | |--------|---------------|---------------|-----------| | **State location** | `.prose/runs/{id}/` files | Conversation history | `.prose/runs/{id}/state.db` | | **Queryable** | Via file reads | No | Yes (SQL) | | **Atomic updates** | No | N/A | Yes (transactions) | | **Schema flexibility** | Rigid file structure | N/A | Flexible (add tables/columns) | | **Resumption** | Read state.md | Re-read conversation | Query database | | **Complexity ceiling** | High | Low (<30 statements) | High | | **Dependency** | None | None | sqlite3 CLI | | **Status** | Stable | Stable | **Experimental** | --- ## Summary SQLite state management: 1. Uses a **single database file** per run 2. Provides **clear responsibility separation** between VM and subagents 3. Enables **structured queries** for state inspection 4. Supports **atomic transactions** for reliable updates 5. Allows **flexible schema evolution** as needed 6. Requires the **sqlite3 CLI** tool 7. Is **experimental**—expect changes The core contract: the VM manages execution flow and spawns subagents; subagents write their own outputs directly to the database. Both maintain the principle that what happens is recorded, and what is recorded can be queried.