clawdbot/extensions/open-prose/skills/prose/state/sqlite.md

---
role: sqlite-state-management
status: experimental
summary: |
  SQLite-based state management for OpenProse programs. This approach persists
  execution state to a SQLite database, enabling structured queries, atomic
  transactions, and flexible schema evolution.
requires: sqlite3 CLI tool in PATH
see-also:
  - ../prose.md: VM execution semantics
  - filesystem.md: File-based state (default, more prescriptive)
  - in-context.md: In-context state (for simple programs)
  - ../primitives/session.md: Session context and compaction guidelines
---

# SQLite State Management (Experimental)

This document describes how the OpenProse VM tracks execution state using a **SQLite database**. This is an experimental alternative to file-based state (`filesystem.md`) and in-context state (`in-context.md`).

## Prerequisites

**Requires:** The `sqlite3` command-line tool must be available in your PATH.

| Platform | Installation |
|----------|--------------|
| macOS | Pre-installed |
| Linux | `apt install sqlite3` / `dnf install sqlite3` / etc. |
| Windows | `winget install SQLite.SQLite` or download from sqlite.org |

If `sqlite3` is not available, the VM will fall back to filesystem state and warn the user.

---

## Overview

SQLite state provides:

- **Atomic transactions**: State changes are ACID-compliant
- **Structured queries**: Find specific bindings, filter by status, aggregate results
- **Flexible schema**: Add columns and tables as needed
- **Single-file portability**: The entire run state is one `.db` file
- **Concurrent access**: SQLite handles locking automatically

**Key principle:** The database is a flexible workspace. The VM and subagents share it as a coordination mechanism, not a rigid contract.

---

## Database Location

The database lives within the standard run directory:

```
.prose/runs/{YYYYMMDD}-{HHMMSS}-{random}/
├── state.db          # SQLite database (this file)
├── program.prose     # Copy of running program
└── attachments/      # Large outputs that don't fit in DB (optional)
```

**Run ID format:** Same as filesystem state: `{YYYYMMDD}-{HHMMSS}-{random6}`

Example: `.prose/runs/20260116-143052-a7b3c9/state.db`

### Project-Scoped and User-Scoped Agents

Execution-scoped agents (the default) live in the per-run `state.db`. However, **project-scoped agents** (`persist: project`) and **user-scoped agents** (`persist: user`) must survive across runs.

For project-scoped agents, use a separate database:

```
.prose/
├── agents.db                 # Project-scoped agent memory (survives runs)
└── runs/
    └── {id}/
        └── state.db          # Execution-scoped state (dies with run)
```

For user-scoped agents, use a database in the home directory:

```
~/.prose/
└── agents.db                 # User-scoped agent memory (survives across projects)
```

The `agents` and `agent_segments` tables for project-scoped agents live in `.prose/agents.db`, and for user-scoped agents live in `~/.prose/agents.db`. The VM initializes these databases on first use and provides the correct path to subagents.

---

## Responsibility Separation

This section defines **who does what**. This is the contract between the VM and subagents.

### VM Responsibilities

The VM (the orchestrating agent running the .prose program) is responsible for:

| Responsibility | Description |
|----------------|-------------|
| **Database creation** | Create `state.db` and initialize core tables at run start |
| **Program registration** | Store the program source and metadata |
| **Execution tracking** | Update position, status, and timing as statements execute |
| **Subagent spawning** | Spawn sessions via Task tool with database path and instructions |
| **Parallel coordination** | Track branch status, implement join strategies |
| **Loop management** | Track iteration counts, evaluate conditions |
| **Error aggregation** | Record failures, manage retry state |
| **Context preservation** | Maintain sufficient narration in the main conversation thread so execution can be understood and resumed |
| **Completion detection** | Mark the run as complete when finished |

**Critical:** The VM must preserve enough context in its own conversation to understand execution state without re-reading the entire database. The database is for coordination and persistence, not a replacement for working memory.

### Subagent Responsibilities

Subagents (sessions spawned by the VM) are responsible for:

| Responsibility | Description |
|----------------|-------------|
| **Writing own outputs** | Insert/update their binding in the `bindings` table |
| **Memory management** | For persistent agents: read and update their memory record |
| **Segment recording** | For persistent agents: append segment history |
| **Attachment handling** | Write large outputs to `attachments/` directory, store path in DB |
| **Atomic writes** | Use transactions when updating multiple related records |

**Critical:** Subagents write ONLY to `bindings`, `agents`, and `agent_segments` tables. The VM owns the `execution` table entirely. Completion signaling happens through the substrate (Task tool return), not database updates.

**Critical:** Subagents must write their outputs directly to the database. The VM does not write subagent outputs—it only reads them after the subagent completes.

**What subagents return to the VM:** A confirmation message with the binding location—not the full content:

**Root scope:**
```
Binding written: research
Location: .prose/runs/20260116-143052-a7b3c9/state.db (bindings table, name='research', execution_id=NULL)
Summary: AI safety research covering alignment, robustness, and interpretability with 15 citations.
```

**Inside block invocation:**
```
Binding written: result
Location: .prose/runs/20260116-143052-a7b3c9/state.db (bindings table, name='result', execution_id=43)
Execution ID: 43
Summary: Processed chunk into 3 sub-parts for recursive processing.
```

The VM tracks locations, not values. This keeps the VM's context lean and enables arbitrarily large intermediate values.

### Shared Concerns

| Concern | Who Handles |
|---------|-------------|
| Schema evolution | Either (use `CREATE TABLE IF NOT EXISTS`, `ALTER TABLE` as needed) |
| Custom tables | Either (prefix with `x_` for extensions) |
| Indexing | Either (add indexes for frequently-queried columns) |
| Cleanup | VM (at run end, optionally vacuum) |

---

## Core Schema

The VM initializes these tables. This is a **minimum viable schema**—extend freely.

```sql
-- Run metadata
CREATE TABLE IF NOT EXISTS run (
    id TEXT PRIMARY KEY,
    program_path TEXT,
    program_source TEXT,
    started_at TEXT DEFAULT (datetime('now')),
    updated_at TEXT DEFAULT (datetime('now')),
    status TEXT DEFAULT 'running',  -- running, completed, failed, interrupted
    state_mode TEXT DEFAULT 'sqlite'
);

-- Execution position and history
CREATE TABLE IF NOT EXISTS execution (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    statement_index INTEGER,
    statement_text TEXT,
    status TEXT,  -- pending, executing, completed, failed, skipped
    started_at TEXT,
    completed_at TEXT,
    error_message TEXT,
    parent_id INTEGER REFERENCES execution(id),  -- for nested blocks
    metadata TEXT  -- JSON for construct-specific data (loop iteration, parallel branch, etc.)
);

-- All named values (input, output, let, const)
CREATE TABLE IF NOT EXISTS bindings (
    name TEXT,
    execution_id INTEGER,  -- NULL for root scope, non-null for block invocations
    kind TEXT,  -- input, output, let, const
    value TEXT,
    source_statement TEXT,
    created_at TEXT DEFAULT (datetime('now')),
    updated_at TEXT DEFAULT (datetime('now')),
    attachment_path TEXT,  -- if value is too large, store path to file
    PRIMARY KEY (name, IFNULL(execution_id, -1))  -- IFNULL handles NULL for root scope
);

-- Persistent agent memory
CREATE TABLE IF NOT EXISTS agents (
    name TEXT PRIMARY KEY,
    scope TEXT,  -- execution, project, user, custom
    memory TEXT,
    created_at TEXT DEFAULT (datetime('now')),
    updated_at TEXT DEFAULT (datetime('now'))
);

-- Agent invocation history
CREATE TABLE IF NOT EXISTS agent_segments (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    agent_name TEXT REFERENCES agents(name),
    segment_number INTEGER,
    timestamp TEXT DEFAULT (datetime('now')),
    prompt TEXT,
    summary TEXT,
    UNIQUE(agent_name, segment_number)
);

-- Import registry
CREATE TABLE IF NOT EXISTS imports (
    alias TEXT PRIMARY KEY,
    source_url TEXT,
    fetched_at TEXT,
    inputs_schema TEXT,  -- JSON
    outputs_schema TEXT  -- JSON
);
```

### Schema Conventions

- **Timestamps**: Use ISO 8601 format (`datetime('now')`)
- **JSON fields**: Store structured data as JSON text in `metadata`, `*_schema` columns
- **Large values**: If a binding value exceeds ~100KB, write to `attachments/{name}.md` and store path
- **Extension tables**: Prefix with `x_` (e.g., `x_metrics`, `x_audit_log`)
- **Anonymous bindings**: Sessions without explicit capture (`session "..."` without `let x =`) use auto-generated names: `anon_001`, `anon_002`, etc.
- **Import bindings**: Prefix with import alias for scoping: `research.findings`, `research.sources`
- **Scoped bindings**: Use `execution_id` column—NULL for root scope, non-null for block invocations

### Scope Resolution Query

For recursive blocks, bindings are scoped to their execution frame. Resolve variables by walking up the call stack:

```sql
-- Find binding 'result' starting from execution_id 43
WITH RECURSIVE scope_chain AS (
  -- Start with current execution
  SELECT id, parent_id FROM execution WHERE id = 43
  UNION ALL
  -- Walk up to parent
  SELECT e.id, e.parent_id
  FROM execution e
  JOIN scope_chain s ON e.id = s.parent_id
)
SELECT b.* FROM bindings b
LEFT JOIN scope_chain s ON b.execution_id = s.id
WHERE b.name = 'result'
  AND (b.execution_id IN (SELECT id FROM scope_chain) OR b.execution_id IS NULL)
ORDER BY
  CASE WHEN b.execution_id IS NULL THEN 1 ELSE 0 END,  -- Prefer scoped over root
  s.id DESC NULLS LAST  -- Prefer deeper (more local) scope
LIMIT 1;
```

**Simpler version if you know the scope chain:**

```sql
-- Direct lookup: check current scope, then parent, then root
SELECT * FROM bindings
WHERE name = 'result'
  AND (execution_id = 43 OR execution_id = 42 OR execution_id IS NULL)
ORDER BY execution_id DESC NULLS LAST
LIMIT 1;
```

---

## Database Interaction

Both VM and subagents interact via the `sqlite3` CLI.

### From the VM

```bash
# Initialize database
sqlite3 .prose/runs/20260116-143052-a7b3c9/state.db "CREATE TABLE IF NOT EXISTS..."

# Update execution position
sqlite3 .prose/runs/20260116-143052-a7b3c9/state.db "
  INSERT INTO execution (statement_index, statement_text, status, started_at)
  VALUES (3, 'session \"Research AI safety\"', 'executing', datetime('now'))
"

# Read a binding
sqlite3 -json .prose/runs/20260116-143052-a7b3c9/state.db "
  SELECT value FROM bindings WHERE name = 'research'
"

# Check parallel branch status
sqlite3 .prose/runs/20260116-143052-a7b3c9/state.db "
  SELECT statement_text, status FROM execution
  WHERE json_extract(metadata, '$.parallel_id') = 'p1'
"
```

### From Subagents

The VM provides the database path and instructions when spawning:

**Root scope (outside block invocations):**

```
Your output database is:
  .prose/runs/20260116-143052-a7b3c9/state.db

When complete, write your output:

sqlite3 .prose/runs/20260116-143052-a7b3c9/state.db "
  INSERT OR REPLACE INTO bindings (name, execution_id, kind, value, source_statement, updated_at)
  VALUES (
    'research',
    NULL,  -- root scope
    'let',
    'AI safety research covers alignment, robustness...',
    'let research = session: researcher',
    datetime('now')
  )
"
```

**Inside block invocation (include execution_id):**

```
Execution scope:
  execution_id: 43
  block: process
  depth: 3

Your output database is:
  .prose/runs/20260116-143052-a7b3c9/state.db

When complete, write your output:

sqlite3 .prose/runs/20260116-143052-a7b3c9/state.db "
  INSERT OR REPLACE INTO bindings (name, execution_id, kind, value, source_statement, updated_at)
  VALUES (
    'result',
    43,  -- scoped to this execution
    'let',
    'Processed chunk into 3 sub-parts...',
    'let result = session \"Process chunk\"',
    datetime('now')
  )
"
```

For persistent agents (execution-scoped):

```
Your memory is in the database:
  .prose/runs/20260116-143052-a7b3c9/state.db

Read your current state:
  sqlite3 -json .prose/runs/20260116-143052-a7b3c9/state.db "SELECT memory FROM agents WHERE name = 'captain'"

Update when done:
  sqlite3 .prose/runs/20260116-143052-a7b3c9/state.db "UPDATE agents SET memory = '...', updated_at = datetime('now') WHERE name = 'captain'"

Record this segment:
  sqlite3 .prose/runs/20260116-143052-a7b3c9/state.db "INSERT INTO agent_segments (agent_name, segment_number, prompt, summary) VALUES ('captain', 3, '...', '...')"
```

For project-scoped agents, use `.prose/agents.db`. For user-scoped agents, use `~/.prose/agents.db`.

---

## Context Preservation in Main Thread

**This is critical.** The database is for persistence and coordination, but the VM must still maintain conversational context.

### What the VM Must Narrate

Even with SQLite state, the VM should narrate key events in its conversation:

```
[Position] Statement 3: let research = session: researcher
   Spawning session, will write to state.db
   [Task tool call]
[Success] Session complete, binding written to DB
[Binding] research = <stored in state.db>
```

### Why Both?

| Purpose | Mechanism |
|---------|-----------|
| **Working memory** | Conversation narration (what the VM "remembers" without re-querying) |
| **Durable state** | SQLite database (survives context limits, enables resumption) |
| **Subagent coordination** | SQLite database (shared access point) |
| **Debugging/inspection** | SQLite database (queryable history) |

The narration is the VM's "mental model" of execution. The database is the "source of truth" for resumption and inspection.

---

## Parallel Execution

For parallel blocks, the VM uses the `metadata` JSON field to track branches. **Only the VM writes to the `execution` table.**

```sql
-- VM marks parallel start
INSERT INTO execution (statement_index, statement_text, status, metadata)
VALUES (5, 'parallel:', 'executing', '{"parallel_id": "p1", "strategy": "all", "branches": ["a", "b", "c"]}');

-- VM creates execution record for each branch
INSERT INTO execution (statement_index, statement_text, status, parent_id, metadata)
VALUES (6, 'a = session "Task A"', 'executing', 5, '{"parallel_id": "p1", "branch": "a"}');

-- Subagent writes its output to bindings table (see "From Subagents" section)
-- Task tool signals completion to VM via substrate

-- VM marks branch complete after Task returns
UPDATE execution SET status = 'completed', completed_at = datetime('now')
WHERE json_extract(metadata, '$.parallel_id') = 'p1' AND json_extract(metadata, '$.branch') = 'a';

-- VM checks if all branches complete
SELECT COUNT(*) as pending FROM execution
WHERE json_extract(metadata, '$.parallel_id') = 'p1' AND status != 'completed';
```

---

## Loop Tracking

```sql
-- Loop metadata tracks iteration state
INSERT INTO execution (statement_index, statement_text, status, metadata)
VALUES (10, 'loop until **analysis complete** (max: 5):', 'executing',
  '{"loop_id": "l1", "max_iterations": 5, "current_iteration": 0, "condition": "**analysis complete**"}');

-- Update iteration
UPDATE execution
SET metadata = json_set(metadata, '$.current_iteration', 2),
    updated_at = datetime('now')
WHERE json_extract(metadata, '$.loop_id') = 'l1';
```

---

## Error Handling

```sql
-- Record failure
UPDATE execution
SET status = 'failed',
    error_message = 'Connection timeout after 30s',
    completed_at = datetime('now')
WHERE id = 15;

-- Track retry attempts in metadata
UPDATE execution
SET metadata = json_set(metadata, '$.retry_attempt', 2, '$.max_retries', 3)
WHERE id = 15;
```

---

## Large Outputs

When a binding value is too large for comfortable database storage (>100KB):

1. Write content to `attachments/{binding_name}.md`
2. Store the path in the `attachment_path` column
3. Leave `value` as a summary or null

```sql
INSERT INTO bindings (name, kind, value, attachment_path, source_statement)
VALUES (
  'full_report',
  'let',
  'Full analysis report (847KB) - see attachment',
  'attachments/full_report.md',
  'let full_report = session "Generate comprehensive report"'
);
```

---

## Resuming Execution

To resume an interrupted run:

```sql
-- Find current position
SELECT statement_index, statement_text, status
FROM execution
WHERE status = 'executing'
ORDER BY id DESC LIMIT 1;

-- Get all completed bindings
SELECT name, kind, value, attachment_path FROM bindings;

-- Get agent memory states
SELECT name, memory FROM agents;

-- Check parallel block status
SELECT json_extract(metadata, '$.branch') as branch, status
FROM execution
WHERE json_extract(metadata, '$.parallel_id') IS NOT NULL
  AND parent_id = (SELECT id FROM execution WHERE status = 'executing' AND statement_text LIKE 'parallel:%');
```

---

## Flexibility Encouragement

Unlike filesystem state, SQLite state is intentionally **less prescriptive**. The core schema is a starting point. You are encouraged to:

- **Add columns** to existing tables as needed
- **Create extension tables** (prefix with `x_`)
- **Store custom metrics** (timing, token counts, model info)
- **Build indexes** for your query patterns
- **Use JSON functions** for semi-structured data

Example extensions:

```sql
-- Custom metrics table
CREATE TABLE x_metrics (
    execution_id INTEGER REFERENCES execution(id),
    metric_name TEXT,
    metric_value REAL,
    recorded_at TEXT DEFAULT (datetime('now'))
);

-- Add custom column
ALTER TABLE bindings ADD COLUMN token_count INTEGER;

-- Create index for common query
CREATE INDEX idx_execution_status ON execution(status);
```

The database is your workspace. Use it.

---

## Comparison with Other Modes

| Aspect | filesystem.md | in-context.md | sqlite.md |
|--------|---------------|---------------|-----------|
| **State location** | `.prose/runs/{id}/` files | Conversation history | `.prose/runs/{id}/state.db` |
| **Queryable** | Via file reads | No | Yes (SQL) |
| **Atomic updates** | No | N/A | Yes (transactions) |
| **Schema flexibility** | Rigid file structure | N/A | Flexible (add tables/columns) |
| **Resumption** | Read state.md | Re-read conversation | Query database |
| **Complexity ceiling** | High | Low (<30 statements) | High |
| **Dependency** | None | None | sqlite3 CLI |
| **Status** | Stable | Stable | **Experimental** |

---

## Summary

SQLite state management:

1. Uses a **single database file** per run
2. Provides **clear responsibility separation** between VM and subagents
3. Enables **structured queries** for state inspection
4. Supports **atomic transactions** for reliable updates
5. Allows **flexible schema evolution** as needed
6. Requires the **sqlite3 CLI** tool
7. Is **experimental**—expect changes

The core contract: the VM manages execution flow and spawns subagents; subagents write their own outputs directly to the database. Both maintain the principle that what happens is recorded, and what is recorded can be queried.