From 760a83d256d63b55905681cadac60bebca684b73 Mon Sep 17 00:00:00 2001
From: Peter Steinberger <steipete@gmail.com>
Date: Tue, 23 Dec 2025 13:36:43 +0100
Subject: [PATCH] docs: add offline memory system proposal

---
 docs/research/memory.md | 228 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 228 insertions(+)
 create mode 100644 docs/research/memory.md

diff --git a/docs/research/memory.md b/docs/research/memory.md
new file mode 100644
index 000000000..5d64aa2ac
--- /dev/null
+++ b/docs/research/memory.md
@@ -0,0 +1,228 @@
+---
+summary: "Proposal + research notes: offline memory system for Clawd workspaces (Markdown source-of-truth + derived index)"
+read_when:
+  - Designing workspace memory (~/clawd) beyond daily Markdown logs
+  - Deciding: standalone CLI vs deep Clawdis integration
+  - Adding offline recall + reflection (retain/recall/reflect)
+---
+
+# Workspace Memory v2 (offline): proposal + research
+
+Target: Clawd-style workspace (`inbound.workspace`, default `~/clawd`) where “memory” is stored as one Markdown file per day (`memory/YYYY-MM-DD.md`) plus a small set of stable files (e.g. `memory.md`, `SOUL.md`).
+
+This doc proposes an **offline-first** memory architecture that keeps Markdown as the canonical, reviewable source of truth, but adds **structured recall** (search, entity summaries, confidence updates) via a derived index.
+
+## Why change?
+
+The current setup (one file per day) is excellent for:
+- “append-only” journaling
+- human editing
+- git-backed durability + auditability
+- low-friction capture (“just write it down”)
+
+It’s weak for:
+- high-recall retrieval (“what did we decide about X?”, “last time we tried Y?”)
+- entity-centric answers (“tell me about Alice / The Castle / warelay”) without rereading many files
+- opinion/preference stability (and evidence when it changes)
+- time constraints (“what was true during Nov 2025?”) and conflict resolution
+
+## Design goals
+
+- **Offline**: works without network; can run on laptop/Castle; no cloud dependency.
+- **Explainable**: retrieved items should be attributable (file + location) and separable from inference.
+- **Low ceremony**: daily logging stays Markdown, no heavy schema work.
+- **Incremental**: v1 is useful with FTS only; semantic/vector and graphs are optional upgrades.
+- **Agent-friendly**: makes “recall within token budgets” easy (return small bundles of facts).
+
+## North star model (Hindsight × Letta)
+
+Two pieces to blend:
+
+1) **Letta/MemGPT-style control loop**
+- keep a small “core” always in context (persona + key user facts)
+- everything else is out-of-context and retrieved via tools
+- memory writes are explicit tool calls (append/replace/insert), persisted, then re-injected next turn
+
+2) **Hindsight-style memory substrate**
+- separate what’s observed vs what’s believed vs what’s summarized
+- support retain/recall/reflect
+- confidence-bearing opinions that can evolve with evidence
+- entity-aware retrieval + temporal queries (even without full knowledge graphs)
+
+## Proposed architecture (Markdown source-of-truth + derived index)
+
+### Canonical store (git-friendly)
+
+Keep `~/clawd` as canonical human-readable memory.
+
+Suggested workspace layout:
+
+```
+~/clawd/
+  memory.md                    # small: durable facts + preferences (core-ish)
+  memory/
+    YYYY-MM-DD.md              # daily log (append; narrative)
+  bank/                        # “typed” memory pages (stable, reviewable)
+    world.md                   # objective facts about the world
+    experience.md              # what the agent did (first-person)
+    opinions.md                # subjective prefs/judgments + confidence + evidence pointers
+    entities/
+      Peter.md
+      The-Castle.md
+      warelay.md
+      ...
+```
+
+Notes:
+- **Daily log stays daily log**. No need to turn it into JSON.
+- The `bank/` files are **curated**, produced by reflection jobs, and can still be edited by hand.
+- `memory.md` remains “small + core-ish”: the things you want Clawd to see every session.
+
+### Derived store (machine recall)
+
+Add a derived index under the workspace (not necessarily git tracked):
+
+```
+~/clawd/.memory/index.sqlite
+```
+
+Back it with:
+- SQLite schema for facts + entity links + opinion metadata
+- SQLite **FTS5** for lexical recall (fast, tiny, offline)
+- optional embeddings table for semantic recall (still offline)
+
+The index is always **rebuildable from Markdown**.
+
+## Retain / Recall / Reflect (operational loop)
+
+### Retain: normalize daily logs into “facts”
+
+Hindsight’s key insight that matters here: store **narrative, self-contained facts**, not tiny snippets.
+
+Practical rule for `memory/YYYY-MM-DD.md`:
+- at end of day (or during), add a `## Retain` section with 2–5 bullets that are:
+  - narrative (cross-turn context preserved)
+  - self-contained (standalone makes sense later)
+  - tagged with type + entity mentions
+
+Example:
+
+```
+## Retain
+- W @Peter: Currently in Marrakech (Nov 27–Dec 1, 2025) for Andy’s birthday.
+- B @warelay: I fixed the Baileys WS crash by wrapping connection.update handlers in try/catch (see memory/2025-11-27.md).
+- O(c=0.95) @Peter: Prefers concise replies (<1500 chars) on WhatsApp; long content goes into files.
+```
+
+Minimal parsing:
+- Type prefix: `W` (world), `B` (experience/biographical), `O` (opinion), `S` (observation/summary; usually generated)
+- Entities: `@Peter`, `@warelay`, etc (slugs map to `bank/entities/*.md`)
+- Opinion confidence: `O(c=0.0..1.0)` optional
+
+If you don’t want authors to think about it: the reflect job can infer these bullets from the rest of the log, but having an explicit `## Retain` section is the easiest “quality lever”.
+
+### Recall: queries over the derived index
+
+Recall should support:
+- **lexical**: “find exact terms / names / commands” (FTS5)
+- **entity**: “tell me about X” (entity pages + entity-linked facts)
+- **temporal**: “what happened around Nov 27” / “since last week”
+- **opinion**: “what does Peter prefer?” (with confidence + evidence)
+
+Return format should be agent-friendly and cite sources:
+- `kind` (`world|experience|opinion|observation`)
+- `timestamp` (source day, or extracted time range if present)
+- `entities` (`["Peter","warelay"]`)
+- `content` (the narrative fact)
+- `source` (`memory/2025-11-27.md#L12` etc)
+
+### Reflect: produce stable pages + update beliefs
+
+Reflection is a scheduled job (daily or heartbeat `ultrathink`) that:
+- updates `bank/entities/*.md` from recent facts (entity summaries)
+- updates `bank/opinions.md` confidence based on reinforcement/contradiction
+- optionally proposes edits to `memory.md` (“core-ish” durable facts)
+
+Opinion evolution (simple, explainable):
+- each opinion has:
+  - statement
+  - confidence `c ∈ [0,1]`
+  - last_updated
+  - evidence links (supporting + contradicting fact IDs)
+- when new facts arrive:
+  - find candidate opinions by entity overlap + similarity (FTS first, embeddings later)
+  - update confidence by small deltas; big jumps require strong contradiction + repeated evidence
+
+## CLI integration: standalone vs deep integration
+
+Recommendation: **deep integration in Clawdis**, but keep a separable core library.
+
+### Why integrate into Clawdis?
+- Clawdis already knows:
+  - the workspace path (`inbound.workspace`)
+  - the session model + heartbeats
+  - logging + troubleshooting patterns
+- You want the agent itself to call the tools:
+  - `clawdis memory recall "…" --k 25 --since 30d`
+  - `clawdis memory reflect --since 7d`
+
+### Why still split a library?
+- keep memory logic testable without gateway/runtime
+- reuse from other contexts (local scripts, future desktop app, etc.)
+
+Shape:
+- `src/memory/*` (library-ish core; pure functions + sqlite adapter)
+- `src/commands/memory/*.ts` (CLI glue)
+
+## “S-Collide” / SuCo: when to use it (research)
+
+If “S-Collide” refers to **SuCo (Subspace Collision)**: it’s an ANN retrieval approach that targets strong recall/latency tradeoffs by using learned/structured collisions in subspaces (paper: arXiv 2411.14754, 2024).
+
+Pragmatic take for `~/clawd`:
+- **don’t start** with SuCo.
+- start with SQLite FTS + (optional) simple embeddings; you’ll get most UX wins immediately.
+- consider SuCo/HNSW/ScaNN-class solutions only once:
+  - corpus is big (tens/hundreds of thousands of chunks)
+  - brute-force embedding search becomes too slow
+  - recall quality is meaningfully bottlenecked by lexical search
+
+Offline-friendly alternatives (in increasing complexity):
+- SQLite FTS5 + metadata filters (zero ML)
+- Embeddings + brute force (works surprisingly far if chunk count is low)
+- HNSW index (common, robust; needs a library binding)
+- SuCo (research-grade; attractive if there’s a solid implementation you can embed)
+
+Open question:
+- what’s the **best** offline embedding model for “personal assistant memory” on your machines (MacBook + Castle)?
+  - if you already have Ollama: embed with a local model; otherwise ship a small embedding model in the toolchain.
+
+## Implementation plan (phased, shippable)
+
+### Phase 0: workspace conventions (no code)
+- add `bank/` files + entity pages
+- add `## Retain` convention to daily logs
+
+### Phase 1: `clawdis memory index|recall` (FTS-only)
+- parse Markdown (`memory/*.md`, `bank/*.md`) into chunks
+- write to SQLite: `facts`, `entities`, `fact_entities`, `opinions`
+- FTS5 table over `facts.content`
+- `recall` returns citations (path + line) + trimmed content budget
+
+### Phase 2: entity summaries + opinion tracking
+- `reflect` updates `bank/entities/*.md`
+- opinion confidence updates with evidence pointers (no embeddings required yet)
+
+### Phase 3: semantic recall (offline embeddings)
+- compute embeddings during indexing (incremental)
+- retrieval = `hybrid(FTS, vector)` with simple fusion
+
+### Phase 4: “graph-ish” traversal (still simple)
+- entity links enable multi-hop: “related to Peter via warelay”
+- optional: “topic” nodes, lightweight edges (not a full KG)
+
+## References
+
+- Letta / MemGPT concepts: “core memory blocks” + “archival memory” + tool-driven self-editing memory.
+- Hindsight Technical Report: “retain / recall / reflect”, four-network memory, narrative fact extraction, opinion confidence evolution.
+- SuCo: arXiv 2411.14754 (2024): “Subspace Collision” approximate nearest neighbor retrieval.
+