From 760a83d256d63b55905681cadac60bebca684b73 Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Tue, 23 Dec 2025 13:36:43 +0100 Subject: [PATCH] docs: add offline memory system proposal --- docs/research/memory.md | 228 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 228 insertions(+) create mode 100644 docs/research/memory.md diff --git a/docs/research/memory.md b/docs/research/memory.md new file mode 100644 index 000000000..5d64aa2ac --- /dev/null +++ b/docs/research/memory.md @@ -0,0 +1,228 @@ +--- +summary: "Proposal + research notes: offline memory system for Clawd workspaces (Markdown source-of-truth + derived index)" +read_when: + - Designing workspace memory (~/clawd) beyond daily Markdown logs + - Deciding: standalone CLI vs deep Clawdis integration + - Adding offline recall + reflection (retain/recall/reflect) +--- + +# Workspace Memory v2 (offline): proposal + research + +Target: Clawd-style workspace (`inbound.workspace`, default `~/clawd`) where “memory” is stored as one Markdown file per day (`memory/YYYY-MM-DD.md`) plus a small set of stable files (e.g. `memory.md`, `SOUL.md`). + +This doc proposes an **offline-first** memory architecture that keeps Markdown as the canonical, reviewable source of truth, but adds **structured recall** (search, entity summaries, confidence updates) via a derived index. + +## Why change? + +The current setup (one file per day) is excellent for: +- “append-only” journaling +- human editing +- git-backed durability + auditability +- low-friction capture (“just write it down”) + +It’s weak for: +- high-recall retrieval (“what did we decide about X?”, “last time we tried Y?”) +- entity-centric answers (“tell me about Alice / The Castle / warelay”) without rereading many files +- opinion/preference stability (and evidence when it changes) +- time constraints (“what was true during Nov 2025?”) and conflict resolution + +## Design goals + +- **Offline**: works without network; can run on laptop/Castle; no cloud dependency. +- **Explainable**: retrieved items should be attributable (file + location) and separable from inference. +- **Low ceremony**: daily logging stays Markdown, no heavy schema work. +- **Incremental**: v1 is useful with FTS only; semantic/vector and graphs are optional upgrades. +- **Agent-friendly**: makes “recall within token budgets” easy (return small bundles of facts). + +## North star model (Hindsight × Letta) + +Two pieces to blend: + +1) **Letta/MemGPT-style control loop** +- keep a small “core” always in context (persona + key user facts) +- everything else is out-of-context and retrieved via tools +- memory writes are explicit tool calls (append/replace/insert), persisted, then re-injected next turn + +2) **Hindsight-style memory substrate** +- separate what’s observed vs what’s believed vs what’s summarized +- support retain/recall/reflect +- confidence-bearing opinions that can evolve with evidence +- entity-aware retrieval + temporal queries (even without full knowledge graphs) + +## Proposed architecture (Markdown source-of-truth + derived index) + +### Canonical store (git-friendly) + +Keep `~/clawd` as canonical human-readable memory. + +Suggested workspace layout: + +``` +~/clawd/ + memory.md # small: durable facts + preferences (core-ish) + memory/ + YYYY-MM-DD.md # daily log (append; narrative) + bank/ # “typed” memory pages (stable, reviewable) + world.md # objective facts about the world + experience.md # what the agent did (first-person) + opinions.md # subjective prefs/judgments + confidence + evidence pointers + entities/ + Peter.md + The-Castle.md + warelay.md + ... +``` + +Notes: +- **Daily log stays daily log**. No need to turn it into JSON. +- The `bank/` files are **curated**, produced by reflection jobs, and can still be edited by hand. +- `memory.md` remains “small + core-ish”: the things you want Clawd to see every session. + +### Derived store (machine recall) + +Add a derived index under the workspace (not necessarily git tracked): + +``` +~/clawd/.memory/index.sqlite +``` + +Back it with: +- SQLite schema for facts + entity links + opinion metadata +- SQLite **FTS5** for lexical recall (fast, tiny, offline) +- optional embeddings table for semantic recall (still offline) + +The index is always **rebuildable from Markdown**. + +## Retain / Recall / Reflect (operational loop) + +### Retain: normalize daily logs into “facts” + +Hindsight’s key insight that matters here: store **narrative, self-contained facts**, not tiny snippets. + +Practical rule for `memory/YYYY-MM-DD.md`: +- at end of day (or during), add a `## Retain` section with 2–5 bullets that are: + - narrative (cross-turn context preserved) + - self-contained (standalone makes sense later) + - tagged with type + entity mentions + +Example: + +``` +## Retain +- W @Peter: Currently in Marrakech (Nov 27–Dec 1, 2025) for Andy’s birthday. +- B @warelay: I fixed the Baileys WS crash by wrapping connection.update handlers in try/catch (see memory/2025-11-27.md). +- O(c=0.95) @Peter: Prefers concise replies (<1500 chars) on WhatsApp; long content goes into files. +``` + +Minimal parsing: +- Type prefix: `W` (world), `B` (experience/biographical), `O` (opinion), `S` (observation/summary; usually generated) +- Entities: `@Peter`, `@warelay`, etc (slugs map to `bank/entities/*.md`) +- Opinion confidence: `O(c=0.0..1.0)` optional + +If you don’t want authors to think about it: the reflect job can infer these bullets from the rest of the log, but having an explicit `## Retain` section is the easiest “quality lever”. + +### Recall: queries over the derived index + +Recall should support: +- **lexical**: “find exact terms / names / commands” (FTS5) +- **entity**: “tell me about X” (entity pages + entity-linked facts) +- **temporal**: “what happened around Nov 27” / “since last week” +- **opinion**: “what does Peter prefer?” (with confidence + evidence) + +Return format should be agent-friendly and cite sources: +- `kind` (`world|experience|opinion|observation`) +- `timestamp` (source day, or extracted time range if present) +- `entities` (`["Peter","warelay"]`) +- `content` (the narrative fact) +- `source` (`memory/2025-11-27.md#L12` etc) + +### Reflect: produce stable pages + update beliefs + +Reflection is a scheduled job (daily or heartbeat `ultrathink`) that: +- updates `bank/entities/*.md` from recent facts (entity summaries) +- updates `bank/opinions.md` confidence based on reinforcement/contradiction +- optionally proposes edits to `memory.md` (“core-ish” durable facts) + +Opinion evolution (simple, explainable): +- each opinion has: + - statement + - confidence `c ∈ [0,1]` + - last_updated + - evidence links (supporting + contradicting fact IDs) +- when new facts arrive: + - find candidate opinions by entity overlap + similarity (FTS first, embeddings later) + - update confidence by small deltas; big jumps require strong contradiction + repeated evidence + +## CLI integration: standalone vs deep integration + +Recommendation: **deep integration in Clawdis**, but keep a separable core library. + +### Why integrate into Clawdis? +- Clawdis already knows: + - the workspace path (`inbound.workspace`) + - the session model + heartbeats + - logging + troubleshooting patterns +- You want the agent itself to call the tools: + - `clawdis memory recall "…" --k 25 --since 30d` + - `clawdis memory reflect --since 7d` + +### Why still split a library? +- keep memory logic testable without gateway/runtime +- reuse from other contexts (local scripts, future desktop app, etc.) + +Shape: +- `src/memory/*` (library-ish core; pure functions + sqlite adapter) +- `src/commands/memory/*.ts` (CLI glue) + +## “S-Collide” / SuCo: when to use it (research) + +If “S-Collide” refers to **SuCo (Subspace Collision)**: it’s an ANN retrieval approach that targets strong recall/latency tradeoffs by using learned/structured collisions in subspaces (paper: arXiv 2411.14754, 2024). + +Pragmatic take for `~/clawd`: +- **don’t start** with SuCo. +- start with SQLite FTS + (optional) simple embeddings; you’ll get most UX wins immediately. +- consider SuCo/HNSW/ScaNN-class solutions only once: + - corpus is big (tens/hundreds of thousands of chunks) + - brute-force embedding search becomes too slow + - recall quality is meaningfully bottlenecked by lexical search + +Offline-friendly alternatives (in increasing complexity): +- SQLite FTS5 + metadata filters (zero ML) +- Embeddings + brute force (works surprisingly far if chunk count is low) +- HNSW index (common, robust; needs a library binding) +- SuCo (research-grade; attractive if there’s a solid implementation you can embed) + +Open question: +- what’s the **best** offline embedding model for “personal assistant memory” on your machines (MacBook + Castle)? + - if you already have Ollama: embed with a local model; otherwise ship a small embedding model in the toolchain. + +## Implementation plan (phased, shippable) + +### Phase 0: workspace conventions (no code) +- add `bank/` files + entity pages +- add `## Retain` convention to daily logs + +### Phase 1: `clawdis memory index|recall` (FTS-only) +- parse Markdown (`memory/*.md`, `bank/*.md`) into chunks +- write to SQLite: `facts`, `entities`, `fact_entities`, `opinions` +- FTS5 table over `facts.content` +- `recall` returns citations (path + line) + trimmed content budget + +### Phase 2: entity summaries + opinion tracking +- `reflect` updates `bank/entities/*.md` +- opinion confidence updates with evidence pointers (no embeddings required yet) + +### Phase 3: semantic recall (offline embeddings) +- compute embeddings during indexing (incremental) +- retrieval = `hybrid(FTS, vector)` with simple fusion + +### Phase 4: “graph-ish” traversal (still simple) +- entity links enable multi-hop: “related to Peter via warelay” +- optional: “topic” nodes, lightweight edges (not a full KG) + +## References + +- Letta / MemGPT concepts: “core memory blocks” + “archival memory” + tool-driven self-editing memory. +- Hindsight Technical Report: “retain / recall / reflect”, four-network memory, narrative fact extraction, opinion confidence evolution. +- SuCo: arXiv 2411.14754 (2024): “Subspace Collision” approximate nearest neighbor retrieval. +