Files
clawdbot/docs/experiments/research/memory.md
2026-01-12 02:11:33 +00:00

211 lines
8.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
summary: "Research notes: offline memory system for Clawd workspaces (Markdown source-of-truth + derived index)"
read_when:
- Designing workspace memory (~/clawd) beyond daily Markdown logs
- Deciding: standalone CLI vs deep Clawdbot integration
- Adding offline recall + reflection (retain/recall/reflect)
---
# Workspace Memory v2 (offline): research notes
Target: Clawd-style workspace (`agents.defaults.workspace`, default `~/clawd`) where “memory” is stored as one Markdown file per day (`memory/YYYY-MM-DD.md`) plus a small set of stable files (e.g. `memory.md`, `SOUL.md`).
This doc proposes an **offline-first** memory architecture that keeps Markdown as the canonical, reviewable source of truth, but adds **structured recall** (search, entity summaries, confidence updates) via a derived index.
## Why change?
The current setup (one file per day) is excellent for:
- “append-only” journaling
- human editing
- git-backed durability + auditability
- low-friction capture (“just write it down”)
Its weak for:
- high-recall retrieval (“what did we decide about X?”, “last time we tried Y?”)
- entity-centric answers (“tell me about Alice / The Castle / warelay”) without rereading many files
- opinion/preference stability (and evidence when it changes)
- time constraints (“what was true during Nov 2025?”) and conflict resolution
## Design goals
- **Offline**: works without network; can run on laptop/Castle; no cloud dependency.
- **Explainable**: retrieved items should be attributable (file + location) and separable from inference.
- **Low ceremony**: daily logging stays Markdown, no heavy schema work.
- **Incremental**: v1 is useful with FTS only; semantic/vector and graphs are optional upgrades.
- **Agent-friendly**: makes “recall within token budgets” easy (return small bundles of facts).
## North star model (Hindsight × Letta)
Two pieces to blend:
1) **Letta/MemGPT-style control loop**
- keep a small “core” always in context (persona + key user facts)
- everything else is out-of-context and retrieved via tools
- memory writes are explicit tool calls (append/replace/insert), persisted, then re-injected next turn
2) **Hindsight-style memory substrate**
- separate whats observed vs whats believed vs whats summarized
- support retain/recall/reflect
- confidence-bearing opinions that can evolve with evidence
- entity-aware retrieval + temporal queries (even without full knowledge graphs)
## Proposed architecture (Markdown source-of-truth + derived index)
### Canonical store (git-friendly)
Keep `~/clawd` as canonical human-readable memory.
Suggested workspace layout:
```
~/clawd/
memory.md # small: durable facts + preferences (core-ish)
memory/
YYYY-MM-DD.md # daily log (append; narrative)
bank/ # “typed” memory pages (stable, reviewable)
world.md # objective facts about the world
experience.md # what the agent did (first-person)
opinions.md # subjective prefs/judgments + confidence + evidence pointers
entities/
Peter.md
The-Castle.md
warelay.md
...
```
Notes:
- **Daily log stays daily log**. No need to turn it into JSON.
- The `bank/` files are **curated**, produced by reflection jobs, and can still be edited by hand.
- `memory.md` remains “small + core-ish”: the things you want Clawd to see every session.
### Derived store (machine recall)
Add a derived index under the workspace (not necessarily git tracked):
```
~/clawd/.memory/index.sqlite
```
Back it with:
- SQLite schema for facts + entity links + opinion metadata
- SQLite **FTS5** for lexical recall (fast, tiny, offline)
- optional embeddings table for semantic recall (still offline)
The index is always **rebuildable from Markdown**.
## Retain / Recall / Reflect (operational loop)
### Retain: normalize daily logs into “facts”
Hindsights key insight that matters here: store **narrative, self-contained facts**, not tiny snippets.
Practical rule for `memory/YYYY-MM-DD.md`:
- at end of day (or during), add a `## Retain` section with 25 bullets that are:
- narrative (cross-turn context preserved)
- self-contained (standalone makes sense later)
- tagged with type + entity mentions
Example:
```
## Retain
- W @Peter: Currently in Marrakech (Nov 27Dec 1, 2025) for Andys birthday.
- B @warelay: I fixed the Baileys WS crash by wrapping connection.update handlers in try/catch (see memory/2025-11-27.md).
- O(c=0.95) @Peter: Prefers concise replies (<1500 chars) on WhatsApp; long content goes into files.
```
Minimal parsing:
- Type prefix: `W` (world), `B` (experience/biographical), `O` (opinion), `S` (observation/summary; usually generated)
- Entities: `@Peter`, `@warelay`, etc (slugs map to `bank/entities/*.md`)
- Opinion confidence: `O(c=0.0..1.0)` optional
If you dont want authors to think about it: the reflect job can infer these bullets from the rest of the log, but having an explicit `## Retain` section is the easiest “quality lever”.
### Recall: queries over the derived index
Recall should support:
- **lexical**: “find exact terms / names / commands” (FTS5)
- **entity**: “tell me about X” (entity pages + entity-linked facts)
- **temporal**: “what happened around Nov 27” / “since last week”
- **opinion**: “what does Peter prefer?” (with confidence + evidence)
Return format should be agent-friendly and cite sources:
- `kind` (`world|experience|opinion|observation`)
- `timestamp` (source day, or extracted time range if present)
- `entities` (`["Peter","warelay"]`)
- `content` (the narrative fact)
- `source` (`memory/2025-11-27.md#L12` etc)
### Reflect: produce stable pages + update beliefs
Reflection is a scheduled job (daily or heartbeat `ultrathink`) that:
- updates `bank/entities/*.md` from recent facts (entity summaries)
- updates `bank/opinions.md` confidence based on reinforcement/contradiction
- optionally proposes edits to `memory.md` (“core-ish” durable facts)
Opinion evolution (simple, explainable):
- each opinion has:
- statement
- confidence `c ∈ [0,1]`
- last_updated
- evidence links (supporting + contradicting fact IDs)
- when new facts arrive:
- find candidate opinions by entity overlap + similarity (FTS first, embeddings later)
- update confidence by small deltas; big jumps require strong contradiction + repeated evidence
## CLI integration: standalone vs deep integration
Recommendation: **deep integration in Clawdbot**, but keep a separable core library.
### Why integrate into Clawdbot?
- Clawdbot already knows:
- the workspace path (`agents.defaults.workspace`)
- the session model + heartbeats
- logging + troubleshooting patterns
- You want the agent itself to call the tools:
- `clawdbot memory recall "…" --k 25 --since 30d`
- `clawdbot memory reflect --since 7d`
### Why still split a library?
- keep memory logic testable without gateway/runtime
- reuse from other contexts (local scripts, future desktop app, etc.)
Shape:
The memory tooling is intended to be a small CLI + library layer, but this is exploratory only.
## “S-Collide” / SuCo: when to use it (research)
If “S-Collide” refers to **SuCo (Subspace Collision)**: its an ANN retrieval approach that targets strong recall/latency tradeoffs by using learned/structured collisions in subspaces (paper: arXiv 2411.14754, 2024).
Pragmatic take for `~/clawd`:
- **dont start** with SuCo.
- start with SQLite FTS + (optional) simple embeddings; youll get most UX wins immediately.
- consider SuCo/HNSW/ScaNN-class solutions only once:
- corpus is big (tens/hundreds of thousands of chunks)
- brute-force embedding search becomes too slow
- recall quality is meaningfully bottlenecked by lexical search
Offline-friendly alternatives (in increasing complexity):
- SQLite FTS5 + metadata filters (zero ML)
- Embeddings + brute force (works surprisingly far if chunk count is low)
- HNSW index (common, robust; needs a library binding)
- SuCo (research-grade; attractive if theres a solid implementation you can embed)
Open question:
- whats the **best** offline embedding model for “personal assistant memory” on your machines (laptop + desktop)?
- if you already have Ollama: embed with a local model; otherwise ship a small embedding model in the toolchain.
## Smallest useful pilot
If you want a minimal, still-useful version:
- Add `bank/` entity pages and a `## Retain` section in daily logs.
- Use SQLite FTS for recall with citations (path + line numbers).
- Add embeddings only if recall quality or scale demands it.
## References
- Letta / MemGPT concepts: “core memory blocks” + “archival memory” + tool-driven self-editing memory.
- Hindsight Technical Report: “retain / recall / reflect”, four-network memory, narrative fact extraction, opinion confidence evolution.
- SuCo: arXiv 2411.14754 (2024): “Subspace Collision” approximate nearest neighbor retrieval.