369 lines
14 KiB
Markdown
369 lines
14 KiB
Markdown
---
|
||
summary: "How Clawdbot memory works (workspace files + automatic memory flush)"
|
||
read_when:
|
||
- You want the memory file layout and workflow
|
||
- You want to tune the automatic pre-compaction memory flush
|
||
---
|
||
# Memory
|
||
|
||
Clawdbot memory is **plain Markdown in the agent workspace**. The files are the
|
||
source of truth; the model only "remembers" what gets written to disk.
|
||
|
||
Memory search tools are provided by the active memory plugin (default:
|
||
`memory-core`). Disable memory plugins with `plugins.slots.memory = "none"`.
|
||
|
||
## Memory files (Markdown)
|
||
|
||
The default workspace layout uses two memory layers:
|
||
|
||
- `memory/YYYY-MM-DD.md`
|
||
- Daily log (append-only).
|
||
- Read today + yesterday at session start.
|
||
- `MEMORY.md` (optional)
|
||
- Curated long-term memory.
|
||
- **Only load in the main, private session** (never in group contexts).
|
||
|
||
These files live under the workspace (`agents.defaults.workspace`, default
|
||
`~/clawd`). See [Agent workspace](/concepts/agent-workspace) for the full layout.
|
||
|
||
## When to write memory
|
||
|
||
- Decisions, preferences, and durable facts go to `MEMORY.md`.
|
||
- Day-to-day notes and running context go to `memory/YYYY-MM-DD.md`.
|
||
- If someone says "remember this," write it down (do not keep it in RAM).
|
||
|
||
## Automatic memory flush (pre-compaction ping)
|
||
|
||
When a session is **close to auto-compaction**, Clawdbot triggers a **silent,
|
||
agentic turn** that reminds the model to write durable memory **before** the
|
||
context is compacted. The default prompts explicitly say the model *may reply*,
|
||
but usually `NO_REPLY` is the correct response so the user never sees this turn.
|
||
|
||
This is controlled by `agents.defaults.compaction.memoryFlush`:
|
||
|
||
```json5
|
||
{
|
||
agents: {
|
||
defaults: {
|
||
compaction: {
|
||
reserveTokensFloor: 20000,
|
||
memoryFlush: {
|
||
enabled: true,
|
||
softThresholdTokens: 4000,
|
||
systemPrompt: "Session nearing compaction. Store durable memories now.",
|
||
prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store."
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Details:
|
||
- **Soft threshold**: flush triggers when the session token estimate crosses
|
||
`contextWindow - reserveTokensFloor - softThresholdTokens`.
|
||
- **Silent** by default: prompts include `NO_REPLY` so nothing is delivered.
|
||
- **Two prompts**: a user prompt plus a system prompt append the reminder.
|
||
- **One flush per compaction cycle** (tracked in `sessions.json`).
|
||
- **Workspace must be writable**: if the session runs sandboxed with
|
||
`workspaceAccess: "ro"` or `"none"`, the flush is skipped.
|
||
|
||
For the full compaction lifecycle, see
|
||
[Session management + compaction](/reference/session-management-compaction).
|
||
|
||
## Vector memory search
|
||
|
||
Clawdbot can build a small vector index over `MEMORY.md` and `memory/*.md` so
|
||
semantic queries can find related notes even when wording differs.
|
||
|
||
Defaults:
|
||
- Enabled by default.
|
||
- Watches memory files for changes (debounced).
|
||
- Uses remote embeddings by default. If `memorySearch.provider` is not set, Clawdbot auto-selects:
|
||
1. `local` if a `memorySearch.local.modelPath` is configured and the file exists.
|
||
2. `openai` if an OpenAI key can be resolved.
|
||
3. `gemini` if a Gemini key can be resolved.
|
||
4. Otherwise memory search stays disabled until configured.
|
||
- Local mode uses node-llama-cpp and may require `pnpm approve-builds`.
|
||
- Uses sqlite-vec (when available) to accelerate vector search inside SQLite.
|
||
|
||
Remote embeddings **require** an API key for the embedding provider. Clawdbot
|
||
resolves keys from auth profiles, `models.providers.*.apiKey`, or environment
|
||
variables. Codex OAuth only covers chat/completions and does **not** satisfy
|
||
embeddings for memory search. For Gemini, use `GEMINI_API_KEY` or
|
||
`models.providers.google.apiKey`. When using a custom OpenAI-compatible endpoint,
|
||
set `memorySearch.remote.apiKey` (and optional `memorySearch.remote.headers`).
|
||
|
||
### Gemini embeddings (native)
|
||
|
||
Set the provider to `gemini` to use the Gemini embeddings API directly:
|
||
|
||
```json5
|
||
agents: {
|
||
defaults: {
|
||
memorySearch: {
|
||
provider: "gemini",
|
||
model: "gemini-embedding-001",
|
||
remote: {
|
||
apiKey: "YOUR_GEMINI_API_KEY"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Notes:
|
||
- `remote.baseUrl` is optional (defaults to the Gemini API base URL).
|
||
- `remote.headers` lets you add extra headers if needed.
|
||
- Default model: `gemini-embedding-001`.
|
||
|
||
If you want to use a **custom OpenAI-compatible endpoint** (OpenRouter, vLLM, or a proxy),
|
||
you can use the `remote` configuration with the OpenAI provider:
|
||
|
||
```json5
|
||
agents: {
|
||
defaults: {
|
||
memorySearch: {
|
||
provider: "openai",
|
||
model: "text-embedding-3-small",
|
||
remote: {
|
||
baseUrl: "https://api.example.com/v1/",
|
||
apiKey: "YOUR_OPENAI_COMPAT_API_KEY",
|
||
headers: { "X-Custom-Header": "value" }
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
If you don't want to set an API key, use `memorySearch.provider = "local"` or set
|
||
`memorySearch.fallback = "none"`.
|
||
|
||
Fallbacks:
|
||
- `memorySearch.fallback` can be `openai`, `gemini`, `local`, or `none`.
|
||
- The fallback provider is only used when the primary embedding provider fails.
|
||
|
||
Batch indexing (OpenAI + Gemini):
|
||
- Enabled by default for OpenAI and Gemini embeddings. Set `agents.defaults.memorySearch.remote.batch.enabled = false` to disable.
|
||
- Default behavior waits for batch completion; tune `remote.batch.wait`, `remote.batch.pollIntervalMs`, and `remote.batch.timeoutMinutes` if needed.
|
||
- Set `remote.batch.concurrency` to control how many batch jobs we submit in parallel (default: 2).
|
||
- Batch mode applies when `memorySearch.provider = "openai"` or `"gemini"` and uses the corresponding API key.
|
||
- Gemini batch jobs use the async embeddings batch endpoint and require Gemini Batch API availability.
|
||
|
||
Why OpenAI batch is fast + cheap:
|
||
- For large backfills, OpenAI is typically the fastest option we support because we can submit many embedding requests in a single batch job and let OpenAI process them asynchronously.
|
||
- OpenAI offers discounted pricing for Batch API workloads, so large indexing runs are usually cheaper than sending the same requests synchronously.
|
||
- See the OpenAI Batch API docs and pricing for details:
|
||
- https://platform.openai.com/docs/api-reference/batch
|
||
- https://platform.openai.com/pricing
|
||
|
||
Config example:
|
||
|
||
```json5
|
||
agents: {
|
||
defaults: {
|
||
memorySearch: {
|
||
provider: "openai",
|
||
model: "text-embedding-3-small",
|
||
fallback: "openai",
|
||
remote: {
|
||
batch: { enabled: true, concurrency: 2 }
|
||
},
|
||
sync: { watch: true }
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Tools:
|
||
- `memory_search` — returns snippets with file + line ranges.
|
||
- `memory_get` — read memory file content by path.
|
||
|
||
Local mode:
|
||
- Set `agents.defaults.memorySearch.provider = "local"`.
|
||
- Provide `agents.defaults.memorySearch.local.modelPath` (GGUF or `hf:` URI).
|
||
- Optional: set `agents.defaults.memorySearch.fallback = "none"` to avoid remote fallback.
|
||
|
||
### How the memory tools work
|
||
|
||
- `memory_search` semantically searches Markdown chunks (~400 token target, 80-token overlap) from `MEMORY.md` + `memory/**/*.md`. It returns snippet text (capped ~700 chars), file path, line range, score, provider/model, and whether we fell back from local → remote embeddings. No full file payload is returned.
|
||
- `memory_get` reads a specific memory Markdown file (workspace-relative), optionally from a starting line and for N lines. Paths outside `MEMORY.md` / `memory/` are rejected.
|
||
- Both tools are enabled only when `memorySearch.enabled` resolves true for the agent.
|
||
|
||
### What gets indexed (and when)
|
||
|
||
- File type: Markdown only (`MEMORY.md`, `memory/**/*.md`).
|
||
- Index storage: per-agent SQLite at `~/.clawdbot/memory/<agentId>.sqlite` (configurable via `agents.defaults.memorySearch.store.path`, supports `{agentId}` token).
|
||
- Freshness: watcher on `MEMORY.md` + `memory/` marks the index dirty (debounce 1.5s). Sync runs on session start, on first search when dirty, and optionally on an interval.
|
||
- Reindex triggers: the index stores the embedding **provider/model + endpoint fingerprint + chunking params**. If any of those change, Clawdbot automatically resets and reindexes the entire store.
|
||
|
||
### Hybrid search (BM25 + vector)
|
||
|
||
When enabled, Clawdbot combines:
|
||
- **Vector similarity** (semantic match, wording can differ)
|
||
- **BM25 keyword relevance** (exact tokens like IDs, env vars, code symbols)
|
||
|
||
If full-text search is unavailable on your platform, Clawdbot falls back to vector-only search.
|
||
|
||
#### Why hybrid?
|
||
|
||
Vector search is great at “this means the same thing”:
|
||
- “Mac Studio gateway host” vs “the machine running the gateway”
|
||
- “debounce file updates” vs “avoid indexing on every write”
|
||
|
||
But it can be weak at exact, high-signal tokens:
|
||
- IDs (`a828e60`, `b3b9895a…`)
|
||
- code symbols (`memorySearch.query.hybrid`)
|
||
- error strings (“sqlite-vec unavailable”)
|
||
|
||
BM25 (full-text) is the opposite: strong at exact tokens, weaker at paraphrases.
|
||
Hybrid search is the pragmatic middle ground: **use both retrieval signals** so you get
|
||
good results for both “natural language” queries and “needle in a haystack” queries.
|
||
|
||
#### How we merge results (the current design)
|
||
|
||
Implementation sketch:
|
||
|
||
1) Retrieve a candidate pool from both sides:
|
||
- **Vector**: top `maxResults * candidateMultiplier` by cosine similarity.
|
||
- **BM25**: top `maxResults * candidateMultiplier` by FTS5 BM25 rank (lower is better).
|
||
|
||
2) Convert BM25 rank into a 0..1-ish score:
|
||
- `textScore = 1 / (1 + max(0, bm25Rank))`
|
||
|
||
3) Union candidates by chunk id and compute a weighted score:
|
||
- `finalScore = vectorWeight * vectorScore + textWeight * textScore`
|
||
|
||
Notes:
|
||
- `vectorWeight` + `textWeight` is normalized to 1.0 in config resolution, so weights behave as percentages.
|
||
- If embeddings are unavailable (or the provider returns a zero-vector), we still run BM25 and return keyword matches.
|
||
- If FTS5 can’t be created, we keep vector-only search (no hard failure).
|
||
|
||
This isn’t “IR-theory perfect”, but it’s simple, fast, and tends to improve recall/precision on real notes.
|
||
If we want to get fancier later, common next steps are Reciprocal Rank Fusion (RRF) or score normalization
|
||
(min/max or z-score) before mixing.
|
||
|
||
Config:
|
||
|
||
```json5
|
||
agents: {
|
||
defaults: {
|
||
memorySearch: {
|
||
query: {
|
||
hybrid: {
|
||
enabled: true,
|
||
vectorWeight: 0.7,
|
||
textWeight: 0.3,
|
||
candidateMultiplier: 4
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### Embedding cache
|
||
|
||
Clawdbot can cache **chunk embeddings** in SQLite so reindexing and frequent updates (especially session transcripts) don't re-embed unchanged text.
|
||
|
||
Config:
|
||
|
||
```json5
|
||
agents: {
|
||
defaults: {
|
||
memorySearch: {
|
||
cache: {
|
||
enabled: true,
|
||
maxEntries: 50000
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### Session memory search (experimental)
|
||
|
||
You can optionally index **session transcripts** and surface them via `memory_search`.
|
||
This is gated behind an experimental flag.
|
||
|
||
```json5
|
||
agents: {
|
||
defaults: {
|
||
memorySearch: {
|
||
experimental: { sessionMemory: true },
|
||
sources: ["memory", "sessions"]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Notes:
|
||
- Session indexing is **opt-in** (off by default).
|
||
- Session updates are debounced and indexed lazily on the next `memory_search` (or manual `clawdbot memory index`).
|
||
- Results still include snippets only; `memory_get` remains limited to memory files.
|
||
- Session indexing is isolated per agent (only that agent’s session logs are indexed).
|
||
- Session logs live on disk (`~/.clawdbot/agents/<agentId>/sessions/*.jsonl`). Any process/user with filesystem access can read them, so treat disk access as the trust boundary. For stricter isolation, run agents under separate OS users or hosts.
|
||
|
||
### SQLite vector acceleration (sqlite-vec)
|
||
|
||
When the sqlite-vec extension is available, Clawdbot stores embeddings in a
|
||
SQLite virtual table (`vec0`) and performs vector distance queries in the
|
||
database. This keeps search fast without loading every embedding into JS.
|
||
|
||
Configuration (optional):
|
||
|
||
```json5
|
||
agents: {
|
||
defaults: {
|
||
memorySearch: {
|
||
store: {
|
||
vector: {
|
||
enabled: true,
|
||
extensionPath: "/path/to/sqlite-vec"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Notes:
|
||
- `enabled` defaults to true; when disabled, search falls back to in-process
|
||
cosine similarity over stored embeddings.
|
||
- If the sqlite-vec extension is missing or fails to load, Clawdbot logs the
|
||
error and continues with the JS fallback (no vector table).
|
||
- `extensionPath` overrides the bundled sqlite-vec path (useful for custom builds
|
||
or non-standard install locations).
|
||
|
||
### Local embedding auto-download
|
||
|
||
- Default local embedding model: `hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf` (~0.6 GB).
|
||
- When `memorySearch.provider = "local"`, `node-llama-cpp` resolves `modelPath`; if the GGUF is missing it **auto-downloads** to the cache (or `local.modelCacheDir` if set), then loads it. Downloads resume on retry.
|
||
- Native build requirement: run `pnpm approve-builds`, pick `node-llama-cpp`, then `pnpm rebuild node-llama-cpp`.
|
||
- Fallback: if local setup fails and `memorySearch.fallback = "openai"`, we automatically switch to remote embeddings (`openai/text-embedding-3-small` unless overridden) and record the reason.
|
||
|
||
### Custom OpenAI-compatible endpoint example
|
||
|
||
```json5
|
||
agents: {
|
||
defaults: {
|
||
memorySearch: {
|
||
provider: "openai",
|
||
model: "text-embedding-3-small",
|
||
remote: {
|
||
baseUrl: "https://api.example.com/v1/",
|
||
apiKey: "YOUR_REMOTE_API_KEY",
|
||
headers: {
|
||
"X-Organization": "org-id",
|
||
"X-Project": "project-id"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Notes:
|
||
- `remote.*` takes precedence over `models.providers.openai.*`.
|
||
- `remote.headers` merge with OpenAI headers; remote wins on key conflicts. Omit `remote.headers` to use the OpenAI defaults.
|