let5see/clawdbot

Fork 0

Files

Peter Steinberger 0fb2777c6d feat: add memory embedding cache

2026-01-18 01:47:58 +00:00

10 KiB

Raw Blame History

summary, read_when

summary

read_when

How Clawdbot memory works (workspace files + automatic memory flush)

You want the memory file layout and workflow

You want to tune the automatic pre-compaction memory flush

Memory

Clawdbot memory is plain Markdown in the agent workspace. The files are the source of truth; the model only "remembers" what gets written to disk.

Memory files (Markdown)

The default workspace layout uses two memory layers:

memory/YYYY-MM-DD.md
- Daily log (append-only).
- Read today + yesterday at session start.
MEMORY.md (optional)
- Curated long-term memory.
- Only load in the main, private session (never in group contexts).

These files live under the workspace (agents.defaults.workspace, default ~/clawd). See Agent workspace for the full layout.

When to write memory

Decisions, preferences, and durable facts go to MEMORY.md.
Day-to-day notes and running context go to memory/YYYY-MM-DD.md.
If someone says "remember this," write it down (do not keep it in RAM).

Automatic memory flush (pre-compaction ping)

When a session is close to auto-compaction, Clawdbot triggers a silent, agentic turn that reminds the model to write durable memory before the context is compacted. The default prompts explicitly say the model may reply, but usually NO_REPLY is the correct response so the user never sees this turn.

This is controlled by agents.defaults.compaction.memoryFlush:

{
  agents: {
    defaults: {
      compaction: {
        reserveTokensFloor: 20000,
        memoryFlush: {
          enabled: true,
          softThresholdTokens: 4000,
          systemPrompt: "Session nearing compaction. Store durable memories now.",
          prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store."
        }
      }
    }
  }
}

Details:

Soft threshold: flush triggers when the session token estimate crosses contextWindow - reserveTokensFloor - softThresholdTokens.
Silent by default: prompts include NO_REPLY so nothing is delivered.
Two prompts: a user prompt plus a system prompt append the reminder.
One flush per compaction cycle (tracked in sessions.json).
Workspace must be writable: if the session runs sandboxed with workspaceAccess: "ro" or "none", the flush is skipped.

For the full compaction lifecycle, see Session management + compaction.

Vector memory search

Clawdbot can build a small vector index over MEMORY.md and memory/*.md so semantic queries can find related notes even when wording differs.

Defaults:

Enabled by default.
Watches memory files for changes (debounced).
Uses remote embeddings (OpenAI) unless configured for local.
Local mode uses node-llama-cpp and may require pnpm approve-builds.
Uses sqlite-vec (when available) to accelerate vector search inside SQLite.

Remote embeddings require an API key for the embedding provider. By default this is OpenAI (OPENAI_API_KEY or models.providers.openai.apiKey). Codex OAuth only covers chat/completions and does not satisfy embeddings for memory search. When using a custom OpenAI-compatible endpoint, set memorySearch.remote.apiKey (and optional memorySearch.remote.headers).

If you want to use a custom OpenAI-compatible endpoint (like Gemini, OpenRouter, or a proxy), you can use the remote configuration:

agents: {
  defaults: {
    memorySearch: {
      provider: "openai",
      model: "text-embedding-3-small",
      remote: {
        baseUrl: "https://generativelanguage.googleapis.com/v1beta/openai/",
        apiKey: "YOUR_GEMINI_API_KEY",
        headers: { "X-Custom-Header": "value" }
      }
    }
  }
}

If you don't want to set an API key, use memorySearch.provider = "local" or set memorySearch.fallback = "none".

Batch indexing (OpenAI only):

Enabled by default for OpenAI embeddings. Set agents.defaults.memorySearch.remote.batch.enabled = false to disable.
Default behavior waits for batch completion; tune remote.batch.wait, remote.batch.pollIntervalMs, and remote.batch.timeoutMinutes if needed.
Set remote.batch.concurrency to control how many batch jobs we submit in parallel (default: 2).
Batch mode currently applies only when memorySearch.provider = "openai" and uses your OpenAI API key.

Why OpenAI batch is fast + cheap:

For large backfills, OpenAI is typically the fastest option we support because we can submit many embedding requests in a single batch job and let OpenAI process them asynchronously.
OpenAI offers discounted pricing for Batch API workloads, so large indexing runs are usually cheaper than sending the same requests synchronously.
See the OpenAI Batch API docs and pricing for details:
- https://platform.openai.com/docs/api-reference/batch
- https://platform.openai.com/pricing

Config example:

agents: {
  defaults: {
    memorySearch: {
      provider: "openai",
      model: "text-embedding-3-small",
      fallback: "openai",
      remote: {
        batch: { enabled: true, concurrency: 2 }
      },
      sync: { watch: true }
    }
  }
}

Tools:

memory_search — returns snippets with file + line ranges.
memory_get — read memory file content by path.

Local mode:

Set agents.defaults.memorySearch.provider = "local".
Provide agents.defaults.memorySearch.local.modelPath (GGUF or hf: URI).
Optional: set agents.defaults.memorySearch.fallback = "none" to avoid remote fallback.

How the memory tools work

memory_search semantically searches Markdown chunks (~400 token target, 80-token overlap) from MEMORY.md + memory/**/*.md. It returns snippet text (capped ~700 chars), file path, line range, score, provider/model, and whether we fell back from local → remote embeddings. No full file payload is returned.
memory_get reads a specific memory Markdown file (workspace-relative), optionally from a starting line and for N lines. Paths outside MEMORY.md / memory/ are rejected.
Both tools are enabled only when memorySearch.enabled resolves true for the agent.

What gets indexed (and when)

File type: Markdown only (MEMORY.md, memory/**/*.md).
Index storage: per-agent SQLite at ~/.clawdbot/memory/<agentId>.sqlite (configurable via agents.defaults.memorySearch.store.path, supports {agentId} token).
Freshness: watcher on MEMORY.md + memory/ marks the index dirty (debounce 1.5s). Sync runs on session start, on first search when dirty, and optionally on an interval.
Reindex triggers: the index stores the embedding provider/model + endpoint fingerprint + chunking params. If any of those change, Clawdbot automatically resets and reindexes the entire store.

Embedding cache

Clawdbot can cache chunk embeddings in SQLite so reindexing and frequent updates (especially session transcripts) don't re-embed unchanged text.

Config:

agents: {
  defaults: {
    memorySearch: {
      cache: {
        enabled: true,
        maxEntries: 50000
      }
    }
  }
}

Session memory search (experimental)

You can optionally index session transcripts and surface them via memory_search. This is gated behind an experimental flag.

agents: {
  defaults: {
    memorySearch: {
      experimental: { sessionMemory: true },
      sources: ["memory", "sessions"]
    }
  }
}

Notes:

Session indexing is opt-in (off by default).
Session updates are debounced and indexed lazily on the next memory_search (or manual clawdbot memory index).
Results still include snippets only; memory_get remains limited to memory files.
Session indexing is isolated per agent (only that agent’s session logs are indexed).
Session logs live on disk (~/.clawdbot/agents/<agentId>/sessions/*.jsonl). Any process/user with filesystem access can read them, so treat disk access as the trust boundary. For stricter isolation, run agents under separate OS users or hosts.

SQLite vector acceleration (sqlite-vec)

When the sqlite-vec extension is available, Clawdbot stores embeddings in a SQLite virtual table (vec0) and performs vector distance queries in the database. This keeps search fast without loading every embedding into JS.

Configuration (optional):

agents: {
  defaults: {
    memorySearch: {
      store: {
        vector: {
          enabled: true,
          extensionPath: "/path/to/sqlite-vec"
        }
      }
    }
  }
}

Notes:

enabled defaults to true; when disabled, search falls back to in-process cosine similarity over stored embeddings.
If the sqlite-vec extension is missing or fails to load, Clawdbot logs the error and continues with the JS fallback (no vector table).
extensionPath overrides the bundled sqlite-vec path (useful for custom builds or non-standard install locations).

Local embedding auto-download

Default local embedding model: hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf (~0.6 GB).
When memorySearch.provider = "local", node-llama-cpp resolves modelPath; if the GGUF is missing it auto-downloads to the cache (or local.modelCacheDir if set), then loads it. Downloads resume on retry.
Native build requirement: run pnpm approve-builds, pick node-llama-cpp, then pnpm rebuild node-llama-cpp.
Fallback: if local setup fails and memorySearch.fallback = "openai", we automatically switch to remote embeddings (openai/text-embedding-3-small unless overridden) and record the reason.

Custom OpenAI-compatible endpoint example

agents: {
  defaults: {
    memorySearch: {
      provider: "openai",
      model: "text-embedding-3-small",
      remote: {
        baseUrl: "https://api.example.com/v1/",
        apiKey: "YOUR_REMOTE_API_KEY",
        headers: {
          "X-Organization": "org-id",
          "X-Project": "project-id"
        }
      }
    }
  }
}

Notes:

remote.* takes precedence over models.providers.openai.*.
remote.headers merge with OpenAI headers; remote wins on key conflicts. Omit remote.headers to use the OpenAI defaults.

10 KiB Raw Blame History Unescape Escape

Memory

Memory files (Markdown)

When to write memory

Automatic memory flush (pre-compaction ping)

Vector memory search

How the memory tools work

What gets indexed (and when)

Embedding cache

Session memory search (experimental)

SQLite vector acceleration (sqlite-vec)

Local embedding auto-download

Custom OpenAI-compatible endpoint example

10 KiB

Raw Blame History