docs: detail memory tools and local models
This commit is contained in:
@@ -102,3 +102,22 @@ Local mode:
|
|||||||
- Set `agents.defaults.memorySearch.provider = "local"`.
|
- Set `agents.defaults.memorySearch.provider = "local"`.
|
||||||
- Provide `agents.defaults.memorySearch.local.modelPath` (GGUF or `hf:` URI).
|
- Provide `agents.defaults.memorySearch.local.modelPath` (GGUF or `hf:` URI).
|
||||||
- Optional: set `agents.defaults.memorySearch.fallback = "none"` to avoid remote fallback.
|
- Optional: set `agents.defaults.memorySearch.fallback = "none"` to avoid remote fallback.
|
||||||
|
|
||||||
|
### How the memory tools work
|
||||||
|
|
||||||
|
- `memory_search` semantically searches Markdown chunks (~400 token target, 80-token overlap) from `MEMORY.md` + `memory/**/*.md`. It returns snippet text (capped ~700 chars), file path, line range, score, provider/model, and whether we fell back from local → remote embeddings. No full file payload is returned.
|
||||||
|
- `memory_get` reads a specific memory Markdown file (workspace-relative), optionally from a starting line and for N lines. Paths outside `MEMORY.md` / `memory/` are rejected.
|
||||||
|
- Both tools are enabled only when `memorySearch.enabled` resolves true for the agent.
|
||||||
|
|
||||||
|
### What gets indexed (and when)
|
||||||
|
|
||||||
|
- File type: Markdown only (`MEMORY.md`, `memory/**/*.md`).
|
||||||
|
- Index storage: per-agent SQLite at `~/.clawdbot/state/memory/<agentId>.sqlite` (configurable via `agents.defaults.memorySearch.store.path`, supports `{agentId}` token).
|
||||||
|
- Freshness: watcher on `MEMORY.md` + `memory/` marks the index dirty (debounce 1.5s). Sync runs on session start, on first search when dirty, and optionally on an interval. Reindex triggers when embedding model/provider or chunk sizes change.
|
||||||
|
|
||||||
|
### Local embedding auto-download
|
||||||
|
|
||||||
|
- Default local embedding model: `hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf` (~0.6 GB).
|
||||||
|
- When `memorySearch.provider = "local"`, `node-llama-cpp` resolves `modelPath`; if the GGUF is missing it **auto-downloads** to the cache (or `local.modelCacheDir` if set), then loads it. Downloads resume on retry.
|
||||||
|
- Native build requirement: run `pnpm approve-builds`, pick `node-llama-cpp`, then `pnpm rebuild node-llama-cpp`.
|
||||||
|
- Fallback: if local setup fails and `memorySearch.fallback = "openai"`, we automatically switch to remote embeddings (`openai/text-embedding-3-small` unless overridden) and record the reason.
|
||||||
|
|||||||
@@ -55,6 +55,64 @@ Best current local stack. Load MiniMax M2.1 in LM Studio, enable the local serve
|
|||||||
- Adjust `contextWindow`/`maxTokens` if your LM Studio build differs.
|
- Adjust `contextWindow`/`maxTokens` if your LM Studio build differs.
|
||||||
- For WhatsApp, stick to Responses API so only final text is sent.
|
- For WhatsApp, stick to Responses API so only final text is sent.
|
||||||
|
|
||||||
|
## Model catalog (remote + local)
|
||||||
|
|
||||||
|
| Tier | Model ID | Context | Who downloads | Notes |
|
||||||
|
| --- | --- | --- | --- | --- |
|
||||||
|
| Hosted | `anthropic/claude-opus-4-5` | 200k | Provider | Latest Claude; keep as high-quality fallback. |
|
||||||
|
| Hosted | `anthropic/claude-sonnet-4-5` | 200k | Provider | Cheaper Claude; good default. |
|
||||||
|
| Hosted | `openai/gpt-4.1` | 128k | Provider | Latest GPT-4; strong tools + reasoning. |
|
||||||
|
| Hosted | `openai/gpt-4.1-mini` | 128k | Provider | Fast/cheap GPT-4 family; good fallback. |
|
||||||
|
| Local | `lmstudio/minimax-m2.1-gs32` | ~196k (build-dependent) | You (LM Studio UI) | Recommended local heavy model; keep loaded. |
|
||||||
|
| Local | Custom `vllm` / `litellm` model | server-defined | You (server) | Any OpenAI-compatible endpoint; align context settings. |
|
||||||
|
|
||||||
|
Keep hosted models configured even when running local; use `models.mode: "merge"` so fallbacks stay available.
|
||||||
|
|
||||||
|
### Hybrid config: hosted primary, local fallback
|
||||||
|
|
||||||
|
```json5
|
||||||
|
{
|
||||||
|
agents: {
|
||||||
|
defaults: {
|
||||||
|
model: {
|
||||||
|
primary: "anthropic/claude-sonnet-4-5",
|
||||||
|
fallbacks: ["lmstudio/minimax-m2.1-gs32", "openai/gpt-4.1-mini"]
|
||||||
|
},
|
||||||
|
models: {
|
||||||
|
"anthropic/claude-sonnet-4-5": { alias: "Sonnet" },
|
||||||
|
"lmstudio/minimax-m2.1-gs32": { alias: "MiniMax Local" },
|
||||||
|
"openai/gpt-4.1-mini": { alias: "GPT-4.1 mini" }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
models: {
|
||||||
|
mode: "merge",
|
||||||
|
providers: {
|
||||||
|
lmstudio: {
|
||||||
|
baseUrl: "http://127.0.0.1:1234/v1",
|
||||||
|
apiKey: "lmstudio",
|
||||||
|
api: "openai-responses",
|
||||||
|
models: [
|
||||||
|
{
|
||||||
|
id: "minimax-m2.1-gs32",
|
||||||
|
name: "MiniMax M2.1 GS32",
|
||||||
|
reasoning: false,
|
||||||
|
input: ["text"],
|
||||||
|
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
|
||||||
|
contextWindow: 196608,
|
||||||
|
maxTokens: 8192
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Local-first with hosted safety net
|
||||||
|
|
||||||
|
Swap the primary and fallback order; keep the same providers block and `models.mode: "merge"` so you can fall back to Sonnet or GPT-4.1 when the local box is down.
|
||||||
|
|
||||||
## Other OpenAI-compatible local proxies
|
## Other OpenAI-compatible local proxies
|
||||||
|
|
||||||
vLLM, LiteLLM, OAI-proxy, or custom gateways work if they expose an OpenAI-style `/v1` endpoint. Replace the provider block above with your endpoint and model ID:
|
vLLM, LiteLLM, OAI-proxy, or custom gateways work if they expose an OpenAI-style `/v1` endpoint. Replace the provider block above with your endpoint and model ID:
|
||||||
|
|||||||
Reference in New Issue
Block a user