docs: expand memory hybrid search explainer

2026-01-18 03:09:39 +00:00
parent 072a13f3b2
commit b5c023044b
1 changed files with 38 additions and 0 deletions
--- a/docs/concepts/memory.md
+++ b/docs/concepts/memory.md
@@ -172,6 +172,44 @@ When enabled, Clawdbot combines:

 If full-text search is unavailable on your platform, Clawdbot falls back to vector-only search.

+#### Why hybrid?
+
+Vector search is great at “this means the same thing”:
+- “Mac Studio gateway host” vs “the machine running the gateway”
+- “debounce file updates” vs “avoid indexing on every write”
+
+But it can be weak at exact, high-signal tokens:
+- IDs (`a828e60`, `b3b9895a…`)
+- code symbols (`memorySearch.query.hybrid`)
+- error strings (“sqlite-vec unavailable”)
+
+BM25 (full-text) is the opposite: strong at exact tokens, weaker at paraphrases.
+Hybrid search is the pragmatic middle ground: **use both retrieval signals** so you get
+good results for both “natural language” queries and “needle in a haystack” queries.
+
+#### How we merge results (the current design)
+
+Implementation sketch:
+
+1) Retrieve a candidate pool from both sides:
+- **Vector**: top `maxResults * candidateMultiplier` by cosine similarity.
+- **BM25**: top `maxResults * candidateMultiplier` by FTS5 BM25 rank (lower is better).
+
+2) Convert BM25 rank into a 0..1-ish score:
+- `textScore = 1 / (1 + max(0, bm25Rank))`
+
+3) Union candidates by chunk id and compute a weighted score:
+- `finalScore = vectorWeight * vectorScore + textWeight * textScore`
+
+Notes:
+- `vectorWeight` + `textWeight` is normalized to 1.0 in config resolution, so weights behave as percentages.
+- If embeddings are unavailable (or the provider returns a zero-vector), we still run BM25 and return keyword matches.
+- If FTS5 can’t be created, we keep vector-only search (no hard failure).
+
+This isn’t “IR-theory perfect”, but it’s simple, fast, and tends to improve recall/precision on real notes.
+If we want to get fancier later, common next steps are Reciprocal Rank Fusion (RRF) or score normalization
+(min/max or z-score) before mixing.
+
 Config:

 ```json5