docs: harden local model guidance

2026-01-12 17:10:56 +00:00
parent 05ac67c520
commit 1b2c1545a0
2 changed files with 4 additions and 4 deletions
--- a/docs/gateway/local-models.md
+++ b/docs/gateway/local-models.md
@@ -7,9 +7,9 @@ read_when:
 ---
 # Local models

-Local is doable, but Clawdbot expects large context + strong defenses against prompt injection. Small cards truncate context and leak safety. Aim high: **≥2 maxed-out Mac Studios or equivalent GPU rig (~$30k+)**. A single **24 GB** GPU works only for lighter prompts with higher latency.
+Local is doable, but Clawdbot expects large context + strong defenses against prompt injection. Small cards truncate context and leak safety. Aim high: **≥2 maxed-out Mac Studios or equivalent GPU rig (~$30k+)**. A single **24 GB** GPU works only for lighter prompts with higher latency. Use the **largest / full-size model variant you can run**; aggressively quantized or “small” checkpoints raise prompt-injection risk (see [Security](/gateway/security)).

-## Recommended: LM Studio + MiniMax M2.1 (Responses API)
+## Recommended: LM Studio + MiniMax M2.1 (Responses API, full-size)

 Best current local stack. Load MiniMax M2.1 in LM Studio, enable the local server (default `http://127.0.0.1:1234`), and use Responses API to keep reasoning separate from final text.

@@ -50,7 +50,7 @@ Best current local stack. Load MiniMax M2.1 in LM Studio, enable the local serve

 **Setup checklist**
 - Install LM Studio: https://lmstudio.ai
- In LM Studio, download MiniMax M2.1, start the server, confirm `http://127.0.0.1:1234/v1/models` lists it.
+- In LM Studio, download the **largest MiniMax M2.1 build available** (avoid “small”/heavily quantized variants), start the server, confirm `http://127.0.0.1:1234/v1/models` lists it.
 - Keep the model loaded; cold-load adds startup latency.
 - Adjust `contextWindow`/`maxTokens` if your LM Studio build differs.
 - For WhatsApp, stick to Responses API so only final text is sent.