let5see/clawdbot

Fork 0

Files

Peter Steinberger 717a259056 docs: add local models guide

2026-01-12 16:50:37 +00:00

3.0 KiB

Raw Blame History

summary, read_when

summary

read_when

Run Clawdbot on local LLMs (LM Studio, vLLM, LiteLLM, custom OpenAI endpoints)

You want to serve models from your own GPU box

You are wiring LM Studio or an OpenAI-compatible proxy

You need the safest local model guidance

Local models

Local is doable, but Clawdbot expects large context + strong defenses against prompt injection. Small cards truncate context and leak safety. Aim high: ≥2 maxed-out Mac Studios or equivalent GPU rig (~$30k+). A single 24 GB GPU works only for lighter prompts with higher latency.

Recommended: LM Studio + MiniMax M2.1 (Responses API)

Best current local stack. Load MiniMax M2.1 in LM Studio, enable the local server (default http://127.0.0.1:1234), and use Responses API to keep reasoning separate from final text.

{
  agents: {
    defaults: {
      model: { primary: "lmstudio/minimax-m2.1-gs32" },
      models: {
        "anthropic/claude-opus-4-5": { alias: "Opus" },
        "lmstudio/minimax-m2.1-gs32": { alias: "Minimax" }
      }
    }
  },
  models: {
    mode: "merge",
    providers: {
      lmstudio: {
        baseUrl: "http://127.0.0.1:1234/v1",
        apiKey: "lmstudio",
        api: "openai-responses",
        models: [
          {
            id: "minimax-m2.1-gs32",
            name: "MiniMax M2.1 GS32",
            reasoning: false,
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 196608,
            maxTokens: 8192
          }
        ]
      }
    }
  }
}

Setup checklist

Install LM Studio: https://lmstudio.ai
In LM Studio, download MiniMax M2.1, start the server, confirm http://127.0.0.1:1234/v1/models lists it.
Keep the model loaded; cold-load adds startup latency.
Adjust contextWindow/maxTokens if your LM Studio build differs.
For WhatsApp, stick to Responses API so only final text is sent.

Other OpenAI-compatible local proxies

vLLM, LiteLLM, OAI-proxy, or custom gateways work if they expose an OpenAI-style /v1 endpoint. Replace the provider block above with your endpoint and model ID:

{
  models: {
    mode: "merge",
    providers: {
      local: {
        baseUrl: "http://127.0.0.1:8000/v1",
        apiKey: "sk-local",
        api: "openai-responses",
        models: [
          {
            id: "my-local-model",
            name: "Local Model",
            reasoning: false,
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 120000,
            maxTokens: 8192
          }
        ]
      }
    }
  }
}

Keep models.mode: "merge" so hosted models stay available as fallbacks.

Troubleshooting

Gateway can reach the proxy? curl http://127.0.0.1:1234/v1/models.
LM Studio model unloaded? Reload; cold start is a common “hanging” cause.
Context errors? Lower contextWindow or raise your server limit.
Safety: local models skip provider-side filters; keep agents narrow and compaction on to limit prompt injection blast radius.

3.0 KiB Raw Blame History

Local models

Recommended: LM Studio + MiniMax M2.1 (Responses API)

Other OpenAI-compatible local proxies

Troubleshooting

3.0 KiB

Raw Blame History