feat: warn on weak model tiers

2026-01-16 09:33:59 +00:00
parent 384028e12e
commit d0c986c4f0
4 changed files with 93 additions and 0 deletions
--- a/docs/gateway/security.md
+++ b/docs/gateway/security.md
@@ -161,6 +161,15 @@ Even with strong system prompts, **prompt injection is not solved**. What helps
 - Run sensitive tool execution in a sandbox; keep secrets out of the agent’s reachable filesystem.
 - **Model choice matters:** older/legacy models can be less robust against prompt injection and tool misuse. Prefer modern, instruction-hardened models for any bot with tools. We recommend Anthropic Opus 4.5 because it’s quite good at recognizing prompt injections (see [“A step forward on safety”](https://www.anthropic.com/news/claude-opus-4-5)).

+### Model strength (security note)
+
+Prompt injection resistance is **not** uniform across model tiers. Smaller/cheaper models are generally more susceptible to tool misuse and instruction hijacking, especially under adversarial prompts.
+
+Recommendations:
+- **Use the latest generation, best-tier model** for any bot that can run tools or touch files/networks.
+- **Avoid weaker tiers** (for example, Sonnet or Haiku) for tool-enabled agents or untrusted inboxes.
+- If you must use a smaller model, **reduce blast radius** (read-only tools, strong sandboxing, minimal filesystem access, strict allowlists).
+
 ## Reasoning & verbose output in groups

 `/reasoning` and `/verbose` can expose internal reasoning or tool output that