docs: clarify prompt injection guidance
This commit is contained in:
@@ -178,6 +178,20 @@ Even with strong system prompts, **prompt injection is not solved**. What helps
|
|||||||
- Run sensitive tool execution in a sandbox; keep secrets out of the agent’s reachable filesystem.
|
- Run sensitive tool execution in a sandbox; keep secrets out of the agent’s reachable filesystem.
|
||||||
- **Model choice matters:** older/legacy models can be less robust against prompt injection and tool misuse. Prefer modern, instruction-hardened models for any bot with tools. We recommend Anthropic Opus 4.5 because it’s quite good at recognizing prompt injections (see [“A step forward on safety”](https://www.anthropic.com/news/claude-opus-4-5)).
|
- **Model choice matters:** older/legacy models can be less robust against prompt injection and tool misuse. Prefer modern, instruction-hardened models for any bot with tools. We recommend Anthropic Opus 4.5 because it’s quite good at recognizing prompt injections (see [“A step forward on safety”](https://www.anthropic.com/news/claude-opus-4-5)).
|
||||||
|
|
||||||
|
### Prompt injection does not require public DMs
|
||||||
|
|
||||||
|
Even if **only you** can message the bot, prompt injection can still happen via
|
||||||
|
any **untrusted content** the bot reads (web search/fetch results, browser pages,
|
||||||
|
emails, docs, attachments, pasted logs/code). In other words: the sender is not
|
||||||
|
the only threat surface; the **content itself** can carry adversarial instructions.
|
||||||
|
|
||||||
|
When tools are enabled, the typical risk is exfiltrating context or triggering
|
||||||
|
tool calls. Reduce the blast radius by:
|
||||||
|
- Using a read-only or tool-disabled **reader agent** to summarize untrusted content,
|
||||||
|
then pass the summary to your main agent.
|
||||||
|
- Keeping `web_search` / `web_fetch` / `browser` off for tool-enabled agents unless needed.
|
||||||
|
- Enabling sandboxing and strict tool allowlists for any agent that touches untrusted input.
|
||||||
|
|
||||||
### Model strength (security note)
|
### Model strength (security note)
|
||||||
|
|
||||||
Prompt injection resistance is **not** uniform across model tiers. Smaller/cheaper models are generally more susceptible to tool misuse and instruction hijacking, especially under adversarial prompts.
|
Prompt injection resistance is **not** uniform across model tiers. Smaller/cheaper models are generally more susceptible to tool misuse and instruction hijacking, especially under adversarial prompts.
|
||||||
@@ -187,6 +201,7 @@ Recommendations:
|
|||||||
- **Avoid weaker tiers** (for example, Sonnet or Haiku) for tool-enabled agents or untrusted inboxes.
|
- **Avoid weaker tiers** (for example, Sonnet or Haiku) for tool-enabled agents or untrusted inboxes.
|
||||||
- If you must use a smaller model, **reduce blast radius** (read-only tools, strong sandboxing, minimal filesystem access, strict allowlists).
|
- If you must use a smaller model, **reduce blast radius** (read-only tools, strong sandboxing, minimal filesystem access, strict allowlists).
|
||||||
- When running small models, **enable sandboxing for all sessions** and **disable web_search/web_fetch/browser** unless inputs are tightly controlled.
|
- When running small models, **enable sandboxing for all sessions** and **disable web_search/web_fetch/browser** unless inputs are tightly controlled.
|
||||||
|
- For chat-only personal assistants with trusted input and no tools, smaller models are usually fine.
|
||||||
|
|
||||||
## Reasoning & verbose output in groups
|
## Reasoning & verbose output in groups
|
||||||
|
|
||||||
|
|||||||
@@ -117,6 +117,8 @@ Quick answers plus deeper troubleshooting for real-world setups (local dev, VPS,
|
|||||||
- [My skill generated an image/PDF, but nothing was sent](#my-skill-generated-an-imagepdf-but-nothing-was-sent)
|
- [My skill generated an image/PDF, but nothing was sent](#my-skill-generated-an-imagepdf-but-nothing-was-sent)
|
||||||
- [Security and access control](#security-and-access-control)
|
- [Security and access control](#security-and-access-control)
|
||||||
- [Is it safe to expose Clawdbot to inbound DMs?](#is-it-safe-to-expose-clawdbot-to-inbound-dms)
|
- [Is it safe to expose Clawdbot to inbound DMs?](#is-it-safe-to-expose-clawdbot-to-inbound-dms)
|
||||||
|
- [Is prompt injection only a concern for public bots?](#is-prompt-injection-only-a-concern-for-public-bots)
|
||||||
|
- [Can I use cheaper models for personal assistant tasks?](#can-i-use-cheaper-models-for-personal-assistant-tasks)
|
||||||
- [I ran `/start` in Telegram but didn’t get a pairing code](#i-ran-start-in-telegram-but-didnt-get-a-pairing-code)
|
- [I ran `/start` in Telegram but didn’t get a pairing code](#i-ran-start-in-telegram-but-didnt-get-a-pairing-code)
|
||||||
- [WhatsApp: will it message my contacts? How does pairing work?](#whatsapp-will-it-message-my-contacts-how-does-pairing-work)
|
- [WhatsApp: will it message my contacts? How does pairing work?](#whatsapp-will-it-message-my-contacts-how-does-pairing-work)
|
||||||
- [Chat commands, aborting tasks, and “it won’t stop”](#chat-commands-aborting-tasks-and-it-wont-stop)
|
- [Chat commands, aborting tasks, and “it won’t stop”](#chat-commands-aborting-tasks-and-it-wont-stop)
|
||||||
@@ -1539,6 +1541,28 @@ Treat inbound DMs as untrusted input. Defaults are designed to reduce risk:
|
|||||||
|
|
||||||
Run `clawdbot doctor` to surface risky DM policies.
|
Run `clawdbot doctor` to surface risky DM policies.
|
||||||
|
|
||||||
|
### Is prompt injection only a concern for public bots?
|
||||||
|
|
||||||
|
No. Prompt injection is about **untrusted content**, not just who can DM the bot.
|
||||||
|
If your assistant reads external content (web search/fetch, browser pages, emails,
|
||||||
|
docs, attachments, pasted logs), that content can include instructions that try
|
||||||
|
to hijack the model. This can happen even if **you are the only sender**.
|
||||||
|
|
||||||
|
The biggest risk is when tools are enabled: the model can be tricked into
|
||||||
|
exfiltrating context or calling tools on your behalf. Reduce the blast radius by:
|
||||||
|
- using a read-only or tool-disabled "reader" agent to summarize untrusted content
|
||||||
|
- keeping `web_search` / `web_fetch` / `browser` off for tool-enabled agents
|
||||||
|
- sandboxing and strict tool allowlists
|
||||||
|
|
||||||
|
Details: [Security](/gateway/security).
|
||||||
|
|
||||||
|
### Can I use cheaper models for personal assistant tasks?
|
||||||
|
|
||||||
|
Yes, **if** the agent is chat-only and the input is trusted. Smaller tiers are
|
||||||
|
more susceptible to instruction hijacking, so avoid them for tool-enabled agents
|
||||||
|
or when reading untrusted content. If you must use a smaller model, lock down
|
||||||
|
tools and run inside a sandbox. See [Security](/gateway/security).
|
||||||
|
|
||||||
### I ran `/start` in Telegram but didn’t get a pairing code
|
### I ran `/start` in Telegram but didn’t get a pairing code
|
||||||
|
|
||||||
Pairing codes are sent **only** when an unknown sender messages the bot and
|
Pairing codes are sent **only** when an unknown sender messages the bot and
|
||||||
|
|||||||
Reference in New Issue
Block a user