feat: add image model config + tool

2026-01-04 19:35:00 +01:00
parent 0716a624a8
commit 78998dba9e
20 changed files with 856 additions and 144 deletions
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -431,6 +431,8 @@ Controls the embedded agent runtime (model/thinking/verbose/timeouts).
 (omit to show the full catalog).
 `modelAliases` adds short names for `/model` (alias -> provider/model).
 `modelFallbacks` lists ordered fallback models to try when the default fails.
+`imageModel` selects an image-capable model for the `image` tool.
+`imageModelFallbacks` lists ordered fallback image models for the `image` tool.

 ```json5
 {
@@ -448,6 +450,10 @@ Controls the embedded agent runtime (model/thinking/verbose/timeouts).
      "openrouter/deepseek/deepseek-r1:free",
      "openrouter/meta-llama/llama-3.3-70b-instruct:free"
    ],
+    imageModel: "openrouter/qwen/qwen-2.5-vl-72b-instruct:free",
+    imageModelFallbacks: [
+      "openrouter/google/gemini-2.0-flash-vision:free"
+    ],
    thinkingDefault: "low",
    verboseDefault: "off",
    elevatedDefault: "on",
--- a/docs/models.md
+++ b/docs/models.md
@@ -19,16 +19,22 @@ that prefers tool-call + image-capable models and maintains ordered fallbacks.
  - show default model + aliases + fallbacks + allowlist
 - `clawdbot models set <modelOrAlias>`
  - writes `agent.model` in config
+- `clawdbot models set-image <modelOrAlias>`
+  - writes `agent.imageModel` in config
 - `clawdbot models aliases list|add|remove`
  - writes `agent.modelAliases`
 - `clawdbot models fallbacks list|add|remove|clear`
  - writes `agent.modelFallbacks`
+- `clawdbot models image-fallbacks list|add|remove|clear`
+  - writes `agent.imageModelFallbacks`
 - `clawdbot models scan`
  - OpenRouter :free scan; probe tool-call + image; interactive selection

 ## Config changes

 - Add `agent.modelFallbacks: string[]` (ordered list of provider/model IDs).
+- Add `agent.imageModel?: string` (optional image-capable model for image tool).
+- Add `agent.imageModelFallbacks?: string[]` (ordered list for image tool).
 - Keep existing:
  - `agent.model` (default)
  - `agent.allowedModels` (list filter)
@@ -49,8 +55,8 @@ Probes (direct pi-ai complete)
  - Prompt includes 1x1 PNG; success if no "unsupported image" error.

 Scoring/selection
- Prefer models passing tool + image.
- Fallback to tool-only if no tool+image pass.
+- Prefer models passing tool + image for text/tool fallbacks.
+- Prefer image-only models for image tool fallback (even if tool probe fails).
 - Rank by: image ok, then lower tool latency, then larger context, then params.

 Interactive selection (TTY)
@@ -61,7 +67,9 @@ Interactive selection (TTY)

 Output
 - Writes `agent.modelFallbacks` ordered.
+- Writes `agent.imageModelFallbacks` ordered (image-capable models).
 - Optional `--set-default` to set `agent.model`.
+- Optional `--set-image` to set `agent.imageModel`.

 ## Runtime fallback

--- a/docs/tools.md
+++ b/docs/tools.md
@@ -101,6 +101,19 @@ Notes:
 - Videos return `FILE:<path>` (mp4).
 - Location returns a JSON payload (lat/lon/accuracy/timestamp).

+### `image`
+Analyze an image with the configured image model.
+
+Core parameters:
+- `image` (required path or URL)
+- `prompt` (optional; defaults to "Describe the image.")
+- `model` (optional override)
+- `maxBytesMb` (optional size cap)
+
+Notes:
+- Only available when `agent.imageModel` or `agent.imageModelFallbacks` is set.
+- Uses the image model directly (independent of the main chat model).
+
 ### `cron`
 Manage Gateway cron jobs and wakeups.