test(gateway): add OpenResponses parity E2E tests

- Add schema validation tests for input_image, input_file, client tools - Add buildAgentPrompt tests for turn-based tool flow
2026-01-19 12:43:17 +01:00
parent a5afe7bc2b
commit 4f02c74dca
7 changed files with 769 additions and 0 deletions
--- a/.agent/.DS_Store
+++ b/.agent/.DS_Store
--- a/.agent/workflows/update_clawdbot.md
+++ b/.agent/workflows/update_clawdbot.md
@@ -0,0 +1,366 @@
+---
+description: Update Clawdbot from upstream when branch has diverged (ahead/behind)
+---
+
+# Clawdbot Upstream Sync Workflow
+
+Use this workflow when your fork has diverged from upstream (e.g., "18 commits ahead, 29 commits behind").
+
+## Quick Reference
+
+```bash
+# Check divergence status
+git fetch upstream && git rev-list --left-right --count main...upstream/main
+
+# Full sync (rebase preferred)
+git fetch upstream && git rebase upstream/main && pnpm install && pnpm build && ./scripts/restart-mac.sh
+
+# Check for Swift 6.2 issues after sync
+grep -r "FileManager\.default\|Thread\.isMainThread" src/ apps/ --include="*.swift"
+```
+
+---
+
+## Step 1: Assess Divergence
+
+```bash
+git fetch upstream
+git log --oneline --left-right main...upstream/main | head -20
+```
+
+This shows:
+- `<` = your local commits (ahead)
+- `>` = upstream commits you're missing (behind)
+
+**Decision point:**
+- Few local commits, many upstream → **Rebase** (cleaner history)
+- Many local commits or shared branch → **Merge** (preserves history)
+
+---
+
+## Step 2A: Rebase Strategy (Preferred)
+
+Replays your commits on top of upstream. Results in linear history.
+
+```bash
+# Ensure working tree is clean
+git status
+
+# Rebase onto upstream
+git rebase upstream/main
+```
+
+### Handling Rebase Conflicts
+
+```bash
+# When conflicts occur:
+# 1. Fix conflicts in the listed files
+# 2. Stage resolved files
+git add <resolved-files>
+
+# 3. Continue rebase
+git rebase --continue
+
+# If a commit is no longer needed (already in upstream):
+git rebase --skip
+
+# To abort and return to original state:
+git rebase --abort
+```
+
+### Common Conflict Patterns
+
+| File | Resolution |
+|------|------------|
+| `package.json` | Take upstream deps, keep local scripts if needed |
+| `pnpm-lock.yaml` | Accept upstream, regenerate with `pnpm install` |
+| `*.patch` files | Usually take upstream version |
+| Source files | Merge logic carefully, prefer upstream structure |
+
+---
+
+## Step 2B: Merge Strategy (Alternative)
+
+Preserves all history with a merge commit.
+
+```bash
+git merge upstream/main --no-edit
+```
+
+Resolve conflicts same as rebase, then:
+```bash
+git add <resolved-files>
+git commit
+```
+
+---
+
+## Step 3: Rebuild Everything
+
+After sync completes:
+
+```bash
+# Install dependencies (regenerates lock if needed)
+pnpm install
+
+# Build TypeScript
+pnpm build
+
+# Build UI assets
+pnpm ui:build
+
+# Run diagnostics
+pnpm clawdbot doctor
+```
+
+---
+
+## Step 4: Rebuild macOS App
+
+```bash
+# Full rebuild, sign, and launch
+./scripts/restart-mac.sh
+
+# Or just package without restart
+pnpm mac:package
+```
+
+### Install to /Applications
+
+```bash
+# Kill running app
+pkill -x "Clawdbot" || true
+
+# Move old version
+mv /Applications/Clawdbot.app /tmp/Clawdbot-backup.app
+
+# Install new build
+cp -R dist/Clawdbot.app /Applications/
+
+# Launch
+open /Applications/Clawdbot.app
+```
+
+---
+
+## Step 4A: Verify macOS App & Agent
+
+After rebuilding the macOS app, always verify it works correctly:
+
+```bash
+# Check gateway health
+pnpm clawdbot health
+
+# Verify no zombie processes
+ps aux | grep -E "(clawdbot|gateway)" | grep -v grep
+
+# Test agent functionality by sending a verification message
+pnpm clawdbot agent --message "Verification: macOS app rebuild successful - agent is responding." --session-id YOUR_TELEGRAM_SESSION_ID
+
+# Confirm the message was received on Telegram
+# (Check your Telegram chat with the bot)
+```
+
+**Important:** Always wait for the Telegram verification message before proceeding. If the agent doesn't respond, troubleshoot the gateway or model configuration before pushing.
+
+---
+
+## Step 5: Handle Swift/macOS Build Issues (Common After Upstream Sync)
+
+Upstream updates may introduce Swift 6.2 / macOS 26 SDK incompatibilities. Use analyze-mode for systematic debugging:
+
+### Analyze-Mode Investigation
+```bash
+# Gather context with parallel agents
+morph-mcp_warpgrep_codebase_search search_string="Find deprecated FileManager.default and Thread.isMainThread usages in Swift files" repo_path="/Volumes/Main SSD/Developer/clawdis"
+morph-mcp_warpgrep_codebase_search search_string="Locate Peekaboo submodule and macOS app Swift files with concurrency issues" repo_path="/Volumes/Main SSD/Developer/clawdis"
+```
+
+### Common Swift 6.2 Fixes
+
+**FileManager.default Deprecation:**
+```bash
+# Search for deprecated usage
+grep -r "FileManager\.default" src/ apps/ --include="*.swift"
+
+# Replace with proper initialization
+# OLD: FileManager.default
+# NEW: FileManager()
+```
+
+**Thread.isMainThread Deprecation:**
+```bash
+# Search for deprecated usage
+grep -r "Thread\.isMainThread" src/ apps/ --include="*.swift"
+
+# Replace with modern concurrency check
+# OLD: Thread.isMainThread
+# NEW: await MainActor.run { ... } or DispatchQueue.main.sync { ... }
+```
+
+### Peekaboo Submodule Fixes
+```bash
+# Check Peekaboo for concurrency issues
+cd src/canvas-host/a2ui
+grep -r "Thread\.isMainThread\|FileManager\.default" . --include="*.swift"
+
+# Fix and rebuild submodule
+cd /Volumes/Main SSD/Developer/clawdis
+pnpm canvas:a2ui:bundle
+```
+
+### macOS App Concurrency Fixes
+```bash
+# Check macOS app for issues
+grep -r "Thread\.isMainThread\|FileManager\.default" apps/macos/ --include="*.swift"
+
+# Clean and rebuild after fixes
+cd apps/macos && rm -rf .build .swiftpm
+./scripts/restart-mac.sh
+```
+
+### Model Configuration Updates
+If upstream introduced new model configurations:
+```bash
+# Check for OpenRouter API key requirements
+grep -r "openrouter\|OPENROUTER" src/ --include="*.ts" --include="*.js"
+
+# Update clawdbot.json with fallback chains
+# Add model fallback configurations as needed
+```
+
+---
+
+## Step 6: Verify & Push
+
+```bash
+# Verify everything works
+pnpm clawdbot health
+pnpm test
+
+# Push (force required after rebase)
+git push origin main --force-with-lease
+
+# Or regular push after merge
+git push origin main
+```
+
+---
+
+## Troubleshooting
+
+### Build Fails After Sync
+
+```bash
+# Clean and rebuild
+rm -rf node_modules dist
+pnpm install
+pnpm build
+```
+
+### Type Errors (Bun/Node Incompatibility)
+
+Common issue: `fetch.preconnect` type mismatch. Fix by using `FetchLike` type instead of `typeof fetch`.
+
+### macOS App Crashes on Launch
+
+Usually resource bundle mismatch. Full rebuild required:
+```bash
+cd apps/macos && rm -rf .build .swiftpm
+./scripts/restart-mac.sh
+```
+
+### Patch Failures
+
+```bash
+# Check patch status
+pnpm install 2>&1 | grep -i patch
+
+# If patches fail, they may need updating for new dep versions
+# Check patches/ directory against package.json patchedDependencies
+```
+
+### Swift 6.2 / macOS 26 SDK Build Failures
+
+**Symptoms:** Build fails with deprecation warnings about `FileManager.default` or `Thread.isMainThread`
+
+**Search-Mode Investigation:**
+```bash
+# Exhaustive search for deprecated APIs
+morph-mcp_warpgrep_codebase_search search_string="Find all Swift files using deprecated FileManager.default or Thread.isMainThread" repo_path="/Volumes/Main SSD/Developer/clawdis"
+```
+
+**Quick Fix Commands:**
+```bash
+# Find all affected files
+find . -name "*.swift" -exec grep -l "FileManager\.default\|Thread\.isMainThread" {} \;
+
+# Replace FileManager.default with FileManager()
+find . -name "*.swift" -exec sed -i '' 's/FileManager\.default/FileManager()/g' {} \;
+
+# For Thread.isMainThread, need manual review of each usage
+grep -rn "Thread\.isMainThread" --include="*.swift" .
+```
+
+**Rebuild After Fixes:**
+```bash
+# Clean all build artifacts
+rm -rf apps/macos/.build apps/macos/.swiftpm
+rm -rf src/canvas-host/a2ui/.build
+
+# Rebuild Peekaboo bundle
+pnpm canvas:a2ui:bundle
+
+# Full macOS rebuild
+./scripts/restart-mac.sh
+```
+
+---
+
+## Automation Script
+
+Save as `scripts/sync-upstream.sh`:
+
+```bash
+#!/usr/bin/env bash
+set -euo pipefail
+
+echo "==> Fetching upstream..."
+git fetch upstream
+
+echo "==> Current divergence:"
+git rev-list --left-right --count main...upstream/main
+
+echo "==> Rebasing onto upstream/main..."
+git rebase upstream/main
+
+echo "==> Installing dependencies..."
+pnpm install
+
+echo "==> Building..."
+pnpm build
+pnpm ui:build
+
+echo "==> Running doctor..."
+pnpm clawdbot doctor
+
+echo "==> Rebuilding macOS app..."
+./scripts/restart-mac.sh
+
+echo "==> Verifying gateway health..."
+pnpm clawdbot health
+
+echo "==> Checking for Swift 6.2 compatibility issues..."
+if grep -r "FileManager\.default\|Thread\.isMainThread" src/ apps/ --include="*.swift" --quiet; then
+    echo "⚠️  Found potential Swift 6.2 deprecated API usage"
+    echo "   Run manual fixes or use analyze-mode investigation"
+else
+    echo "✅ No obvious Swift deprecation issues found"
+fi
+
+echo "==> Testing agent functionality..."
+# Note: Update YOUR_TELEGRAM_SESSION_ID with actual session ID
+pnpm clawdbot agent --message "Verification: Upstream sync and macOS rebuild completed successfully." --session-id YOUR_TELEGRAM_SESSION_ID || echo "Warning: Agent test failed - check Telegram for verification message"
+
+echo "==> Done! Check Telegram for verification message, then run 'git push --force-with-lease' when ready."
+```
--- a/.serena/.gitignore
+++ b/.serena/.gitignore
@@ -0,0 +1 @@
+/cache
--- a/.serena/cache/typescript/document_symbols.pkl
+++ b/.serena/cache/typescript/document_symbols.pkl
--- a/.serena/cache/typescript/raw_document_symbols.pkl
+++ b/.serena/cache/typescript/raw_document_symbols.pkl
--- a/.serena/project.yml
+++ b/.serena/project.yml
@@ -0,0 +1,87 @@
+# list of languages for which language servers are started; choose from:
+#   al               bash             clojure          cpp              csharp           csharp_omnisharp
+#   dart             elixir           elm              erlang           fortran          fsharp
+#   go               groovy           haskell          java             julia            kotlin
+#   lua              markdown         nix              pascal           perl             php
+#   powershell       python           python_jedi      r                rego             ruby
+#   ruby_solargraph  rust             scala            swift            terraform        toml
+#   typescript       typescript_vts   yaml             zig
+# Note:
+#   - For C, use cpp
+#   - For JavaScript, use typescript
+#   - For Free Pascal / Lazarus, use pascal
+# Special requirements:
+#   - csharp: Requires the presence of a .sln file in the project folder.
+#   - pascal: Requires Free Pascal Compiler (fpc) and optionally Lazarus.
+# When using multiple languages, the first language server that supports a given file will be used for that file.
+# The first language is the default language and the respective language server will be used as a fallback.
+# Note that when using the JetBrains backend, language servers are not used and this list is correspondingly ignored.
+languages:
+- typescript
+
+# the encoding used by text files in the project
+# For a list of possible encodings, see https://docs.python.org/3.11/library/codecs.html#standard-encodings
+encoding: "utf-8"
+
+# whether to use the project's gitignore file to ignore files
+# Added on 2025-04-07
+ignore_all_files_in_gitignore: true
+
+# list of additional paths to ignore
+# same syntax as gitignore, so you can use * and **
+# Was previously called `ignored_dirs`, please update your config if you are using that.
+# Added (renamed) on 2025-04-07
+ignored_paths: []
+
+# whether the project is in read-only mode
+# If set to true, all editing tools will be disabled and attempts to use them will result in an error
+# Added on 2025-04-18
+read_only: false
+
+# list of tool names to exclude. We recommend not excluding any tools, see the readme for more details.
+# Below is the complete list of tools for convenience.
+# To make sure you have the latest list of tools, and to view their descriptions, 
+# execute `uv run scripts/print_tool_overview.py`.
+#
+#  * `activate_project`: Activates a project by name.
+#  * `check_onboarding_performed`: Checks whether project onboarding was already performed.
+#  * `create_text_file`: Creates/overwrites a file in the project directory.
+#  * `delete_lines`: Deletes a range of lines within a file.
+#  * `delete_memory`: Deletes a memory from Serena's project-specific memory store.
+#  * `execute_shell_command`: Executes a shell command.
+#  * `find_referencing_code_snippets`: Finds code snippets in which the symbol at the given location is referenced.
+#  * `find_referencing_symbols`: Finds symbols that reference the symbol at the given location (optionally filtered by type).
+#  * `find_symbol`: Performs a global (or local) search for symbols with/containing a given name/substring (optionally filtered by type).
+#  * `get_current_config`: Prints the current configuration of the agent, including the active and available projects, tools, contexts, and modes.
+#  * `get_symbols_overview`: Gets an overview of the top-level symbols defined in a given file.
+#  * `initial_instructions`: Gets the initial instructions for the current project.
+#     Should only be used in settings where the system prompt cannot be set,
+#     e.g. in clients you have no control over, like Claude Desktop.
+#  * `insert_after_symbol`: Inserts content after the end of the definition of a given symbol.
+#  * `insert_at_line`: Inserts content at a given line in a file.
+#  * `insert_before_symbol`: Inserts content before the beginning of the definition of a given symbol.
+#  * `list_dir`: Lists files and directories in the given directory (optionally with recursion).
+#  * `list_memories`: Lists memories in Serena's project-specific memory store.
+#  * `onboarding`: Performs onboarding (identifying the project structure and essential tasks, e.g. for testing or building).
+#  * `prepare_for_new_conversation`: Provides instructions for preparing for a new conversation (in order to continue with the necessary context).
+#  * `read_file`: Reads a file within the project directory.
+#  * `read_memory`: Reads the memory with the given name from Serena's project-specific memory store.
+#  * `remove_project`: Removes a project from the Serena configuration.
+#  * `replace_lines`: Replaces a range of lines within a file with new content.
+#  * `replace_symbol_body`: Replaces the full definition of a symbol.
+#  * `restart_language_server`: Restarts the language server, may be necessary when edits not through Serena happen.
+#  * `search_for_pattern`: Performs a search for a pattern in the project.
+#  * `summarize_changes`: Provides instructions for summarizing the changes made to the codebase.
+#  * `switch_modes`: Activates modes by providing a list of their names
+#  * `think_about_collected_information`: Thinking tool for pondering the completeness of collected information.
+#  * `think_about_task_adherence`: Thinking tool for determining whether the agent is still on track with the current task.
+#  * `think_about_whether_you_are_done`: Thinking tool for determining whether the task is truly completed.
+#  * `write_memory`: Writes a named memory (for future reference) to Serena's project-specific memory store.
+excluded_tools: []
+
+# initial prompt for the project. It will always be given to the LLM upon activating the project
+# (contrary to the memories, which are loaded on demand).
+initial_prompt: ""
+
+project_name: "clawdbot"
+included_optional_tools: []
--- a/src/gateway/openresponses-parity.e2e.test.ts
+++ b/src/gateway/openresponses-parity.e2e.test.ts
@@ -0,0 +1,315 @@
+/**
+ * OpenResponses Feature Parity E2E Tests
+ *
+ * Tests for input_image, input_file, and client-side tools (Hosted Tools)
+ * support in the OpenResponses `/v1/responses` endpoint.
+ */
+
+import { describe, it, expect } from "vitest";
+
+describe("OpenResponses Feature Parity", () => {
+  describe("Schema Validation", () => {
+    it("should validate input_image with url source", async () => {
+      const { InputImageContentPartSchema } = await import("./open-responses.schema.js");
+
+      const validImage = {
+        type: "input_image" as const,
+        source: {
+          type: "url" as const,
+          url: "https://example.com/image.png",
+        },
+      };
+
+      const result = InputImageContentPartSchema.safeParse(validImage);
+      expect(result.success).toBe(true);
+    });
+
+    it("should validate input_image with base64 source", async () => {
+      const { InputImageContentPartSchema } = await import("./open-responses.schema.js");
+
+      const validImage = {
+        type: "input_image" as const,
+        source: {
+          type: "base64" as const,
+          media_type: "image/png" as const,
+          data: "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==",
+        },
+      };
+
+      const result = InputImageContentPartSchema.safeParse(validImage);
+      expect(result.success).toBe(true);
+    });
+
+    it("should reject input_image with invalid mime type", async () => {
+      const { InputImageContentPartSchema } = await import("./open-responses.schema.js");
+
+      const invalidImage = {
+        type: "input_image" as const,
+        source: {
+          type: "base64" as const,
+          media_type: "application/json" as const, // Not an image
+          data: "SGVsbG8gV29ybGQh",
+        },
+      };
+
+      const result = InputImageContentPartSchema.safeParse(invalidImage);
+      expect(result.success).toBe(false);
+    });
+
+    it("should validate input_file with url source", async () => {
+      const { InputFileContentPartSchema } = await import("./open-responses.schema.js");
+
+      const validFile = {
+        type: "input_file" as const,
+        source: {
+          type: "url" as const,
+          url: "https://example.com/document.txt",
+        },
+      };
+
+      const result = InputFileContentPartSchema.safeParse(validFile);
+      expect(result.success).toBe(true);
+    });
+
+    it("should validate input_file with base64 source", async () => {
+      const { InputFileContentPartSchema } = await import("./open-responses.schema.js");
+
+      const validFile = {
+        type: "input_file" as const,
+        source: {
+          type: "base64" as const,
+          media_type: "text/plain" as const,
+          data: "SGVsbG8gV29ybGQh",
+          filename: "hello.txt",
+        },
+      };
+
+      const result = InputFileContentPartSchema.safeParse(validFile);
+      expect(result.success).toBe(true);
+    });
+
+    it("should validate tool definition", async () => {
+      const { ToolDefinitionSchema } = await import("./open-responses.schema.js");
+
+      const validTool = {
+        type: "function" as const,
+        function: {
+          name: "get_weather",
+          description: "Get the current weather",
+          parameters: {
+            type: "object",
+            properties: {
+              location: { type: "string" },
+            },
+            required: ["location"],
+          },
+        },
+      };
+
+      const result = ToolDefinitionSchema.safeParse(validTool);
+      expect(result.success).toBe(true);
+    });
+
+    it("should reject tool definition without name", async () => {
+      const { ToolDefinitionSchema } = await import("./open-responses.schema.js");
+
+      const invalidTool = {
+        type: "function" as const,
+        function: {
+          name: "", // Empty name
+          description: "Get the current weather",
+        },
+      };
+
+      const result = ToolDefinitionSchema.safeParse(invalidTool);
+      expect(result.success).toBe(false);
+    });
+  });
+
+  describe("CreateResponseBody Schema", () => {
+    it("should validate request with input_image", async () => {
+      const { CreateResponseBodySchema } = await import("./open-responses.schema.js");
+
+      const validRequest = {
+        model: "claude-sonnet-4-20250514",
+        input: [
+          {
+            type: "message" as const,
+            role: "user" as const,
+            content: [
+              {
+                type: "input_image" as const,
+                source: {
+                  type: "url" as const,
+                  url: "https://example.com/photo.jpg",
+                },
+              },
+              {
+                type: "input_text" as const,
+                text: "What's in this image?",
+              },
+            ],
+          },
+        ],
+      };
+
+      const result = CreateResponseBodySchema.safeParse(validRequest);
+      expect(result.success).toBe(true);
+    });
+
+    it("should validate request with client tools", async () => {
+      const { CreateResponseBodySchema } = await import("./open-responses.schema.js");
+
+      const validRequest = {
+        model: "claude-sonnet-4-20250514",
+        input: [
+          {
+            type: "message" as const,
+            role: "user" as const,
+            content: "What's the weather?",
+          },
+        ],
+        tools: [
+          {
+            type: "function" as const,
+            function: {
+              name: "get_weather",
+              description: "Get weather for a location",
+              parameters: {
+                type: "object",
+                properties: {
+                  location: { type: "string" },
+                },
+                required: ["location"],
+              },
+            },
+          },
+        ],
+      };
+
+      const result = CreateResponseBodySchema.safeParse(validRequest);
+      expect(result.success).toBe(true);
+    });
+
+    it("should validate request with function_call_output for turn-based tools", async () => {
+      const { CreateResponseBodySchema } = await import("./open-responses.schema.js");
+
+      const validRequest = {
+        model: "claude-sonnet-4-20250514",
+        input: [
+          {
+            type: "function_call_output" as const,
+            call_id: "call_123",
+            output: '{"temperature": "72°F", "condition": "sunny"}',
+          },
+        ],
+      };
+
+      const result = CreateResponseBodySchema.safeParse(validRequest);
+      expect(result.success).toBe(true);
+    });
+
+    it("should validate complete turn-based tool flow", async () => {
+      const { CreateResponseBodySchema } = await import("./open-responses.schema.js");
+
+      const turn1Request = {
+        model: "claude-sonnet-4-20250514",
+        input: [
+          {
+            type: "message" as const,
+            role: "user" as const,
+            content: "What's the weather in San Francisco?",
+          },
+        ],
+        tools: [
+          {
+            type: "function" as const,
+            function: {
+              name: "get_weather",
+              description: "Get weather for a location",
+            },
+          },
+        ],
+      };
+
+      const turn1Result = CreateResponseBodySchema.safeParse(turn1Request);
+      expect(turn1Result.success).toBe(true);
+
+      // Turn 2: Client provides tool output
+      const turn2Request = {
+        model: "claude-sonnet-4-20250514",
+        input: [
+          {
+            type: "function_call_output" as const,
+            call_id: "call_123",
+            output: '{"temperature": "72°F", "condition": "sunny"}',
+          },
+        ],
+      };
+
+      const turn2Result = CreateResponseBodySchema.safeParse(turn2Request);
+      expect(turn2Result.success).toBe(true);
+    });
+  });
+
+  describe("Response Resource Schema", () => {
+    it("should validate response with function_call output", async () => {
+      const { OutputItemSchema } = await import("./open-responses.schema.js");
+
+      const functionCallOutput = {
+        type: "function_call" as const,
+        id: "msg_123",
+        call_id: "call_456",
+        name: "get_weather",
+        arguments: '{"location": "San Francisco"}',
+      };
+
+      const result = OutputItemSchema.safeParse(functionCallOutput);
+      expect(result.success).toBe(true);
+    });
+  });
+
+  describe("buildAgentPrompt", () => {
+    it("should convert function_call_output to tool entry", async () => {
+      const { buildAgentPrompt } = await import("./openresponses-http.js");
+
+      const result = buildAgentPrompt([
+        {
+          type: "function_call_output" as const,
+          call_id: "call_123",
+          output: '{"temperature": "72°F"}',
+        },
+      ]);
+
+      // When there's only a tool output (no history), returns just the body
+      expect(result.message).toBe('{"temperature": "72°F"}');
+    });
+
+    it("should handle mixed message and function_call_output items", async () => {
+      const { buildAgentPrompt } = await import("./openresponses-http.js");
+
+      const result = buildAgentPrompt([
+        {
+          type: "message" as const,
+          role: "user" as const,
+          content: "What's the weather?",
+        },
+        {
+          type: "function_call_output" as const,
+          call_id: "call_123",
+          output: '{"temperature": "72°F"}',
+        },
+        {
+          type: "message" as const,
+          role: "user" as const,
+          content: "Thanks!",
+        },
+      ]);
+
+      // Should include both user messages and tool output
+      expect(result.message).toContain("weather");
+      expect(result.message).toContain("72°F");
+      expect(result.message).toContain("Thanks");
+    });
+  });
+});