test(gateway): add OpenResponses parity E2E tests

- Add schema validation tests for input_image, input_file, client tools - Add buildAgentPrompt tests for turn-based tool flow
2026-01-19 12:43:17 +01:00
parent a5afe7bc2b
commit 4f02c74dca
7 changed files with 769 additions and 0 deletions
--- a/.agent/.DS_Store
+++ b/.agent/.DS_Store
--- a/.agent/workflows/update_clawdbot.md
+++ b/.agent/workflows/update_clawdbot.md
@@ -0,0 +1,366 @@
 ---
 description: Update Clawdbot from upstream when branch has diverged (ahead/behind)
 ---
 # Clawdbot Upstream Sync Workflow
 Use this workflow when your fork has diverged from upstream (e.g., "18 commits ahead, 29 commits behind").
 ## Quick Reference
 ```bash
 # Check divergence status
 git fetch upstream && git rev-list --left-right --count main...upstream/main
 # Full sync (rebase preferred)
 git fetch upstream && git rebase upstream/main && pnpm install && pnpm build && ./scripts/restart-mac.sh
 # Check for Swift 6.2 issues after sync
 grep -r "FileManager\.default\|Thread\.isMainThread" src/ apps/ --include="*.swift"
 ```
 ---
 ## Step 1: Assess Divergence
 ```bash
 git fetch upstream
 git log --oneline --left-right main...upstream/main | head -20
 ```
 This shows:
 - `<` = your local commits (ahead)
 - `>` = upstream commits you're missing (behind)
 **Decision point:**
 - Few local commits, many upstream → **Rebase** (cleaner history)
 - Many local commits or shared branch → **Merge** (preserves history)
 ---
 ## Step 2A: Rebase Strategy (Preferred)
 Replays your commits on top of upstream. Results in linear history.
 ```bash
 # Ensure working tree is clean
 git status
 # Rebase onto upstream
 git rebase upstream/main
 ```
 ### Handling Rebase Conflicts
 ```bash
 # When conflicts occur:
 # 1. Fix conflicts in the listed files
 # 2. Stage resolved files
 git add <resolved-files>
 # 3. Continue rebase
 git rebase --continue
 # If a commit is no longer needed (already in upstream):
 git rebase --skip
 # To abort and return to original state:
 git rebase --abort
 ```
 ### Common Conflict Patterns
 | File | Resolution |
 |------|------------|
 | `package.json` | Take upstream deps, keep local scripts if needed |
 | `pnpm-lock.yaml` | Accept upstream, regenerate with `pnpm install` |
 | `*.patch` files | Usually take upstream version |
 | Source files | Merge logic carefully, prefer upstream structure |
 ---
 ## Step 2B: Merge Strategy (Alternative)
 Preserves all history with a merge commit.
 ```bash
 git merge upstream/main --no-edit
 ```
 Resolve conflicts same as rebase, then:
 ```bash
 git add <resolved-files>
 git commit
 ```
 ---
 ## Step 3: Rebuild Everything
 After sync completes:
 ```bash
 # Install dependencies (regenerates lock if needed)
 pnpm install
 # Build TypeScript
 pnpm build
 # Build UI assets
 pnpm ui:build
 # Run diagnostics
 pnpm clawdbot doctor
 ```
 ---
 ## Step 4: Rebuild macOS App
 ```bash
 # Full rebuild, sign, and launch
 ./scripts/restart-mac.sh
 # Or just package without restart
 pnpm mac:package
 ```
 ### Install to /Applications
 ```bash
 # Kill running app
 pkill -x "Clawdbot" || true
 # Move old version
 mv /Applications/Clawdbot.app /tmp/Clawdbot-backup.app
 # Install new build
 cp -R dist/Clawdbot.app /Applications/
 # Launch
 open /Applications/Clawdbot.app
 ```
 ---
 ## Step 4A: Verify macOS App & Agent
 After rebuilding the macOS app, always verify it works correctly:
 ```bash
 # Check gateway health
 pnpm clawdbot health
 # Verify no zombie processes
 ps aux | grep -E "(clawdbot|gateway)" | grep -v grep
 # Test agent functionality by sending a verification message
 pnpm clawdbot agent --message "Verification: macOS app rebuild successful - agent is responding." --session-id YOUR_TELEGRAM_SESSION_ID
 # Confirm the message was received on Telegram
 # (Check your Telegram chat with the bot)
 ```
 **Important:** Always wait for the Telegram verification message before proceeding. If the agent doesn't respond, troubleshoot the gateway or model configuration before pushing.
 ---
 ## Step 5: Handle Swift/macOS Build Issues (Common After Upstream Sync)
 Upstream updates may introduce Swift 6.2 / macOS 26 SDK incompatibilities. Use analyze-mode for systematic debugging:
 ### Analyze-Mode Investigation
 ```bash
 # Gather context with parallel agents
 morph-mcp_warpgrep_codebase_search search_string="Find deprecated FileManager.default and Thread.isMainThread usages in Swift files" repo_path="/Volumes/Main SSD/Developer/clawdis"
 morph-mcp_warpgrep_codebase_search search_string="Locate Peekaboo submodule and macOS app Swift files with concurrency issues" repo_path="/Volumes/Main SSD/Developer/clawdis"
 ```
 ### Common Swift 6.2 Fixes
 **FileManager.default Deprecation:**
 ```bash
 # Search for deprecated usage
 grep -r "FileManager\.default" src/ apps/ --include="*.swift"
 # Replace with proper initialization
 # OLD: FileManager.default
 # NEW: FileManager()
 ```
 **Thread.isMainThread Deprecation:**
 ```bash
 # Search for deprecated usage
 grep -r "Thread\.isMainThread" src/ apps/ --include="*.swift"
 # Replace with modern concurrency check
 # OLD: Thread.isMainThread
 # NEW: await MainActor.run { ... } or DispatchQueue.main.sync { ... }
 ```
 ### Peekaboo Submodule Fixes
 ```bash
 # Check Peekaboo for concurrency issues
 cd src/canvas-host/a2ui
 grep -r "Thread\.isMainThread\|FileManager\.default" . --include="*.swift"
 # Fix and rebuild submodule
 cd /Volumes/Main SSD/Developer/clawdis
 pnpm canvas:a2ui:bundle
 ```
 ### macOS App Concurrency Fixes
 ```bash
 # Check macOS app for issues
 grep -r "Thread\.isMainThread\|FileManager\.default" apps/macos/ --include="*.swift"
 # Clean and rebuild after fixes
 cd apps/macos && rm -rf .build .swiftpm
 ./scripts/restart-mac.sh
 ```
 ### Model Configuration Updates
 If upstream introduced new model configurations:
 ```bash
 # Check for OpenRouter API key requirements
 grep -r "openrouter\|OPENROUTER" src/ --include="*.ts" --include="*.js"
 # Update clawdbot.json with fallback chains
 # Add model fallback configurations as needed
 ```
 ---
 ## Step 6: Verify & Push
 ```bash
 # Verify everything works
 pnpm clawdbot health
 pnpm test
 # Push (force required after rebase)
 git push origin main --force-with-lease
 # Or regular push after merge
 git push origin main
 ```
 ---
 ## Troubleshooting
 ### Build Fails After Sync
 ```bash
 # Clean and rebuild
 rm -rf node_modules dist
 pnpm install
 pnpm build
 ```
 ### Type Errors (Bun/Node Incompatibility)
 Common issue: `fetch.preconnect` type mismatch. Fix by using `FetchLike` type instead of `typeof fetch`.
 ### macOS App Crashes on Launch
 Usually resource bundle mismatch. Full rebuild required:
 ```bash
 cd apps/macos && rm -rf .build .swiftpm
 ./scripts/restart-mac.sh
 ```
 ### Patch Failures
 ```bash
 # Check patch status
 pnpm install 2>&1 | grep -i patch
 # If patches fail, they may need updating for new dep versions
 # Check patches/ directory against package.json patchedDependencies
 ```
 ### Swift 6.2 / macOS 26 SDK Build Failures
 **Symptoms:** Build fails with deprecation warnings about `FileManager.default` or `Thread.isMainThread`
 **Search-Mode Investigation:**
 ```bash
 # Exhaustive search for deprecated APIs
 morph-mcp_warpgrep_codebase_search search_string="Find all Swift files using deprecated FileManager.default or Thread.isMainThread" repo_path="/Volumes/Main SSD/Developer/clawdis"
 ```
 **Quick Fix Commands:**
 ```bash
 # Find all affected files
 find . -name "*.swift" -exec grep -l "FileManager\.default\|Thread\.isMainThread" {} \;
 # Replace FileManager.default with FileManager()
 find . -name "*.swift" -exec sed -i '' 's/FileManager\.default/FileManager()/g' {} \;
 # For Thread.isMainThread, need manual review of each usage
 grep -rn "Thread\.isMainThread" --include="*.swift" .
 ```
 **Rebuild After Fixes:**
 ```bash
 # Clean all build artifacts
 rm -rf apps/macos/.build apps/macos/.swiftpm
 rm -rf src/canvas-host/a2ui/.build
 # Rebuild Peekaboo bundle
 pnpm canvas:a2ui:bundle
 # Full macOS rebuild
 ./scripts/restart-mac.sh
 ```
 ---
 ## Automation Script
 Save as `scripts/sync-upstream.sh`:
 ```bash
 #!/usr/bin/env bash
 set -euo pipefail
 echo "==> Fetching upstream..."
 git fetch upstream
 echo "==> Current divergence:"
 git rev-list --left-right --count main...upstream/main
 echo "==> Rebasing onto upstream/main..."
 git rebase upstream/main
 echo "==> Installing dependencies..."
 pnpm install
 echo "==> Building..."
 pnpm build
 pnpm ui:build
 echo "==> Running doctor..."
 pnpm clawdbot doctor
 echo "==> Rebuilding macOS app..."
 ./scripts/restart-mac.sh
 echo "==> Verifying gateway health..."
 pnpm clawdbot health
 echo "==> Checking for Swift 6.2 compatibility issues..."
 if grep -r "FileManager\.default\|Thread\.isMainThread" src/ apps/ --include="*.swift" --quiet; then
    echo "⚠️  Found potential Swift 6.2 deprecated API usage"
    echo "   Run manual fixes or use analyze-mode investigation"
 else
    echo "✅ No obvious Swift deprecation issues found"
 fi
 echo "==> Testing agent functionality..."
 # Note: Update YOUR_TELEGRAM_SESSION_ID with actual session ID
 pnpm clawdbot agent --message "Verification: Upstream sync and macOS rebuild completed successfully." --session-id YOUR_TELEGRAM_SESSION_ID || echo "Warning: Agent test failed - check Telegram for verification message"
 echo "==> Done! Check Telegram for verification message, then run 'git push --force-with-lease' when ready."
 ```
--- a/.serena/.gitignore
+++ b/.serena/.gitignore
@@ -0,0 +1 @@
 /cache
--- a/.serena/cache/typescript/document_symbols.pkl
+++ b/.serena/cache/typescript/document_symbols.pkl
--- a/.serena/cache/typescript/raw_document_symbols.pkl
+++ b/.serena/cache/typescript/raw_document_symbols.pkl
--- a/.serena/project.yml
+++ b/.serena/project.yml
@@ -0,0 +1,87 @@
 # list of languages for which language servers are started; choose from:
 #   al               bash             clojure          cpp              csharp           csharp_omnisharp
 #   dart             elixir           elm              erlang           fortran          fsharp
 #   go               groovy           haskell          java             julia            kotlin
 #   lua              markdown         nix              pascal           perl             php
 #   powershell       python           python_jedi      r                rego             ruby
 #   ruby_solargraph  rust             scala            swift            terraform        toml
 #   typescript       typescript_vts   yaml             zig
 # Note:
 #   - For C, use cpp
 #   - For JavaScript, use typescript
 #   - For Free Pascal / Lazarus, use pascal
 # Special requirements:
 #   - csharp: Requires the presence of a .sln file in the project folder.
 #   - pascal: Requires Free Pascal Compiler (fpc) and optionally Lazarus.
 # When using multiple languages, the first language server that supports a given file will be used for that file.
 # The first language is the default language and the respective language server will be used as a fallback.
 # Note that when using the JetBrains backend, language servers are not used and this list is correspondingly ignored.
 languages:
 - typescript
 # the encoding used by text files in the project
 # For a list of possible encodings, see https://docs.python.org/3.11/library/codecs.html#standard-encodings
 encoding: "utf-8"
 # whether to use the project's gitignore file to ignore files
 # Added on 2025-04-07
 ignore_all_files_in_gitignore: true
 # list of additional paths to ignore
 # same syntax as gitignore, so you can use * and **
 # Was previously called `ignored_dirs`, please update your config if you are using that.
 # Added (renamed) on 2025-04-07
 ignored_paths: []
 # whether the project is in read-only mode
 # If set to true, all editing tools will be disabled and attempts to use them will result in an error
 # Added on 2025-04-18
 read_only: false
 # list of tool names to exclude. We recommend not excluding any tools, see the readme for more details.
 # Below is the complete list of tools for convenience.
 # To make sure you have the latest list of tools, and to view their descriptions, 
 # execute `uv run scripts/print_tool_overview.py`.
 #
 #  * `activate_project`: Activates a project by name.
 #  * `check_onboarding_performed`: Checks whether project onboarding was already performed.
 #  * `create_text_file`: Creates/overwrites a file in the project directory.
 #  * `delete_lines`: Deletes a range of lines within a file.
 #  * `delete_memory`: Deletes a memory from Serena's project-specific memory store.
 #  * `execute_shell_command`: Executes a shell command.
 #  * `find_referencing_code_snippets`: Finds code snippets in which the symbol at the given location is referenced.
 #  * `find_referencing_symbols`: Finds symbols that reference the symbol at the given location (optionally filtered by type).
 #  * `find_symbol`: Performs a global (or local) search for symbols with/containing a given name/substring (optionally filtered by type).
 #  * `get_current_config`: Prints the current configuration of the agent, including the active and available projects, tools, contexts, and modes.
 #  * `get_symbols_overview`: Gets an overview of the top-level symbols defined in a given file.
 #  * `initial_instructions`: Gets the initial instructions for the current project.
 #     Should only be used in settings where the system prompt cannot be set,
 #     e.g. in clients you have no control over, like Claude Desktop.
 #  * `insert_after_symbol`: Inserts content after the end of the definition of a given symbol.
 #  * `insert_at_line`: Inserts content at a given line in a file.
 #  * `insert_before_symbol`: Inserts content before the beginning of the definition of a given symbol.
 #  * `list_dir`: Lists files and directories in the given directory (optionally with recursion).
 #  * `list_memories`: Lists memories in Serena's project-specific memory store.
 #  * `onboarding`: Performs onboarding (identifying the project structure and essential tasks, e.g. for testing or building).
 #  * `prepare_for_new_conversation`: Provides instructions for preparing for a new conversation (in order to continue with the necessary context).
 #  * `read_file`: Reads a file within the project directory.
 #  * `read_memory`: Reads the memory with the given name from Serena's project-specific memory store.
 #  * `remove_project`: Removes a project from the Serena configuration.
 #  * `replace_lines`: Replaces a range of lines within a file with new content.
 #  * `replace_symbol_body`: Replaces the full definition of a symbol.
 #  * `restart_language_server`: Restarts the language server, may be necessary when edits not through Serena happen.
 #  * `search_for_pattern`: Performs a search for a pattern in the project.
 #  * `summarize_changes`: Provides instructions for summarizing the changes made to the codebase.
 #  * `switch_modes`: Activates modes by providing a list of their names
 #  * `think_about_collected_information`: Thinking tool for pondering the completeness of collected information.
 #  * `think_about_task_adherence`: Thinking tool for determining whether the agent is still on track with the current task.
 #  * `think_about_whether_you_are_done`: Thinking tool for determining whether the task is truly completed.
 #  * `write_memory`: Writes a named memory (for future reference) to Serena's project-specific memory store.
 excluded_tools: []
 # initial prompt for the project. It will always be given to the LLM upon activating the project
 # (contrary to the memories, which are loaded on demand).
 initial_prompt: ""
 project_name: "clawdbot"
 included_optional_tools: []
--- a/src/gateway/openresponses-parity.e2e.test.ts
+++ b/src/gateway/openresponses-parity.e2e.test.ts
@@ -0,0 +1,315 @@
 /**
 * OpenResponses Feature Parity E2E Tests
 *
 * Tests for input_image, input_file, and client-side tools (Hosted Tools)
 * support in the OpenResponses `/v1/responses` endpoint.
 */
 import { describe, it, expect } from "vitest";
 describe("OpenResponses Feature Parity", () => {
  describe("Schema Validation", () => {
    it("should validate input_image with url source", async () => {
      const { InputImageContentPartSchema } = await import("./open-responses.schema.js");
      const validImage = {
        type: "input_image" as const,
        source: {
          type: "url" as const,
          url: "https://example.com/image.png",
        },
      };
      const result = InputImageContentPartSchema.safeParse(validImage);
      expect(result.success).toBe(true);
    });
    it("should validate input_image with base64 source", async () => {
      const { InputImageContentPartSchema } = await import("./open-responses.schema.js");
      const validImage = {
        type: "input_image" as const,
        source: {
          type: "base64" as const,
          media_type: "image/png" as const,
          data: "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==",
        },
      };
      const result = InputImageContentPartSchema.safeParse(validImage);
      expect(result.success).toBe(true);
    });
    it("should reject input_image with invalid mime type", async () => {
      const { InputImageContentPartSchema } = await import("./open-responses.schema.js");
      const invalidImage = {
        type: "input_image" as const,
        source: {
          type: "base64" as const,
          media_type: "application/json" as const, // Not an image
          data: "SGVsbG8gV29ybGQh",
        },
      };
      const result = InputImageContentPartSchema.safeParse(invalidImage);
      expect(result.success).toBe(false);
    });
    it("should validate input_file with url source", async () => {
      const { InputFileContentPartSchema } = await import("./open-responses.schema.js");
      const validFile = {
        type: "input_file" as const,
        source: {
          type: "url" as const,
          url: "https://example.com/document.txt",
        },
      };
      const result = InputFileContentPartSchema.safeParse(validFile);
      expect(result.success).toBe(true);
    });
    it("should validate input_file with base64 source", async () => {
      const { InputFileContentPartSchema } = await import("./open-responses.schema.js");
      const validFile = {
        type: "input_file" as const,
        source: {
          type: "base64" as const,
          media_type: "text/plain" as const,
          data: "SGVsbG8gV29ybGQh",
          filename: "hello.txt",
        },
      };
      const result = InputFileContentPartSchema.safeParse(validFile);
      expect(result.success).toBe(true);
    });
    it("should validate tool definition", async () => {
      const { ToolDefinitionSchema } = await import("./open-responses.schema.js");
      const validTool = {
        type: "function" as const,
        function: {
          name: "get_weather",
          description: "Get the current weather",
          parameters: {
            type: "object",
            properties: {
              location: { type: "string" },
            },
            required: ["location"],
          },
        },
      };
      const result = ToolDefinitionSchema.safeParse(validTool);
      expect(result.success).toBe(true);
    });
    it("should reject tool definition without name", async () => {
      const { ToolDefinitionSchema } = await import("./open-responses.schema.js");
      const invalidTool = {
        type: "function" as const,
        function: {
          name: "", // Empty name
          description: "Get the current weather",
        },
      };
      const result = ToolDefinitionSchema.safeParse(invalidTool);
      expect(result.success).toBe(false);
    });
  });
  describe("CreateResponseBody Schema", () => {
    it("should validate request with input_image", async () => {
      const { CreateResponseBodySchema } = await import("./open-responses.schema.js");
      const validRequest = {
        model: "claude-sonnet-4-20250514",
        input: [
          {
            type: "message" as const,
            role: "user" as const,
            content: [
              {
                type: "input_image" as const,
                source: {
                  type: "url" as const,
                  url: "https://example.com/photo.jpg",
                },
              },
              {
                type: "input_text" as const,
                text: "What's in this image?",
              },
            ],
          },
        ],
      };
      const result = CreateResponseBodySchema.safeParse(validRequest);
      expect(result.success).toBe(true);
    });
    it("should validate request with client tools", async () => {
      const { CreateResponseBodySchema } = await import("./open-responses.schema.js");
      const validRequest = {
        model: "claude-sonnet-4-20250514",
        input: [
          {
            type: "message" as const,
            role: "user" as const,
            content: "What's the weather?",
          },
        ],
        tools: [
          {
            type: "function" as const,
            function: {
              name: "get_weather",
              description: "Get weather for a location",
              parameters: {
                type: "object",
                properties: {
                  location: { type: "string" },
                },
                required: ["location"],
              },
            },
          },
        ],
      };
      const result = CreateResponseBodySchema.safeParse(validRequest);
      expect(result.success).toBe(true);
    });
    it("should validate request with function_call_output for turn-based tools", async () => {
      const { CreateResponseBodySchema } = await import("./open-responses.schema.js");
      const validRequest = {
        model: "claude-sonnet-4-20250514",
        input: [
          {
            type: "function_call_output" as const,
            call_id: "call_123",
            output: '{"temperature": "72°F", "condition": "sunny"}',
          },
        ],
      };
      const result = CreateResponseBodySchema.safeParse(validRequest);
      expect(result.success).toBe(true);
    });
    it("should validate complete turn-based tool flow", async () => {
      const { CreateResponseBodySchema } = await import("./open-responses.schema.js");
      const turn1Request = {
        model: "claude-sonnet-4-20250514",
        input: [
          {
            type: "message" as const,
            role: "user" as const,
            content: "What's the weather in San Francisco?",
          },
        ],
        tools: [
          {
            type: "function" as const,
            function: {
              name: "get_weather",
              description: "Get weather for a location",
            },
          },
        ],
      };
      const turn1Result = CreateResponseBodySchema.safeParse(turn1Request);
      expect(turn1Result.success).toBe(true);
      // Turn 2: Client provides tool output
      const turn2Request = {
        model: "claude-sonnet-4-20250514",
        input: [
          {
            type: "function_call_output" as const,
            call_id: "call_123",
            output: '{"temperature": "72°F", "condition": "sunny"}',
          },
        ],
      };
      const turn2Result = CreateResponseBodySchema.safeParse(turn2Request);
      expect(turn2Result.success).toBe(true);
    });
  });
  describe("Response Resource Schema", () => {
    it("should validate response with function_call output", async () => {
      const { OutputItemSchema } = await import("./open-responses.schema.js");
      const functionCallOutput = {
        type: "function_call" as const,
        id: "msg_123",
        call_id: "call_456",
        name: "get_weather",
        arguments: '{"location": "San Francisco"}',
      };
      const result = OutputItemSchema.safeParse(functionCallOutput);
      expect(result.success).toBe(true);
    });
  });
  describe("buildAgentPrompt", () => {
    it("should convert function_call_output to tool entry", async () => {
      const { buildAgentPrompt } = await import("./openresponses-http.js");
      const result = buildAgentPrompt([
        {
          type: "function_call_output" as const,
          call_id: "call_123",
          output: '{"temperature": "72°F"}',
        },
      ]);
      // When there's only a tool output (no history), returns just the body
      expect(result.message).toBe('{"temperature": "72°F"}');
    });
    it("should handle mixed message and function_call_output items", async () => {
      const { buildAgentPrompt } = await import("./openresponses-http.js");
      const result = buildAgentPrompt([
        {
          type: "message" as const,
          role: "user" as const,
          content: "What's the weather?",
        },
        {
          type: "function_call_output" as const,
          call_id: "call_123",
          output: '{"temperature": "72°F"}',
        },
        {
          type: "message" as const,
          role: "user" as const,
          content: "Thanks!",
        },
      ]);
      // Should include both user messages and tool output
      expect(result.message).toContain("weather");
      expect(result.message).toContain("72°F");
      expect(result.message).toContain("Thanks");
    });
  });
 });