test(gateway): add OpenResponses parity E2E tests

- Add schema validation tests for input_image, input_file, client tools
- Add buildAgentPrompt tests for turn-based tool flow
This commit is contained in:
Ryan Lisse
2026-01-19 12:43:17 +01:00
committed by Peter Steinberger
parent a5afe7bc2b
commit 4f02c74dca
7 changed files with 769 additions and 0 deletions

BIN
.agent/.DS_Store vendored Normal file

Binary file not shown.

View File

@@ -0,0 +1,366 @@
---
description: Update Clawdbot from upstream when branch has diverged (ahead/behind)
---
# Clawdbot Upstream Sync Workflow
Use this workflow when your fork has diverged from upstream (e.g., "18 commits ahead, 29 commits behind").
## Quick Reference
```bash
# Check divergence status
git fetch upstream && git rev-list --left-right --count main...upstream/main
# Full sync (rebase preferred)
git fetch upstream && git rebase upstream/main && pnpm install && pnpm build && ./scripts/restart-mac.sh
# Check for Swift 6.2 issues after sync
grep -r "FileManager\.default\|Thread\.isMainThread" src/ apps/ --include="*.swift"
```
---
## Step 1: Assess Divergence
```bash
git fetch upstream
git log --oneline --left-right main...upstream/main | head -20
```
This shows:
- `<` = your local commits (ahead)
- `>` = upstream commits you're missing (behind)
**Decision point:**
- Few local commits, many upstream → **Rebase** (cleaner history)
- Many local commits or shared branch → **Merge** (preserves history)
---
## Step 2A: Rebase Strategy (Preferred)
Replays your commits on top of upstream. Results in linear history.
```bash
# Ensure working tree is clean
git status
# Rebase onto upstream
git rebase upstream/main
```
### Handling Rebase Conflicts
```bash
# When conflicts occur:
# 1. Fix conflicts in the listed files
# 2. Stage resolved files
git add <resolved-files>
# 3. Continue rebase
git rebase --continue
# If a commit is no longer needed (already in upstream):
git rebase --skip
# To abort and return to original state:
git rebase --abort
```
### Common Conflict Patterns
| File | Resolution |
|------|------------|
| `package.json` | Take upstream deps, keep local scripts if needed |
| `pnpm-lock.yaml` | Accept upstream, regenerate with `pnpm install` |
| `*.patch` files | Usually take upstream version |
| Source files | Merge logic carefully, prefer upstream structure |
---
## Step 2B: Merge Strategy (Alternative)
Preserves all history with a merge commit.
```bash
git merge upstream/main --no-edit
```
Resolve conflicts same as rebase, then:
```bash
git add <resolved-files>
git commit
```
---
## Step 3: Rebuild Everything
After sync completes:
```bash
# Install dependencies (regenerates lock if needed)
pnpm install
# Build TypeScript
pnpm build
# Build UI assets
pnpm ui:build
# Run diagnostics
pnpm clawdbot doctor
```
---
## Step 4: Rebuild macOS App
```bash
# Full rebuild, sign, and launch
./scripts/restart-mac.sh
# Or just package without restart
pnpm mac:package
```
### Install to /Applications
```bash
# Kill running app
pkill -x "Clawdbot" || true
# Move old version
mv /Applications/Clawdbot.app /tmp/Clawdbot-backup.app
# Install new build
cp -R dist/Clawdbot.app /Applications/
# Launch
open /Applications/Clawdbot.app
```
---
## Step 4A: Verify macOS App & Agent
After rebuilding the macOS app, always verify it works correctly:
```bash
# Check gateway health
pnpm clawdbot health
# Verify no zombie processes
ps aux | grep -E "(clawdbot|gateway)" | grep -v grep
# Test agent functionality by sending a verification message
pnpm clawdbot agent --message "Verification: macOS app rebuild successful - agent is responding." --session-id YOUR_TELEGRAM_SESSION_ID
# Confirm the message was received on Telegram
# (Check your Telegram chat with the bot)
```
**Important:** Always wait for the Telegram verification message before proceeding. If the agent doesn't respond, troubleshoot the gateway or model configuration before pushing.
---
## Step 5: Handle Swift/macOS Build Issues (Common After Upstream Sync)
Upstream updates may introduce Swift 6.2 / macOS 26 SDK incompatibilities. Use analyze-mode for systematic debugging:
### Analyze-Mode Investigation
```bash
# Gather context with parallel agents
morph-mcp_warpgrep_codebase_search search_string="Find deprecated FileManager.default and Thread.isMainThread usages in Swift files" repo_path="/Volumes/Main SSD/Developer/clawdis"
morph-mcp_warpgrep_codebase_search search_string="Locate Peekaboo submodule and macOS app Swift files with concurrency issues" repo_path="/Volumes/Main SSD/Developer/clawdis"
```
### Common Swift 6.2 Fixes
**FileManager.default Deprecation:**
```bash
# Search for deprecated usage
grep -r "FileManager\.default" src/ apps/ --include="*.swift"
# Replace with proper initialization
# OLD: FileManager.default
# NEW: FileManager()
```
**Thread.isMainThread Deprecation:**
```bash
# Search for deprecated usage
grep -r "Thread\.isMainThread" src/ apps/ --include="*.swift"
# Replace with modern concurrency check
# OLD: Thread.isMainThread
# NEW: await MainActor.run { ... } or DispatchQueue.main.sync { ... }
```
### Peekaboo Submodule Fixes
```bash
# Check Peekaboo for concurrency issues
cd src/canvas-host/a2ui
grep -r "Thread\.isMainThread\|FileManager\.default" . --include="*.swift"
# Fix and rebuild submodule
cd /Volumes/Main SSD/Developer/clawdis
pnpm canvas:a2ui:bundle
```
### macOS App Concurrency Fixes
```bash
# Check macOS app for issues
grep -r "Thread\.isMainThread\|FileManager\.default" apps/macos/ --include="*.swift"
# Clean and rebuild after fixes
cd apps/macos && rm -rf .build .swiftpm
./scripts/restart-mac.sh
```
### Model Configuration Updates
If upstream introduced new model configurations:
```bash
# Check for OpenRouter API key requirements
grep -r "openrouter\|OPENROUTER" src/ --include="*.ts" --include="*.js"
# Update clawdbot.json with fallback chains
# Add model fallback configurations as needed
```
---
## Step 6: Verify & Push
```bash
# Verify everything works
pnpm clawdbot health
pnpm test
# Push (force required after rebase)
git push origin main --force-with-lease
# Or regular push after merge
git push origin main
```
---
## Troubleshooting
### Build Fails After Sync
```bash
# Clean and rebuild
rm -rf node_modules dist
pnpm install
pnpm build
```
### Type Errors (Bun/Node Incompatibility)
Common issue: `fetch.preconnect` type mismatch. Fix by using `FetchLike` type instead of `typeof fetch`.
### macOS App Crashes on Launch
Usually resource bundle mismatch. Full rebuild required:
```bash
cd apps/macos && rm -rf .build .swiftpm
./scripts/restart-mac.sh
```
### Patch Failures
```bash
# Check patch status
pnpm install 2>&1 | grep -i patch
# If patches fail, they may need updating for new dep versions
# Check patches/ directory against package.json patchedDependencies
```
### Swift 6.2 / macOS 26 SDK Build Failures
**Symptoms:** Build fails with deprecation warnings about `FileManager.default` or `Thread.isMainThread`
**Search-Mode Investigation:**
```bash
# Exhaustive search for deprecated APIs
morph-mcp_warpgrep_codebase_search search_string="Find all Swift files using deprecated FileManager.default or Thread.isMainThread" repo_path="/Volumes/Main SSD/Developer/clawdis"
```
**Quick Fix Commands:**
```bash
# Find all affected files
find . -name "*.swift" -exec grep -l "FileManager\.default\|Thread\.isMainThread" {} \;
# Replace FileManager.default with FileManager()
find . -name "*.swift" -exec sed -i '' 's/FileManager\.default/FileManager()/g' {} \;
# For Thread.isMainThread, need manual review of each usage
grep -rn "Thread\.isMainThread" --include="*.swift" .
```
**Rebuild After Fixes:**
```bash
# Clean all build artifacts
rm -rf apps/macos/.build apps/macos/.swiftpm
rm -rf src/canvas-host/a2ui/.build
# Rebuild Peekaboo bundle
pnpm canvas:a2ui:bundle
# Full macOS rebuild
./scripts/restart-mac.sh
```
---
## Automation Script
Save as `scripts/sync-upstream.sh`:
```bash
#!/usr/bin/env bash
set -euo pipefail
echo "==> Fetching upstream..."
git fetch upstream
echo "==> Current divergence:"
git rev-list --left-right --count main...upstream/main
echo "==> Rebasing onto upstream/main..."
git rebase upstream/main
echo "==> Installing dependencies..."
pnpm install
echo "==> Building..."
pnpm build
pnpm ui:build
echo "==> Running doctor..."
pnpm clawdbot doctor
echo "==> Rebuilding macOS app..."
./scripts/restart-mac.sh
echo "==> Verifying gateway health..."
pnpm clawdbot health
echo "==> Checking for Swift 6.2 compatibility issues..."
if grep -r "FileManager\.default\|Thread\.isMainThread" src/ apps/ --include="*.swift" --quiet; then
echo "⚠️ Found potential Swift 6.2 deprecated API usage"
echo " Run manual fixes or use analyze-mode investigation"
else
echo "✅ No obvious Swift deprecation issues found"
fi
echo "==> Testing agent functionality..."
# Note: Update YOUR_TELEGRAM_SESSION_ID with actual session ID
pnpm clawdbot agent --message "Verification: Upstream sync and macOS rebuild completed successfully." --session-id YOUR_TELEGRAM_SESSION_ID || echo "Warning: Agent test failed - check Telegram for verification message"
echo "==> Done! Check Telegram for verification message, then run 'git push --force-with-lease' when ready."
```

1
.serena/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
/cache

Binary file not shown.

Binary file not shown.

87
.serena/project.yml Normal file
View File

@@ -0,0 +1,87 @@
# list of languages for which language servers are started; choose from:
# al bash clojure cpp csharp csharp_omnisharp
# dart elixir elm erlang fortran fsharp
# go groovy haskell java julia kotlin
# lua markdown nix pascal perl php
# powershell python python_jedi r rego ruby
# ruby_solargraph rust scala swift terraform toml
# typescript typescript_vts yaml zig
# Note:
# - For C, use cpp
# - For JavaScript, use typescript
# - For Free Pascal / Lazarus, use pascal
# Special requirements:
# - csharp: Requires the presence of a .sln file in the project folder.
# - pascal: Requires Free Pascal Compiler (fpc) and optionally Lazarus.
# When using multiple languages, the first language server that supports a given file will be used for that file.
# The first language is the default language and the respective language server will be used as a fallback.
# Note that when using the JetBrains backend, language servers are not used and this list is correspondingly ignored.
languages:
- typescript
# the encoding used by text files in the project
# For a list of possible encodings, see https://docs.python.org/3.11/library/codecs.html#standard-encodings
encoding: "utf-8"
# whether to use the project's gitignore file to ignore files
# Added on 2025-04-07
ignore_all_files_in_gitignore: true
# list of additional paths to ignore
# same syntax as gitignore, so you can use * and **
# Was previously called `ignored_dirs`, please update your config if you are using that.
# Added (renamed) on 2025-04-07
ignored_paths: []
# whether the project is in read-only mode
# If set to true, all editing tools will be disabled and attempts to use them will result in an error
# Added on 2025-04-18
read_only: false
# list of tool names to exclude. We recommend not excluding any tools, see the readme for more details.
# Below is the complete list of tools for convenience.
# To make sure you have the latest list of tools, and to view their descriptions,
# execute `uv run scripts/print_tool_overview.py`.
#
# * `activate_project`: Activates a project by name.
# * `check_onboarding_performed`: Checks whether project onboarding was already performed.
# * `create_text_file`: Creates/overwrites a file in the project directory.
# * `delete_lines`: Deletes a range of lines within a file.
# * `delete_memory`: Deletes a memory from Serena's project-specific memory store.
# * `execute_shell_command`: Executes a shell command.
# * `find_referencing_code_snippets`: Finds code snippets in which the symbol at the given location is referenced.
# * `find_referencing_symbols`: Finds symbols that reference the symbol at the given location (optionally filtered by type).
# * `find_symbol`: Performs a global (or local) search for symbols with/containing a given name/substring (optionally filtered by type).
# * `get_current_config`: Prints the current configuration of the agent, including the active and available projects, tools, contexts, and modes.
# * `get_symbols_overview`: Gets an overview of the top-level symbols defined in a given file.
# * `initial_instructions`: Gets the initial instructions for the current project.
# Should only be used in settings where the system prompt cannot be set,
# e.g. in clients you have no control over, like Claude Desktop.
# * `insert_after_symbol`: Inserts content after the end of the definition of a given symbol.
# * `insert_at_line`: Inserts content at a given line in a file.
# * `insert_before_symbol`: Inserts content before the beginning of the definition of a given symbol.
# * `list_dir`: Lists files and directories in the given directory (optionally with recursion).
# * `list_memories`: Lists memories in Serena's project-specific memory store.
# * `onboarding`: Performs onboarding (identifying the project structure and essential tasks, e.g. for testing or building).
# * `prepare_for_new_conversation`: Provides instructions for preparing for a new conversation (in order to continue with the necessary context).
# * `read_file`: Reads a file within the project directory.
# * `read_memory`: Reads the memory with the given name from Serena's project-specific memory store.
# * `remove_project`: Removes a project from the Serena configuration.
# * `replace_lines`: Replaces a range of lines within a file with new content.
# * `replace_symbol_body`: Replaces the full definition of a symbol.
# * `restart_language_server`: Restarts the language server, may be necessary when edits not through Serena happen.
# * `search_for_pattern`: Performs a search for a pattern in the project.
# * `summarize_changes`: Provides instructions for summarizing the changes made to the codebase.
# * `switch_modes`: Activates modes by providing a list of their names
# * `think_about_collected_information`: Thinking tool for pondering the completeness of collected information.
# * `think_about_task_adherence`: Thinking tool for determining whether the agent is still on track with the current task.
# * `think_about_whether_you_are_done`: Thinking tool for determining whether the task is truly completed.
# * `write_memory`: Writes a named memory (for future reference) to Serena's project-specific memory store.
excluded_tools: []
# initial prompt for the project. It will always be given to the LLM upon activating the project
# (contrary to the memories, which are loaded on demand).
initial_prompt: ""
project_name: "clawdbot"
included_optional_tools: []

View File

@@ -0,0 +1,315 @@
/**
* OpenResponses Feature Parity E2E Tests
*
* Tests for input_image, input_file, and client-side tools (Hosted Tools)
* support in the OpenResponses `/v1/responses` endpoint.
*/
import { describe, it, expect } from "vitest";
describe("OpenResponses Feature Parity", () => {
describe("Schema Validation", () => {
it("should validate input_image with url source", async () => {
const { InputImageContentPartSchema } = await import("./open-responses.schema.js");
const validImage = {
type: "input_image" as const,
source: {
type: "url" as const,
url: "https://example.com/image.png",
},
};
const result = InputImageContentPartSchema.safeParse(validImage);
expect(result.success).toBe(true);
});
it("should validate input_image with base64 source", async () => {
const { InputImageContentPartSchema } = await import("./open-responses.schema.js");
const validImage = {
type: "input_image" as const,
source: {
type: "base64" as const,
media_type: "image/png" as const,
data: "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==",
},
};
const result = InputImageContentPartSchema.safeParse(validImage);
expect(result.success).toBe(true);
});
it("should reject input_image with invalid mime type", async () => {
const { InputImageContentPartSchema } = await import("./open-responses.schema.js");
const invalidImage = {
type: "input_image" as const,
source: {
type: "base64" as const,
media_type: "application/json" as const, // Not an image
data: "SGVsbG8gV29ybGQh",
},
};
const result = InputImageContentPartSchema.safeParse(invalidImage);
expect(result.success).toBe(false);
});
it("should validate input_file with url source", async () => {
const { InputFileContentPartSchema } = await import("./open-responses.schema.js");
const validFile = {
type: "input_file" as const,
source: {
type: "url" as const,
url: "https://example.com/document.txt",
},
};
const result = InputFileContentPartSchema.safeParse(validFile);
expect(result.success).toBe(true);
});
it("should validate input_file with base64 source", async () => {
const { InputFileContentPartSchema } = await import("./open-responses.schema.js");
const validFile = {
type: "input_file" as const,
source: {
type: "base64" as const,
media_type: "text/plain" as const,
data: "SGVsbG8gV29ybGQh",
filename: "hello.txt",
},
};
const result = InputFileContentPartSchema.safeParse(validFile);
expect(result.success).toBe(true);
});
it("should validate tool definition", async () => {
const { ToolDefinitionSchema } = await import("./open-responses.schema.js");
const validTool = {
type: "function" as const,
function: {
name: "get_weather",
description: "Get the current weather",
parameters: {
type: "object",
properties: {
location: { type: "string" },
},
required: ["location"],
},
},
};
const result = ToolDefinitionSchema.safeParse(validTool);
expect(result.success).toBe(true);
});
it("should reject tool definition without name", async () => {
const { ToolDefinitionSchema } = await import("./open-responses.schema.js");
const invalidTool = {
type: "function" as const,
function: {
name: "", // Empty name
description: "Get the current weather",
},
};
const result = ToolDefinitionSchema.safeParse(invalidTool);
expect(result.success).toBe(false);
});
});
describe("CreateResponseBody Schema", () => {
it("should validate request with input_image", async () => {
const { CreateResponseBodySchema } = await import("./open-responses.schema.js");
const validRequest = {
model: "claude-sonnet-4-20250514",
input: [
{
type: "message" as const,
role: "user" as const,
content: [
{
type: "input_image" as const,
source: {
type: "url" as const,
url: "https://example.com/photo.jpg",
},
},
{
type: "input_text" as const,
text: "What's in this image?",
},
],
},
],
};
const result = CreateResponseBodySchema.safeParse(validRequest);
expect(result.success).toBe(true);
});
it("should validate request with client tools", async () => {
const { CreateResponseBodySchema } = await import("./open-responses.schema.js");
const validRequest = {
model: "claude-sonnet-4-20250514",
input: [
{
type: "message" as const,
role: "user" as const,
content: "What's the weather?",
},
],
tools: [
{
type: "function" as const,
function: {
name: "get_weather",
description: "Get weather for a location",
parameters: {
type: "object",
properties: {
location: { type: "string" },
},
required: ["location"],
},
},
},
],
};
const result = CreateResponseBodySchema.safeParse(validRequest);
expect(result.success).toBe(true);
});
it("should validate request with function_call_output for turn-based tools", async () => {
const { CreateResponseBodySchema } = await import("./open-responses.schema.js");
const validRequest = {
model: "claude-sonnet-4-20250514",
input: [
{
type: "function_call_output" as const,
call_id: "call_123",
output: '{"temperature": "72°F", "condition": "sunny"}',
},
],
};
const result = CreateResponseBodySchema.safeParse(validRequest);
expect(result.success).toBe(true);
});
it("should validate complete turn-based tool flow", async () => {
const { CreateResponseBodySchema } = await import("./open-responses.schema.js");
const turn1Request = {
model: "claude-sonnet-4-20250514",
input: [
{
type: "message" as const,
role: "user" as const,
content: "What's the weather in San Francisco?",
},
],
tools: [
{
type: "function" as const,
function: {
name: "get_weather",
description: "Get weather for a location",
},
},
],
};
const turn1Result = CreateResponseBodySchema.safeParse(turn1Request);
expect(turn1Result.success).toBe(true);
// Turn 2: Client provides tool output
const turn2Request = {
model: "claude-sonnet-4-20250514",
input: [
{
type: "function_call_output" as const,
call_id: "call_123",
output: '{"temperature": "72°F", "condition": "sunny"}',
},
],
};
const turn2Result = CreateResponseBodySchema.safeParse(turn2Request);
expect(turn2Result.success).toBe(true);
});
});
describe("Response Resource Schema", () => {
it("should validate response with function_call output", async () => {
const { OutputItemSchema } = await import("./open-responses.schema.js");
const functionCallOutput = {
type: "function_call" as const,
id: "msg_123",
call_id: "call_456",
name: "get_weather",
arguments: '{"location": "San Francisco"}',
};
const result = OutputItemSchema.safeParse(functionCallOutput);
expect(result.success).toBe(true);
});
});
describe("buildAgentPrompt", () => {
it("should convert function_call_output to tool entry", async () => {
const { buildAgentPrompt } = await import("./openresponses-http.js");
const result = buildAgentPrompt([
{
type: "function_call_output" as const,
call_id: "call_123",
output: '{"temperature": "72°F"}',
},
]);
// When there's only a tool output (no history), returns just the body
expect(result.message).toBe('{"temperature": "72°F"}');
});
it("should handle mixed message and function_call_output items", async () => {
const { buildAgentPrompt } = await import("./openresponses-http.js");
const result = buildAgentPrompt([
{
type: "message" as const,
role: "user" as const,
content: "What's the weather?",
},
{
type: "function_call_output" as const,
call_id: "call_123",
output: '{"temperature": "72°F"}',
},
{
type: "message" as const,
role: "user" as const,
content: "Thanks!",
},
]);
// Should include both user messages and tool output
expect(result.message).toContain("weather");
expect(result.message).toContain("72°F");
expect(result.message).toContain("Thanks");
});
});
});