feat(gateway): implement OpenResponses /v1/responses endpoint phase 2
- Add input_image and input_file support with SSRF protection - Add client-side tools (Hosted Tools) support - Add turn-based tool flow with function_call_output handling - Export buildAgentPrompt for testing
This commit is contained in:
committed by
Peter Steinberger
parent
f4b03599f0
commit
a5afe7bc2b
@@ -8,6 +8,7 @@ last_updated: "2026-01-19"
|
|||||||
# OpenResponses Gateway Integration Plan
|
# OpenResponses Gateway Integration Plan
|
||||||
|
|
||||||
## Context
|
## Context
|
||||||
|
|
||||||
Clawdbot Gateway currently exposes a minimal OpenAI-compatible Chat Completions endpoint at
|
Clawdbot Gateway currently exposes a minimal OpenAI-compatible Chat Completions endpoint at
|
||||||
`/v1/chat/completions` (see [OpenAI Chat Completions](/gateway/openai-http-api)).
|
`/v1/chat/completions` (see [OpenAI Chat Completions](/gateway/openai-http-api)).
|
||||||
|
|
||||||
@@ -16,19 +17,23 @@ for agentic workflows and uses item-based inputs plus semantic streaming events.
|
|||||||
spec defines `/v1/responses`, not `/v1/chat/completions`.
|
spec defines `/v1/responses`, not `/v1/chat/completions`.
|
||||||
|
|
||||||
## Goals
|
## Goals
|
||||||
|
|
||||||
- Add a `/v1/responses` endpoint that adheres to OpenResponses semantics.
|
- Add a `/v1/responses` endpoint that adheres to OpenResponses semantics.
|
||||||
- Keep Chat Completions as a compatibility layer that is easy to disable and eventually remove.
|
- Keep Chat Completions as a compatibility layer that is easy to disable and eventually remove.
|
||||||
- Standardize validation and parsing with isolated, reusable schemas.
|
- Standardize validation and parsing with isolated, reusable schemas.
|
||||||
|
|
||||||
## Non-goals
|
## Non-goals
|
||||||
|
|
||||||
- Full OpenResponses feature parity in the first pass (images, files, hosted tools).
|
- Full OpenResponses feature parity in the first pass (images, files, hosted tools).
|
||||||
- Replacing internal agent execution logic or tool orchestration.
|
- Replacing internal agent execution logic or tool orchestration.
|
||||||
- Changing the existing `/v1/chat/completions` behavior during the first phase.
|
- Changing the existing `/v1/chat/completions` behavior during the first phase.
|
||||||
|
|
||||||
## Research Summary
|
## Research Summary
|
||||||
|
|
||||||
Sources: OpenResponses OpenAPI, OpenResponses specification site, and the Hugging Face blog post.
|
Sources: OpenResponses OpenAPI, OpenResponses specification site, and the Hugging Face blog post.
|
||||||
|
|
||||||
Key points extracted:
|
Key points extracted:
|
||||||
|
|
||||||
- `POST /v1/responses` accepts `CreateResponseBody` fields like `model`, `input` (string or
|
- `POST /v1/responses` accepts `CreateResponseBody` fields like `model`, `input` (string or
|
||||||
`ItemParam[]`), `instructions`, `tools`, `tool_choice`, `stream`, `max_output_tokens`, and
|
`ItemParam[]`), `instructions`, `tools`, `tool_choice`, `stream`, `max_output_tokens`, and
|
||||||
`max_tool_calls`.
|
`max_tool_calls`.
|
||||||
@@ -52,6 +57,7 @@ Key points extracted:
|
|||||||
- HF examples include `OpenResponses-Version: latest` in requests (optional header).
|
- HF examples include `OpenResponses-Version: latest` in requests (optional header).
|
||||||
|
|
||||||
## Proposed Architecture
|
## Proposed Architecture
|
||||||
|
|
||||||
- Add `src/gateway/open-responses.schema.ts` containing Zod schemas only (no gateway imports).
|
- Add `src/gateway/open-responses.schema.ts` containing Zod schemas only (no gateway imports).
|
||||||
- Add `src/gateway/openresponses-http.ts` (or `open-responses-http.ts`) for `/v1/responses`.
|
- Add `src/gateway/openresponses-http.ts` (or `open-responses-http.ts`) for `/v1/responses`.
|
||||||
- Keep `src/gateway/openai-http.ts` intact as a legacy compatibility adapter.
|
- Keep `src/gateway/openai-http.ts` intact as a legacy compatibility adapter.
|
||||||
@@ -61,6 +67,7 @@ Key points extracted:
|
|||||||
- Emit a startup warning when Chat Completions is enabled to signal legacy status.
|
- Emit a startup warning when Chat Completions is enabled to signal legacy status.
|
||||||
|
|
||||||
## Deprecation Path for Chat Completions
|
## Deprecation Path for Chat Completions
|
||||||
|
|
||||||
- Maintain strict module boundaries: no shared schema types between responses and chat completions.
|
- Maintain strict module boundaries: no shared schema types between responses and chat completions.
|
||||||
- Make Chat Completions opt-in by config so it can be disabled without code changes.
|
- Make Chat Completions opt-in by config so it can be disabled without code changes.
|
||||||
- Update docs to label Chat Completions as legacy once `/v1/responses` is stable.
|
- Update docs to label Chat Completions as legacy once `/v1/responses` is stable.
|
||||||
@@ -68,6 +75,7 @@ Key points extracted:
|
|||||||
removal path.
|
removal path.
|
||||||
|
|
||||||
## Phase 1 Support Subset
|
## Phase 1 Support Subset
|
||||||
|
|
||||||
- Accept `input` as string or `ItemParam[]` with message roles and `function_call_output`.
|
- Accept `input` as string or `ItemParam[]` with message roles and `function_call_output`.
|
||||||
- Extract system and developer messages into `extraSystemPrompt`.
|
- Extract system and developer messages into `extraSystemPrompt`.
|
||||||
- Use the most recent `user` or `function_call_output` as the current message for agent runs.
|
- Use the most recent `user` or `function_call_output` as the current message for agent runs.
|
||||||
@@ -76,6 +84,7 @@ Key points extracted:
|
|||||||
- Return `usage` with zeroed values until token accounting is wired.
|
- Return `usage` with zeroed values until token accounting is wired.
|
||||||
|
|
||||||
## Validation Strategy (No SDK)
|
## Validation Strategy (No SDK)
|
||||||
|
|
||||||
- Implement Zod schemas for the supported subset of:
|
- Implement Zod schemas for the supported subset of:
|
||||||
- `CreateResponseBody`
|
- `CreateResponseBody`
|
||||||
- `ItemParam` + message content part unions
|
- `ItemParam` + message content part unions
|
||||||
@@ -84,6 +93,7 @@ Key points extracted:
|
|||||||
- Keep schemas in a single, isolated module to avoid drift and allow future codegen.
|
- Keep schemas in a single, isolated module to avoid drift and allow future codegen.
|
||||||
|
|
||||||
## Streaming Implementation (Phase 1)
|
## Streaming Implementation (Phase 1)
|
||||||
|
|
||||||
- SSE lines with both `event:` and `data:`.
|
- SSE lines with both `event:` and `data:`.
|
||||||
- Required sequence (minimum viable):
|
- Required sequence (minimum viable):
|
||||||
- `response.created`
|
- `response.created`
|
||||||
@@ -96,6 +106,7 @@ Key points extracted:
|
|||||||
- `[DONE]`
|
- `[DONE]`
|
||||||
|
|
||||||
## Tests and Verification Plan
|
## Tests and Verification Plan
|
||||||
|
|
||||||
- Add e2e coverage for `/v1/responses`:
|
- Add e2e coverage for `/v1/responses`:
|
||||||
- Auth required
|
- Auth required
|
||||||
- Non-stream response shape
|
- Non-stream response shape
|
||||||
@@ -106,5 +117,6 @@ Key points extracted:
|
|||||||
`[DONE]`.
|
`[DONE]`.
|
||||||
|
|
||||||
## Doc Updates (Follow-up)
|
## Doc Updates (Follow-up)
|
||||||
|
|
||||||
- Add a new docs page for `/v1/responses` usage and examples.
|
- Add a new docs page for `/v1/responses` usage and examples.
|
||||||
- Update `/gateway/openai-http-api` with a legacy note and pointer to `/v1/responses`.
|
- Update `/gateway/openai-http-api` with a legacy note and pointer to `/v1/responses`.
|
||||||
|
|||||||
@@ -482,6 +482,17 @@ export async function runEmbeddedPiAgent(
|
|||||||
agentMeta,
|
agentMeta,
|
||||||
aborted,
|
aborted,
|
||||||
systemPromptReport: attempt.systemPromptReport,
|
systemPromptReport: attempt.systemPromptReport,
|
||||||
|
// Handle client tool calls (OpenResponses hosted tools)
|
||||||
|
stopReason: attempt.clientToolCall ? "tool_calls" : undefined,
|
||||||
|
pendingToolCalls: attempt.clientToolCall
|
||||||
|
? [
|
||||||
|
{
|
||||||
|
id: `call_${Date.now()}`,
|
||||||
|
name: attempt.clientToolCall.name,
|
||||||
|
arguments: JSON.stringify(attempt.clientToolCall.params),
|
||||||
|
},
|
||||||
|
]
|
||||||
|
: undefined,
|
||||||
},
|
},
|
||||||
didSendViaMessagingTool: attempt.didSendViaMessagingTool,
|
didSendViaMessagingTool: attempt.didSendViaMessagingTool,
|
||||||
messagingToolSentTexts: attempt.messagingToolSentTexts,
|
messagingToolSentTexts: attempt.messagingToolSentTexts,
|
||||||
|
|||||||
@@ -64,6 +64,7 @@ import { prewarmSessionFile, trackSessionManagerAccess } from "../session-manage
|
|||||||
import { prepareSessionManagerForRun } from "../session-manager-init.js";
|
import { prepareSessionManagerForRun } from "../session-manager-init.js";
|
||||||
import { buildEmbeddedSystemPrompt, createSystemPromptOverride } from "../system-prompt.js";
|
import { buildEmbeddedSystemPrompt, createSystemPromptOverride } from "../system-prompt.js";
|
||||||
import { splitSdkTools } from "../tool-split.js";
|
import { splitSdkTools } from "../tool-split.js";
|
||||||
|
import { toClientToolDefinitions } from "../../pi-tool-definition-adapter.js";
|
||||||
import { buildSystemPromptParams } from "../../system-prompt-params.js";
|
import { buildSystemPromptParams } from "../../system-prompt-params.js";
|
||||||
import { describeUnknownError, mapThinkingLevel } from "../utils.js";
|
import { describeUnknownError, mapThinkingLevel } from "../utils.js";
|
||||||
import { resolveSandboxRuntimeStatus } from "../../sandbox/runtime-status.js";
|
import { resolveSandboxRuntimeStatus } from "../../sandbox/runtime-status.js";
|
||||||
@@ -314,6 +315,16 @@ export async function runEmbeddedAttempt(
|
|||||||
sandboxEnabled: !!sandbox?.enabled,
|
sandboxEnabled: !!sandbox?.enabled,
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// Add client tools (OpenResponses hosted tools) to customTools
|
||||||
|
let clientToolCallDetected: { name: string; params: Record<string, unknown> } | null = null;
|
||||||
|
const clientToolDefs = params.clientTools
|
||||||
|
? toClientToolDefinitions(params.clientTools, (toolName, toolParams) => {
|
||||||
|
clientToolCallDetected = { name: toolName, params: toolParams };
|
||||||
|
})
|
||||||
|
: [];
|
||||||
|
|
||||||
|
const allCustomTools = [...customTools, ...clientToolDefs];
|
||||||
|
|
||||||
({ session } = await createAgentSession({
|
({ session } = await createAgentSession({
|
||||||
cwd: resolvedWorkspace,
|
cwd: resolvedWorkspace,
|
||||||
agentDir,
|
agentDir,
|
||||||
@@ -323,7 +334,7 @@ export async function runEmbeddedAttempt(
|
|||||||
thinkingLevel: mapThinkingLevel(params.thinkLevel),
|
thinkingLevel: mapThinkingLevel(params.thinkLevel),
|
||||||
systemPrompt,
|
systemPrompt,
|
||||||
tools: builtInTools,
|
tools: builtInTools,
|
||||||
customTools,
|
customTools: allCustomTools,
|
||||||
sessionManager,
|
sessionManager,
|
||||||
settingsManager,
|
settingsManager,
|
||||||
skills: [],
|
skills: [],
|
||||||
@@ -681,6 +692,8 @@ export async function runEmbeddedAttempt(
|
|||||||
cloudCodeAssistFormatError: Boolean(
|
cloudCodeAssistFormatError: Boolean(
|
||||||
lastAssistant?.errorMessage && isCloudCodeAssistFormatError(lastAssistant.errorMessage),
|
lastAssistant?.errorMessage && isCloudCodeAssistFormatError(lastAssistant.errorMessage),
|
||||||
),
|
),
|
||||||
|
// Client tool call detected (OpenResponses hosted tools)
|
||||||
|
clientToolCall: clientToolCallDetected ?? undefined,
|
||||||
};
|
};
|
||||||
} finally {
|
} finally {
|
||||||
// Always tear down the session (and release the lock) before we leave this attempt.
|
// Always tear down the session (and release the lock) before we leave this attempt.
|
||||||
|
|||||||
@@ -6,6 +6,16 @@ import type { ExecElevatedDefaults, ExecToolDefaults } from "../../bash-tools.js
|
|||||||
import type { BlockReplyChunking, ToolResultFormat } from "../../pi-embedded-subscribe.js";
|
import type { BlockReplyChunking, ToolResultFormat } from "../../pi-embedded-subscribe.js";
|
||||||
import type { SkillSnapshot } from "../../skills.js";
|
import type { SkillSnapshot } from "../../skills.js";
|
||||||
|
|
||||||
|
// Simplified tool definition for client-provided tools (OpenResponses hosted tools)
|
||||||
|
export type ClientToolDefinition = {
|
||||||
|
type: "function";
|
||||||
|
function: {
|
||||||
|
name: string;
|
||||||
|
description?: string;
|
||||||
|
parameters?: Record<string, unknown>;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
export type RunEmbeddedPiAgentParams = {
|
export type RunEmbeddedPiAgentParams = {
|
||||||
sessionId: string;
|
sessionId: string;
|
||||||
sessionKey?: string;
|
sessionKey?: string;
|
||||||
@@ -27,6 +37,8 @@ export type RunEmbeddedPiAgentParams = {
|
|||||||
skillsSnapshot?: SkillSnapshot;
|
skillsSnapshot?: SkillSnapshot;
|
||||||
prompt: string;
|
prompt: string;
|
||||||
images?: ImageContent[];
|
images?: ImageContent[];
|
||||||
|
/** Optional client-provided tools (OpenResponses hosted tools). */
|
||||||
|
clientTools?: ClientToolDefinition[];
|
||||||
provider?: string;
|
provider?: string;
|
||||||
model?: string;
|
model?: string;
|
||||||
authProfileId?: string;
|
authProfileId?: string;
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ import type { MessagingToolSend } from "../../pi-embedded-messaging.js";
|
|||||||
import type { BlockReplyChunking, ToolResultFormat } from "../../pi-embedded-subscribe.js";
|
import type { BlockReplyChunking, ToolResultFormat } from "../../pi-embedded-subscribe.js";
|
||||||
import type { SkillSnapshot } from "../../skills.js";
|
import type { SkillSnapshot } from "../../skills.js";
|
||||||
import type { SessionSystemPromptReport } from "../../../config/sessions/types.js";
|
import type { SessionSystemPromptReport } from "../../../config/sessions/types.js";
|
||||||
|
import type { ClientToolDefinition } from "./params.js";
|
||||||
|
|
||||||
type AuthStorage = ReturnType<typeof discoverAuthStorage>;
|
type AuthStorage = ReturnType<typeof discoverAuthStorage>;
|
||||||
type ModelRegistry = ReturnType<typeof discoverModels>;
|
type ModelRegistry = ReturnType<typeof discoverModels>;
|
||||||
@@ -30,6 +31,8 @@ export type EmbeddedRunAttemptParams = {
|
|||||||
skillsSnapshot?: SkillSnapshot;
|
skillsSnapshot?: SkillSnapshot;
|
||||||
prompt: string;
|
prompt: string;
|
||||||
images?: ImageContent[];
|
images?: ImageContent[];
|
||||||
|
/** Optional client-provided tools (OpenResponses hosted tools). */
|
||||||
|
clientTools?: ClientToolDefinition[];
|
||||||
provider: string;
|
provider: string;
|
||||||
modelId: string;
|
modelId: string;
|
||||||
model: Model<Api>;
|
model: Model<Api>;
|
||||||
@@ -79,4 +82,6 @@ export type EmbeddedRunAttemptResult = {
|
|||||||
messagingToolSentTexts: string[];
|
messagingToolSentTexts: string[];
|
||||||
messagingToolSentTargets: MessagingToolSend[];
|
messagingToolSentTargets: MessagingToolSend[];
|
||||||
cloudCodeAssistFormatError: boolean;
|
cloudCodeAssistFormatError: boolean;
|
||||||
|
/** Client tool call detected (OpenResponses hosted tools). */
|
||||||
|
clientToolCall?: { name: string; params: Record<string, unknown> };
|
||||||
};
|
};
|
||||||
|
|||||||
@@ -23,6 +23,14 @@ export type EmbeddedPiRunMeta = {
|
|||||||
kind: "context_overflow" | "compaction_failure" | "role_ordering";
|
kind: "context_overflow" | "compaction_failure" | "role_ordering";
|
||||||
message: string;
|
message: string;
|
||||||
};
|
};
|
||||||
|
/** Stop reason for the agent run (e.g., "completed", "tool_calls"). */
|
||||||
|
stopReason?: string;
|
||||||
|
/** Pending tool calls when stopReason is "tool_calls". */
|
||||||
|
pendingToolCalls?: Array<{
|
||||||
|
id: string;
|
||||||
|
name: string;
|
||||||
|
arguments: string;
|
||||||
|
}>;
|
||||||
};
|
};
|
||||||
|
|
||||||
export type EmbeddedPiRunResult = {
|
export type EmbeddedPiRunResult = {
|
||||||
|
|||||||
@@ -4,6 +4,7 @@ import type {
|
|||||||
AgentToolUpdateCallback,
|
AgentToolUpdateCallback,
|
||||||
} from "@mariozechner/pi-agent-core";
|
} from "@mariozechner/pi-agent-core";
|
||||||
import type { ToolDefinition } from "@mariozechner/pi-coding-agent";
|
import type { ToolDefinition } from "@mariozechner/pi-coding-agent";
|
||||||
|
import type { ClientToolDefinition } from "./pi-embedded-runner/run/params.js";
|
||||||
import { logDebug, logError } from "../logger.js";
|
import { logDebug, logError } from "../logger.js";
|
||||||
import { normalizeToolName } from "./tool-policy.js";
|
import { normalizeToolName } from "./tool-policy.js";
|
||||||
import { jsonResult } from "./tools/common.js";
|
import { jsonResult } from "./tools/common.js";
|
||||||
@@ -65,3 +66,38 @@ export function toToolDefinitions(tools: AnyAgentTool[]): ToolDefinition[] {
|
|||||||
} satisfies ToolDefinition;
|
} satisfies ToolDefinition;
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Convert client tools (OpenResponses hosted tools) to ToolDefinition format
|
||||||
|
// These tools are intercepted to return a "pending" result instead of executing
|
||||||
|
export function toClientToolDefinitions(
|
||||||
|
tools: ClientToolDefinition[],
|
||||||
|
onClientToolCall?: (toolName: string, params: Record<string, unknown>) => void,
|
||||||
|
): ToolDefinition[] {
|
||||||
|
return tools.map((tool) => {
|
||||||
|
const func = tool.function;
|
||||||
|
return {
|
||||||
|
name: func.name,
|
||||||
|
label: func.name,
|
||||||
|
description: func.description ?? "",
|
||||||
|
parameters: func.parameters as any,
|
||||||
|
execute: async (
|
||||||
|
toolCallId,
|
||||||
|
params,
|
||||||
|
_onUpdate: AgentToolUpdateCallback<unknown> | undefined,
|
||||||
|
_ctx,
|
||||||
|
_signal,
|
||||||
|
): Promise<AgentToolResult<unknown>> => {
|
||||||
|
// Notify handler that a client tool was called
|
||||||
|
if (onClientToolCall) {
|
||||||
|
onClientToolCall(func.name, params as Record<string, unknown>);
|
||||||
|
}
|
||||||
|
// Return a pending result - the client will execute this tool
|
||||||
|
return jsonResult({
|
||||||
|
status: "pending",
|
||||||
|
tool: func.name,
|
||||||
|
message: "Tool execution delegated to client",
|
||||||
|
});
|
||||||
|
},
|
||||||
|
} satisfies ToolDefinition;
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|||||||
@@ -415,6 +415,7 @@ export async function agentCommand(
|
|||||||
skillsSnapshot,
|
skillsSnapshot,
|
||||||
prompt: body,
|
prompt: body,
|
||||||
images: opts.images,
|
images: opts.images,
|
||||||
|
clientTools: opts.clientTools,
|
||||||
provider: providerOverride,
|
provider: providerOverride,
|
||||||
model: modelOverride,
|
model: modelOverride,
|
||||||
authProfileId,
|
authProfileId,
|
||||||
|
|||||||
@@ -1,4 +1,5 @@
|
|||||||
import type { ChannelOutboundTargetMode } from "../../channels/plugins/types.js";
|
import type { ChannelOutboundTargetMode } from "../../channels/plugins/types.js";
|
||||||
|
import type { ClientToolDefinition } from "../../agents/pi-embedded-runner/run/params.js";
|
||||||
|
|
||||||
/** Image content block for Claude API multimodal messages. */
|
/** Image content block for Claude API multimodal messages. */
|
||||||
export type ImageContent = {
|
export type ImageContent = {
|
||||||
@@ -20,6 +21,8 @@ export type AgentCommandOpts = {
|
|||||||
message: string;
|
message: string;
|
||||||
/** Optional image attachments for multimodal messages. */
|
/** Optional image attachments for multimodal messages. */
|
||||||
images?: ImageContent[];
|
images?: ImageContent[];
|
||||||
|
/** Optional client-provided tools (OpenResponses hosted tools). */
|
||||||
|
clientTools?: ClientToolDefinition[];
|
||||||
/** Agent id override (must exist in config). */
|
/** Agent id override (must exist in config). */
|
||||||
agentId?: string;
|
agentId?: string;
|
||||||
to?: string;
|
to?: string;
|
||||||
|
|||||||
@@ -90,7 +90,7 @@ export class GatewayClient {
|
|||||||
};
|
};
|
||||||
if (url.startsWith("wss://") && this.opts.tlsFingerprint) {
|
if (url.startsWith("wss://") && this.opts.tlsFingerprint) {
|
||||||
wsOptions.rejectUnauthorized = false;
|
wsOptions.rejectUnauthorized = false;
|
||||||
wsOptions.checkServerIdentity = (_host: string, cert: CertMeta) => {
|
wsOptions.checkServerIdentity = ((_host: string, cert: CertMeta) => {
|
||||||
const fingerprintValue =
|
const fingerprintValue =
|
||||||
typeof cert === "object" && cert && "fingerprint256" in cert
|
typeof cert === "object" && cert && "fingerprint256" in cert
|
||||||
? ((cert as { fingerprint256?: string }).fingerprint256 ?? "")
|
? ((cert as { fingerprint256?: string }).fingerprint256 ?? "")
|
||||||
@@ -99,9 +99,17 @@ export class GatewayClient {
|
|||||||
typeof fingerprintValue === "string" ? fingerprintValue : "",
|
typeof fingerprintValue === "string" ? fingerprintValue : "",
|
||||||
);
|
);
|
||||||
const expected = normalizeFingerprint(this.opts.tlsFingerprint ?? "");
|
const expected = normalizeFingerprint(this.opts.tlsFingerprint ?? "");
|
||||||
if (!expected || !fingerprint) return false;
|
if (!expected) {
|
||||||
return fingerprint === expected;
|
return new Error("gateway tls fingerprint missing");
|
||||||
};
|
}
|
||||||
|
if (!fingerprint) {
|
||||||
|
return new Error("gateway tls fingerprint unavailable");
|
||||||
|
}
|
||||||
|
if (fingerprint !== expected) {
|
||||||
|
return new Error("gateway tls fingerprint mismatch");
|
||||||
|
}
|
||||||
|
return undefined;
|
||||||
|
}) as any;
|
||||||
}
|
}
|
||||||
this.ws = new WebSocket(url, wsOptions);
|
this.ws = new WebSocket(url, wsOptions);
|
||||||
|
|
||||||
|
|||||||
@@ -27,18 +27,46 @@ export const OutputTextContentPartSchema = z
|
|||||||
})
|
})
|
||||||
.strict();
|
.strict();
|
||||||
|
|
||||||
// For Phase 1, we reject image/file content with helpful errors
|
// OpenResponses Image Content: Supports URL or base64 sources
|
||||||
|
export const InputImageSourceSchema = z.discriminatedUnion("type", [
|
||||||
|
z.object({
|
||||||
|
type: z.literal("url"),
|
||||||
|
url: z.string().url(),
|
||||||
|
}),
|
||||||
|
z.object({
|
||||||
|
type: z.literal("base64"),
|
||||||
|
media_type: z.enum(["image/jpeg", "image/png", "image/gif", "image/webp"]),
|
||||||
|
data: z.string().min(1), // base64-encoded
|
||||||
|
}),
|
||||||
|
]);
|
||||||
|
|
||||||
export const InputImageContentPartSchema = z
|
export const InputImageContentPartSchema = z
|
||||||
.object({
|
.object({
|
||||||
type: z.literal("input_image"),
|
type: z.literal("input_image"),
|
||||||
|
source: InputImageSourceSchema,
|
||||||
})
|
})
|
||||||
.passthrough();
|
.strict();
|
||||||
|
|
||||||
|
// OpenResponses File Content: Supports URL or base64 sources
|
||||||
|
export const InputFileSourceSchema = z.discriminatedUnion("type", [
|
||||||
|
z.object({
|
||||||
|
type: z.literal("url"),
|
||||||
|
url: z.string().url(),
|
||||||
|
}),
|
||||||
|
z.object({
|
||||||
|
type: z.literal("base64"),
|
||||||
|
media_type: z.string().min(1), // MIME type
|
||||||
|
data: z.string().min(1), // base64-encoded
|
||||||
|
filename: z.string().optional(),
|
||||||
|
}),
|
||||||
|
]);
|
||||||
|
|
||||||
export const InputFileContentPartSchema = z
|
export const InputFileContentPartSchema = z
|
||||||
.object({
|
.object({
|
||||||
type: z.literal("input_file"),
|
type: z.literal("input_file"),
|
||||||
|
source: InputFileSourceSchema,
|
||||||
})
|
})
|
||||||
.passthrough();
|
.strict();
|
||||||
|
|
||||||
export const ContentPartSchema = z.discriminatedUnion("type", [
|
export const ContentPartSchema = z.discriminatedUnion("type", [
|
||||||
InputTextContentPartSchema,
|
InputTextContentPartSchema,
|
||||||
@@ -117,13 +145,14 @@ export const FunctionToolDefinitionSchema = z
|
|||||||
.object({
|
.object({
|
||||||
type: z.literal("function"),
|
type: z.literal("function"),
|
||||||
function: z.object({
|
function: z.object({
|
||||||
name: z.string(),
|
name: z.string().min(1, "Tool name cannot be empty"),
|
||||||
description: z.string().optional(),
|
description: z.string().optional(),
|
||||||
parameters: z.record(z.string(), z.unknown()).optional(),
|
parameters: z.record(z.string(), z.unknown()).optional(),
|
||||||
}),
|
}),
|
||||||
})
|
})
|
||||||
.strict();
|
.strict();
|
||||||
|
|
||||||
|
// OpenResponses tool definitions match internal ToolDefinition structure
|
||||||
export const ToolDefinitionSchema = FunctionToolDefinitionSchema;
|
export const ToolDefinitionSchema = FunctionToolDefinitionSchema;
|
||||||
|
|
||||||
export type ToolDefinition = z.infer<typeof ToolDefinitionSchema>;
|
export type ToolDefinition = z.infer<typeof ToolDefinitionSchema>;
|
||||||
|
|||||||
@@ -27,6 +27,8 @@ import {
|
|||||||
type StreamingEvent,
|
type StreamingEvent,
|
||||||
type Usage,
|
type Usage,
|
||||||
} from "./open-responses.schema.js";
|
} from "./open-responses.schema.js";
|
||||||
|
import type { ClientToolDefinition } from "../agents/pi-embedded-runner/run/params.js";
|
||||||
|
import type { ImageContent } from "../commands/agent/types.js";
|
||||||
|
|
||||||
type OpenResponsesHttpOptions = {
|
type OpenResponsesHttpOptions = {
|
||||||
auth: ResolvedGatewayAuth;
|
auth: ResolvedGatewayAuth;
|
||||||
@@ -74,16 +76,157 @@ function extractTextContent(content: string | ContentPart[]): string {
|
|||||||
.join("\n");
|
.join("\n");
|
||||||
}
|
}
|
||||||
|
|
||||||
function hasUnsupportedContent(content: string | ContentPart[]): string | null {
|
const PRIVATE_IP_PATTERNS = [
|
||||||
if (typeof content === "string") return null;
|
/^127\./, // Loopback
|
||||||
for (const part of content) {
|
/^192\.168\./, // Private network
|
||||||
if (part.type === "input_image") return "input_image content is not supported in Phase 1";
|
/^10\./, // Private network
|
||||||
if (part.type === "input_file") return "input_file content is not supported in Phase 1";
|
/^172\.(1[6-9]|2[0-9]|3[0-1])\./, // Private network
|
||||||
}
|
/^::1$/, // IPv6 loopback
|
||||||
return null;
|
/^fe80:/, // IPv6 link-local
|
||||||
|
/^fec0:/, // IPv6 site-local
|
||||||
|
];
|
||||||
|
|
||||||
|
function isPrivateIp(hostname: string): boolean {
|
||||||
|
return PRIVATE_IP_PATTERNS.some((pattern) => pattern.test(hostname));
|
||||||
}
|
}
|
||||||
|
|
||||||
function buildAgentPrompt(input: string | ItemParam[]): {
|
// Fetch with SSRF protection, timeout, and size limits
|
||||||
|
async function fetchWithGuard(
|
||||||
|
url: string,
|
||||||
|
maxBytes: number,
|
||||||
|
timeoutMs: number = 10000,
|
||||||
|
): Promise<{ data: string; mimeType: string }> {
|
||||||
|
const parsedUrl = new URL(url);
|
||||||
|
|
||||||
|
// Only allow HTTP/HTTPS
|
||||||
|
if (!["http:", "https:"].includes(parsedUrl.protocol)) {
|
||||||
|
throw new Error(`Invalid URL protocol: ${parsedUrl.protocol}. Only HTTP/HTTPS allowed.`);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Block private IPs (SSRF protection)
|
||||||
|
if (isPrivateIp(parsedUrl.hostname)) {
|
||||||
|
throw new Error(`Private IP addresses are not allowed: ${parsedUrl.hostname}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const controller = new AbortController();
|
||||||
|
const timeoutId = setTimeout(() => controller.abort(), timeoutMs);
|
||||||
|
|
||||||
|
try {
|
||||||
|
const response = await fetch(url, {
|
||||||
|
signal: controller.signal,
|
||||||
|
headers: { "User-Agent": "Clawdbot-Gateway/1.0" },
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!response.ok) {
|
||||||
|
throw new Error(`Failed to fetch: ${response.status} ${response.statusText}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const contentLength = response.headers.get("content-length");
|
||||||
|
if (contentLength) {
|
||||||
|
const size = parseInt(contentLength, 10);
|
||||||
|
if (size > maxBytes) {
|
||||||
|
throw new Error(`Content too large: ${size} bytes (limit: ${maxBytes} bytes)`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const buffer = await response.arrayBuffer();
|
||||||
|
if (buffer.byteLength > maxBytes) {
|
||||||
|
throw new Error(`Content too large: ${buffer.byteLength} bytes (limit: ${maxBytes} bytes)`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const mimeType = response.headers.get("content-type") || "application/octet-stream";
|
||||||
|
|
||||||
|
return {
|
||||||
|
data: Buffer.from(buffer).toString("base64"),
|
||||||
|
mimeType,
|
||||||
|
};
|
||||||
|
} finally {
|
||||||
|
clearTimeout(timeoutId);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const ALLOWED_IMAGE_MIMES = new Set(["image/jpeg", "image/png", "image/gif", "image/webp"]);
|
||||||
|
const MAX_IMAGE_BYTES = 10 * 1024 * 1024; // 10MB
|
||||||
|
const MAX_FILE_BYTES = 5 * 1024 * 1024; // 5MB
|
||||||
|
const ALLOWED_FILE_MIMES = new Set([
|
||||||
|
"text/plain",
|
||||||
|
"text/markdown",
|
||||||
|
"text/html",
|
||||||
|
"text/csv",
|
||||||
|
"application/pdf",
|
||||||
|
"application/json",
|
||||||
|
]);
|
||||||
|
|
||||||
|
async function extractImageContent(part: ContentPart): Promise<ImageContent | null> {
|
||||||
|
if (part.type !== "input_image") return null;
|
||||||
|
|
||||||
|
const source = part.source as { type: string; url?: string; data?: string; media_type?: string };
|
||||||
|
|
||||||
|
if (source.type === "base64") {
|
||||||
|
if (!source.data) {
|
||||||
|
throw new Error("input_image base64 source missing 'data' field");
|
||||||
|
}
|
||||||
|
const mimeType = source.media_type || "image/png";
|
||||||
|
if (!ALLOWED_IMAGE_MIMES.has(mimeType)) {
|
||||||
|
throw new Error(`Unsupported image MIME type: ${mimeType}`);
|
||||||
|
}
|
||||||
|
return { type: "image", data: source.data, mimeType };
|
||||||
|
}
|
||||||
|
|
||||||
|
if (source.type === "url" && source.url) {
|
||||||
|
const result = await fetchWithGuard(source.url, MAX_IMAGE_BYTES);
|
||||||
|
if (!ALLOWED_IMAGE_MIMES.has(result.mimeType)) {
|
||||||
|
throw new Error(`Unsupported image MIME type from URL: ${result.mimeType}`);
|
||||||
|
}
|
||||||
|
return { type: "image", data: result.data, mimeType: result.mimeType };
|
||||||
|
}
|
||||||
|
|
||||||
|
throw new Error("input_image must have 'source.url' or 'source.data'");
|
||||||
|
}
|
||||||
|
|
||||||
|
async function extractFileContent(part: ContentPart): Promise<string | null> {
|
||||||
|
if (part.type !== "input_file") return null;
|
||||||
|
|
||||||
|
const source = part.source as {
|
||||||
|
type: string;
|
||||||
|
url?: string;
|
||||||
|
data?: string;
|
||||||
|
media_type?: string;
|
||||||
|
filename?: string;
|
||||||
|
};
|
||||||
|
const filename = source.filename || "file";
|
||||||
|
|
||||||
|
let content: string;
|
||||||
|
|
||||||
|
if (source.type === "base64") {
|
||||||
|
if (!source.data) {
|
||||||
|
throw new Error("input_file base64 source missing 'data' field");
|
||||||
|
}
|
||||||
|
const buffer = Buffer.from(source.data, "base64");
|
||||||
|
if (buffer.byteLength > MAX_FILE_BYTES) {
|
||||||
|
throw new Error(
|
||||||
|
`File too large: ${buffer.byteLength} bytes (limit: ${MAX_FILE_BYTES} bytes)`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
content = buffer.toString("utf-8");
|
||||||
|
} else if (source.type === "url" && source.url) {
|
||||||
|
const result = await fetchWithGuard(source.url, MAX_FILE_BYTES);
|
||||||
|
if (!ALLOWED_FILE_MIMES.has(result.mimeType)) {
|
||||||
|
throw new Error(`Unsupported file MIME type: ${result.mimeType}`);
|
||||||
|
}
|
||||||
|
content = Buffer.from(result.data, "base64").toString("utf-8");
|
||||||
|
} else {
|
||||||
|
throw new Error("input_file must have 'source.url' or 'source.data'");
|
||||||
|
}
|
||||||
|
|
||||||
|
return `<file name="${filename}">\n${content}\n</file>`;
|
||||||
|
}
|
||||||
|
|
||||||
|
function extractClientTools(body: CreateResponseBody): ClientToolDefinition[] {
|
||||||
|
return (body.tools ?? []) as ClientToolDefinition[];
|
||||||
|
}
|
||||||
|
|
||||||
|
export function buildAgentPrompt(input: string | ItemParam[]): {
|
||||||
message: string;
|
message: string;
|
||||||
extraSystemPrompt?: string;
|
extraSystemPrompt?: string;
|
||||||
} {
|
} {
|
||||||
@@ -293,33 +436,44 @@ export async function handleOpenResponsesHttpRequest(
|
|||||||
const model = payload.model;
|
const model = payload.model;
|
||||||
const user = payload.user;
|
const user = payload.user;
|
||||||
|
|
||||||
// Check for unsupported content types (Phase 1)
|
// Extract images, files, and tools from input (Phase 2)
|
||||||
|
let images: ImageContent[] = [];
|
||||||
|
let fileContents: string[] = [];
|
||||||
if (Array.isArray(payload.input)) {
|
if (Array.isArray(payload.input)) {
|
||||||
for (const item of payload.input) {
|
for (const item of payload.input) {
|
||||||
if (item.type === "message" && typeof item.content !== "string") {
|
if (item.type === "message" && typeof item.content !== "string") {
|
||||||
const unsupported = hasUnsupportedContent(item.content);
|
for (const part of item.content) {
|
||||||
if (unsupported) {
|
const image = await extractImageContent(part);
|
||||||
sendJson(res, 400, {
|
if (image) {
|
||||||
error: { message: unsupported, type: "invalid_request_error" },
|
images.push(image);
|
||||||
});
|
continue;
|
||||||
return true;
|
}
|
||||||
|
const file = await extractFileContent(part);
|
||||||
|
if (file) {
|
||||||
|
fileContents.push(file);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const clientTools = extractClientTools(payload);
|
||||||
const agentId = resolveAgentIdForRequest({ req, model });
|
const agentId = resolveAgentIdForRequest({ req, model });
|
||||||
const sessionKey = resolveSessionKey({ req, agentId, user });
|
const sessionKey = resolveSessionKey({ req, agentId, user });
|
||||||
|
|
||||||
// Build prompt from input
|
// Build prompt from input
|
||||||
const prompt = buildAgentPrompt(payload.input);
|
const prompt = buildAgentPrompt(payload.input);
|
||||||
|
|
||||||
|
// Append file contents to the message
|
||||||
|
const fullMessage =
|
||||||
|
fileContents.length > 0 ? `${prompt.message}\n\n${fileContents.join("\n\n")}` : prompt.message;
|
||||||
|
|
||||||
// Handle instructions as extra system prompt
|
// Handle instructions as extra system prompt
|
||||||
const extraSystemPrompt = [payload.instructions, prompt.extraSystemPrompt]
|
const extraSystemPrompt = [payload.instructions, prompt.extraSystemPrompt]
|
||||||
.filter(Boolean)
|
.filter(Boolean)
|
||||||
.join("\n\n");
|
.join("\n\n");
|
||||||
|
|
||||||
if (!prompt.message) {
|
if (!fullMessage) {
|
||||||
sendJson(res, 400, {
|
sendJson(res, 400, {
|
||||||
error: {
|
error: {
|
||||||
message: "Missing user message in `input`.",
|
message: "Missing user message in `input`.",
|
||||||
@@ -337,7 +491,9 @@ export async function handleOpenResponsesHttpRequest(
|
|||||||
try {
|
try {
|
||||||
const result = await agentCommand(
|
const result = await agentCommand(
|
||||||
{
|
{
|
||||||
message: prompt.message,
|
message: fullMessage,
|
||||||
|
images: images.length > 0 ? images : undefined,
|
||||||
|
clientTools: clientTools.length > 0 ? clientTools : undefined,
|
||||||
extraSystemPrompt: extraSystemPrompt || undefined,
|
extraSystemPrompt: extraSystemPrompt || undefined,
|
||||||
sessionKey,
|
sessionKey,
|
||||||
runId: responseId,
|
runId: responseId,
|
||||||
@@ -350,6 +506,36 @@ export async function handleOpenResponsesHttpRequest(
|
|||||||
);
|
);
|
||||||
|
|
||||||
const payloads = (result as { payloads?: Array<{ text?: string }> } | null)?.payloads;
|
const payloads = (result as { payloads?: Array<{ text?: string }> } | null)?.payloads;
|
||||||
|
const meta = (result as { meta?: unknown } | null)?.meta;
|
||||||
|
const stopReason =
|
||||||
|
meta && typeof meta === "object" ? (meta as { stopReason?: string }).stopReason : undefined;
|
||||||
|
const pendingToolCalls =
|
||||||
|
meta && typeof meta === "object"
|
||||||
|
? (meta as { pendingToolCalls?: Array<{ id: string; name: string; arguments: string }> })
|
||||||
|
.pendingToolCalls
|
||||||
|
: undefined;
|
||||||
|
|
||||||
|
// If agent called a client tool, return function_call instead of text
|
||||||
|
if (stopReason === "tool_calls" && pendingToolCalls && pendingToolCalls.length > 0) {
|
||||||
|
const functionCall = pendingToolCalls[0];
|
||||||
|
const response = createResponseResource({
|
||||||
|
id: responseId,
|
||||||
|
model,
|
||||||
|
status: "incomplete",
|
||||||
|
output: [
|
||||||
|
{
|
||||||
|
type: "function_call",
|
||||||
|
id: functionCall.id,
|
||||||
|
call_id: functionCall.id,
|
||||||
|
name: functionCall.name,
|
||||||
|
arguments: functionCall.arguments,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
});
|
||||||
|
sendJson(res, 200, response);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
const content =
|
const content =
|
||||||
Array.isArray(payloads) && payloads.length > 0
|
Array.isArray(payloads) && payloads.length > 0
|
||||||
? payloads
|
? payloads
|
||||||
@@ -511,7 +697,9 @@ export async function handleOpenResponsesHttpRequest(
|
|||||||
try {
|
try {
|
||||||
const result = await agentCommand(
|
const result = await agentCommand(
|
||||||
{
|
{
|
||||||
message: prompt.message,
|
message: fullMessage,
|
||||||
|
images: images.length > 0 ? images : undefined,
|
||||||
|
clientTools: clientTools.length > 0 ? clientTools : undefined,
|
||||||
extraSystemPrompt: extraSystemPrompt || undefined,
|
extraSystemPrompt: extraSystemPrompt || undefined,
|
||||||
sessionKey,
|
sessionKey,
|
||||||
runId: responseId,
|
runId: responseId,
|
||||||
@@ -527,7 +715,90 @@ export async function handleOpenResponsesHttpRequest(
|
|||||||
|
|
||||||
// Fallback: if no streaming deltas were received, send the full response
|
// Fallback: if no streaming deltas were received, send the full response
|
||||||
if (!sawAssistantDelta) {
|
if (!sawAssistantDelta) {
|
||||||
const payloads = (result as { payloads?: Array<{ text?: string }> } | null)?.payloads;
|
const resultAny = result as { payloads?: Array<{ text?: string }>; meta?: unknown };
|
||||||
|
const payloads = resultAny.payloads;
|
||||||
|
const meta = resultAny.meta;
|
||||||
|
const stopReason =
|
||||||
|
meta && typeof meta === "object"
|
||||||
|
? (meta as { stopReason?: string }).stopReason
|
||||||
|
: undefined;
|
||||||
|
const pendingToolCalls =
|
||||||
|
meta && typeof meta === "object"
|
||||||
|
? (
|
||||||
|
meta as {
|
||||||
|
pendingToolCalls?: Array<{ id: string; name: string; arguments: string }>;
|
||||||
|
}
|
||||||
|
).pendingToolCalls
|
||||||
|
: undefined;
|
||||||
|
|
||||||
|
// If agent called a client tool, emit function_call instead of text
|
||||||
|
if (stopReason === "tool_calls" && pendingToolCalls && pendingToolCalls.length > 0) {
|
||||||
|
const functionCall = pendingToolCalls[0];
|
||||||
|
// Complete the text content part
|
||||||
|
writeSseEvent(res, {
|
||||||
|
type: "response.output_text.done",
|
||||||
|
item_id: outputItemId,
|
||||||
|
output_index: 0,
|
||||||
|
content_index: 0,
|
||||||
|
text: "",
|
||||||
|
});
|
||||||
|
writeSseEvent(res, {
|
||||||
|
type: "response.content_part.done",
|
||||||
|
item_id: outputItemId,
|
||||||
|
output_index: 0,
|
||||||
|
content_index: 0,
|
||||||
|
part: { type: "output_text", text: "" },
|
||||||
|
});
|
||||||
|
|
||||||
|
// Complete the message item
|
||||||
|
const completedItem = createAssistantOutputItem({
|
||||||
|
id: outputItemId,
|
||||||
|
text: "",
|
||||||
|
status: "completed",
|
||||||
|
});
|
||||||
|
writeSseEvent(res, {
|
||||||
|
type: "response.output_item.done",
|
||||||
|
output_index: 0,
|
||||||
|
item: completedItem,
|
||||||
|
});
|
||||||
|
|
||||||
|
// Send function_call item
|
||||||
|
const functionCallItemId = `call_${randomUUID()}`;
|
||||||
|
const functionCallItem = {
|
||||||
|
type: "function_call" as const,
|
||||||
|
id: functionCallItemId,
|
||||||
|
call_id: functionCall.id,
|
||||||
|
name: functionCall.name,
|
||||||
|
arguments: functionCall.arguments,
|
||||||
|
};
|
||||||
|
writeSseEvent(res, {
|
||||||
|
type: "response.output_item.added",
|
||||||
|
output_index: 1,
|
||||||
|
item: functionCallItem,
|
||||||
|
});
|
||||||
|
writeSseEvent(res, {
|
||||||
|
type: "response.output_item.done",
|
||||||
|
output_index: 1,
|
||||||
|
item: { ...functionCallItem, status: "completed" as const },
|
||||||
|
});
|
||||||
|
writeSseEvent(res, {
|
||||||
|
type: "response.output_item.done",
|
||||||
|
output_index: 1,
|
||||||
|
item: { ...functionCallItem, status: "completed" as const },
|
||||||
|
});
|
||||||
|
|
||||||
|
const incompleteResponse = createResponseResource({
|
||||||
|
id: responseId,
|
||||||
|
model,
|
||||||
|
status: "incomplete",
|
||||||
|
output: [completedItem, functionCallItem],
|
||||||
|
});
|
||||||
|
writeSseEvent(res, { type: "response.completed", response: incompleteResponse });
|
||||||
|
writeDone(res);
|
||||||
|
res.end();
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
const content =
|
const content =
|
||||||
Array.isArray(payloads) && payloads.length > 0
|
Array.isArray(payloads) && payloads.length > 0
|
||||||
? payloads
|
? payloads
|
||||||
|
|||||||
Reference in New Issue
Block a user