Files

Ryan Lisse a5afe7bc2b feat(gateway): implement OpenResponses /v1/responses endpoint phase 2

- Add input_image and input_file support with SSRF protection
- Add client-side tools (Hosted Tools) support
- Add turn-based tool flow with function_call_output handling
- Export buildAgentPrompt for testing

2026-01-20 07:37:01 +00:00

5.0 KiB

Raw Blame History

summary, owner, status, last_updated

summary	owner	status	last_updated
Plan: Add OpenResponses /v1/responses endpoint and deprecate chat completions cleanly	clawdbot	draft	2026-01-19

OpenResponses Gateway Integration Plan

Context

Clawdbot Gateway currently exposes a minimal OpenAI-compatible Chat Completions endpoint at /v1/chat/completions (see OpenAI Chat Completions).

Open Responses is an open inference standard based on the OpenAI Responses API. It is designed for agentic workflows and uses item-based inputs plus semantic streaming events. The OpenResponses spec defines /v1/responses, not /v1/chat/completions.

Goals

Add a /v1/responses endpoint that adheres to OpenResponses semantics.
Keep Chat Completions as a compatibility layer that is easy to disable and eventually remove.
Standardize validation and parsing with isolated, reusable schemas.

Non-goals

Full OpenResponses feature parity in the first pass (images, files, hosted tools).
Replacing internal agent execution logic or tool orchestration.
Changing the existing /v1/chat/completions behavior during the first phase.

Research Summary

Sources: OpenResponses OpenAPI, OpenResponses specification site, and the Hugging Face blog post.

Key points extracted:

POST /v1/responses accepts CreateResponseBody fields like model, input (string or ItemParam[]), instructions, tools, tool_choice, stream, max_output_tokens, and max_tool_calls.
ItemParam is a discriminated union of:
- message items with roles system, developer, user, assistant
- function_call and function_call_output
- reasoning
- item_reference
Successful responses return a ResponseResource with object: "response", status, and output items.
Streaming uses semantic events such as:
- response.created, response.in_progress, response.completed, response.failed
- response.output_item.added, response.output_item.done
- response.content_part.added, response.content_part.done
- response.output_text.delta, response.output_text.done
The spec requires:
- Content-Type: text/event-stream
- event: must match the JSON type field
- terminal event must be literal [DONE]
Reasoning items may expose content, encrypted_content, and summary.
HF examples include OpenResponses-Version: latest in requests (optional header).

Proposed Architecture

Add src/gateway/open-responses.schema.ts containing Zod schemas only (no gateway imports).
Add src/gateway/openresponses-http.ts (or open-responses-http.ts) for /v1/responses.
Keep src/gateway/openai-http.ts intact as a legacy compatibility adapter.
Add config gateway.http.endpoints.responses.enabled (default false).
Keep gateway.http.endpoints.chatCompletions.enabled independent; allow both endpoints to be toggled separately.
Emit a startup warning when Chat Completions is enabled to signal legacy status.

Deprecation Path for Chat Completions

Maintain strict module boundaries: no shared schema types between responses and chat completions.
Make Chat Completions opt-in by config so it can be disabled without code changes.
Update docs to label Chat Completions as legacy once /v1/responses is stable.
Optional future step: map Chat Completions requests to the Responses handler for a simpler removal path.

Phase 1 Support Subset

Accept input as string or ItemParam[] with message roles and function_call_output.
Extract system and developer messages into extraSystemPrompt.
Use the most recent user or function_call_output as the current message for agent runs.
Reject unsupported content parts (image/file) with invalid_request_error.
Return a single assistant message with output_text content.
Return usage with zeroed values until token accounting is wired.

Validation Strategy (No SDK)

Implement Zod schemas for the supported subset of:
- CreateResponseBody
- ItemParam + message content part unions
- ResponseResource
- Streaming event shapes used by the gateway
Keep schemas in a single, isolated module to avoid drift and allow future codegen.

Streaming Implementation (Phase 1)

SSE lines with both event: and data:.
Required sequence (minimum viable):
- response.created
- response.output_item.added
- response.content_part.added
- response.output_text.delta (repeat as needed)
- response.output_text.done
- response.content_part.done
- response.completed
- [DONE]

Tests and Verification Plan

Add e2e coverage for /v1/responses:
- Auth required
- Non-stream response shape
- Stream event ordering and [DONE]
- Session routing with headers and user
Keep src/gateway/openai-http.e2e.test.ts unchanged.
Manual: curl to /v1/responses with stream: true and verify event ordering and terminal [DONE].

Doc Updates (Follow-up)

Add a new docs page for /v1/responses usage and examples.
Update /gateway/openai-http-api with a legacy note and pointer to /v1/responses.

5.0 KiB Raw Blame History