diff --git a/docs/testing.md b/docs/testing.md index fdebd61c7..2a52b72d8 100644 --- a/docs/testing.md +++ b/docs/testing.md @@ -133,7 +133,7 @@ Live tests are split into two layers so we can isolate failures: - Optional tool-calling stress: - `CLAWDBOT_LIVE_GATEWAY_TOOL_PROBE=1` enables an extra “bash writes file → read reads it back → echo nonce” check. - This is specifically meant to catch tool-calling compatibility issues across providers (formatting, history replay, tool_result pairing, etc.). -- Optional image send smoke: + - Optional image send smoke: - `CLAWDBOT_LIVE_GATEWAY_IMAGE_PROBE=1` sends a real image attachment through the gateway agent pipeline (multimodal message) and asserts the model can read back a per-run code from the image. - Flow (high level): - Test generates a tiny PNG with “CAT” + random code (`src/gateway/live-image-probe.ts`) @@ -142,6 +142,26 @@ Live tests are split into two layers so we can isolate failures: - Embedded agent forwards a multimodal user message to the model - Assertion: reply contains `cat` + the code (OCR tolerance: minor mistakes allowed) +## Live: Anthropic setup-token smoke + +- Test: `src/agents/anthropic.setup-token.live.test.ts` +- Goal: verify Claude CLI setup-token (or a pasted setup-token profile) can complete an Anthropic prompt. +- Enable: + - `CLAWDBOT_LIVE_TEST=1` or `LIVE=1` + - `CLAWDBOT_LIVE_SETUP_TOKEN=1` +- Token sources (pick one): + - Profile: `CLAWDBOT_LIVE_SETUP_TOKEN_PROFILE=anthropic:setup-token-test` + - Raw token: `CLAWDBOT_LIVE_SETUP_TOKEN_VALUE=sk-ant-oat01-...` +- Model override (optional): + - `CLAWDBOT_LIVE_SETUP_TOKEN_MODEL=anthropic/claude-opus-4-5` + +Setup example: + +```bash +clawdbot models auth paste-token --provider anthropic --profile-id anthropic:setup-token-test +CLAWDBOT_LIVE_TEST=1 CLAWDBOT_LIVE_SETUP_TOKEN=1 CLAWDBOT_LIVE_SETUP_TOKEN_PROFILE=anthropic:setup-token-test pnpm test:live src/agents/anthropic.setup-token.live.test.ts +``` + ### Recommended live recipes Narrow, explicit allowlists are fastest and least flaky: @@ -153,22 +173,41 @@ Narrow, explicit allowlists are fastest and least flaky: - `LIVE=1 CLAWDBOT_LIVE_GATEWAY=1 CLAWDBOT_LIVE_GATEWAY_ALL_MODELS=1 CLAWDBOT_LIVE_GATEWAY_MODELS="openai/gpt-5.2" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts` - Tool calling across several providers (bash + read probe): - - `LIVE=1 CLAWDBOT_LIVE_GATEWAY=1 CLAWDBOT_LIVE_GATEWAY_ALL_MODELS=1 CLAWDBOT_LIVE_GATEWAY_TOOL_PROBE=1 CLAWDBOT_LIVE_GATEWAY_MODELS="openai/gpt-5.2,anthropic/claude-opus-4-5,google/gemini-flash-latest,zai/glm-4.7,minimax/minimax-m2.1" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts` + - `LIVE=1 CLAWDBOT_LIVE_GATEWAY=1 CLAWDBOT_LIVE_GATEWAY_ALL_MODELS=1 CLAWDBOT_LIVE_GATEWAY_TOOL_PROBE=1 CLAWDBOT_LIVE_GATEWAY_MODELS="openai/gpt-5.2,anthropic/claude-opus-4-5,google/gemini-3-flash-preview,zai/glm-4.7,minimax/minimax-m2.1" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts` - Google focus (Gemini API key + Antigravity): - - Gemini (API key): `LIVE=1 CLAWDBOT_LIVE_GATEWAY=1 CLAWDBOT_LIVE_GATEWAY_ALL_MODELS=1 CLAWDBOT_LIVE_GATEWAY_TOOL_PROBE=1 CLAWDBOT_LIVE_GATEWAY_IMAGE_PROBE=1 CLAWDBOT_LIVE_GATEWAY_MODELS="google/gemini-flash-latest" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts` + - Gemini (API key): `LIVE=1 CLAWDBOT_LIVE_GATEWAY=1 CLAWDBOT_LIVE_GATEWAY_ALL_MODELS=1 CLAWDBOT_LIVE_GATEWAY_TOOL_PROBE=1 CLAWDBOT_LIVE_GATEWAY_IMAGE_PROBE=1 CLAWDBOT_LIVE_GATEWAY_MODELS="google/gemini-3-flash-preview" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts` - Antigravity (OAuth): `LIVE=1 CLAWDBOT_LIVE_GATEWAY=1 CLAWDBOT_LIVE_GATEWAY_ALL_MODELS=1 CLAWDBOT_LIVE_GATEWAY_TOOL_PROBE=1 CLAWDBOT_LIVE_GATEWAY_IMAGE_PROBE=1 CLAWDBOT_LIVE_GATEWAY_MODELS="google-antigravity/claude-opus-4-5-thinking,google-antigravity/gemini-3-pro-high" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts` +Notes: +- `google/...` uses the Gemini API (API key). +- `google-antigravity/...` uses the Antigravity OAuth bridge (Cloud Code Assist-style agent endpoint). +- `google-gemini-cli/...` uses the local Gemini CLI on your machine (separate auth + tooling quirks). + ## Live: model matrix (what we cover) There is no fixed “CI model list” (live is opt-in), but these are the **recommended** models to cover regularly on a dev machine with keys. +### Modern smoke set (tool calling + image) + +This is the “common models” run we expect to keep working: +- OpenAI (non-Codex): `openai/gpt-5.2` (optional: `openai/gpt-5.1`) +- OpenAI Codex: `openai-codex/gpt-5.2` (optional: `openai-codex/gpt-5.2-codex`) +- Anthropic: `anthropic/claude-opus-4-5` (or `anthropic/claude-sonnet-4-5`) +- Google (Gemini API): `google/gemini-3-pro-preview` and `google/gemini-3-flash-preview` +- Google (Antigravity): `google-antigravity/claude-opus-4-5-thinking` and `google-antigravity/gemini-3-flash` +- Z.AI (GLM): `zai/glm-4.7` +- MiniMax: `minimax/minimax-m2.1` + +Run gateway smoke with tools + image: +`LIVE=1 CLAWDBOT_LIVE_GATEWAY=1 CLAWDBOT_LIVE_GATEWAY_TOOL_PROBE=1 CLAWDBOT_LIVE_GATEWAY_IMAGE_PROBE=1 CLAWDBOT_LIVE_GATEWAY_MODELS="openai/gpt-5.2,openai-codex/gpt-5.2,anthropic/claude-opus-4-5,google/gemini-3-pro-preview,google/gemini-3-flash-preview,google-antigravity/claude-opus-4-5-thinking,google-antigravity/gemini-3-flash,zai/glm-4.7,minimax/minimax-m2.1" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts` + ### Baseline: tool calling (Read + optional Bash) Pick at least one per provider family: - OpenAI: `openai/gpt-5.2` (or `openai/gpt-5-mini`) - Anthropic: `anthropic/claude-opus-4-5` (or `anthropic/claude-sonnet-4-5`) -- Google: `google/gemini-flash-latest` (or `google/gemini-2.5-pro`) +- Google: `google/gemini-3-flash-preview` (or `google/gemini-3-pro-preview`) - Z.AI (GLM): `zai/glm-4.7` - MiniMax: `minimax/minimax-m2.1` diff --git a/src/agents/anthropic.setup-token.live.test.ts b/src/agents/anthropic.setup-token.live.test.ts new file mode 100644 index 000000000..6c3a99f51 --- /dev/null +++ b/src/agents/anthropic.setup-token.live.test.ts @@ -0,0 +1,239 @@ +import { randomUUID } from "node:crypto"; +import fs from "node:fs/promises"; +import os from "node:os"; +import path from "node:path"; + +import { type Api, completeSimple, type Model } from "@mariozechner/pi-ai"; +import { + discoverAuthStorage, + discoverModels, +} from "@mariozechner/pi-coding-agent"; +import { describe, expect, it } from "vitest"; +import { + ANTHROPIC_SETUP_TOKEN_PREFIX, + validateAnthropicSetupToken, +} from "../commands/auth-token.js"; +import { loadConfig } from "../config/config.js"; +import { resolveClawdbotAgentDir } from "./agent-paths.js"; +import { + type AuthProfileCredential, + ensureAuthProfileStore, + saveAuthProfileStore, +} from "./auth-profiles.js"; +import { getApiKeyForModel } from "./model-auth.js"; +import { normalizeProviderId, parseModelRef } from "./model-selection.js"; +import { ensureClawdbotModelsJson } from "./models-config.js"; + +const LIVE = process.env.LIVE === "1" || process.env.CLAWDBOT_LIVE_TEST === "1"; +const SETUP_TOKEN_RAW = process.env.CLAWDBOT_LIVE_SETUP_TOKEN?.trim() ?? ""; +const SETUP_TOKEN_VALUE = + process.env.CLAWDBOT_LIVE_SETUP_TOKEN_VALUE?.trim() ?? ""; +const SETUP_TOKEN_PROFILE = + process.env.CLAWDBOT_LIVE_SETUP_TOKEN_PROFILE?.trim() ?? ""; +const SETUP_TOKEN_MODEL = + process.env.CLAWDBOT_LIVE_SETUP_TOKEN_MODEL?.trim() ?? ""; + +const ENABLED = + LIVE && Boolean(SETUP_TOKEN_RAW || SETUP_TOKEN_VALUE || SETUP_TOKEN_PROFILE); +const describeLive = ENABLED ? describe : describe.skip; + +type TokenSource = { + agentDir: string; + profileId: string; + cleanup?: () => Promise; +}; + +function isSetupToken(value: string): boolean { + return value.startsWith(ANTHROPIC_SETUP_TOKEN_PREFIX); +} + +function listSetupTokenProfiles(store: { + profiles: Record; +}): string[] { + return Object.entries(store.profiles) + .filter(([, cred]) => { + if (cred.type !== "token") return false; + if (normalizeProviderId(cred.provider) !== "anthropic") return false; + return isSetupToken(cred.token); + }) + .map(([id]) => id); +} + +function pickSetupTokenProfile(candidates: string[]): string { + const preferred = [ + "anthropic:setup-token-test", + "anthropic:setup-token", + "anthropic:default", + ]; + for (const id of preferred) { + if (candidates.includes(id)) return id; + } + return candidates[0] ?? ""; +} + +async function resolveTokenSource(): Promise { + const explicitToken = + (SETUP_TOKEN_RAW && isSetupToken(SETUP_TOKEN_RAW) ? SETUP_TOKEN_RAW : "") || + SETUP_TOKEN_VALUE; + + if (explicitToken) { + const error = validateAnthropicSetupToken(explicitToken); + if (error) { + throw new Error(`Invalid setup-token: ${error}`); + } + const tempDir = await fs.mkdtemp( + path.join(os.tmpdir(), "clawdbot-setup-token-"), + ); + const profileId = `anthropic:setup-token-live-${randomUUID()}`; + const store = ensureAuthProfileStore(tempDir, { + allowKeychainPrompt: false, + }); + store.profiles[profileId] = { + type: "token", + provider: "anthropic", + token: explicitToken, + }; + saveAuthProfileStore(store, tempDir); + return { + agentDir: tempDir, + profileId, + cleanup: async () => { + await fs.rm(tempDir, { recursive: true, force: true }); + }, + }; + } + + const agentDir = resolveClawdbotAgentDir(); + const store = ensureAuthProfileStore(agentDir, { + allowKeychainPrompt: false, + }); + + const candidates = listSetupTokenProfiles(store); + if (SETUP_TOKEN_PROFILE) { + if (!candidates.includes(SETUP_TOKEN_PROFILE)) { + const available = + candidates.length > 0 ? candidates.join(", ") : "(none)"; + throw new Error( + `Setup-token profile "${SETUP_TOKEN_PROFILE}" not found. Available: ${available}.`, + ); + } + return { agentDir, profileId: SETUP_TOKEN_PROFILE }; + } + + if ( + SETUP_TOKEN_RAW && + SETUP_TOKEN_RAW !== "1" && + SETUP_TOKEN_RAW !== "auto" + ) { + throw new Error( + "CLAWDBOT_LIVE_SETUP_TOKEN did not look like a setup-token. Use CLAWDBOT_LIVE_SETUP_TOKEN_VALUE for raw tokens.", + ); + } + + if (candidates.length === 0) { + throw new Error( + "No Anthropics setup-token profiles found. Set CLAWDBOT_LIVE_SETUP_TOKEN_VALUE or CLAWDBOT_LIVE_SETUP_TOKEN_PROFILE.", + ); + } + return { agentDir, profileId: pickSetupTokenProfile(candidates) }; +} + +function pickModel(models: Array>, raw?: string): Model | null { + const normalized = raw?.trim() ?? ""; + if (normalized) { + const parsed = parseModelRef(normalized, "anthropic"); + if (!parsed) return null; + return ( + models.find( + (model) => + normalizeProviderId(model.provider) === parsed.provider && + model.id === parsed.model, + ) ?? null + ); + } + + const preferred = [ + "claude-opus-4-5", + "claude-sonnet-4-5", + "claude-sonnet-4-0", + "claude-haiku-3-5", + ]; + for (const id of preferred) { + const match = models.find((model) => model.id === id); + if (match) return match; + } + return models[0] ?? null; +} + +describeLive("live anthropic setup-token", () => { + it( + "completes using a setup-token profile", + async () => { + const tokenSource = await resolveTokenSource(); + try { + const cfg = loadConfig(); + await ensureClawdbotModelsJson(cfg, tokenSource.agentDir); + + const authStorage = discoverAuthStorage(tokenSource.agentDir); + const modelRegistry = discoverModels(authStorage, tokenSource.agentDir); + const all = Array.isArray(modelRegistry) + ? modelRegistry + : modelRegistry.getAll(); + const candidates = all.filter( + (model) => normalizeProviderId(model.provider) === "anthropic", + ) as Array>; + expect(candidates.length).toBeGreaterThan(0); + + const model = pickModel(candidates, SETUP_TOKEN_MODEL); + if (!model) { + throw new Error( + SETUP_TOKEN_MODEL + ? `Model not found: ${SETUP_TOKEN_MODEL}` + : "No Anthropic models available.", + ); + } + + const apiKeyInfo = await getApiKeyForModel({ + model, + cfg, + profileId: tokenSource.profileId, + agentDir: tokenSource.agentDir, + }); + const tokenError = validateAnthropicSetupToken(apiKeyInfo.apiKey); + if (tokenError) { + throw new Error( + `Resolved profile is not a setup-token: ${tokenError}`, + ); + } + + const res = await completeSimple( + model, + { + messages: [ + { + role: "user", + content: "Reply with the word ok.", + timestamp: Date.now(), + }, + ], + }, + { + apiKey: apiKeyInfo.apiKey, + maxTokens: 64, + temperature: 0, + }, + ); + const text = res.content + .filter((block) => block.type === "text") + .map((block) => block.text.trim()) + .join(" "); + expect(text.toLowerCase()).toContain("ok"); + } finally { + if (tokenSource.cleanup) { + await tokenSource.cleanup(); + } + } + }, + 5 * 60 * 1000, + ); +});