Files
clawdbot/docs/testing.md
2026-01-10 19:53:34 +00:00

192 lines
8.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
summary: "Testing kit: unit/e2e/live suites, Docker runners, and what each test covers"
read_when:
- Running tests locally or in CI
- Adding regressions for model/provider bugs
- Debugging gateway + agent behavior
---
# Testing
Clawdbot has three Vitest suites (unit/integration, e2e, live) and a small set of Docker runners.
This doc is a “how we test” guide:
- What each suite covers (and what it deliberately does *not* cover)
- Which commands to run for common workflows (local, pre-push, debugging)
- How live tests discover credentials and select models/providers
- How to add regressions for real-world model/provider issues
## Quick start
Most days:
- Full gate (expected before push): `pnpm lint && pnpm build && pnpm test`
When you touch tests or want extra confidence:
- Coverage gate: `pnpm test:coverage`
- E2E suite: `pnpm test:e2e`
When debugging real providers/models (requires real creds; skipped by default):
- Live suite (models only): `CLAWDBOT_LIVE_TEST=1 pnpm test:live`
- Live suite (models + providers): `LIVE=1 pnpm test:live`
Tip: when you only need one failing case, prefer narrowing live tests via the allowlist env vars described below.
## Test suites (what runs where)
Think of the suites as “increasing realism” (and increasing flakiness/cost):
### Unit / integration (default)
- Command: `pnpm test`
- Config: `vitest.config.ts`
- Files: `src/**/*.test.ts`
- Scope:
- Pure unit tests
- In-process integration tests (gateway auth, routing, tooling, parsing, config)
- Deterministic regressions for known bugs
- Expectations:
- Runs in CI
- No real keys required
- Should be fast and stable
### E2E (gateway smoke)
- Command: `pnpm test:e2e`
- Config: `vitest.e2e.config.ts`
- Files: `src/**/*.e2e.test.ts`
- Scope:
- Multi-instance gateway end-to-end behavior
- WebSocket/HTTP surfaces, node pairing, and heavier networking
- Expectations:
- Runs in CI (when enabled in the pipeline)
- No real keys required
- More moving parts than unit tests (can be slower)
### Live (real providers + real models)
- Command: `pnpm test:live`
- Config: `vitest.live.config.ts`
- Files: `src/**/*.live.test.ts`
- Default: **skipped** unless `CLAWDBOT_LIVE_TEST=1` or `LIVE=1`
- Scope:
- “Does this provider/model actually work *today* with real creds?”
- Catch provider format changes, tool-calling quirks, auth issues, and rate limit behavior
- Expectations:
- Not CI-stable by design (real networks, real provider policies, quotas, outages)
- Costs money / uses rate limits
- Prefer running narrowed subsets instead of “everything”
## Which suite should I run?
Use this decision table:
- Editing logic/tests: run `pnpm test` (and `pnpm test:coverage` if you changed a lot)
- Touching gateway networking / WS protocol / pairing: add `pnpm test:e2e`
- Debugging “my bot is down” / provider-specific failures / tool calling: run a narrowed `pnpm test:live`
## Live: model smoke (profile keys)
Live tests are split into two layers so we can isolate failures:
- “Direct model” tells us the provider/model can answer at all with the given key.
- “Gateway smoke” tells us the full gateway+agent pipeline works for that model (sessions, history, tools, sandbox policy, etc.).
### Layer 1: Direct model completion (no gateway)
- Test: `src/agents/models.profiles.live.test.ts`
- Goal:
- Enumerate discovered models
- Use `getApiKeyForModel` to select models you have creds for
- Run a small completion per model (and targeted regressions where needed)
- How to enable:
- `CLAWDBOT_LIVE_TEST=1` or `LIVE=1`
- `CLAWDBOT_LIVE_ALL_MODELS=1` (required for this test to run)
- How to select models:
- `CLAWDBOT_LIVE_MODELS=all` to run everything with keys
- or `CLAWDBOT_LIVE_MODELS="openai/gpt-5.2,anthropic/claude-opus-4-5,..."` (comma allowlist)
- Where keys come from:
- By default: profile store and env fallbacks
- Set `CLAWDBOT_LIVE_REQUIRE_PROFILE_KEYS=1` to enforce **profile store** only
- Why this exists:
- Separates “provider API is broken / key is invalid” from “gateway agent pipeline is broken”
- Contains small, isolated regressions (example: OpenAI Responses/Codex Responses reasoning replay + tool-call flows)
### Layer 2: Gateway + dev agent smoke (what “@clawdbot” actually does)
- Test: `src/gateway/gateway-models.profiles.live.test.ts`
- Goal:
- Spin up an in-process gateway
- Create/patch a `agent:dev:*` session (model override per run)
- Iterate models-with-keys and assert:
- “meaningful” response (no tools)
- a real tool invocation works (read probe)
- optional extra tool probes (bash+read probe)
- OpenAI regression paths (tool-call-only → follow-up) keep working
- How to enable:
- `CLAWDBOT_LIVE_TEST=1` or `LIVE=1`
- `CLAWDBOT_LIVE_GATEWAY=1` (required for this test to run)
- How to select models:
- `CLAWDBOT_LIVE_GATEWAY_ALL_MODELS=1` to scan all discovered models with keys
- or set `CLAWDBOT_LIVE_GATEWAY_MODELS="provider/model,provider/model,..."` to narrow quickly
- Optional tool-calling stress:
- `CLAWDBOT_LIVE_GATEWAY_TOOL_PROBE=1` enables an extra “bash writes file → read reads it back → echo nonce” check.
- This is specifically meant to catch tool-calling compatibility issues across providers (formatting, history replay, tool_result pairing, etc.).
### Recommended live recipes
Narrow, explicit allowlists are fastest and least flaky:
- Single model, direct (no gateway):
- `CLAWDBOT_LIVE_TEST=1 CLAWDBOT_LIVE_ALL_MODELS=1 CLAWDBOT_LIVE_MODELS="openai/gpt-5.2" pnpm test:live src/agents/models.profiles.live.test.ts`
- Single model, gateway smoke:
- `LIVE=1 CLAWDBOT_LIVE_GATEWAY=1 CLAWDBOT_LIVE_GATEWAY_ALL_MODELS=1 CLAWDBOT_LIVE_GATEWAY_MODELS="openai/gpt-5.2" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
- Tool calling across several providers (bash + read probe):
- `LIVE=1 CLAWDBOT_LIVE_GATEWAY=1 CLAWDBOT_LIVE_GATEWAY_ALL_MODELS=1 CLAWDBOT_LIVE_GATEWAY_TOOL_PROBE=1 CLAWDBOT_LIVE_GATEWAY_MODELS="openai/gpt-5.2,anthropic/claude-opus-4-5,google/gemini-flash-latest,zai/glm-4.7,minimax/minimax-m2.1" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
## Credentials (never commit)
Live tests discover credentials the same way the CLI does. Practical implications:
- If the CLI works, live tests should find the same keys.
- If a live test says “no creds”, debug the same way youd debug `clawdbot models list` / model selection.
- Profile store: `~/.clawdbot/credentials/` (preferred; what “profile keys” means in the tests)
- Config: `~/.clawdbot/clawdbot.json` (or `CLAWDBOT_CONFIG_PATH`)
If you want to rely on env keys (e.g. exported in your `~/.profile`), run local tests after `source ~/.profile`, or use the Docker runners below (they can mount `~/.profile` into the container).
## Docker runners (optional “works in Linux” checks)
These run `pnpm test:live` inside the repo Docker image, mounting your local config dir and workspace (and sourcing `~/.profile` if mounted):
- Direct models: `pnpm test:docker:live-models` (script: `scripts/test-live-models-docker.sh`)
- Gateway + dev agent: `pnpm test:docker:live-gateway` (script: `scripts/test-live-gateway-models-docker.sh`)
- Onboarding wizard (TTY, full scaffolding): `pnpm test:docker:onboard` (script: `scripts/e2e/onboard-docker.sh`)
- Gateway networking (two containers, WS auth + health): `pnpm test:docker:gateway-network` (script: `scripts/e2e/gateway-network-docker.sh`)
Useful env vars:
- `CLAWDBOT_CONFIG_DIR=...` (default: `~/.clawdbot`) mounted to `/home/node/.clawdbot`
- `CLAWDBOT_WORKSPACE_DIR=...` (default: `~/clawd`) mounted to `/home/node/clawd`
- `CLAWDBOT_PROFILE_FILE=...` (default: `~/.profile`) mounted to `/home/node/.profile` and sourced before running tests
- `CLAWDBOT_LIVE_GATEWAY_MODELS=...` / `CLAWDBOT_LIVE_MODELS=...` to narrow the run
- `CLAWDBOT_LIVE_REQUIRE_PROFILE_KEYS=1` to ensure creds come from the profile store (not env)
## Docs sanity
Run docs checks after doc edits: `pnpm docs:list`.
## Offline regression (CI-safe)
These are “real pipeline” regressions without real providers:
- Gateway tool calling (mock OpenAI, real gateway + agent loop): `src/gateway/gateway.tool-calling.mock-openai.test.ts`
- Gateway wizard (WS `wizard.start`/`wizard.next`, writes config + auth enforced): `src/gateway/gateway.wizard.e2e.test.ts`
## Adding regressions (guidance)
When you fix a provider/model issue discovered in live:
- Add a CI-safe regression if possible (mock/stub provider, or capture the exact request-shape transformation)
- If its inherently live-only (rate limits, auth policies), keep the live test narrow and opt-in via env vars
- Prefer targeting the smallest layer that catches the bug:
- provider request conversion/replay bug → direct models test
- gateway session/history/tool pipeline bug → gateway live smoke or CI-safe gateway mock test