diff --git a/AGENTS.md b/AGENTS.md index aac23c78c..84ad34145 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -34,6 +34,8 @@ - Framework: Vitest with V8 coverage thresholds (70% lines/branches/functions/statements). - Naming: match source names with `*.test.ts`; e2e in `*.e2e.test.ts`. - Run `pnpm test` (or `pnpm test:coverage`) before pushing when you touch logic. +- Live tests: `LIVE=1 pnpm test:live` (real keys; skipped by default). Docker runners: `scripts/test-live-models-docker.sh`, `scripts/test-live-gateway-models-docker.sh`. +- Full kit + what’s covered: `docs/testing.md`. - Pure test additions/fixes generally do **not** need a changelog entry unless they alter user-facing behavior or the user asks for one. - Mobile: before using a simulator, check for connected real devices (iOS + Android) and prefer them when available. diff --git a/docs/docs.json b/docs/docs.json index f04f1c3c8..a684807cb 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -718,6 +718,7 @@ { "group": "Reference & Templates", "pages": [ + "testing", "scripts", "reference/rpc", "reference/device-models", diff --git a/docs/reference/test.md b/docs/reference/test.md index a66e43aa2..c8ad33c19 100644 --- a/docs/reference/test.md +++ b/docs/reference/test.md @@ -5,6 +5,8 @@ read_when: --- # Tests +- Full testing kit (suites, live, Docker): [Testing](/testing) + - `pnpm test:force`: Kills any lingering gateway process holding the default control port, then runs the full Vitest suite with an isolated gateway port so server tests don’t collide with a running instance. Use this when a prior gateway run left port 18789 occupied. - `pnpm test:coverage`: Runs Vitest with V8 coverage. Global thresholds are 70% lines/branches/functions/statements. Coverage excludes integration-heavy entrypoints (CLI wiring, gateway/telegram bridges, webchat static server) to keep the target focused on unit-testable logic. - `pnpm test:e2e`: Runs gateway end-to-end smoke tests (multi-instance WS/HTTP/node pairing). diff --git a/docs/testing.md b/docs/testing.md new file mode 100644 index 000000000..d9c9b258d --- /dev/null +++ b/docs/testing.md @@ -0,0 +1,91 @@ +--- +summary: "Testing kit: unit/e2e/live suites, Docker runners, and what each test covers" +read_when: + - Running tests locally or in CI + - Adding regressions for model/provider bugs + - Debugging gateway + agent behavior +--- + +# Testing + +Clawdbot has three Vitest suites (unit, e2e, live) plus a couple Docker helpers for “run with my real keys” smoke checks. + +## Quick start + +- Full gate (what we expect before push): `pnpm lint && pnpm build && pnpm test` +- Coverage gate: `pnpm test:coverage` +- E2E suite: `pnpm test:e2e` +- Live suite (opt-in): `LIVE=1 pnpm test:live` + +## Test suites (what runs where) + +### Unit / integration (default) + +- Command: `pnpm test` +- Config: `vitest.config.ts` +- Files: `src/**/*.test.ts` +- Scope: pure unit tests + in-process integration tests (gateway server auth, routing, tooling, parsing, config). + +### E2E (gateway smoke) + +- Command: `pnpm test:e2e` +- Config: `vitest.e2e.config.ts` +- Files: `src/**/*.e2e.test.ts` +- Scope: multi-instance gateway end-to-end behavior (WebSocket/HTTP/node pairing), heavier networking surface. + +### Live (real providers + real models) + +- Command: `pnpm test:live` +- Config: `vitest.live.config.ts` +- Files: `src/**/*.live.test.ts` +- Default: **skipped** unless `LIVE=1` (or `CLAWDBOT_LIVE_TEST=1`) +- Scope: “does this provider/model actually work today with real creds”. + +## Live: model smoke (profile keys) + +Two layers: + +1. Direct model completion (no gateway): + - Test: `src/agents/models.profiles.live.test.ts` + - Goal: enumerate discovered models, use `getApiKeyForModel` to pick ones you have creds for, then run a small completion. + - Selection: + - `CLAWDBOT_LIVE_ALL_MODELS=1` (required to run the suite) + - `CLAWDBOT_LIVE_MODELS=all` or comma allowlist (`openai/gpt-5.2,anthropic/claude-opus-4-5,...`) + - `CLAWDBOT_LIVE_REQUIRE_PROFILE_KEYS=1` to ensure creds come from the profile store (not ad-hoc env). + - Regression hook: OpenAI Responses tool-only → follow-up path (the `reasoning` replay class) is covered here. + +2. Gateway + dev agent smoke (what “@clawdbot” actually does): + - Test: `src/gateway/gateway-models.profiles.live.test.ts` + - Goal: spin up an in-process gateway, create/patch a `agent:dev:*` session, iterate models-with-keys, and assert “meaningful” responses. + - Selection: + - `CLAWDBOT_LIVE_GATEWAY=1` + - `CLAWDBOT_LIVE_GATEWAY_ALL_MODELS=1` (scan all discovered models with available keys) + - `CLAWDBOT_LIVE_GATEWAY_MODELS=all` or comma allowlist + - Extra regression: for OpenAI Responses/Codex Responses models, force a tool-call-only turn followed by a user question (the exact failure mode that produced `400 … reasoning … required following item`). + +## Credentials (never commit) + +Live tests discover credentials the same way the CLI does: + +- Profile store: `~/.clawdbot/credentials/` (preferred; what “profile keys” means in the tests) +- Config: `~/.clawdbot/clawdbot.json` (or `CLAWDBOT_CONFIG_PATH`) + +If you want to rely on env keys (e.g. exported in your `~/.profile`), run local tests after `source ~/.profile`, or use the Docker runners below (they can mount `~/.profile` into the container). + +## Docker runners (optional “works in Linux” checks) + +These run `pnpm test:live` inside the repo Docker image, mounting your local config dir and workspace: + +- Direct models: `scripts/test-live-models-docker.sh` +- Gateway + dev agent: `scripts/test-live-gateway-models-docker.sh` + +Useful env vars: + +- `CLAWDBOT_CONFIG_DIR=...` (default: `~/.clawdbot`) mounted to `/home/node/.clawdbot` +- `CLAWDBOT_WORKSPACE_DIR=...` (default: `~/clawd`) mounted to `/home/node/clawd` +- `CLAWDBOT_LIVE_GATEWAY_MODELS=...` / `CLAWDBOT_LIVE_MODELS=...` to narrow the run + +## Docs sanity + +Run docs checks after doc edits: `pnpm docs:list`. + diff --git a/scripts/test-live-gateway-models-docker.sh b/scripts/test-live-gateway-models-docker.sh index 39422cf76..b30883eec 100755 --- a/scripts/test-live-gateway-models-docker.sh +++ b/scripts/test-live-gateway-models-docker.sh @@ -5,6 +5,12 @@ ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" IMAGE_NAME="${CLAWDBOT_IMAGE:-clawdbot:local}" CONFIG_DIR="${CLAWDBOT_CONFIG_DIR:-$HOME/.clawdbot}" WORKSPACE_DIR="${CLAWDBOT_WORKSPACE_DIR:-$HOME/clawd}" +PROFILE_FILE="${CLAWDBOT_PROFILE_FILE:-$HOME/.profile}" + +PROFILE_MOUNT=() +if [[ -f "$PROFILE_FILE" ]]; then + PROFILE_MOUNT=(-v "$PROFILE_FILE":/home/node/.profile:ro) +fi echo "==> Build image: $IMAGE_NAME" docker build -t "$IMAGE_NAME" -f "$ROOT_DIR/Dockerfile" "$ROOT_DIR" @@ -19,6 +25,6 @@ docker run --rm -t \ -e CLAWDBOT_LIVE_GATEWAY_MODELS="${CLAWDBOT_LIVE_GATEWAY_MODELS:-all}" \ -v "$CONFIG_DIR":/home/node/.clawdbot \ -v "$WORKSPACE_DIR":/home/node/clawd \ + "${PROFILE_MOUNT[@]}" \ "$IMAGE_NAME" \ -lc "cd /app && pnpm test:live" - diff --git a/scripts/test-live-models-docker.sh b/scripts/test-live-models-docker.sh index 4f03b5ab9..7fdd48372 100755 --- a/scripts/test-live-models-docker.sh +++ b/scripts/test-live-models-docker.sh @@ -5,6 +5,12 @@ ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" IMAGE_NAME="${CLAWDBOT_IMAGE:-clawdbot:local}" CONFIG_DIR="${CLAWDBOT_CONFIG_DIR:-$HOME/.clawdbot}" WORKSPACE_DIR="${CLAWDBOT_WORKSPACE_DIR:-$HOME/clawd}" +PROFILE_FILE="${CLAWDBOT_PROFILE_FILE:-$HOME/.profile}" + +PROFILE_MOUNT=() +if [[ -f "$PROFILE_FILE" ]]; then + PROFILE_MOUNT=(-v "$PROFILE_FILE":/home/node/.profile:ro) +fi echo "==> Build image: $IMAGE_NAME" docker build -t "$IMAGE_NAME" -f "$ROOT_DIR/Dockerfile" "$ROOT_DIR" @@ -18,6 +24,6 @@ docker run --rm -t \ -e CLAWDBOT_LIVE_REQUIRE_PROFILE_KEYS=1 \ -v "$CONFIG_DIR":/home/node/.clawdbot \ -v "$WORKSPACE_DIR":/home/node/clawd \ + "${PROFILE_MOUNT[@]}" \ "$IMAGE_NAME" \ -lc "cd /app && pnpm test:live" -