mirror of
https://github.com/openclaw/openclaw.git
synced 2026-04-18 19:57:27 +00:00
chore: Run pnpm format:fix.
This commit is contained in:
@@ -11,7 +11,8 @@ read_when:
|
||||
OpenClaw has three Vitest suites (unit/integration, e2e, live) and a small set of Docker runners.
|
||||
|
||||
This doc is a “how we test” guide:
|
||||
- What each suite covers (and what it deliberately does *not* cover)
|
||||
|
||||
- What each suite covers (and what it deliberately does _not_ cover)
|
||||
- Which commands to run for common workflows (local, pre-push, debugging)
|
||||
- How live tests discover credentials and select models/providers
|
||||
- How to add regressions for real-world model/provider issues
|
||||
@@ -19,13 +20,16 @@ This doc is a “how we test” guide:
|
||||
## Quick start
|
||||
|
||||
Most days:
|
||||
|
||||
- Full gate (expected before push): `pnpm lint && pnpm build && pnpm test`
|
||||
|
||||
When you touch tests or want extra confidence:
|
||||
|
||||
- Coverage gate: `pnpm test:coverage`
|
||||
- E2E suite: `pnpm test:e2e`
|
||||
|
||||
When debugging real providers/models (requires real creds):
|
||||
|
||||
- Live suite (models + gateway tool/image probes): `pnpm test:live`
|
||||
|
||||
Tip: when you only need one failing case, prefer narrowing live tests via the allowlist env vars described below.
|
||||
@@ -68,7 +72,7 @@ Think of the suites as “increasing realism” (and increasing flakiness/cost):
|
||||
- Files: `src/**/*.live.test.ts`
|
||||
- Default: **enabled** by `pnpm test:live` (sets `OPENCLAW_LIVE_TEST=1`)
|
||||
- Scope:
|
||||
- “Does this provider/model actually work *today* with real creds?”
|
||||
- “Does this provider/model actually work _today_ with real creds?”
|
||||
- Catch provider format changes, tool-calling quirks, auth issues, and rate limit behavior
|
||||
- Expectations:
|
||||
- Not CI-stable by design (real networks, real provider policies, quotas, outages)
|
||||
@@ -80,6 +84,7 @@ Think of the suites as “increasing realism” (and increasing flakiness/cost):
|
||||
## Which suite should I run?
|
||||
|
||||
Use this decision table:
|
||||
|
||||
- Editing logic/tests: run `pnpm test` (and `pnpm test:coverage` if you changed a lot)
|
||||
- Touching gateway networking / WS protocol / pairing: add `pnpm test:e2e`
|
||||
- Debugging “my bot is down” / provider-specific failures / tool calling: run a narrowed `pnpm test:live`
|
||||
@@ -87,6 +92,7 @@ Use this decision table:
|
||||
## Live: model smoke (profile keys)
|
||||
|
||||
Live tests are split into two layers so we can isolate failures:
|
||||
|
||||
- “Direct model” tells us the provider/model can answer at all with the given key.
|
||||
- “Gateway smoke” tells us the full gateway+agent pipeline works for that model (sessions, history, tools, sandbox policy, etc.).
|
||||
|
||||
@@ -223,6 +229,7 @@ Narrow, explicit allowlists are fastest and least flaky:
|
||||
- Antigravity (OAuth): `OPENCLAW_LIVE_GATEWAY_MODELS="google-antigravity/claude-opus-4-5-thinking,google-antigravity/gemini-3-pro-high" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
|
||||
|
||||
Notes:
|
||||
|
||||
- `google/...` uses the Gemini API (API key).
|
||||
- `google-antigravity/...` uses the Antigravity OAuth bridge (Cloud Code Assist-style agent endpoint).
|
||||
- `google-gemini-cli/...` uses the local Gemini CLI on your machine (separate auth + tooling quirks).
|
||||
@@ -237,6 +244,7 @@ There is no fixed “CI model list” (live is opt-in), but these are the **reco
|
||||
### Modern smoke set (tool calling + image)
|
||||
|
||||
This is the “common models” run we expect to keep working:
|
||||
|
||||
- OpenAI (non-Codex): `openai/gpt-5.2` (optional: `openai/gpt-5.1`)
|
||||
- OpenAI Codex: `openai-codex/gpt-5.2` (optional: `openai-codex/gpt-5.2-codex`)
|
||||
- Anthropic: `anthropic/claude-opus-4-5` (or `anthropic/claude-sonnet-4-5`)
|
||||
@@ -251,6 +259,7 @@ Run gateway smoke with tools + image:
|
||||
### Baseline: tool calling (Read + optional Exec)
|
||||
|
||||
Pick at least one per provider family:
|
||||
|
||||
- OpenAI: `openai/gpt-5.2` (or `openai/gpt-5-mini`)
|
||||
- Anthropic: `anthropic/claude-opus-4-5` (or `anthropic/claude-sonnet-4-5`)
|
||||
- Google: `google/gemini-3-flash-preview` (or `google/gemini-3-pro-preview`)
|
||||
@@ -258,6 +267,7 @@ Pick at least one per provider family:
|
||||
- MiniMax: `minimax/minimax-m2.1`
|
||||
|
||||
Optional additional coverage (nice to have):
|
||||
|
||||
- xAI: `xai/grok-4` (or latest available)
|
||||
- Mistral: `mistral/`… (pick one “tools” capable model you have enabled)
|
||||
- Cerebras: `cerebras/`… (if you have access)
|
||||
@@ -270,10 +280,12 @@ Include at least one image-capable model in `OPENCLAW_LIVE_GATEWAY_MODELS` (Clau
|
||||
### Aggregators / alternate gateways
|
||||
|
||||
If you have keys enabled, we also support testing via:
|
||||
|
||||
- OpenRouter: `openrouter/...` (hundreds of models; use `openclaw models scan` to find tool+image capable candidates)
|
||||
- OpenCode Zen: `opencode/...` (auth via `OPENCODE_API_KEY` / `OPENCODE_ZEN_API_KEY`)
|
||||
|
||||
More providers you can include in the live matrix (if you have creds/config):
|
||||
|
||||
- Built-in: `openai`, `openai-codex`, `anthropic`, `google`, `google-vertex`, `google-antigravity`, `google-gemini-cli`, `zai`, `openrouter`, `opencode`, `xai`, `groq`, `cerebras`, `mistral`, `github-copilot`
|
||||
- Via `models.providers` (custom endpoints): `minimax` (cloud/API), plus any OpenAI/Anthropic-compatible proxy (LM Studio, vLLM, LiteLLM, etc.)
|
||||
|
||||
@@ -282,6 +294,7 @@ Tip: don’t try to hardcode “all models” in docs. The authoritative list is
|
||||
## Credentials (never commit)
|
||||
|
||||
Live tests discover credentials the same way the CLI does. Practical implications:
|
||||
|
||||
- If the CLI works, live tests should find the same keys.
|
||||
- If a live test says “no creds”, debug the same way you’d debug `openclaw models list` / model selection.
|
||||
|
||||
@@ -320,21 +333,25 @@ Run docs checks after doc edits: `pnpm docs:list`.
|
||||
## Offline regression (CI-safe)
|
||||
|
||||
These are “real pipeline” regressions without real providers:
|
||||
|
||||
- Gateway tool calling (mock OpenAI, real gateway + agent loop): `src/gateway/gateway.tool-calling.mock-openai.test.ts`
|
||||
- Gateway wizard (WS `wizard.start`/`wizard.next`, writes config + auth enforced): `src/gateway/gateway.wizard.e2e.test.ts`
|
||||
|
||||
## Agent reliability evals (skills)
|
||||
|
||||
We already have a few CI-safe tests that behave like “agent reliability evals”:
|
||||
|
||||
- Mock tool-calling through the real gateway + agent loop (`src/gateway/gateway.tool-calling.mock-openai.test.ts`).
|
||||
- End-to-end wizard flows that validate session wiring and config effects (`src/gateway/gateway.wizard.e2e.test.ts`).
|
||||
|
||||
What’s still missing for skills (see [Skills](/tools/skills)):
|
||||
|
||||
- **Decisioning:** when skills are listed in the prompt, does the agent pick the right skill (or avoid irrelevant ones)?
|
||||
- **Compliance:** does the agent read `SKILL.md` before use and follow required steps/args?
|
||||
- **Workflow contracts:** multi-turn scenarios that assert tool order, session history carryover, and sandbox boundaries.
|
||||
|
||||
Future evals should stay deterministic first:
|
||||
|
||||
- A scenario runner using mock providers to assert tool calls + order, skill file reads, and session wiring.
|
||||
- A small suite of skill-focused scenarios (use vs avoid, gating, prompt injection).
|
||||
- Optional live evals (opt-in, env-gated) only after the CI-safe suite is in place.
|
||||
@@ -342,6 +359,7 @@ Future evals should stay deterministic first:
|
||||
## Adding regressions (guidance)
|
||||
|
||||
When you fix a provider/model issue discovered in live:
|
||||
|
||||
- Add a CI-safe regression if possible (mock/stub provider, or capture the exact request-shape transformation)
|
||||
- If it’s inherently live-only (rate limits, auth policies), keep the live test narrow and opt-in via env vars
|
||||
- Prefer targeting the smallest layer that catches the bug:
|
||||
|
||||
Reference in New Issue
Block a user