The UsageAccumulator sums cacheRead/cacheWrite across all API calls
within a single turn. With Anthropic prompt caching, each call reports
cacheRead ≈ current_context_size, so after N tool-call round-trips the
accumulated total becomes N × actual_context, which gets clamped to
contextWindow (200k) by deriveSessionTotalTokens().
Fix: track the most recent API call's cache fields separately and use
them in toNormalizedUsage() for context-size reporting. This makes
/status Context display accurate while preserving accumulated output
token counts.
Fixes#13698Fixes#13782
Co-authored-by: akari-musubi <259925157+akari-musubi@users.noreply.github.com>
* fix: suggest /reset in context overflow error message
When the context window overflows, the error message now suggests
using /reset to clear session history, giving users an actionable
recovery path instead of a dead-end error.
Closes#12940
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: suggest /reset in context overflow error message (#12973) (thanks @RamiNoodle733)
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Rami Abdelrazzaq <RamiNoodle733@users.noreply.github.com>
* initial commit
* feat: implement deriveSessionTotalTokens function and update usage tests
* Added deriveSessionTotalTokens function to calculate total tokens based on usage and context tokens.
* Updated usage tests to include cases for derived session total tokens.
* Refactored session usage calculations in multiple files to utilize the new function for improved accuracy.
* fix: restore overflow truncation fallback + changelog/test hardening (#11551) (thanks @tyler6204)
* fix: gracefully handle oversized tool results causing context overflow
When a subagent reads a very large file or gets a huge tool result (e.g.,
gh pr diff on a massive PR), it can exceed the model's context window in
a single prompt. Auto-compaction can't help because there's no older
history to compact — just one giant tool result.
This adds two layers of defense:
1. Pre-emptive: Hard cap on tool result size (400K chars ≈ 100K tokens)
applied in the session tool result guard before persistence. This
prevents extremely large tool results from being stored in full,
regardless of model context window size.
2. Recovery: When context overflow is detected and compaction fails,
scan session messages for oversized tool results relative to the
model's actual context window (30% max share). If found, truncate
them in the session via branching (creating a new branch with
truncated content) and retry the prompt.
The truncation preserves the beginning of the content (most useful for
understanding what was read) and appends a notice explaining the
truncation and suggesting offset/limit parameters for targeted reads.
Includes comprehensive tests for:
- Text truncation with newline-boundary awareness
- Context-window-proportional size calculation
- In-memory message truncation
- Oversized detection heuristics
- Guard-level size capping during persistence
* fix: prep fixes for tool result truncation PR (#11579) (thanks @tyler6204)
* fix(errors): return clear billing error message instead of cryptic raw error (#8136)
When an LLM API provider returns a credit/billing-related error (HTTP 402,
insufficient credits, low balance, etc.), OpenClaw now shows a clear,
actionable message instead of passing through the raw/cryptic error text:
⚠️ API provider returned a billing error — your API key has run out of
credits or has an insufficient balance. Check your provider's billing
dashboard and top up or switch to a different API key.
Changes:
- formatAssistantErrorText: detect billing errors via isBillingErrorMessage()
and return a user-friendly message (placed before the generic HTTP/JSON
error fallthrough)
- sanitizeUserFacingText: same billing detection for the sanitization path
- pi-embedded-runner/run.ts: add billingFailure detection in the profile
exhaustion fallback, so the FailoverError message is billing-specific
- Added 3 new tests for credit balance, HTTP 402, and insufficient credits
* fix: extract billing error message to shared constant
Previously, overflowCompactionAttempted was a boolean flag set once, preventing
recovery when a single compaction wasn't enough. Change to a counter allowing up
to 3 attempts before giving up. Also add diagnostic logging on overflow events to
help debug early-overflow issues.
Fixes sessions that hit context overflow during long agentic turns with many tool
calls, where one compaction round isn't sufficient to bring context below limits.
- Change MAX_IMAGE_BYTES from 6MB to 5MB to match Anthropic API limit
- Add isImageSizeError() to detect image size errors from API
- Handle image size errors with user-friendly message instead of retry
- Prevent failover for image size errors (not retriable)
Fixes#2271
* fix: detect Anthropic 'Request size exceeds model context window' as context overflow
Anthropic now returns 'Request size exceeds model context window' instead of
the previously detected 'prompt is too long' format. This new error message
was not recognized by isContextOverflowError(), causing auto-compaction to
NOT trigger. Users would see the raw error twice without any recovery attempt.
Changes:
- Add 'exceeds model context window' and 'request size exceeds' to
isContextOverflowError() detection patterns
- Add tests that fail without the fix, verifying both the raw error
string and the JSON-wrapped format from Anthropic's API
- Add test for formatAssistantErrorText to ensure the friendly
'Context overflow' message is shown instead of the raw error
Note: The upstream pi-ai package (@mariozechner/pi-ai) also needs a fix
in its OVERFLOW_PATTERNS regex: /exceeds the context window/i should be
changed to /exceeds.*context window/i to match both 'the' and 'model'
variants for triggering auto-compaction retry.
* fix(tests): remove unused imports and helper from test files
Remove WorkspaceBootstrapFile references and _makeFile helper that were
incorrectly copied from another test file. These caused type errors and
were unrelated to the context overflow detection tests.
* fix: trigger auto-compaction on context overflow promptError
When the LLM rejects a request with a context overflow error that surfaces
as a promptError (thrown exception rather than streamed error), the existing
auto-compaction in pi-coding-agent never triggers. This happens because the
error bypasses the agent's message_end → agent_end → _checkCompaction path.
This fix adds a fallback compaction attempt directly in the run loop:
- Detects context overflow in promptError (excluding compaction_failure)
- Calls compactEmbeddedPiSessionDirect (bypassing lane queues since already in-lane)
- Retries the prompt after successful compaction
- Limits to one compaction attempt per run to prevent infinite loops
Fixes: context overflow errors shown to user without auto-compaction attempt
* style: format compact.ts and run.ts with oxfmt
* fix: tighten context overflow match (#1627) (thanks @rodrigouroz)
---------
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
Auth profiles in cooldown (due to rate limiting) were being attempted,
causing unnecessary retries and delays. This fix ensures:
1. Initial profile selection skips profiles in cooldown
2. Profile rotation (after failures) skips cooldown profiles
3. Clear error message when all profiles are unavailable
Tests added:
- Skips profiles in cooldown during initial selection
- Skips profiles in cooldown when rotating after failure
Fixes#1316
- Add input_image and input_file support with SSRF protection
- Add client-side tools (Hosted Tools) support
- Add turn-based tool flow with function_call_output handling
- Export buildAgentPrompt for testing
When a prompt submission fails with quota or rate limit errors, throw
FailoverError instead of the raw promptError. This enables the model
fallback system to try alternative models.
Previously, rate limit errors during the prompt phase (before streaming)
were thrown directly, bypassing fallback. Only response-phase errors
triggered model fallback.
Now checks if fallback models are configured and the error is failover-
eligible. If so, wraps in FailoverError to trigger the fallback chain.