Commit Graph

203 Commits

Author SHA1 Message Date
rodbland2021
d3b2135f86 fix(agents): wait for agent idle before flushing pending tool results (#13746)
* fix(agents): wait for agent idle before flushing pending tool results

When pi-agent-core's auto-retry mechanism handles overloaded/rate-limit
errors, it resolves waitForRetry() on assistant message receipt — before
tool execution completes in the retried agent loop. This causes the
attempt's finally block to call flushPendingToolResults() while tools
are still executing, inserting synthetic 'missing tool result' errors
and causing silent agent failures.

The fix adds a waitForIdle() call before the flush to ensure the agent's
retry loop (including tool execution) has fully completed.

Evidence from real session: tool call and synthetic error were only 53ms
apart — the tool never had a chance to execute before being flushed.

Root cause is in pi-agent-core's _resolveRetry() firing on message_end
instead of agent_end, but this workaround in OpenClaw prevents the
symptom without requiring an upstream fix.

Fixes #8643
Fixes #13351
Refs #6682, #12595

* test: add tests for tool result flush race condition

Validates that:
- Real tool results are not replaced by synthetic errors when they arrive in time
- Flush correctly inserts synthetic errors for genuinely orphaned tool calls
- Flush is a no-op after real tool results have already been received

Refs #8643, #13748

* fix(agents): add waitForIdle to all flushPendingToolResults call sites

The original fix only covered the main run finally block, but there are
two additional call sites that can trigger flushPendingToolResults while
tools are still executing:

1. The catch block in attempt.ts (session setup error handler)
2. The finally block in compact.ts (compaction teardown)

Both now await agent.waitForIdle() with a 30s timeout before flushing,
matching the pattern already applied to the main finally block.

Production testing on VPS with debug logging confirmed these additional
paths can fire during sub-agent runs, producing spurious synthetic
'missing tool result' errors.

* fix(agents): centralize idle-wait flush and clear timeout handle

---------

Co-authored-by: Renue Development <dev@renuebyscience.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-02-13 20:35:43 +01:00
Peter Steinberger
1eccfa8934 perf(test): trim duplicate e2e suites and harden signal hooks 2026-02-13 16:46:43 +00:00
Peter Steinberger
31c6a12cfa fix(agents): restore missing runtime helpers and sandbox types 2026-02-13 15:42:05 +00:00
davidbors-snyk
29d7839582 fix: execute sandboxed file ops inside containers (#4026)
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 795ec6aa2f
Co-authored-by: davidbors-snyk <240482518+davidbors-snyk@users.noreply.github.com>
Co-authored-by: steipete <58493+steipete@users.noreply.github.com>
Reviewed-by: @steipete
2026-02-13 16:29:10 +01:00
loiie45e
2e04630105 openai-codex: add gpt-5.3-codex-spark forward-compat model (#15174)
Merged via maintainer flow after rebase + local gates.

Prepared head SHA: 6cac87cbf9

Co-authored-by: loiie45e <15420100+loiie45e@users.noreply.github.com>
Co-authored-by: mbelinky <2406260+mbelinky@users.noreply.github.com>
2026-02-13 15:21:07 +00:00
Peter Steinberger
9131b22a28 test: migrate suites to e2e coverage layout 2026-02-13 14:28:22 +00:00
Lucky
e3cb2564d7 Agents: allow gpt-5.3-codex-spark in fallback and thinking (#14990)
* Agents: allow gpt-5.3-codex-spark in fallback and thinking

* Fix: model picker issue for openai-codex/gpt-5.3-codex-spark

Fixed an issue in the model picker.
2026-02-13 11:39:22 +00:00
Peter Steinberger
85409e401b fix: preserve inter-session input provenance (thanks @anbecker) 2026-02-13 02:02:01 +01:00
Patrick Barletta
d34138dfee fix: dispatch before_tool_call and after_tool_call hooks from both tool execution paths (openclaw#15012) thanks @Patrick-Barletta
Verified:
- pnpm check

Co-authored-by: Patrick-Barletta <67929313+Patrick-Barletta@users.noreply.github.com>
2026-02-12 18:48:11 -06:00
Vladimir Peshekhonov
957b883082 fix(agents): stabilize overflow compaction retries and session context accounting (openclaw#14102) thanks @vpesh
Verified:
- CI checks for commit 86a7ecb45e
- Rebase conflict resolution for compatibility with latest main

Co-authored-by: vpesh <9496634+vpesh@users.noreply.github.com>
2026-02-12 17:53:13 -06:00
Peter Steinberger
da55d70fb0 fix(security): harden untrusted web tool transcripts 2026-02-13 00:46:56 +01:00
Kyle Tse
a10f228a5b fix: update totalTokens after compaction using last-call usage (#15018)
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 9214291bf7
Co-authored-by: shtse8 <8020099+shtse8@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
2026-02-12 18:02:30 -05:00
fagemx
bdd0c12329 fix(providers): include provider name in billing error messages (#14697)
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 774e0b6605
Co-authored-by: fagemx <117356295+fagemx@users.noreply.github.com>
Co-authored-by: shakkernerd <165377636+shakkernerd@users.noreply.github.com>
Reviewed-by: @shakkernerd
2026-02-12 18:23:27 +00:00
Peter Steinberger
5e7842a41d feat(zai): auto-detect endpoint + default glm-5 (#14786)
* feat(zai): auto-detect endpoint + default glm-5

* test: fix Z.AI default endpoint expectation (#14786)

* test: bump embedded runner beforeAll timeout

* chore: update changelog for Z.AI GLM-5 autodetect (#14786)

* chore: resolve changelog merge conflict with main (#14786)

* chore: append changelog note for #14786 without merge conflict

* chore: sync changelog with main to resolve merge conflict
2026-02-12 19:16:04 +01:00
Akari
455bc1ebba fix: use last API call's cache tokens for context-size display (#13698) (#13805)
The UsageAccumulator sums cacheRead/cacheWrite across all API calls
within a single turn. With Anthropic prompt caching, each call reports
cacheRead ≈ current_context_size, so after N tool-call round-trips the
accumulated total becomes N × actual_context, which gets clamped to
contextWindow (200k) by deriveSessionTotalTokens().

Fix: track the most recent API call's cache fields separately and use
them in toNormalizedUsage() for context-size reporting. This makes
/status Context display accurate while preserving accumulated output
token counts.

Fixes #13698
Fixes #13782

Co-authored-by: akari-musubi <259925157+akari-musubi@users.noreply.github.com>
2026-02-12 08:01:36 -06:00
jg-noncelogic
6f74786384 fix(antigravity): opus 4.6 forward-compat model + thinking signature sanitization bypass (#14218)
Two fixes for Google Antigravity (Cloud Code Assist) reliability:

1. Forward-compat model fallback: pi-ai's model registry doesn't include
   claude-opus-4-6-thinking. Add resolveAntigravityOpus46ForwardCompatModel()
   that clones the opus-4-5 template so the correct api ("google-gemini-cli")
   and baseUrl are preserved. Fixes #13765.

2. Fix thinking.signature rejection: The API returns Claude thinking blocks
   without signatures, then rejects them on replay. The existing sanitizer
   strips unsigned blocks, but the orphaned-user-message path in attempt.ts
   bypassed it by reading directly from disk. Now applies
   sanitizeAntigravityThinkingBlocks at that code path.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 08:01:28 -06:00
taw0002
dcb921944a fix: prevent double compaction caused by cache-ttl entry bypassing guard (#13514)
Move appendCacheTtlTimestamp() to after prompt + compaction retry
completes instead of before. The previous placement inserted a custom
entry (openclaw.cache-ttl) between compaction and the next prompt,
which broke pi-coding-agent's prepareCompaction() guard — the guard
only checks if the last entry is type 'compaction', and the cache-ttl
custom entry made it type 'custom', allowing an immediate second
compaction at very low token counts (e.g. 5,545 tokens) that nuked
all preserved context.

Fixes #9282
Relates to #12170
2026-02-12 07:55:32 -06:00
0xRain
43818e1583 fix(agents): re-run tool_use pairing repair after history truncation (#13926)
Co-authored-by: 0xRaini <0xRaini@users.noreply.github.com>
2026-02-12 08:42:05 +09:00
quotentiroler
cc87c0ed7c Update contributing, deduplicate more functions 2026-02-09 19:21:33 -08:00
quotentiroler
53910f3643 Deduplicate more 2026-02-09 18:56:58 -08:00
Rami Abdelrazzaq
c2b2d535fb fix: suggest /clear in context overflow error message (#12973)
* fix: suggest /reset in context overflow error message

When the context window overflows, the error message now suggests
using /reset to clear session history, giving users an actionable
recovery path instead of a dead-end error.

Closes #12940

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: suggest /reset in context overflow error message (#12973) (thanks @RamiNoodle733)

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Rami Abdelrazzaq <RamiNoodle733@users.noreply.github.com>
2026-02-09 20:44:37 -06:00
max
223eee0a20 refactor: unify peer kind to ChatType, rename dm to direct (#11881)
* fix: use .js extension for ESM imports of RoutePeerKind

The imports incorrectly used .ts extension which doesn't resolve
with moduleResolution: NodeNext. Changed to .js and added 'type'
import modifier.

* fix tsconfig

* refactor: unify peer kind to ChatType, rename dm to direct

- Replace RoutePeerKind with ChatType throughout codebase
- Change 'dm' literal values to 'direct' in routing/session keys
- Keep backward compat: normalizeChatType accepts 'dm' -> 'direct'
- Add ChatType export to plugin-sdk, deprecate RoutePeerKind
- Update session key parsing to accept both 'dm' and 'direct' markers
- Update all channel monitors and extensions to use ChatType

BREAKING CHANGE: Session keys now use 'direct' instead of 'dm'.
Existing 'dm' keys still work via backward compat layer.

* fix tests

* test: update session key expectations for dmdirect migration

- Fix test expectations to expect :direct: in generated output
- Add explicit backward compat test for normalizeChatType('dm')
- Keep input test data with :dm: keys to verify backward compat

* fix: accept legacy 'dm' in session key parsing for backward compat

getDmHistoryLimitFromSessionKey now accepts both :dm: and :direct:
to ensure old session keys continue to work correctly.

* test: add explicit backward compat tests for dmdirect migration

- session-key.test.ts: verify both :dm: and :direct: keys are valid
- getDmHistoryLimitFromSessionKey: verify both formats work

* feat: backward compat for resetByType.dm config key

* test: skip unix-path Nix tests on Windows
2026-02-09 09:20:52 +09:00
Tyler Yust
191da1feb5 fix: context overflow compaction and subagent announce improvements (#11664) (thanks @tyler6204)
* initial commit

* feat: implement deriveSessionTotalTokens function and update usage tests

* Added deriveSessionTotalTokens function to calculate total tokens based on usage and context tokens.
* Updated usage tests to include cases for derived session total tokens.
* Refactored session usage calculations in multiple files to utilize the new function for improved accuracy.

* fix: restore overflow truncation fallback + changelog/test hardening (#11551) (thanks @tyler6204)
2026-02-07 20:02:32 -08:00
Tyler Yust
0deb8b0da1 fix: recover from context overflow caused by oversized tool results (#11579)
* fix: gracefully handle oversized tool results causing context overflow

When a subagent reads a very large file or gets a huge tool result (e.g.,
gh pr diff on a massive PR), it can exceed the model's context window in
a single prompt. Auto-compaction can't help because there's no older
history to compact — just one giant tool result.

This adds two layers of defense:

1. Pre-emptive: Hard cap on tool result size (400K chars ≈ 100K tokens)
   applied in the session tool result guard before persistence. This
   prevents extremely large tool results from being stored in full,
   regardless of model context window size.

2. Recovery: When context overflow is detected and compaction fails,
   scan session messages for oversized tool results relative to the
   model's actual context window (30% max share). If found, truncate
   them in the session via branching (creating a new branch with
   truncated content) and retry the prompt.

The truncation preserves the beginning of the content (most useful for
understanding what was read) and appends a notice explaining the
truncation and suggesting offset/limit parameters for targeted reads.

Includes comprehensive tests for:
- Text truncation with newline-boundary awareness
- Context-window-proportional size calculation
- In-memory message truncation
- Oversized detection heuristics
- Guard-level size capping during persistence

* fix: prep fixes for tool result truncation PR (#11579) (thanks @tyler6204)
2026-02-07 17:40:51 -08:00
Tak Hoffman
f0722498a4 Agents: include runtime shell (#1835)
* Agents: include runtime shell

* Agents: fix compact runtime build

* chore: fix CLAUDE.md formatting, security regex for secret

---------

Co-authored-by: Tak hoffman <takayukihoffman@gmail.com>
Co-authored-by: quotentiroler <max.nussbaumer@maxhealth.tech>
2026-02-07 09:32:31 -08:00
Peter Steinberger
0daf416908 fix(agents): add Opus 4.6 forward-compat fallback 2026-02-06 16:56:21 -08:00
Yida-Dev
4216449405 fix: guard resolveUserPath against undefined input (#10176)
* fix: guard resolveUserPath against undefined input

When subagent spawner omits workspaceDir, resolveUserPath receives
undefined and crashes on .trim().  Add a falsy guard that falls back
to process.cwd(), matching the behavior callers already expect.

Closes #10089

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: harden runner workspace fallback (#10176) (thanks @Yida-Dev)

* fix: harden workspace fallback scoping (#10176) (thanks @Yida-Dev)

* refactor: centralize workspace fallback classification and redaction (#10176) (thanks @Yida-Dev)

* test: remove explicit any from utils mock (#10176) (thanks @Yida-Dev)

* security: reject malformed agent session keys for workspace resolution (#10176) (thanks @Yida-Dev)

---------

Co-authored-by: Yida-Dev <reyifeijun@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Gustavo Madeira Santana <gumadeiras@gmail.com>
2026-02-06 13:16:58 -05:00
Tyler Yust
370bbcd89b Model: add strict gpt-5.3-codex fallback for OpenAI Codex (fixes #9989) (#9995)
* Model: allow forward-compatible OpenAI Codex GPT-5 IDs

* Model: scope Codex fallback to gpt-5.3-codex

* fix: reorder codex fallback before providerCfg, add ordering test, changelog (#9989) (thanks @w1kke)

---------

Co-authored-by: Robin <4robinlehmann@gmail.com>
2026-02-05 16:23:18 -08:00
Glucksberg
d4c560853c fix(errors): show clear billing error instead of cryptic API response (#8391)
* fix(errors): return clear billing error message instead of cryptic raw error (#8136)

When an LLM API provider returns a credit/billing-related error (HTTP 402,
insufficient credits, low balance, etc.), OpenClaw now shows a clear,
actionable message instead of passing through the raw/cryptic error text:

  ⚠️ API provider returned a billing error — your API key has run out of
  credits or has an insufficient balance. Check your provider's billing
  dashboard and top up or switch to a different API key.

Changes:
- formatAssistantErrorText: detect billing errors via isBillingErrorMessage()
  and return a user-friendly message (placed before the generic HTTP/JSON
  error fallthrough)
- sanitizeUserFacingText: same billing detection for the sanitization path
- pi-embedded-runner/run.ts: add billingFailure detection in the profile
  exhaustion fallback, so the FailoverError message is billing-specific
- Added 3 new tests for credit balance, HTTP 402, and insufficient credits

* fix: extract billing error message to shared constant
2026-02-05 13:58:43 -08:00
Glucksberg
4e1a7cd60c fix: allow multiple compaction retries on context overflow (#8928)
Previously, overflowCompactionAttempted was a boolean flag set once, preventing
recovery when a single compaction wasn't enough. Change to a counter allowing up
to 3 attempts before giving up. Also add diagnostic logging on overflow events to
help debug early-overflow issues.

Fixes sessions that hit context overflow during long agentic turns with many tool
calls, where one compaction round isn't sufficient to bring context below limits.
2026-02-05 13:58:37 -08:00
Gustavo Madeira Santana
392bbddf29 Security: owner-only tools + command auth hardening (#9202)
* Security: gate whatsapp_login by sender auth

* Security: treat undefined senderAuthorized as unauthorized (opt-in)

* fix: gate whatsapp_login to owner senders (#8768) (thanks @victormier)

* fix: add explicit owner allowlist for tools (#8768) (thanks @victormier)

* fix: normalize escaped newlines in send actions (#8768) (thanks @victormier)

---------

Co-authored-by: Victor Mier <victormier@gmail.com>
2026-02-04 19:49:36 -05:00
Tyler Yust
64df61f697 feat(cron): enhance delivery handling and testing for isolated jobs
- Introduced new properties for explicit message targeting and message tool disabling in the EmbeddedRunAttemptParams type.
- Updated cron job tests to validate best-effort delivery behavior and handling of delivery failures.
- Added logic to clear delivery settings when switching session targets in cron jobs.
- Improved the resolution of delivery failures and best-effort logic in the isolated agent's run function.

This update enhances the flexibility and reliability of delivery mechanisms in isolated cron jobs, ensuring better handling of message delivery scenarios.
2026-02-04 01:03:59 -08:00
Tyler Yust
3f82daefd8 feat(cron): enhance delivery modes and job configuration
- Updated isolated cron jobs to support new delivery modes: `announce` and `none`, improving output management.
- Refactored job configuration to remove legacy fields and streamline delivery settings.
- Enhanced the `CronJobEditor` UI to reflect changes in delivery options, including a new segmented control for delivery mode selection.
- Updated documentation to clarify the new delivery configurations and their implications for job execution.
- Improved tests to validate the new delivery behavior and ensure backward compatibility with legacy settings.

This update provides users with greater flexibility in managing how isolated jobs deliver their outputs, enhancing overall usability and clarity in job configurations.
2026-02-04 01:03:59 -08:00
Vignesh Natarajan
9bef525944 chore: apply formatter 2026-02-02 23:45:05 -08:00
Vignesh Natarajan
5d3af3bc62 feat (memory): Implement new (opt-in) QMD memory backend 2026-02-02 23:45:05 -08:00
Justin
0da6de6624 Agent: repair malformed tool calls and session files 2026-02-02 23:56:27 +00:00
Peter Steinberger
b8174decf3 fix: resolve system prompt overrides 2026-02-02 02:10:13 -08:00
Peter Steinberger
d03eca8450 fix: harden plugin and hook install paths 2026-02-02 02:07:47 -08:00
Tyler Yust
8d2f98fb01 Fix subagent announce failover race (always emit lifecycle end + treat timeout=0 as no-timeout) (#6621)
* Fix subagent announce race and timeout handling

Bug 1: Subagent announce fires before model failover retries finish
- Problem: CLI provider emitted lifecycle error on each attempt, causing
  subagent registry to prematurely call beginSubagentCleanup() and announce
  with incorrect status before failover retries completed
- Fix: Removed lifecycle error emission from CLI provider's attempt-level
  .catch() in agent-runner-execution.ts. Errors still propagate to
  runWithModelFallback for retry, but no intermediate lifecycle events
  are emitted. Only the final outcome (after all retries) emits lifecycle
  events.

Bug 2: Hard 600s per-prompt timeout ignores runTimeoutSeconds=0
- Problem: When runTimeoutSeconds=0 (meaning 'no timeout'), the code
  returned the default 600s timeout instead of respecting the 0 setting
- Fix: Modified resolveAgentTimeoutMs() to treat 0 as 'no timeout' and
  return a very large timeout value (30 days) instead of the default.
  This avoids setTimeout issues with Infinity while effectively providing
  unlimited time for long-running tasks.

* fix: emit lifecycle:error for CLI failures (#6621) (thanks @tyler6204)

* chore: satisfy format/lint gates (#6621) (thanks @tyler6204)

* fix: restore build after upstream type changes (#6621) (thanks @tyler6204)

* test: fix createSystemPromptOverride tests to match new return type (#6621) (thanks @tyler6204)
2026-02-02 02:06:14 -08:00
Peter Steinberger
34dd7324d9 fix: restore lint/build gates 2026-02-02 01:25:40 -08:00
Tyler Yust
9ef24fd400 fix: flush block streaming on paragraph boundaries for chunkMode=newline (#7014)
* feat: Implement paragraph boundary flushing in block streaming

- Added `flushOnParagraph` option to `BlockReplyChunking` for immediate flushing on paragraph breaks.
- Updated `EmbeddedBlockChunker` to handle paragraph boundaries during chunking.
- Enhanced `createBlockReplyCoalescer` to support flushing on enqueue.
- Added tests to verify behavior of flushing with and without `flushOnEnqueue` set.
- Updated relevant types and interfaces to include `flushOnParagraph` and `flushOnEnqueue` options.

* fix: Improve streaming behavior and enhance block chunking logic

- Resolved issue with stuck typing indicator after streamed BlueBubbles replies.
- Refactored `EmbeddedBlockChunker` to streamline fence-split handling and ensure maxChars fallback for newline chunking.
- Added tests to validate new chunking behavior, including handling of paragraph breaks and fence scenarios.
- Updated changelog to reflect these changes.

* test: Add test for clamping long paragraphs in EmbeddedBlockChunker

- Introduced a new test case to verify that long paragraphs are correctly clamped to maxChars when flushOnParagraph is enabled.
- Updated logic in EmbeddedBlockChunker to handle cases where the next paragraph break exceeds maxChars, ensuring proper chunking behavior.

* refactor: streamline logging and improve error handling in message processing

- Removed verbose logging statements from the `processMessage` function to reduce clutter.
- Enhanced error handling by using `runtime.error` for typing restart failures.
- Updated the `applySystemPromptOverrideToSession` function to accept a string directly instead of a function, simplifying the prompt application process.
- Adjusted the `runEmbeddedAttempt` function to directly use the system prompt override without invoking it as a function.
2026-02-02 01:22:41 -08:00
David Iach
4e4ed2ea17 fix(security): cap Slack media downloads and validate Slack file URLs (#6639)
* Security: cap Slack media downloads and validate Slack file URLs

* Security: relax web media fetch cap for compression

* Fixes: sync pi-coding-agent options

* Fixes: align system prompt override type

* Slack: clarify fetchImpl assumptions

* fix: respect raw media fetch cap (#6639) (thanks @davidiach)

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-02-02 00:48:07 -08:00
Mario Zechner
cf1d3f7a7c fix: update pi packages to 0.51.0, remove bogus type augmentation
- Update @mariozechner/pi-agent-core, pi-ai, pi-coding-agent, pi-tui to 0.51.0
- Delete src/types/pi-coding-agent.d.ts (declared additionalExtensionPaths which SDK never supported)
- Fix ToolDefinition.execute signature (parameter order changed in 0.51.0)
- Remove dead additionalExtensionPaths from createAgentSession calls
2026-02-02 01:52:33 +01:00
Peter Steinberger
e58291e070 fix: align embedded runner with pi-coding-agent API 2026-02-01 15:51:46 -08:00
Peter Steinberger
3367b2aa27 fix: align embedded runner with session API changes 2026-02-01 15:06:55 -08:00
Peter Steinberger
8eb11bd304 fix: wire before_tool_call hook into tool execution (#6570) (thanks @ryancnelson) (#6660) 2026-02-01 14:52:11 -08:00
Peter Steinberger
bcde2fca5a fix: align embedded agent session setup 2026-02-01 22:23:16 +00:00
Peter Steinberger
083ec9325e fix: cover OpenRouter attribution headers 2026-02-01 19:30:33 +00:00
Alex Atallah
74039fc0f1 Add openrouter attribution headers 2026-02-01 19:24:55 +00:00
Evan
5d3c898a94 fix: update compaction safeguard to respect context window tokens 2026-02-01 19:52:56 +05:30