* fix(agents): bypass pendingDescendantRuns guard for cron announce delivery
Standalone cron job completions were blocked from direct channel delivery
when the cron run had spawned subagents that were still registered as
pending. The pendingDescendantRuns guard exists for live orchestration
coordination and should not apply to fire-and-forget cron announce sends.
Thread the announceType through the delivery chain and skip both the
child-descendant and requester-descendant pending-run guards when the
announce originates from a cron job.
Closes#34966
* fix: ensure outbound session entry for cron announce with named agents (#32432)
Named agents may not have a session entry for their delivery target,
causing the announce flow to silently fail (delivered=false, no error).
Two fixes:
1. Call ensureOutboundSessionEntry when resolving the cron announce
session key so downstream delivery can find channel metadata.
2. Fall back to direct outbound delivery when announce delivery fails
to ensure cron output reaches the target channel.
Closes#32432
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: guard announce direct-delivery fallback against suppression leaks (#32432)
The `!delivered` fallback condition was too broad — it caught intentional
suppressions (active subagents, interim messages, SILENT_REPLY_TOKEN) in
addition to actual announce delivery failures. Add an
`announceDeliveryWasAttempted` flag so the direct-delivery fallback only
fires when `runSubagentAnnounceFlow` was actually called and failed.
Also remove the redundant `if (route)` guard in
`resolveCronAnnounceSessionKey` since `resolved` being truthy guarantees
`route` is non-null.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(cron): harden announce synthesis follow-ups
---------
Co-authored-by: scoootscooob <zhentongfan@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
* fix(feishu): comprehensive reply mechanism fix — outbound replyToId forwarding + topic-aware reply targeting
- Forward replyToId from ChannelOutboundContext through sendText/sendMedia
to sendMessageFeishu/sendMarkdownCardFeishu/sendMediaFeishu, enabling
reply-to-message via the message tool.
- Fix group reply targeting: use ctx.messageId (triggering message) in
normal groups to prevent silent topic thread creation (#32980). Preserve
ctx.rootId targeting for topic-mode groups (group_topic/group_topic_sender)
and groups with explicit replyInThread config.
- Add regression tests for both fixes.
Fixes#32980Fixes#32958
Related #19784
* fix: normalize Feishu delivery.to before comparing with messaging tool targets
- Add normalizeDeliveryTarget helper to strip user:/chat: prefixes for Feishu
- Apply normalization in matchesMessagingToolDeliveryTarget before comparison
- This ensures cron duplicate suppression works when session uses prefixed targets
(user:ou_xxx) but messaging tool extract uses normalized bare IDs (ou_xxx)
Fixes review comment on PR #32755
(cherry picked from commit fc20106f16)
* fix(feishu): catch thrown SDK errors for withdrawn reply targets
The Feishu Lark SDK can throw exceptions (SDK errors with .code or
AxiosErrors with .response.data.code) for withdrawn/deleted reply
targets, in addition to returning error codes in the response object.
Wrap reply calls in sendMessageFeishu and sendCardFeishu with
try-catch to handle thrown withdrawn/not-found errors (230011,
231003) and fall back to client.im.message.create, matching the
existing response-level fallback behavior.
Also extract sendFallbackDirect helper to deduplicate the
direct-send fallback block across both functions.
Closes#33496
(cherry picked from commit ad0901aec1)
* feishu: forward outbound reply target context
(cherry picked from commit c129a691fc)
* feishu extension: tighten reply target fallback semantics
(cherry picked from commit f85ec610f2)
* fix(feishu): align synthesized fallback typing and changelog attribution
* test(feishu): cover group_topic_sender reply targeting
---------
Co-authored-by: Xu Zimo <xuzimojimmy@163.com>
Co-authored-by: Munem Hashmi <munem.hashmi@gmail.com>
Co-authored-by: bmendonca3 <bmendonca3@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
* daemon(systemd): fall back to machine user scope when user bus is missing
* test(systemd): cover machine scope fallback for user-bus errors
* test(systemd): reset execFile mock state across cases
* test(systemd): make machine-user fallback assertion portable
* fix(daemon): keep root sudo path on direct user scope
* test(systemd): cover sudo root user-scope behavior
* ci: use resolvable bun version in setup-node-env
* daemon(systemd): target sudo caller user scope
* test(systemd): cover sudo user scope commands
* infra(ports): fall back to ss when lsof missing
* test(ports): verify ss fallback listener detection
* cli(gateway): use probe fallback for restart health
* test(gateway): cover restart-health probe fallback
createOllamaStreamFn() only accepted baseUrl, ignoring custom headers
configured in models.providers.<provider>.headers. This caused 403
errors when Ollama endpoints are behind reverse proxies that require
auth headers (e.g. X-OLLAMA-KEY via HAProxy).
Add optional defaultHeaders parameter to createOllamaStreamFn() and
merge them into every fetch request. Provider headers from config are
now passed through at the call site in the embedded runner.
Fixes#24285
* feat(slack): add typingReaction config for DM typing indicator fallback
Adds a reaction-based typing indicator for Slack DMs that works without
assistant mode. When `channels.slack.typingReaction` is set (e.g.
"hourglass_flowing_sand"), the emoji is added to the user's message when
processing starts and removed when the reply is sent.
Addresses #19809
* test(slack): add typingReaction to createSlackMonitorContext test callers
* test(slack): add typingReaction to test context callers
* test(slack): add typingReaction to context fixture
* docs(changelog): credit Slack typingReaction feature
* test(slack): align existing-thread history expectation
---------
Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
- Export pickFirstExistingAgentId and use it to validate topic agentId
- Properly update mainSessionKey when overriding route agent
- Fix docs example showing incorrect session key for topic 3
Fixes issue where non-existent agentId would create orphaned sessions.
Fixes issue where DM topic replies would route to wrong agent.
This feature allows different topics within a Telegram forum supergroup to route
to different agents, each with isolated workspace, memory, and sessions.
Key changes:
- Add agentId field to TelegramTopicConfig type for per-topic routing
- Add zod validation for agentId in topic config schema
- Implement routing logic to re-derive session key with topic's agent
- Add debug logging for topic agent overrides
- Add unit tests for routing behavior (forum topics + DM topics)
- Add config validation tests
- Document feature in docs/channels/telegram.md
This builds on the approach from PR #31513 by @Sid-Qin with additional fixes
for security (preserved account fail-closed guard) and test coverage.
Closes#31473
* fix(gateway): correct launchctl command sequence for gateway restart (closes#20030)
* fix(restart): expand HOME and escape label in launchctl plist path
* fix(restart): poll port free after SIGKILL to prevent EADDRINUSE restart loop
When cleanStaleGatewayProcessesSync() kills a stale gateway process,
the kernel may not immediately release the TCP port. Previously the
function returned after a fixed 500ms sleep (300ms SIGTERM + 200ms
SIGKILL), allowing triggerOpenClawRestart() to hand off to systemd
before the port was actually free. The new systemd process then raced
the dying socket for port 18789, hit EADDRINUSE, and exited with
status 1, causing systemd to retry indefinitely — the zombie restart
loop reported in #33103.
Fix: add waitForPortFreeSync() that polls lsof at 50ms intervals for
up to 2 seconds after SIGKILL. cleanStaleGatewayProcessesSync() now
blocks until the port is confirmed free (or the budget expires with a
warning) before returning. The increased SIGTERM/SIGKILL wait budgets
(600ms / 400ms) also give slow processes more time to exit cleanly.
Fixes#33103
Related: #28134
* fix: add EADDRINUSE retry and TIME_WAIT port-bind checks for gateway startup
* fix(ports): treat EADDRNOTAVAIL as non-retryable and fix flaky test
* fix(gateway): hot-reload agents.defaults.models allowlist changes
The reload plan had a rule for `agents.defaults.model` (singular) but
not `agents.defaults.models` (plural — the allowlist array). Because
`agents.defaults.models` does not prefix-match `agents.defaults.model.`,
it fell through to the catch-all `agents` tail rule (kind=none), so
allowlist edits in openclaw.json were silently ignored at runtime.
Add a dedicated reload rule so changes to the models allowlist trigger
a heartbeat restart, which re-reads the config and serves the updated
list to clients.
Fixes#33600
Co-authored-by: HCL <chenglunhu@gmail.com>
Signed-off-by: HCL <chenglunhu@gmail.com>
* test(restart): 100% branch coverage — audit round 2
Audit findings fixed:
- remove dead guard: terminateStaleProcessesSync pids.length===0 check was
unreachable (only caller cleanStaleGatewayProcessesSync already guards)
- expose __testing.callSleepSyncRaw so sleepSync's real Atomics.wait path
can be unit-tested directly without going through the override
- fix broken sleepSync Atomics.wait test: previous test set override=null
but cleanStaleGatewayProcessesSync returned before calling sleepSync —
replaced with direct callSleepSyncRaw calls that actually exercise L36/L42-47
- fix pid collision: two tests used process.pid+304 (EPERM + dead-at-SIGTERM);
EPERM test changed to process.pid+305
- fix misindented tests: 'deduplicates pids' and 'lsof status 1 container
edge case' were outside their intended describe blocks; moved to correct
scopes (findGatewayPidsOnPortSync and pollPortOnce respectively)
- add missing branch tests:
- status 1 + non-empty stdout with zero openclaw pids → free:true (L145)
- mid-loop non-openclaw cmd in &&-chain (L67)
- consecutive p-lines without c-line between them (L67)
- invalid PID in p-line (p0 / pNaN) — ternary false branch (L67)
- unknown lsof output line (else-if false branch L69)
Coverage: 100% stmts / 100% branch / 100% funcs / 100% lines (36 tests)
* test(restart): fix stale-pid test typing for tsgo
* fix(gateway): address lifecycle review findings
* test(update): make restart-helper path assertions windows-safe
---------
Signed-off-by: HCL <chenglunhu@gmail.com>
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: Efe Büken <efe@arven.digital>
Co-authored-by: Riccardo Marino <rmarino@apple.com>
Co-authored-by: HCL <chenglunhu@gmail.com>
Synthesize runtime state transition fixes for compaction tool-use integrity and long-running handler backpressure.
Sources: #33630, #33583
Co-authored-by: Kevin Shenghui <shenghuikevin@gmail.com>
Co-authored-by: Theo Tarr <theodore@tarr.com>