* feat: add LiteLLM provider types, env var, credentials, and auth choice
Add litellm-api-key auth choice, LITELLM_API_KEY env var mapping,
setLitellmApiKey() credential storage, and LITELLM_DEFAULT_MODEL_REF.
* feat: add LiteLLM onboarding handler and provider config
Add applyLitellmProviderConfig which properly registers
models.providers.litellm with baseUrl, api type, and model definitions.
This fixes the critical bug from PR #6488 where the provider entry was
never created, causing model resolution to fail at runtime.
* docs: add LiteLLM provider documentation
Add setup guide covering onboarding, manual config, virtual keys,
model routing, and usage tracking. Link from provider index.
* docs: add LiteLLM to sidebar navigation in docs.json
Add providers/litellm to both English and Chinese provider page lists
so the docs page appears in the sidebar navigation.
* test: add LiteLLM non-interactive onboarding test
Wire up litellmApiKey flag inference and auth-choice handler for the
non-interactive onboarding path, and add an integration test covering
profile, model default, and credential storage.
* fix: register --litellm-api-key CLI flag and add preferred provider mapping
Wire up the missing Commander CLI option, action handler mapping, and
help text for --litellm-api-key. Add litellm-api-key to the preferred
provider map for consistency with other providers.
* fix: remove zh-CN sidebar entry for litellm (no localized page yet)
* style: format buildLitellmModelDefinition return type
* fix(onboarding): harden LiteLLM provider setup (#12823)
* refactor(onboarding): keep auth-choice provider dispatcher under size limit
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>
* fix(browser): prevent permanent timeout after stuck evaluate
Thread AbortSignal from client-fetch through dispatcher to Playwright
operations. When a timeout fires, force-disconnect the Playwright CDP
connection to unblock the serialized command queue, allowing the next
call to reconnect transparently.
Key changes:
- client-fetch.ts: proper AbortController with signal propagation
- pw-session.ts: new forceDisconnectPlaywrightForTarget()
- pw-tools-core.interactions.ts: accept signal, align inner timeout
to outer-500ms, inject in-browser Promise.race for async evaluates
- routes/dispatcher.ts + types.ts: propagate signal through dispatch
- server.ts + bridge-server.ts: Express middleware creates AbortSignal
from request lifecycle
- client-actions-core.ts: add timeoutMs to evaluate type
Fixes#10994
* fix(browser): v2 - force-disconnect via Connection.close() instead of browser.close()
When page.evaluate() is stuck on a hung CDP transport, browser.close() also
hangs because it tries to send a close command through the same stuck pipe.
v2 fix: forceDisconnectPlaywrightForTarget now directly calls Playwright's
internal Connection.close() which locally rejects all pending callbacks and
emits 'disconnected' without touching the network. This instantly unblocks
all stuck Playwright operations.
closePlaywrightBrowserConnection (clean shutdown) now also has a 3s timeout
fallback that drops to forceDropConnection if browser.close() hangs.
Fixes permanent browser timeout after stuck evaluate.
* fix(browser): v3 - fire-and-forget browser.close() instead of Connection.close()
v2's forceDropConnection called browser._connection.close() which corrupts
the entire Playwright instance because Connection is shared across all
objects (BrowserType, Browser, Page, etc.). This prevented reconnection
with cascading 'connectOverCDP: Force-disconnected' errors.
v3 fix: forceDisconnectPlaywrightForTarget now:
1. Nulls cached connection immediately
2. Fire-and-forgets browser.close() (doesn't await — it may hang)
3. Next connectBrowser() creates a fresh connectOverCDP WebSocket
Each connectOverCDP creates an independent WebSocket to the CDP endpoint,
so the new connection is unaffected by the old one's pending close.
The old browser.close() eventually resolves when the in-browser evaluate
timeout fires, or the old connection gets GC'd.
* fix(browser): v4 - clear connecting state and remove stale disconnect listeners
The reconnect was failing because:
1. forceDisconnectPlaywrightForTarget nulled cached but not connecting,
so subsequent calls could await a stale promise
2. The old browser's 'disconnected' event handler raced with new
connections, nulling the fresh cached reference
Fix: null both cached and connecting, and removeAllListeners on the
old browser before fire-and-forget close.
* fix(browser): v5 - use raw CDP Runtime.terminateExecution to kill stuck evaluate
When forceDisconnectPlaywrightForTarget fires, open a raw WebSocket
to the stuck page's CDP endpoint and send Runtime.terminateExecution.
This kills running JS without navigating away or crashing the page.
Also clear connecting state and remove stale disconnect listeners.
* fix(browser): abort cancels stuck evaluate
* Browser: always cleanup evaluate abort listener
* Chore: remove Playwright debug scripts
* Docs: add CDP evaluate refactor plan
* Browser: refactor Playwright force-disconnect
* Browser: abort stops evaluate promptly
* Node host: extract withTimeout helper
* Browser: remove disconnected listener safely
* Changelog: note act:evaluate hang fix
---------
Co-authored-by: Bob <bob@dutifulbob.com>
Discussion: https://github.com/openclaw/openclaw/discussions/13528
## Checklist
- [x] **Mark as AI-assisted in the PR title or description** - Implemented by 🤖, reviewed by 👨💻
- [x] **Note the degree of testing** - fully tested and I use it myself
- [x] **Include prompts or session logs if possible (super helpful!)** - I can try doing a "resume" on a few sessions, but don't think it'll provide value. Lmk if this is a blocker.
- [x] **Confirm you understand what the code does** - It's simple :)
## Summary of changes
- **ClawDock** - Shell helpers replace verbose `docker-compose` commands with simple `clawdock-*` shortcuts
- **Zero-config setup** - First run auto-detects the OpenClaw project directory from common paths and saves the config for future use
- **No extra dependencies** - Just bash
- **Built-in auth & device pairing helpers** - `clawdock-fix-token`, `clawdock-dashboard`, etc to handle gateay setup, streamline web UI, etc...
- **Updated Docker docs** - Installation docs now include the optional ClawDock helper setup for users who want simplified container management
## Example Usage
```bash
$ clawdock-help
🦞 ClawDock - Docker Helpers for OpenClaw
⚡ Basic Operations
clawdock-start Start the gateway
clawdock-stop Stop the gateway
clawdock-restart Restart the gateway
clawdock-status Check container status
clawdock-logs View live logs (follows)
🐚 Container Access
clawdock-shell Shell into container (openclaw alias ready)
clawdock-cli Run CLI commands (e.g., clawdock-cli status)
clawdock-exec <cmd> Execute command in gateway container
🌐 Web UI & Devices
clawdock-dashboard Open web UI in browser (auto-guides you)
clawdock-devices List device pairings (auto-guides you)
clawdock-approve <id> Approve device pairing (with examples)
⚙️ Setup & Configuration
clawdock-fix-token Configure gateway token (run once)
🔧 Maintenance
clawdock-rebuild Rebuild Docker image
clawdock-clean ⚠️ Remove containers & volumes (nuclear)
🛠️ Utilities
clawdock-health Run health check
clawdock-token Show gateway auth token
clawdock-cd Jump to openclaw project directory
clawdock-config Open config directory (~/.openclaw)
clawdock-workspace Open workspace directory
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🚀 First Time Setup
1. clawdock-start # Start the gateway
2. clawdock-fix-token # Configure token
3. clawdock-dashboard # Open web UI
4. clawdock-devices # If pairing needed
5. clawdock-approve <id> # Approve pairing
💬 WhatsApp Setup
clawdock-shell
> openclaw channels login --channel whatsapp
> openclaw status
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 All commands guide you through next steps!
📚 Docs: https://docs.openclaw.ai
```\n\nCo-authored-by: Gustavo Madeira Santana <gumadeiras@gmail.com>
* fix: prune stale session entries, cap entry count, and rotate sessions.json
The sessions.json file grows unbounded over time. Every heartbeat tick (default: 30m)
triggers multiple full rewrites, and session keys from groups, threads, and DMs
accumulate indefinitely with large embedded objects (skillsSnapshot,
systemPromptReport). At >50MB the synchronous JSON parse blocks the event loop,
causing Telegram webhook timeouts and effectively taking the bot down.
Three mitigations, all running inside saveSessionStoreUnlocked() on every write:
1. Prune stale entries: remove entries with updatedAt older than 30 days
(configurable via session.maintenance.pruneDays in openclaw.json)
2. Cap entry count: keep only the 500 most recently updated entries
(configurable via session.maintenance.maxEntries). Entries without updatedAt
are evicted first.
3. File rotation: if the existing sessions.json exceeds 10MB before a write,
rename it to sessions.json.bak.{timestamp} and keep only the 3 most recent
backups (configurable via session.maintenance.rotateBytes).
All three thresholds are configurable under session.maintenance in openclaw.json
with Zod validation. No env vars.
Existing tests updated to use Date.now() instead of epoch-relative timestamps
(1, 2, 3) that would be incorrectly pruned as stale.
27 new tests covering pruning, capping, rotation, and integration scenarios.
* feat: auto-prune expired cron run sessions (#12289)
Add TTL-based reaper for isolated cron run sessions that accumulate
indefinitely in sessions.json.
New config option:
cron.sessionRetention: string | false (default: '24h')
The reaper runs piggy-backed on the cron timer tick, self-throttled
to sweep at most every 5 minutes. It removes session entries matching
the pattern cron:<jobId>:run:<uuid> whose updatedAt + retention < now.
Design follows the Kubernetes ttlSecondsAfterFinished pattern:
- Sessions are persisted normally (observability/debugging)
- A periodic reaper prunes expired entries
- Configurable retention with sensible default
- Set to false to disable pruning entirely
Files changed:
- src/config/types.cron.ts: Add sessionRetention to CronConfig
- src/config/zod-schema.ts: Add Zod validation for sessionRetention
- src/cron/session-reaper.ts: New reaper module (sweepCronRunSessions)
- src/cron/session-reaper.test.ts: 12 tests covering all paths
- src/cron/service/state.ts: Add cronConfig/sessionStorePath to deps
- src/cron/service/timer.ts: Wire reaper into onTimer tick
- src/gateway/server-cron.ts: Pass config and session store path to deps
Closes#12289
* fix: sweep cron session stores per agent
* docs: add changelog for session maintenance (#13083) (thanks @skyfallsin, @Glucksberg)
* fix: add warn-only session maintenance mode
* fix: warn-only maintenance defaults to active session
* fix: deliver maintenance warnings to active session
* docs: add session maintenance examples
* fix: accept duration and size maintenance thresholds
* refactor: share cron run session key check
* fix: format issues and replace defaultRuntime.warn with console.warn
---------
Co-authored-by: Pradeep Elankumaran <pradeepe@gmail.com>
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: max <40643627+quotentiroler@users.noreply.github.com>
Co-authored-by: quotentiroler <max.nussbaumer@maxhealth.tech>
* fix(tools): correct Grok response parsing for xAI Responses API
The xAI Responses API returns content in output[0].content[0].text,
not in output_text field. Updated GrokSearchResponse type and
runGrokSearch to extract content from the correct path.
Fixes the 'No response' issue when using Grok web search.
* fix(tools): harden Grok web_search parsing (#13049) (thanks @ereid7)
---------
Co-authored-by: erai <erai@erais-Mac-mini.local>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
* fix: cap Discord gateway reconnect attempts to prevent infinite loop
The Discord GatewayPlugin was configured with maxAttempts: Infinity,
which causes an unbounded reconnection loop when the Discord gateway
enters a persistent failure state (e.g. code 1005 with stalled HELLO).
In production, this manifested as 2,483+ reconnection attempts in a
single log file, starving the Node.js event loop and preventing cron,
heartbeat, and other subsystems from functioning.
Cap maxAttempts at 50, which provides ~25 minutes of retry time
(with 30s HELLO timeout between attempts) before cleanly exiting
via the existing "Max reconnect attempts" error handler.
Closes#11836
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Changelog: note Discord gateway reconnect cap (#12230) (thanks @Yida-Dev)
---------
Co-authored-by: Yida-Dev <reyifeijun@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Shadow <shadow@clawd.bot>
* Config: reload dotenv before env substitution on runtime loads
* Test: isolate config env var regression from host state env
* fix: keep dotenv vars resolvable on runtime config reloads (#12748) (thanks @rodrigouroz)
---------
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
* fix(web_search): Fix invalid model name sent to Perplexity
* chore: Only apply fix to direct Perplexity calls
* fix(web_search): normalize direct Perplexity model IDs
* fix: add changelog note for perplexity model normalization (#12795) (thanks @cdorsey)
* fix: align tests and fetch type for gate stability (#12795) (thanks @cdorsey)
* chore: keep #12795 scoped to web_search changes
---------
Co-authored-by: Sebastian <19554889+sebslight@users.noreply.github.com>
* Memory/QMD: symlink default model cache into custom XDG_CACHE_HOME
QmdMemoryManager overrides XDG_CACHE_HOME to isolate the qmd index
per-agent, but this also moves where qmd looks for its ML models
(~2.1GB). Since models are installed at the default location
(~/.cache/qmd/models/), every qmd invocation would attempt to
re-download them from HuggingFace and time out.
Fix: on initialization, symlink ~/.cache/qmd/models/ into the custom
XDG_CACHE_HOME path so the index stays isolated per-agent while the
shared models are reused. The symlink is only created when the default
models directory exists and the target path does not already exist.
Includes tests for the three key scenarios: symlink creation, existing
directory preservation, and graceful skip when no default models exist.
* Memory/QMD: skip model symlink warning on ENOENT
* test: stabilize warning-filter visibility assertion (#12114) (thanks @tyler6204)
* fix: add changelog entry for QMD cache reuse (#12114) (thanks @tyler6204)
* fix: handle plain context-overflow strings in compaction detection (#12114) (thanks @tyler6204)
* fix(cron): recover flat params when LLM omits job wrapper (#11310)
Non-frontier models (e.g. Grok) flatten job properties to the top level
alongside `action` instead of nesting them inside the `job` parameter.
The opaque schema (`Type.Object({}, { additionalProperties: true })`)
gives these models no structural hint, so they put name, schedule,
payload, etc. as siblings of action.
Add a flat-params recovery step in the cron add handler: when
`params.job` is missing or an empty object, scan for recognised job
property names on params and construct a synthetic job object before
passing to `normalizeCronJobCreate`. Recovery requires at least one
meaningful signal field (schedule, payload, message, or text) to avoid
false positives.
Added tests:
- Flat params with no job wrapper → recovered
- Empty job object + flat params → recovered
- Message shorthand at top level → inferred as agentTurn
- No meaningful fields → still throws 'job required'
- Non-empty job takes precedence over flat params
* fix(cron): floor nowMs to second boundary before croner lookback
Cron expressions operate at second granularity. When nowMs falls
mid-second (e.g. 12:00:00.500) and the pattern targets that exact
second (like '0 0 12 * * *'), a 1ms lookback still lands inside the
matching second. Croner interprets this as 'already past' and skips
to the next occurrence (e.g. the following day).
Fix: floor nowMs to the start of the current second before applying
the 1ms lookback. This ensures the reference always falls in the
*previous* second, so croner correctly identifies the current match.
Also compare the result against the floored nowSecondMs (not raw nowMs)
so that a match at the start of the current second is not rejected by
the >= guard when nowMs has sub-second offset.
Adds regression tests for 6-field cron patterns with specific seconds.
* fix: add changelog entries for cron fixes (#12124) (thanks @tyler6204)
* test: stabilize warning filter emit assertion (#12124) (thanks @tyler6204)
* feat: Implement Telegram video note support with tests and docs
* fixing lint
* feat: add doctor-state-integrity command, Telegram messaging, and PowerShell Docker setup scripts.
* Update src/telegram/send.video-note.test.ts
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* fix: Set video note follow-up text to undefined for empty input and adjust caption test expectation.
* test: add assertion for `sendMessage` with reply markup and HTML parse mode in `send.video-note` test.
* docs: add changelog entry for Telegram video notes
---------
Co-authored-by: Evgenii Utkin <thewulf7@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: CLAWDINATOR Bot <clawdinator[bot]@users.noreply.github.com>
* fix(skills): ignore Python venvs and caches in skills watcher
Add .venv, venv, __pycache__, .mypy_cache, .pytest_cache, build, and
.cache to the default ignored patterns for the skills watcher.
This prevents file descriptor exhaustion when a skill contains a Python
virtual environment with tens of thousands of files, which was causing
EBADF spawn errors on macOS.
Fixes#1056
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: add changelog entry for skills watcher ignores
* docs: fill changelog PR number
---------
Co-authored-by: Kyle Howells <freerunnering@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: CLAWDINATOR Bot <clawdinator[bot]@users.noreply.github.com>