Session/Cron maintenance hardening and cleanup UX (#24753)

Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: 7533b85156 Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com> Co-authored-by: shakkernerd <165377636+shakkernerd@users.noreply.github.com> Reviewed-by: @shakkernerd
2026-04-19 05:27:26 +00:00 · 2026-02-23 17:39:48 -05:00
parent 29b19455e3
commit eff3c5c707
49 changed files with 3180 additions and 235 deletions
--- a/docs/automation/cron-jobs.md
+++ b/docs/automation/cron-jobs.md
@@ -349,7 +349,8 @@ Notes:
 ## Storage & history

 - Job store: `~/.openclaw/cron/jobs.json` (Gateway-managed JSON).
- Run history: `~/.openclaw/cron/runs/<jobId>.jsonl` (JSONL, auto-pruned).
+- Run history: `~/.openclaw/cron/runs/<jobId>.jsonl` (JSONL, auto-pruned by size and line count).
+- Isolated cron run sessions in `sessions.json` are pruned by `cron.sessionRetention` (default `24h`; set `false` to disable).
 - Override store path: `cron.store` in config.

 ## Configuration
@@ -362,10 +363,21 @@ Notes:
    maxConcurrentRuns: 1, // default 1
    webhook: "https://example.invalid/legacy", // deprecated fallback for stored notify:true jobs
    webhookToken: "replace-with-dedicated-webhook-token", // optional bearer token for webhook mode
+    sessionRetention: "24h", // duration string or false
+    runLog: {
+      maxBytes: "2mb", // default 2_000_000 bytes
+      keepLines: 2000, // default 2000
+    },
  },
 }
 ```

+Run-log pruning behavior:
+
+- `cron.runLog.maxBytes`: max run-log file size before pruning.
+- `cron.runLog.keepLines`: when pruning, keep only the newest N lines.
+- Both apply to `cron/runs/<jobId>.jsonl` files.
+
 Webhook behavior:

 - Preferred: set `delivery.mode: "webhook"` with `delivery.to: "https://..."` per job.
@@ -380,6 +392,85 @@ Disable cron entirely:
 - `cron.enabled: false` (config)
 - `OPENCLAW_SKIP_CRON=1` (env)

+## Maintenance
+
+Cron has two built-in maintenance paths: isolated run-session retention and run-log pruning.
+
+### Defaults
+
+- `cron.sessionRetention`: `24h` (set `false` to disable run-session pruning)
+- `cron.runLog.maxBytes`: `2_000_000` bytes
+- `cron.runLog.keepLines`: `2000`
+
+### How it works
+
+- Isolated runs create session entries (`...:cron:<jobId>:run:<uuid>`) and transcript files.
+- The reaper removes expired run-session entries older than `cron.sessionRetention`.
+- For removed run sessions no longer referenced by the session store, OpenClaw archives transcript files and purges old deleted archives on the same retention window.
+- After each run append, `cron/runs/<jobId>.jsonl` is size-checked:
+  - if file size exceeds `runLog.maxBytes`, it is trimmed to the newest `runLog.keepLines` lines.
+
+### Performance caveat for high volume schedulers
+
+High-frequency cron setups can generate large run-session and run-log footprints. Maintenance is built in, but loose limits can still create avoidable IO and cleanup work.
+
+What to watch:
+
+- long `cron.sessionRetention` windows with many isolated runs
+- high `cron.runLog.keepLines` combined with large `runLog.maxBytes`
+- many noisy recurring jobs writing to the same `cron/runs/<jobId>.jsonl`
+
+What to do:
+
+- keep `cron.sessionRetention` as short as your debugging/audit needs allow
+- keep run logs bounded with moderate `runLog.maxBytes` and `runLog.keepLines`
+- move noisy background jobs to isolated mode with delivery rules that avoid unnecessary chatter
+- review growth periodically with `openclaw cron runs` and adjust retention before logs become large
+
+### Customize examples
+
+Keep run sessions for a week and allow bigger run logs:
+
+```json5
+{
+  cron: {
+    sessionRetention: "7d",
+    runLog: {
+      maxBytes: "10mb",
+      keepLines: 5000,
+    },
+  },
+}
+```
+
+Disable isolated run-session pruning but keep run-log pruning:
+
+```json5
+{
+  cron: {
+    sessionRetention: false,
+    runLog: {
+      maxBytes: "5mb",
+      keepLines: 3000,
+    },
+  },
+}
+```
+
+Tune for high-volume cron usage (example):
+
+```json5
+{
+  cron: {
+    sessionRetention: "12h",
+    runLog: {
+      maxBytes: "3mb",
+      keepLines: 1500,
+    },
+  },
+}
+```
+
 ## CLI quickstart

 One-shot reminder (UTC ISO, auto-delete after success):
--- a/docs/cli/cron.md
+++ b/docs/cli/cron.md
@@ -23,6 +23,11 @@ Note: one-shot (`--at`) jobs delete after success by default. Use `--keep-after-

 Note: recurring jobs now use exponential retry backoff after consecutive errors (30s → 1m → 5m → 15m → 60m), then return to normal schedule after the next successful run.

+Note: retention/pruning is controlled in config:
+
+- `cron.sessionRetention` (default `24h`) prunes completed isolated run sessions.
+- `cron.runLog.maxBytes` + `cron.runLog.keepLines` prune `~/.openclaw/cron/runs/<jobId>.jsonl`.
+
 ## Common edits

 Update delivery settings without changing the message:
--- a/docs/cli/doctor.md
+++ b/docs/cli/doctor.md
@@ -27,6 +27,7 @@ Notes:

 - Interactive prompts (like keychain/OAuth fixes) only run when stdin is a TTY and `--non-interactive` is **not** set. Headless runs (cron, Telegram, no terminal) will skip prompts.
 - `--fix` (alias for `--repair`) writes a backup to `~/.openclaw/openclaw.json.bak` and drops unknown config keys, listing each removal.
+- State integrity checks now detect orphan transcript files in the sessions directory and can archive them as `.deleted.<timestamp>` to reclaim space safely.

 ## macOS: `launchctl` env overrides

--- a/docs/cli/sessions.md
+++ b/docs/cli/sessions.md
@@ -11,6 +11,94 @@ List stored conversation sessions.

 ```bash
 openclaw sessions
+openclaw sessions --agent work
+openclaw sessions --all-agents
 openclaw sessions --active 120
 openclaw sessions --json
 ```
+
+Scope selection:
+
+- default: configured default agent store
+- `--agent <id>`: one configured agent store
+- `--all-agents`: aggregate all configured agent stores
+- `--store <path>`: explicit store path (cannot be combined with `--agent` or `--all-agents`)
+
+JSON examples:
+
+`openclaw sessions --all-agents --json`:
+
+```json
+{
+  "path": null,
+  "stores": [
+    { "agentId": "main", "path": "/home/user/.openclaw/agents/main/sessions/sessions.json" },
+    { "agentId": "work", "path": "/home/user/.openclaw/agents/work/sessions/sessions.json" }
+  ],
+  "allAgents": true,
+  "count": 2,
+  "activeMinutes": null,
+  "sessions": [
+    { "agentId": "main", "key": "agent:main:main", "model": "gpt-5" },
+    { "agentId": "work", "key": "agent:work:main", "model": "claude-opus-4-5" }
+  ]
+}
+```
+
+## Cleanup maintenance
+
+Run maintenance now (instead of waiting for the next write cycle):
+
+```bash
+openclaw sessions cleanup --dry-run
+openclaw sessions cleanup --agent work --dry-run
+openclaw sessions cleanup --all-agents --dry-run
+openclaw sessions cleanup --enforce
+openclaw sessions cleanup --enforce --active-key "agent:main:telegram:dm:123"
+openclaw sessions cleanup --json
+```
+
+`openclaw sessions cleanup` uses `session.maintenance` settings from config:
+
+- Scope note: `openclaw sessions cleanup` maintains session stores/transcripts only. It does not prune cron run logs (`cron/runs/<jobId>.jsonl`), which are managed by `cron.runLog.maxBytes` and `cron.runLog.keepLines` in [Cron configuration](/automation/cron-jobs#configuration) and explained in [Cron maintenance](/automation/cron-jobs#maintenance).
+
+- `--dry-run`: preview how many entries would be pruned/capped without writing.
+  - In text mode, dry-run prints a per-session action table (`Action`, `Key`, `Age`, `Model`, `Flags`) so you can see what would be kept vs removed.
+- `--enforce`: apply maintenance even when `session.maintenance.mode` is `warn`.
+- `--active-key <key>`: protect a specific active key from disk-budget eviction.
+- `--agent <id>`: run cleanup for one configured agent store.
+- `--all-agents`: run cleanup for all configured agent stores.
+- `--store <path>`: run against a specific `sessions.json` file.
+- `--json`: print a JSON summary. With `--all-agents`, output includes one summary per store.
+
+`openclaw sessions cleanup --all-agents --dry-run --json`:
+
+```json
+{
+  "allAgents": true,
+  "mode": "warn",
+  "dryRun": true,
+  "stores": [
+    {
+      "agentId": "main",
+      "storePath": "/home/user/.openclaw/agents/main/sessions/sessions.json",
+      "beforeCount": 120,
+      "afterCount": 80,
+      "pruned": 40,
+      "capped": 0
+    },
+    {
+      "agentId": "work",
+      "storePath": "/home/user/.openclaw/agents/work/sessions/sessions.json",
+      "beforeCount": 18,
+      "afterCount": 18,
+      "pruned": 0,
+      "capped": 0
+    }
+  ]
+}
+```
+
+Related:
+
+- Session config: [Configuration reference](/gateway/configuration-reference#session)
--- a/docs/concepts/session.md
+++ b/docs/concepts/session.md
@@ -71,6 +71,109 @@ All session state is **owned by the gateway** (the “master” OpenClaw). UI cl
 - Session entries include `origin` metadata (label + routing hints) so UIs can explain where a session came from.
 - OpenClaw does **not** read legacy Pi/Tau session folders.

+## Maintenance
+
+OpenClaw applies session-store maintenance to keep `sessions.json` and transcript artifacts bounded over time.
+
+### Defaults
+
+- `session.maintenance.mode`: `warn`
+- `session.maintenance.pruneAfter`: `30d`
+- `session.maintenance.maxEntries`: `500`
+- `session.maintenance.rotateBytes`: `10mb`
+- `session.maintenance.resetArchiveRetention`: defaults to `pruneAfter` (`30d`)
+- `session.maintenance.maxDiskBytes`: unset (disabled)
+- `session.maintenance.highWaterBytes`: defaults to `80%` of `maxDiskBytes` when budgeting is enabled
+
+### How it works
+
+Maintenance runs during session-store writes, and you can trigger it on demand with `openclaw sessions cleanup`.
+
+- `mode: "warn"`: reports what would be evicted but does not mutate entries/transcripts.
+- `mode: "enforce"`: applies cleanup in this order:
+  1. prune stale entries older than `pruneAfter`
+  2. cap entry count to `maxEntries` (oldest first)
+  3. archive transcript files for removed entries that are no longer referenced
+  4. purge old `*.deleted.<timestamp>` and `*.reset.<timestamp>` archives by retention policy
+  5. rotate `sessions.json` when it exceeds `rotateBytes`
+  6. if `maxDiskBytes` is set, enforce disk budget toward `highWaterBytes` (oldest artifacts first, then oldest sessions)
+
+### Performance caveat for large stores
+
+Large session stores are common in high-volume setups. Maintenance work is write-path work, so very large stores can increase write latency.
+
+What increases cost most:
+
+- very high `session.maintenance.maxEntries` values
+- long `pruneAfter` windows that keep stale entries around
+- many transcript/archive artifacts in `~/.openclaw/agents/<agentId>/sessions/`
+- enabling disk budgets (`maxDiskBytes`) without reasonable pruning/cap limits
+
+What to do:
+
+- use `mode: "enforce"` in production so growth is bounded automatically
+- set both time and count limits (`pruneAfter` + `maxEntries`), not just one
+- set `maxDiskBytes` + `highWaterBytes` for hard upper bounds in large deployments
+- keep `highWaterBytes` meaningfully below `maxDiskBytes` (default is 80%)
+- run `openclaw sessions cleanup --dry-run --json` after config changes to verify projected impact before enforcing
+- for frequent active sessions, pass `--active-key` when running manual cleanup
+
+### Customize examples
+
+Use a conservative enforce policy:
+
+```json5
+{
+  session: {
+    maintenance: {
+      mode: "enforce",
+      pruneAfter: "45d",
+      maxEntries: 800,
+      rotateBytes: "20mb",
+      resetArchiveRetention: "14d",
+    },
+  },
+}
+```
+
+Enable a hard disk budget for the sessions directory:
+
+```json5
+{
+  session: {
+    maintenance: {
+      mode: "enforce",
+      maxDiskBytes: "1gb",
+      highWaterBytes: "800mb",
+    },
+  },
+}
+```
+
+Tune for larger installs (example):
+
+```json5
+{
+  session: {
+    maintenance: {
+      mode: "enforce",
+      pruneAfter: "14d",
+      maxEntries: 2000,
+      rotateBytes: "25mb",
+      maxDiskBytes: "2gb",
+      highWaterBytes: "1.6gb",
+    },
+  },
+}
+```
+
+Preview or force maintenance from CLI:
+
+```bash
+openclaw sessions cleanup --dry-run
+openclaw sessions cleanup --enforce
+```
+
 ## Session pruning

 OpenClaw trims **old tool results** from the in-memory context right before LLM calls by default.
--- a/docs/gateway/configuration-examples.md
+++ b/docs/gateway/configuration-examples.md
@@ -169,6 +169,9 @@ Save to `~/.openclaw/openclaw.json` and you can DM the bot from that number.
      pruneAfter: "30d",
      maxEntries: 500,
      rotateBytes: "10mb",
+      resetArchiveRetention: "30d", // duration or false
+      maxDiskBytes: "500mb", // optional
+      highWaterBytes: "400mb", // optional (defaults to 80% of maxDiskBytes)
    },
    typingIntervalSeconds: 5,
    sendPolicy: {
@@ -355,6 +358,10 @@ Save to `~/.openclaw/openclaw.json` and you can DM the bot from that number.
    store: "~/.openclaw/cron/cron.json",
    maxConcurrentRuns: 2,
    sessionRetention: "24h",
+    runLog: {
+      maxBytes: "2mb",
+      keepLines: 2000,
+    },
  },

  // Webhooks
--- a/docs/gateway/configuration-reference.md
+++ b/docs/gateway/configuration-reference.md
@@ -1246,6 +1246,9 @@ See [Multi-Agent Sandbox & Tools](/tools/multi-agent-sandbox-tools) for preceden
      pruneAfter: "30d",
      maxEntries: 500,
      rotateBytes: "10mb",
+      resetArchiveRetention: "30d", // duration or false
+      maxDiskBytes: "500mb", // optional hard budget
+      highWaterBytes: "400mb", // optional cleanup target
    },
    threadBindings: {
      enabled: true,
@@ -1273,7 +1276,14 @@ See [Multi-Agent Sandbox & Tools](/tools/multi-agent-sandbox-tools) for preceden
 - **`resetByType`**: per-type overrides (`direct`, `group`, `thread`). Legacy `dm` accepted as alias for `direct`.
 - **`mainKey`**: legacy field. Runtime now always uses `"main"` for the main direct-chat bucket.
 - **`sendPolicy`**: match by `channel`, `chatType` (`direct|group|channel`, with legacy `dm` alias), `keyPrefix`, or `rawKeyPrefix`. First deny wins.
- **`maintenance`**: `warn` warns the active session on eviction; `enforce` applies pruning and rotation.
+- **`maintenance`**: session-store cleanup + retention controls.
+  - `mode`: `warn` emits warnings only; `enforce` applies cleanup.
+  - `pruneAfter`: age cutoff for stale entries (default `30d`).
+  - `maxEntries`: maximum number of entries in `sessions.json` (default `500`).
+  - `rotateBytes`: rotate `sessions.json` when it exceeds this size (default `10mb`).
+  - `resetArchiveRetention`: retention for `*.reset.<timestamp>` transcript archives. Defaults to `pruneAfter`; set `false` to disable.
+  - `maxDiskBytes`: optional sessions-directory disk budget. In `warn` mode it logs warnings; in `enforce` mode it removes oldest artifacts/sessions first.
+  - `highWaterBytes`: optional target after budget cleanup. Defaults to `80%` of `maxDiskBytes`.
 - **`threadBindings`**: global defaults for thread-bound session features.
  - `enabled`: master default switch (providers can override; Discord uses `channels.discord.threadBindings.enabled`)
  - `ttlHours`: default auto-unfocus TTL in hours (`0` disables; providers can override)
@@ -2459,11 +2469,17 @@ Current builds no longer include the TCP bridge. Nodes connect over the Gateway
    webhook: "https://example.invalid/legacy", // deprecated fallback for stored notify:true jobs
    webhookToken: "replace-with-dedicated-token", // optional bearer token for outbound webhook auth
    sessionRetention: "24h", // duration string or false
+    runLog: {
+      maxBytes: "2mb", // default 2_000_000 bytes
+      keepLines: 2000, // default 2000
+    },
  },
 }
 ```

- `sessionRetention`: how long to keep completed cron sessions before pruning. Default: `24h`.
+- `sessionRetention`: how long to keep completed isolated cron run sessions before pruning from `sessions.json`. Also controls cleanup of archived deleted cron transcripts. Default: `24h`; set `false` to disable.
+- `runLog.maxBytes`: max size per run log file (`cron/runs/<jobId>.jsonl`) before pruning. Default: `2_000_000` bytes.
+- `runLog.keepLines`: newest lines retained when run-log pruning is triggered. Default: `2000`.
 - `webhookToken`: bearer token used for cron webhook POST delivery (`delivery.mode = "webhook"`), if omitted no auth header is sent.
 - `webhook`: deprecated legacy fallback webhook URL (http/https) used only for stored jobs that still have `notify: true`.

--- a/docs/gateway/configuration.md
+++ b/docs/gateway/configuration.md
@@ -251,11 +251,17 @@ When validation fails:
        enabled: true,
        maxConcurrentRuns: 2,
        sessionRetention: "24h",
+        runLog: {
+          maxBytes: "2mb",
+          keepLines: 2000,
+        },
      },
    }
    ```

-    See [Cron jobs](/automation/cron-jobs) for the feature overview and CLI examples.
+    - `sessionRetention`: prune completed isolated run sessions from `sessions.json` (default `24h`; set `false` to disable).
+    - `runLog`: prune `cron/runs/<jobId>.jsonl` by size and retained lines.
+    - See [Cron jobs](/automation/cron-jobs) for feature overview and CLI examples.

  </Accordion>

--- a/docs/reference/session-management-compaction.md
+++ b/docs/reference/session-management-compaction.md
@@ -65,6 +65,44 @@ OpenClaw resolves these via `src/config/sessions.ts`.

 ---

+## Store maintenance and disk controls
+
+Session persistence has automatic maintenance controls (`session.maintenance`) for `sessions.json` and transcript artifacts:
+
+- `mode`: `warn` (default) or `enforce`
+- `pruneAfter`: stale-entry age cutoff (default `30d`)
+- `maxEntries`: cap entries in `sessions.json` (default `500`)
+- `rotateBytes`: rotate `sessions.json` when oversized (default `10mb`)
+- `resetArchiveRetention`: retention for `*.reset.<timestamp>` transcript archives (default: same as `pruneAfter`; `false` disables cleanup)
+- `maxDiskBytes`: optional sessions-directory budget
+- `highWaterBytes`: optional target after cleanup (default `80%` of `maxDiskBytes`)
+
+Enforcement order for disk budget cleanup (`mode: "enforce"`):
+
+1. Remove oldest archived or orphan transcript artifacts first.
+2. If still above the target, evict oldest session entries and their transcript files.
+3. Keep going until usage is at or below `highWaterBytes`.
+
+In `mode: "warn"`, OpenClaw reports potential evictions but does not mutate the store/files.
+
+Run maintenance on demand:
+
+```bash
+openclaw sessions cleanup --dry-run
+openclaw sessions cleanup --enforce
+```
+
+---
+
+## Cron sessions and run logs
+
+Isolated cron runs also create session entries/transcripts, and they have dedicated retention controls:
+
+- `cron.sessionRetention` (default `24h`) prunes old isolated cron run sessions from the session store (`false` disables).
+- `cron.runLog.maxBytes` + `cron.runLog.keepLines` prune `~/.openclaw/cron/runs/<jobId>.jsonl` files (defaults: `2_000_000` bytes and `2000` lines).
+
+---
+
 ## Session keys (`sessionKey`)

 A `sessionKey` identifies _which conversation bucket_ you’re in (routing + isolation).