mirror of
https://github.com/openclaw/openclaw.git
synced 2026-05-11 22:53:44 +00:00
fix: release stale session locks and add watchdog for hung API calls (#18060)
When a model API call hangs indefinitely (e.g. Anthropic quota exceeded
mid-call), the gateway acquires a session .jsonl.lock but the promise
never resolves, so the try/finally block never reaches release(). Since
the owning PID is the gateway itself, stale detection cannot help —
isPidAlive() always returns true.
This commit adds four layers of defense:
1. **In-process lock watchdog** (session-write-lock.ts)
- Track acquiredAt timestamp on each held lock
- 60-second interval timer checks all held locks
- Auto-releases any lock held longer than maxHoldMs (default 5 min)
- Catches the hung-API-call case that try/finally cannot
2. **Gateway startup cleanup** (server-startup.ts)
- On boot, scan all agent session directories for *.jsonl.lock files
- Remove locks with dead PIDs or older than staleMs (30 min)
- Log each cleaned lock for diagnostics
3. **openclaw doctor stale lock detection** (doctor-session-locks.ts)
- New health check scans for .jsonl.lock files
- Reports PID status and age of each lock found
- In --fix mode, removes stale locks automatically
4. **Transcript error entry on API failure** (attempt.ts)
- When promptError is set, write an error marker to the session
transcript before releasing the lock
- Preserves conversation history even on model API failures
Closes #18060
This commit is contained in:
committed by
Peter Steinberger
parent
7d8d8c338b
commit
e91a5b0216
@@ -44,6 +44,7 @@ import {
|
||||
import { createDoctorPrompter, type DoctorOptions } from "./doctor-prompter.js";
|
||||
import { maybeRepairSandboxImages, noteSandboxScopeWarnings } from "./doctor-sandbox.js";
|
||||
import { noteSecurityWarnings } from "./doctor-security.js";
|
||||
import { noteSessionLockHealth } from "./doctor-session-locks.js";
|
||||
import { noteStateIntegrity, noteWorkspaceBackupTip } from "./doctor-state-integrity.js";
|
||||
import {
|
||||
detectLegacyStateMigrations,
|
||||
@@ -188,6 +189,7 @@ export async function doctorCommand(
|
||||
}
|
||||
|
||||
await noteStateIntegrity(cfg, prompter, configResult.path ?? CONFIG_PATH);
|
||||
await noteSessionLockHealth({ shouldRepair: prompter.shouldRepair });
|
||||
|
||||
cfg = await maybeRepairSandboxImages(cfg, runtime, prompter);
|
||||
noteSandboxScopeWarnings(cfg);
|
||||
|
||||
Reference in New Issue
Block a user