Commit Graph

8 Commits

Author SHA1 Message Date
Peter Steinberger
07527e22ce refactor(auth-profiles): centralize active-window logic + strengthen regression coverage 2026-02-22 13:23:19 +01:00
Peter Steinberger
7c3c406a35 fix: keep auth-profile cooldown windows immutable in-window (#23536) (thanks @arosstale) 2026-02-22 13:14:02 +01:00
artale
dc69610d51 fix(auth-profiles): never shorten cooldown deadline on retry
When the backoff saturates at 60 min and retries fire every 30 min
(e.g. cron jobs), each failed request was resetting cooldownUntil to
now+60m.  Because now+60m < existing deadline, the window kept getting
renewed and the profile never recovered without manually clearing
usageStats in auth-profiles.json.

Fix: only write a new cooldownUntil (or disabledUntil for billing) when
the new deadline is strictly later than the existing one.  This lets the
original window expire naturally while still allowing genuine backoff
extension when error counts climb further.

Fixes #23516

[AI-assisted]
2026-02-22 13:14:02 +01:00
Peter Steinberger
b257ba9e30 test(auth-profiles): dedupe cleared-state assertions 2026-02-22 07:44:57 +00:00
Nabbil Khan
f91034aa6b fix(auth): clear all usage stats fields in clearAuthProfileCooldown (openclaw#19211) thanks @nabbilkhan
Verified:
- pnpm install --frozen-lockfile
- pnpm build
- pnpm check
- pnpm test:macmini

Co-authored-by: nabbilkhan <203121263+nabbilkhan@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-19 22:21:37 -06:00
Peter Steinberger
28d49b8d44 refactor(auth-profiles): reuse cooldown timestamp resolver 2026-02-18 17:13:47 +00:00
nabbilkhan
250896cf6e fix: correct contradictory test name (Greptile review)
The test verifies that cooldownUntil IS cleared when it equals exactly
`now` (>= comparison), but the test name said "does not clear". Fixed
the name to match the actual assertion behavior.
2026-02-16 12:53:45 -06:00
nabbilkhan
03cadc4b7a fix(auth): auto-expire stale auth profile cooldowns and reset error count
When an auth profile hits a rate limit, `errorCount` is incremented and
`cooldownUntil` is set with exponential backoff. After the cooldown
expires, the time-based check correctly returns false — but `errorCount`
persists. The next transient failure immediately escalates to a much
longer cooldown because the backoff formula uses the stale count:

  60s × 5^(errorCount-1), max 1h

This creates a positive feedback loop where profiles appear permanently
stuck after rate limits, requiring manual JSON editing to recover.

Add `clearExpiredCooldowns()` which sweeps all profiles on every call to
`resolveAuthProfileOrder()` and clears expired `cooldownUntil` /
`disabledUntil` values along with resetting `errorCount` and
`failureCounts` — giving the profile a fair retry window (circuit-breaker
half-open → closed transition).

Key design decisions:
- `cooldownUntil` and `disabledUntil` handled independently (a profile
  can have both; only the expired one is cleared)
- `errorCount` reset only when ALL unusable windows have expired
- `lastFailureAt` preserved for the existing failureWindowMs decay logic
- In-memory mutation; disk persistence happens lazily on the next store
  write, matching the existing save pattern

Fixes #3604
Related: #13623, #15851, #11972, #8434
2026-02-16 12:53:45 -06:00