fix(auth): auto-expire stale auth profile cooldowns and reset error count

When an auth profile hits a rate limit, `errorCount` is incremented and
`cooldownUntil` is set with exponential backoff. After the cooldown
expires, the time-based check correctly returns false — but `errorCount`
persists. The next transient failure immediately escalates to a much
longer cooldown because the backoff formula uses the stale count:

  60s × 5^(errorCount-1), max 1h

This creates a positive feedback loop where profiles appear permanently
stuck after rate limits, requiring manual JSON editing to recover.

Add `clearExpiredCooldowns()` which sweeps all profiles on every call to
`resolveAuthProfileOrder()` and clears expired `cooldownUntil` /
`disabledUntil` values along with resetting `errorCount` and
`failureCounts` — giving the profile a fair retry window (circuit-breaker
half-open → closed transition).

Key design decisions:
- `cooldownUntil` and `disabledUntil` handled independently (a profile
  can have both; only the expired one is cleared)
- `errorCount` reset only when ALL unusable windows have expired
- `lastFailureAt` preserved for the existing failureWindowMs decay logic
- In-memory mutation; disk persistence happens lazily on the next store
  write, matching the existing save pattern

Fixes #3604
Related: #13623, #15851, #11972, #8434
This commit is contained in:
nabbilkhan
2026-02-16 07:27:27 +00:00
committed by Shadow
parent d3707147c0
commit 03cadc4b7a
6 changed files with 507 additions and 1 deletions

View File

@@ -33,6 +33,7 @@ export type {
export {
calculateAuthProfileCooldownMs,
clearAuthProfileCooldown,
clearExpiredCooldowns,
getSoonestCooldownExpiry,
isProfileInCooldown,
markAuthProfileCooldown,