fix(auth): billing backoff + cooldown UX

This commit is contained in:
Peter Steinberger
2026-01-09 21:57:52 +01:00
parent 42a0089b3b
commit c27b1441f7
16 changed files with 497 additions and 43 deletions

View File

@@ -46,7 +46,7 @@ When a provider has multiple profiles, Clawdbot chooses an order like this:
If no explicit order is configured, Clawdbot uses a roundrobin order:
- **Primary key:** profile type (**OAuth before API keys**).
- **Secondary key:** `usageStats.lastUsed` (oldest first, within each type).
- **Cooldown profiles** are moved to the end, ordered by soonest cooldown expiry.
- **Cooldown/disabled profiles** are moved to the end, ordered by soonest expiry.
### Why OAuth can “look lost”
@@ -79,6 +79,27 @@ State is stored in `auth-profiles.json` under `usageStats`:
}
```
## Billing disables
Billing/credit failures (for example “insufficient credits” / “credit balance too low”) are treated as failoverworthy, but theyre usually not transient. Instead of a short cooldown, Clawdbot marks the profile as **disabled** (with a longer backoff) and rotates to the next profile/provider.
State is stored in `auth-profiles.json`:
```json
{
"usageStats": {
"provider:profile": {
"disabledUntil": 1736178000000,
"disabledReason": "billing"
}
}
}
```
Defaults:
- Billing backoff starts at **5 hours**, doubles per billing failure, and caps at **24 hours**.
- Backoff counters reset if the profile hasnt failed for **24 hours** (configurable).
## Model fallback
If all profiles for a provider fail, Clawdbot moves to the next model in
@@ -92,6 +113,8 @@ When a run starts with a model override (hooks or CLI), fallbacks still end at
See [`docs/configuration.md`](/gateway/configuration) for:
- `auth.profiles` / `auth.order`
- `auth.cooldowns.billingBackoffHours` / `auth.cooldowns.billingBackoffHoursByProvider`
- `auth.cooldowns.billingMaxHours` / `auth.cooldowns.failureWindowHours`
- `agents.defaults.model.primary` / `agents.defaults.model.fallbacks`
- `agents.defaults.imageModel` routing

View File

@@ -61,7 +61,7 @@ cat ~/.clawdbot/clawdbot.json
- Legacy on-disk state migration (sessions/agent dir/WhatsApp auth).
- State integrity and permissions checks (sessions, transcripts, state dir).
- Config file permission checks (chmod 600) when running locally.
- Model auth health: checks OAuth expiry and can refresh expiring tokens.
- Model auth health: checks OAuth expiry, can refresh expiring tokens, and reports auth-profile cooldown/disabled states.
- Legacy workspace dir detection (`~/clawdis`, `~/clawdbot`).
- Sandbox image repair when sandboxing is enabled.
- Legacy service migration and extra gateway detection.
@@ -153,6 +153,10 @@ profile is stale, it suggests `claude setup-token` on the gateway host.
Refresh prompts only appear when running interactively (TTY); `--non-interactive`
skips refresh attempts.
Doctor also reports auth profiles that are temporarily unusable due to:
- short cooldowns (rate limits/timeouts/auth failures)
- longer disables (billing/credit failures)
### 6) Sandbox image repair
When sandboxing is enabled, doctor checks Docker images and offers to build or
switch to legacy names if the current image is missing.

View File

@@ -422,6 +422,8 @@ Clawdbot uses providerprefixed IDs like:
Yes. Config supports optional metadata for profiles and an ordering per provider (`auth.order.<provider>`). This does **not** store secrets; it maps IDs to provider/mode and sets rotation order.
Clawdbot may temporarily skip a profile if its in a short **cooldown** (rate limits/timeouts/auth failures) or a longer **disabled** state (billing/insufficient credits). To inspect this, run `clawdbot models status --json` and check `auth.unusableProfiles`. Tuning: `auth.cooldowns.billingBackoffHours*`.
You can also set a **per-agent** order override (stored in that agents `auth-profiles.json`) via the CLI:
```bash