feat(queue): resilience schema + backoff helper + config (ORCH-1)
jobs.transient_attempts + available_at columns (idempotent _ensure_column migration); claim_next_job honours available_at; mark_job_transient (backoff requeue with separate transient budget). Config: preflight_cache_ttl, backoff_base/max_seconds, transient_max_attempts, breaker_threshold, breaker_pause_seconds.
This commit is contained in:
@@ -36,6 +36,23 @@ class Settings(BaseSettings):
|
||||
max_concurrency: int = 1
|
||||
queue_poll_interval: float = 2.0
|
||||
|
||||
# ORCH-1b (resilience): preflight + 429/rate-limit + backoff + circuit breaker.
|
||||
# preflight_cache_ttl -> cache the cheap CLI/network preflight result (seconds);
|
||||
# the worker does NOT re-run `claude --version` more often
|
||||
# than this (env ORCH_PREFLIGHT_CACHE_TTL).
|
||||
# backoff_base_seconds -> base for exponential transient backoff.
|
||||
# backoff_max_seconds -> ceiling for the transient backoff.
|
||||
# transient_max_attempts -> retry budget for transient (429/overload/network)
|
||||
# failures, separate from code-fault `attempts`.
|
||||
# breaker_threshold -> consecutive transient failures that OPEN the breaker.
|
||||
# breaker_pause_seconds -> how long the breaker stays open before half-open.
|
||||
preflight_cache_ttl: int = 45
|
||||
backoff_base_seconds: int = 10
|
||||
backoff_max_seconds: int = 600
|
||||
transient_max_attempts: int = 5
|
||||
breaker_threshold: int = 3
|
||||
breaker_pause_seconds: int = 300
|
||||
|
||||
|
||||
# Telegram notifications
|
||||
telegram_bot_token: str = ""
|
||||
|
||||
Reference in New Issue
Block a user