fix(stage-engine): address ORCH-114 review — env/docs canon + in-region rollback CAS
Resolves the REQUEST_CHANGES findings on ORCH-114 (durable transition-ownership lease + expected-stage CAS): P1 — documentation = golden source: - .env.example: add ORCH_TRANSITION_LEASE_ENABLED / ORCH_TRANSITION_LEASE_REPOS (canon of 100% start keys, ORCH-101), next to the other gate kill-switches. - CLAUDE.md: add the ORCH-114 passport section (mechanism, invariant, flags, ADR links) so a future agent editing advance_stage/reaper/webhooks finds the ownership invariant in the first mandatory-read doc (ORCH-078 traceability index). P2 — should-fix: - docs/overview/ (system showcase, ORCH-011): add transition_lease to tech-data-model.md (helper tables), tech-observability.md (/queue blocks) and tech-architecture.md (components). - ADR-001 D4 alignment: the four side-effectful-edge rollback handlers (_handle_merge_gate_rollback / _handle_security_gate / _handle_coverage_gate / _handle_image_freshness) now write `development` through the expected-stage CAS via a shared _rollback_stage_cas helper (defence against the rollback↔done contradiction, BR-6) instead of a bare unconditional update_task_stage. Under the held lease the sole owner always wins; a lost race aborts WITHOUT side effects. Kill-switch off / out-of-scope repo -> degenerates to the prior write -> 1:1. - Test isolation: make tests/test_webhooks.py order-independent by pinning the proj-1 registry per-test (mirrors test_webhook_dedup.proj_registry); it had only passed by relying on import order. Drop the needless module-level ORCH_DB_PATH setdefault in test_orch114 (fresh_db already isolates db_path). New regression tests (TC-11): in-region rollback writes route through CAS; rollback CAS wins when at expected stage; rollback CAS-lost does NOT clobber `done`; kill-switch-off rollback degenerates to the unconditional write. ruff clean (src/stage_engine.py, src/transition_lease.py); full suite 2052 passed. Refs: ORCH-114 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
28
.env.example
28
.env.example
@@ -434,6 +434,34 @@ ORCH_REAPER_MAX_RUNNING_S=5400
|
||||
ORCH_REAPER_FINALIZE_GRACE_S=300
|
||||
ORCH_LEASE_RECLAIM_ENABLED=true
|
||||
|
||||
# ORCH-114 (adr-0045): durable transition-ownership lease + expected-stage CAS for
|
||||
# side-effectful stage transitions. Generalises the process-local ORCH-113 finalizer-
|
||||
# liveness into a DURABLE, cross-path owner-exclusion (additive table `transition_lease`)
|
||||
# so a concurrent OR post-restart re-entry into a side-effectful transition (reaper /
|
||||
# reconciler / webhook / startup-requeue) is deferred or a no-op instead of re-applying
|
||||
# an irreversible effect (merge_pr / coverage-ratchet / image-rebuild / prod-deploy
|
||||
# initiation / contradictory rollback<->done). Two layers, both gated by the SINGLE
|
||||
# kill-switch below: (1) a durable lease on ENTRY to the side-effectful region (a second
|
||||
# actor that sees a live owner does not start the heavy sub-gates at all); (2) an
|
||||
# expected-stage CAS on the stage WRITE (a lost race -> abort with NO side effect), which
|
||||
# also closes the paths that write the stage in bypass of advance_stage. Owner liveness =
|
||||
# owner_pid + owner_boot_id (NOT a heartbeat), so restart recovery is free (new process ->
|
||||
# new boot_id -> all prior leases instantly stale -> reclaimed). The lease has NO own TTL:
|
||||
# its hard age ceiling IS the reaper Tier-3 backstop (ORCH_REAPER_MAX_RUNNING_S), so the
|
||||
# cross-cutting budget invariant ORCH-065/109/110/113 is untouched. STAGE_TRANSITIONS /
|
||||
# QG_CHECKS / check_* / machine-verdict keys / existing table schemas — byte-for-byte.
|
||||
# TRANSITION_LEASE_ENABLED -> SINGLE kill-switch. false -> the lease is neither written
|
||||
# nor read AND the CAS degenerates to the prior unconditional
|
||||
# update_task_stage -> behaviour byte-for-byte as before
|
||||
# ORCH-114 (reaper -> ORCH-113 in-memory fallback,
|
||||
# reconciler/webhook skip-guard inert). Default true.
|
||||
# TRANSITION_LEASE_REPOS -> CSV scope. Empty -> applies ONLY to the self-hosting repo
|
||||
# (orchestrator), where the irreversible side-effectful edges
|
||||
# live; non-empty -> only the listed repos. Mirrors
|
||||
# ORCH_COVERAGE_GATE_REPOS -> enduro untouched at the default.
|
||||
ORCH_TRANSITION_LEASE_ENABLED=true
|
||||
ORCH_TRANSITION_LEASE_REPOS=
|
||||
|
||||
# ORCH-063: disk-watchdog — background heartbeat that measures HOST-FS fill via the
|
||||
# mounted bind-paths (/repos, /app/data) with shutil.disk_usage (NOT the container
|
||||
# overlay /) and Telegram-alerts the operator at >= threshold. On 07.06.2026 the
|
||||
|
||||
Reference in New Issue
Block a user