The self-hosting orchestrator looped on deploy-staging -> development because
scripts/staging_check.py exited 1 on ANY failed check, so two infra-only checks
(C9a sandbox branch / C9b analyst-job — caused by SANDBOX bot accounts not being
members of the sandbox Plane project, NOT a pipeline regress) forced
staging_status: FAILED -> rollback -> loop, burning developer retries and tokens.
Direction (б) per ADR-001: classify staging checks as REAL (all pipeline checks,
fail-closed) vs SANDBOX_INFRA (narrow allowlist {C9a, C9b}, waivable). New leaf
module src/staging_verdict.py (stdlib-only, never-raise): classify_check +
compute_staging_verdict fold per-check results into a tolerant-but-fail-closed
verdict — any REAL failure -> FAILED/exit1 (safety net holds under any flag);
only C9a/C9b failed & tolerant -> SUCCESS/exit0 with waived list; only infra &
strict -> FAILED/exit1; any internal error -> FAILED/exit1 (never a false green).
staging_check.py now auto-classifies each check (public 3-tuple _items shape kept
as an ORCH-048 b6 regression guard), exposes categorized_items(), prints
INFRA-WAIVED/VERDICT lines, and exits via the verdict; new --strict flag forces
legacy strictness per-run. Kill-switch ORCH_STAGING_INFRA_TOLERANCE_ENABLED
(default true) restores legacy strict mode globally. launcher gains
action_stage_no_changes_note so "no changes to commit" on action stages is logged
as expected, not treated as under-delivery.
Contracts unchanged: STAGE_TRANSITIONS, QG_CHECKS registry, staging_status:/
deploy_status: frontmatter, hook exit-code (0/1/2), check_staging_status; no DB
migration. Docs: README, STAGING_CHECK.md, deployer.md, .env.example, CHANGELOG.
Refs: ORCH-061
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add the optional, backward-compatible SOURCE_IMAGE branch to
orchestrator-deploy-hook.sh: when set, retag the staging-validated image
onto TARGET_IMAGE (docker tag) before `up -d --no-build` instead of
rebuilding — guarantees prod runs the exact artefact that passed staging
(AC-7 / TC-14). Unset -> prior behaviour; exit-code contract (0/1/2) and
health-loop untouched.
Update golden-source docs (AC-13): rewrite deployer.md `deploy` stage from
"paper SUCCESS" to the executable self-deploy (Phase A/B/C, no self-restart
from inside the container) and add the ORCH-036 CHANGELOG entry.
Refs: ORCH-036
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
B6 false-FAILed because it built the project registry from the
launcher process-env via a host-path hack (sys.path.insert +
importlib.reload), not from the running staging instance. Run from the
host, ORCH_PROJECTS_JSON is unset -> default ET+ORCH registry -> false
FAIL -> spurious deploy-staging -> development rollback.
Variant (v) per ADR-001: remove the host-path hack; canonically run the
suite INSIDE orchestrator-staging via docker exec so src.projects
resolves from /app (PYTHONPATH) with .env.staging. Verdict logic
extracted into pure _evaluate_b6(known) -> (passed, detail) +
_known_project_ids_from_registry() / _run_b6() with deterministic FAIL on
source unavailability. deployer.md and STAGING_CHECK.md updated to the
docker exec command. src/projects.py, .env* and checks A/B4/B5/C
untouched.
Refs: ORCH-048
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>