fix(deploy): resilient-pull hygiene for dirty shared deploy-base (ORCH-112)
Self-deploy git pull blocked on a dirty shared main checkout (manual/abandoned WIP from a failed/cancelled task) — incident ORCH-111: "Your local changes to src/config.py would be overwritten by merge" wedged the prod deploy and required manual intervention (a group risk on self-hosting). The deploy hook (--deploy) now converges the deploy-base to a clean, current origin/main BEFORE the pull (git fetch + reset --hard origin/main + a SCOPED `git clean -fd`, NEVER -x), strictly preserving the rollback/log artefacts (.deploy-prev-image-* / deploy-hook.log via -e), gitignored .env/data/*.db/build (no -x), and sibling/.git state (out of clean scope). Gated by CHECKOUT_HYGIENE env injected by self_deploy.build_deploy_command only when the new pure never-raise leaf src/checkout_hygiene.py says applies(repo) (kill-switch + self-hosting scope). Convergence after failed/cancelled is this same deploy-time self-heal — cancel_task is NOT extended and no background janitor is introduced. Observability: the hook writes a `hygiene` sentinel, the Phase-C finalizer reads it and sends a best-effort Telegram alert. Additive, under kill-switch (ORCH_CHECKOUT_HYGIENE_ENABLED, default true; off -> bare `git pull origin main` 1:1 before ORCH-112), never-raise, self-hosting scope. STAGE_TRANSITIONS / QG_CHECKS / check_* / machine-verdict keys / DB schema / the hook exit-code contract (0/1/2, ORCH-036) are byte-for-byte untouched. Coverage: tests/test_deploy_checkout_hygiene.py (TC-01..TC-10; real-hook shell simulation in a temp git repo, no network/prod/ssh, + unit). TC-01 is the mandatory ORCH-111 regression (RED before the fix, GREEN after). Docs golden source updated in the same PR (CLAUDE.md, CHANGELOG.md, .env.example; INFRA.md / architecture/README.md / adr-0044 written at the architecture stage). Refs: ORCH-112 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -340,6 +340,15 @@ ORCH_DEPLOY_PROD_TARGET_IMAGE=orchestrator-orchestrator
|
||||
ORCH_DEPLOY_PROD_COMPOSE_PROFILE=
|
||||
ORCH_DEPLOY_PROD_PREV_IMAGE_FILE=.deploy-prev-image-prod
|
||||
|
||||
# ORCH-112: deploy-base checkout-hygiene (resilient-pull). The self-deploy hook
|
||||
# converges a DIRTY shared deploy-base to a clean, current origin/main BEFORE the
|
||||
# `git pull` (git fetch + reset --hard + a SCOPED `git clean -fd`, NEVER `-x`), so
|
||||
# manual/abandoned WIP left by a failed/cancelled task never blocks the deploy
|
||||
# (incident ORCH-111). False -> bare `git pull origin main` 1:1 as before ORCH-112.
|
||||
# Empty REPOS -> only the self-hosting repo (orchestrator).
|
||||
ORCH_CHECKOUT_HYGIENE_ENABLED=true
|
||||
ORCH_CHECKOUT_HYGIENE_REPOS=
|
||||
|
||||
# ORCH-058: staging-image provenance before the BUILD-ONCE prod retag (INV-FRESH).
|
||||
# Guarantees the staging image promoted to prod is the EXACT artefact rebuilt from the
|
||||
# validated commit — two layers, self-hosting only:
|
||||
|
||||
Reference in New Issue
Block a user