ORCH-112: resilient-pull hygiene for dirty shared deploy-base (fix incident ORCH-111) #136

Merged
admin merged 8 commits from feature/ORCH-112-bug-failed-cancelled-task-arti into main 2026-06-15 15:33:21 +03:00

8 Commits

Author SHA1 Message Date
deploy-finalizer
285f5f05dc deploy(ORCH-036): finalize SUCCESS for ORCH-112
All checks were successful
CI / test (push) Successful in 3m9s
CI / test (pull_request) Successful in 3m11s
2026-06-15 15:33:15 +03:00
344ab72f37 tester(ET): auto-commit from tester run_id=706
All checks were successful
CI / test (push) Successful in 3m59s
CI / test (pull_request) Successful in 3m9s
2026-06-15 15:15:56 +03:00
7f673a45f7 reviewer(ET): auto-commit from reviewer run_id=705 2026-06-15 15:15:56 +03:00
a1f3b7588a fix(deploy): resilient-pull hygiene for dirty shared deploy-base (ORCH-112)
Self-deploy git pull blocked on a dirty shared main checkout (manual/abandoned
WIP from a failed/cancelled task) — incident ORCH-111: "Your local changes to
src/config.py would be overwritten by merge" wedged the prod deploy and required
manual intervention (a group risk on self-hosting).

The deploy hook (--deploy) now converges the deploy-base to a clean, current
origin/main BEFORE the pull (git fetch + reset --hard origin/main + a SCOPED
`git clean -fd`, NEVER -x), strictly preserving the rollback/log artefacts
(.deploy-prev-image-* / deploy-hook.log via -e), gitignored .env/data/*.db/build
(no -x), and sibling/.git state (out of clean scope). Gated by CHECKOUT_HYGIENE
env injected by self_deploy.build_deploy_command only when the new pure never-raise
leaf src/checkout_hygiene.py says applies(repo) (kill-switch + self-hosting scope).
Convergence after failed/cancelled is this same deploy-time self-heal — cancel_task
is NOT extended and no background janitor is introduced. Observability: the hook
writes a `hygiene` sentinel, the Phase-C finalizer reads it and sends a best-effort
Telegram alert.

Additive, under kill-switch (ORCH_CHECKOUT_HYGIENE_ENABLED, default true; off ->
bare `git pull origin main` 1:1 before ORCH-112), never-raise, self-hosting scope.
STAGE_TRANSITIONS / QG_CHECKS / check_* / machine-verdict keys / DB schema / the
hook exit-code contract (0/1/2, ORCH-036) are byte-for-byte untouched.

Coverage: tests/test_deploy_checkout_hygiene.py (TC-01..TC-10; real-hook shell
simulation in a temp git repo, no network/prod/ssh, + unit). TC-01 is the
mandatory ORCH-111 regression (RED before the fix, GREEN after). Docs golden
source updated in the same PR (CLAUDE.md, CHANGELOG.md, .env.example; INFRA.md /
architecture/README.md / adr-0044 written at the architecture stage).

Refs: ORCH-112

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 15:15:56 +03:00
31b4f3fd1d architect(ET): auto-commit from architect run_id=703 2026-06-15 15:15:56 +03:00
96b653d11c architect(ET): auto-commit from architect run_id=702 2026-06-15 15:15:56 +03:00
860de5b0a5 analyst(ET): auto-commit from analyst run_id=701 2026-06-15 15:15:56 +03:00
c086921aa1 docs: init ORCH-112 business request 2026-06-15 15:15:56 +03:00