fix(deploy): terminal-window-aware guard so done tasks hold Done in Plane (ORCH-094)

A DB stage=done task with 0 active jobs flapped in Plane between `Awaiting
Deploy` and `Monitoring after Deploy` instead of holding `Done` (verified live
on ORCH-061, task 47): the three deploy-phase setters were terminal-blind, so
any stale/duplicate/unknown caller under the bot token re-stamped an
intermediate status over the terminal Done, forever.

- New leaf src/deploy_status_guard.py (pure, never-raise, config-gated): decide()
  -> ALLOW | CONVERGE_DONE | SUPPRESS on the entry of set_issue_awaiting_deploy /
  set_issue_deploying / set_issue_monitoring. A deploy-phase status is legitimate
  iff the task is non-terminal OR (done AND post-deploy window active); otherwise
  done converges to Done idempotently, cancelled is suppressed (FR-2, D1/D2).
- D3: move post_deploy.arm_monitor ABOVE the terminal-sync block in advance_stage
  so window_active is True when the legitimate first Monitoring is set (the task
  is already DB-done by then); a re-drive after the window closes converges to Done.
- D4: run_post_deploy_monitor no-ops without a status PATCH / re-queue when the
  task became cancelled mid-window (zombie-tick guard, FR-3).
- D5: additive `reason` kwarg on the three setters + one structured log line per
  verdict (work_item/caller/target/db_stage/window_active/verdict); new read-only
  db.get_task_by_work_item_id; post_deploy.window_active helper.
- Flags deploy_status_guard_enabled (kill-switch -> 1:1) / deploy_status_guard_repos
  (CSV; empty = self-hosting only). STAGE_TRANSITIONS / QG_CHECKS / check_* /
  machine-verdict keys / DB schema untouched (reads existing tasks.stage).

Tests: TC-01..TC-12 across 5 new test modules + config flags; updated the
reason-kwarg assertions in test_deploy_terminal_sync / test_deploy_approve.
Full regress green (1413). Docs: CHANGELOG, CLAUDE.md, docs/architecture/README.md
(status -> реализовано), .env.example.

Refs: ORCH-094

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-09 23:31:30 +03:00
committed by orchestrator-deployer
parent db4dd275e4
commit a46dcbcab3
18 changed files with 1088 additions and 25 deletions

View File

@@ -649,6 +649,32 @@ class Settings(BaseSettings):
stop_status_enabled: bool = True
stop_status_repos: str = ""
# ORCH-094: terminal-window-aware guard for deploy-phase Plane status setters.
# A task with DB stage='done' (and 0 active jobs) was flapping in Plane between
# `Awaiting Deploy` and `Monitoring after Deploy` instead of holding `Done`,
# because the three deploy-phase setters (set_issue_awaiting_deploy /
# set_issue_deploying / set_issue_monitoring) are terminal-blind: any stale /
# duplicate / unknown caller under the bot token re-stamps an intermediate
# deploy status over the terminal Done. ORCH-094 puts a single low choke-point
# guard on the entry of those three setters (leaf src/deploy_status_guard.py):
# for a task whose DB stage is terminal it converges to Done idempotently
# (CONVERGE_DONE), EXCEPT the legitimate post-deploy `Monitoring` while the
# window is still active (ARMED & not DONE). Additive, never-raise; reads the
# existing tasks.stage (no migration); STAGE_TRANSITIONS / QG_CHECKS /
# machine-verdict keys are NOT touched. See
# docs/work-items/ORCH-094/06-adr/ADR-001-terminal-window-aware-deploy-status-guard.md
# and the cross-cutting docs/architecture/adr/adr-0028-…md.
# deploy_status_guard_enabled -> kill-switch (env ORCH_DEPLOY_STATUS_GUARD_ENABLED).
# False -> the setters are terminal-blind, behaviour
# strictly 1:1 as before ORCH-094 (zero regression).
# deploy_status_guard_repos -> CSV scope (env ORCH_DEPLOY_STATUS_GUARD_REPOS).
# Empty -> applies ONLY to the self-hosting repo
# (orchestrator), where deploy-phase statuses are set
# at all; non-empty -> only the listed repos. Tokens
# are sanitised (^[A-Za-z0-9._-]+$) by the guard leaf.
deploy_status_guard_enabled: bool = True
deploy_status_guard_repos: str = ""
# ORCH-073 (ADR-001 Р-4): main-integrity regression guard. After the merge-verify
# under-gate confirms the deployed SHA is an ancestor of origin/main (FR-1), a
# secondary deterministic (no-LLM) guard checks that a declarative set of markers