fix(deploy): terminal-window-aware guard so done tasks hold Done in Plane (ORCH-094)

A DB stage=done task with 0 active jobs flapped in Plane between `Awaiting Deploy` and `Monitoring after Deploy` instead of holding `Done` (verified live on ORCH-061, task 47): the three deploy-phase setters were terminal-blind, so any stale/duplicate/unknown caller under the bot token re-stamped an intermediate status over the terminal Done, forever. - New leaf src/deploy_status_guard.py (pure, never-raise, config-gated): decide() -> ALLOW | CONVERGE_DONE | SUPPRESS on the entry of set_issue_awaiting_deploy / set_issue_deploying / set_issue_monitoring. A deploy-phase status is legitimate iff the task is non-terminal OR (done AND post-deploy window active); otherwise done converges to Done idempotently, cancelled is suppressed (FR-2, D1/D2). - D3: move post_deploy.arm_monitor ABOVE the terminal-sync block in advance_stage so window_active is True when the legitimate first Monitoring is set (the task is already DB-done by then); a re-drive after the window closes converges to Done. - D4: run_post_deploy_monitor no-ops without a status PATCH / re-queue when the task became cancelled mid-window (zombie-tick guard, FR-3). - D5: additive `reason` kwarg on the three setters + one structured log line per verdict (work_item/caller/target/db_stage/window_active/verdict); new read-only db.get_task_by_work_item_id; post_deploy.window_active helper. - Flags deploy_status_guard_enabled (kill-switch -> 1:1) / deploy_status_guard_repos (CSV; empty = self-hosting only). STAGE_TRANSITIONS / QG_CHECKS / check_* / machine-verdict keys / DB schema untouched (reads existing tasks.stage). Tests: TC-01..TC-12 across 5 new test modules + config flags; updated the reason-kwarg assertions in test_deploy_terminal_sync / test_deploy_approve. Full regress green (1413). Docs: CHANGELOG, CLAUDE.md, docs/architecture/README.md (status -> реализовано), .env.example. Refs: ORCH-094 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 23:31:30 +03:00
parent db4dd275e4
commit a46dcbcab3
18 changed files with 1088 additions and 25 deletions
--- a/src/post_deploy.py
+++ b/src/post_deploy.py
@@ -316,6 +316,28 @@ def has_marker(repo: str, work_item_id: str | None, name: str) -> bool:
        return False


+def window_active(repo: str, work_item_id: str | None) -> bool:
+    """ORCH-094: True iff a post-deploy observation window is currently OPEN.
+
+    A window is open iff it has been armed (``ARMED`` sentinel) and has NOT yet
+    finished (no ``DONE`` sentinel). The terminal-window-aware deploy-status guard
+    (``deploy_status_guard.decide``) uses this to keep the legitimate post-deploy
+    ``Monitoring after Deploy`` status for a task that is already DB-``done`` while
+    its window is live, and to converge to ``Done`` once the window has closed.
+
+    Restart-safe (the sentinels live on disk) and never-raise -> False on error
+    (a doubt resolves to "window closed", i.e. converge to Done — the safe-for-
+    indication default that matches the bug we are fixing).
+    """
+    try:
+        return has_marker(repo, work_item_id, ARMED) and not has_marker(
+            repo, work_item_id, DONE
+        )
+    except Exception as e:  # noqa: BLE001 - never-raise
+        logger.warning("window_active error for %s/%s: %s", repo, work_item_id, e)
+        return False
+
+
 def write_marker(repo: str, work_item_id: str | None, name: str, content: str = "") -> bool:
    """Create/overwrite a sentinel (best-effort). Returns True on success."""
    try: