fix(staging): tolerate sandbox-infra-only FAILs (C9a/C9b) in deploy-staging verdict

The self-hosting orchestrator looped on deploy-staging -> development because scripts/staging_check.py exited 1 on ANY failed check, so two infra-only checks (C9a sandbox branch / C9b analyst-job — caused by SANDBOX bot accounts not being members of the sandbox Plane project, NOT a pipeline regress) forced staging_status: FAILED -> rollback -> loop, burning developer retries and tokens. Direction (б) per ADR-001: classify staging checks as REAL (all pipeline checks, fail-closed) vs SANDBOX_INFRA (narrow allowlist {C9a, C9b}, waivable). New leaf module src/staging_verdict.py (stdlib-only, never-raise): classify_check + compute_staging_verdict fold per-check results into a tolerant-but-fail-closed verdict — any REAL failure -> FAILED/exit1 (safety net holds under any flag); only C9a/C9b failed & tolerant -> SUCCESS/exit0 with waived list; only infra & strict -> FAILED/exit1; any internal error -> FAILED/exit1 (never a false green). staging_check.py now auto-classifies each check (public 3-tuple _items shape kept as an ORCH-048 b6 regression guard), exposes categorized_items(), prints INFRA-WAIVED/VERDICT lines, and exits via the verdict; new --strict flag forces legacy strictness per-run. Kill-switch ORCH_STAGING_INFRA_TOLERANCE_ENABLED (default true) restores legacy strict mode globally. launcher gains action_stage_no_changes_note so "no changes to commit" on action stages is logged as expected, not treated as under-delivery. Contracts unchanged: STAGE_TRANSITIONS, QG_CHECKS registry, staging_status:/ deploy_status: frontmatter, hook exit-code (0/1/2), check_staging_status; no DB migration. Docs: README, STAGING_CHECK.md, deployer.md, .env.example, CHANGELOG. Refs: ORCH-061 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-07 12:39:00 +00:00
parent 1d1208c136
commit 9070489968
15 changed files with 831 additions and 7 deletions
--- a/src/agents/launcher.py
+++ b/src/agents/launcher.py
@@ -20,6 +20,33 @@ logger = logging.getLogger("orchestrator.launcher")
 # never passed through to the CLI.
 VALID_EFFORTS = frozenset({"low", "medium", "high", "xhigh", "max"})

+# ORCH-061: action stages whose success is an ACTION (restart/retag), not a src
+# edit — so "no changes to commit" is EXPECTED there, not under-delivery (FR-3).
+_ACTION_STAGES = frozenset({"deploy-staging", "deploy"})
+
+
+def action_stage_no_changes_note(stage, repo) -> str | None:
+    """ORCH-061 (FR-3 / FR-7): observability for an empty diff on an action stage.
+
+    The ``deploy-staging`` / ``deploy`` stages are actions (restart / retag), not
+    code edits, so the post-run "no changes to commit" is the NORMAL case there —
+    advancement is decided by the agent exit-code + the staging/deploy gate verdict,
+    NEVER by the presence of a commit (FR-3 / AC-4). This is a PURE decision used
+    only to emit an explicit log line distinguishing an expected action-stage no-op
+    from a code-stage no-op; it has no effect on stage advancement.
+
+    Returns an explicit note string when the empty diff is expected (an action
+    stage of a self-deploy repo), else ``None``. Never raises.
+    """
+    try:
+        if stage in _ACTION_STAGES:
+            from ..self_deploy import self_deploy_applies
+            if self_deploy_applies(repo):
+                return f"{stage}: no code changes (expected on action stage)"
+        return None
+    except Exception:  # noqa: BLE001 - observability only, never raise
+        return None
+

 def _resolve_agent_attr(agent, project_id, project_map_attr, env_attr_prefix,
                        default_attr):
@@ -582,6 +609,22 @@ class AgentLauncher:
                    logger.warning(f"Agent run_id={run_id}: commit failed: {commit_result.stderr}")
            else:
                logger.info(f"Agent run_id={run_id}: no changes to commit")
+                # ORCH-061: on a self-deploy action stage (deploy-staging/deploy)
+                # an empty diff is EXPECTED (action, not a src edit). Emit an
+                # explicit observability line so an operator can tell this apart
+                # from a code-stage no-op. Does NOT affect advancement (decided by
+                # exit-code + gate verdict, never by a commit existing).
+                try:
+                    _t = get_task_by_repo_branch(repo, branch)
+                    _stage = _t["stage"] if _t else None
+                    _note = action_stage_no_changes_note(_stage, repo)
+                    if _note:
+                        logger.info(f"Agent run_id={run_id}: {_note}")
+                except Exception as _e:
+                    logger.debug(
+                        f"Agent run_id={run_id}: action-stage no-changes note "
+                        f"skipped: {_e}"
+                    )
        except Exception as e:
            logger.error(f"Agent run_id={run_id}: post-run git failed: {e}")