fix(staging): tolerate sandbox-infra-only FAILs (C9a/C9b) in deploy-staging verdict
The self-hosting orchestrator looped on deploy-staging -> development because
scripts/staging_check.py exited 1 on ANY failed check, so two infra-only checks
(C9a sandbox branch / C9b analyst-job — caused by SANDBOX bot accounts not being
members of the sandbox Plane project, NOT a pipeline regress) forced
staging_status: FAILED -> rollback -> loop, burning developer retries and tokens.
Direction (б) per ADR-001: classify staging checks as REAL (all pipeline checks,
fail-closed) vs SANDBOX_INFRA (narrow allowlist {C9a, C9b}, waivable). New leaf
module src/staging_verdict.py (stdlib-only, never-raise): classify_check +
compute_staging_verdict fold per-check results into a tolerant-but-fail-closed
verdict — any REAL failure -> FAILED/exit1 (safety net holds under any flag);
only C9a/C9b failed & tolerant -> SUCCESS/exit0 with waived list; only infra &
strict -> FAILED/exit1; any internal error -> FAILED/exit1 (never a false green).
staging_check.py now auto-classifies each check (public 3-tuple _items shape kept
as an ORCH-048 b6 regression guard), exposes categorized_items(), prints
INFRA-WAIVED/VERDICT lines, and exits via the verdict; new --strict flag forces
legacy strictness per-run. Kill-switch ORCH_STAGING_INFRA_TOLERANCE_ENABLED
(default true) restores legacy strict mode globally. launcher gains
action_stage_no_changes_note so "no changes to commit" on action stages is logged
as expected, not treated as under-delivery.
Contracts unchanged: STAGE_TRANSITIONS, QG_CHECKS registry, staging_status:/
deploy_status: frontmatter, hook exit-code (0/1/2), check_staging_status; no DB
migration. Docs: README, STAGING_CHECK.md, deployer.md, .env.example, CHANGELOG.
Refs: ORCH-061
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -20,6 +20,33 @@ logger = logging.getLogger("orchestrator.launcher")
|
||||
# never passed through to the CLI.
|
||||
VALID_EFFORTS = frozenset({"low", "medium", "high", "xhigh", "max"})
|
||||
|
||||
# ORCH-061: action stages whose success is an ACTION (restart/retag), not a src
|
||||
# edit — so "no changes to commit" is EXPECTED there, not under-delivery (FR-3).
|
||||
_ACTION_STAGES = frozenset({"deploy-staging", "deploy"})
|
||||
|
||||
|
||||
def action_stage_no_changes_note(stage, repo) -> str | None:
|
||||
"""ORCH-061 (FR-3 / FR-7): observability for an empty diff on an action stage.
|
||||
|
||||
The ``deploy-staging`` / ``deploy`` stages are actions (restart / retag), not
|
||||
code edits, so the post-run "no changes to commit" is the NORMAL case there —
|
||||
advancement is decided by the agent exit-code + the staging/deploy gate verdict,
|
||||
NEVER by the presence of a commit (FR-3 / AC-4). This is a PURE decision used
|
||||
only to emit an explicit log line distinguishing an expected action-stage no-op
|
||||
from a code-stage no-op; it has no effect on stage advancement.
|
||||
|
||||
Returns an explicit note string when the empty diff is expected (an action
|
||||
stage of a self-deploy repo), else ``None``. Never raises.
|
||||
"""
|
||||
try:
|
||||
if stage in _ACTION_STAGES:
|
||||
from ..self_deploy import self_deploy_applies
|
||||
if self_deploy_applies(repo):
|
||||
return f"{stage}: no code changes (expected on action stage)"
|
||||
return None
|
||||
except Exception: # noqa: BLE001 - observability only, never raise
|
||||
return None
|
||||
|
||||
|
||||
def _resolve_agent_attr(agent, project_id, project_map_attr, env_attr_prefix,
|
||||
default_attr):
|
||||
@@ -582,6 +609,22 @@ class AgentLauncher:
|
||||
logger.warning(f"Agent run_id={run_id}: commit failed: {commit_result.stderr}")
|
||||
else:
|
||||
logger.info(f"Agent run_id={run_id}: no changes to commit")
|
||||
# ORCH-061: on a self-deploy action stage (deploy-staging/deploy)
|
||||
# an empty diff is EXPECTED (action, not a src edit). Emit an
|
||||
# explicit observability line so an operator can tell this apart
|
||||
# from a code-stage no-op. Does NOT affect advancement (decided by
|
||||
# exit-code + gate verdict, never by a commit existing).
|
||||
try:
|
||||
_t = get_task_by_repo_branch(repo, branch)
|
||||
_stage = _t["stage"] if _t else None
|
||||
_note = action_stage_no_changes_note(_stage, repo)
|
||||
if _note:
|
||||
logger.info(f"Agent run_id={run_id}: {_note}")
|
||||
except Exception as _e:
|
||||
logger.debug(
|
||||
f"Agent run_id={run_id}: action-stage no-changes note "
|
||||
f"skipped: {_e}"
|
||||
)
|
||||
except Exception as e:
|
||||
logger.error(f"Agent run_id={run_id}: post-run git failed: {e}")
|
||||
|
||||
|
||||
Reference in New Issue
Block a user