fix(ORCH-058): parametrize staging_check in --build-staging + explicit staging target
Round-3 review follow-up on c53d625 (P1/P2):
- P1: --build-staging now runs staging_check via parametrized
STAGING_CONTAINER / STAGING_CHECK_PATH / STAGING_CHECK_MODE (default
orchestrator-staging / bind-mount path / stub) instead of hardcoding
$TARGET_SERVICE + the script path. docker exec runs INSIDE the staging
container (ORCH-048 canonical: B6 registry isolation), after health,
before exit 0. Fail-closed: any non-zero -> exit 1. STAGING only (8501).
- P2a: rebuild_staging_image now passes the STAGING target EXPLICITLY
(TARGET_SERVICE/TARGET_PORT/COMPOSE_PROFILE/STAGING_CONTAINER) so the
self-rebuild can never drift onto prod 8500 if hook defaults change (AC-9).
- P2b: TC-09 caller<->hook contract tests assert the ssh command carries
GIT_SHA + BUILD_CONTEXT + the staging target and never the prod 8500 one;
no-ssh-host fails closed.
- P3: consolidated the three duplicate README footers into one.
- Docs (golden source): DEPLOY_HOOK.md step 4 + env rows, README footer,
CHANGELOG, Dockerfile ARG GIT_SHA="" comment, .env.example freshness block.
Validates exactly the artefact later BUILD-ONCE retagged to prod (AC-4,
ADR-001 step 3). 632 tests pass, ruff clean, bash -n OK.
Refs: ORCH-058
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -14,9 +14,10 @@ self-hosting:
|
||||
* **A — liveness:** :func:`check_staging_image_fresh` is a QG sub-check on the
|
||||
``deploy-staging -> deploy`` edge (composed by ``stage_engine`` AFTER the
|
||||
merge-gate, BEFORE Phase A). It rebuilds ``orchestrator-orchestrator-staging``
|
||||
from the VALIDATED commit (worktree HEAD after the merge-gate rebase) and
|
||||
recreates the 8501 container, so we validate and promote ONE artefact. FAIL ->
|
||||
rollback to ``development`` (mirrors the merge-gate).
|
||||
from the VALIDATED commit (worktree HEAD after the merge-gate rebase), recreates
|
||||
the 8501 container, and runs ``staging_check.py --mode stub`` against that fresh
|
||||
8501 (ADR-001 step 3), so we validate exactly the ONE artefact later retagged to
|
||||
prod (AC-4). FAIL -> rollback to ``development`` (mirrors the merge-gate).
|
||||
* **B — safety:** :func:`expected_revision` feeds the validated SHA to
|
||||
``self_deploy.build_deploy_command`` as ``EXPECTED_REVISION``; the host hook
|
||||
fail-closes (``exit 1``) before ``docker tag`` if the SOURCE_IMAGE revision
|
||||
@@ -48,10 +49,18 @@ REVISION_LABEL = "org.opencontainers.image.revision"
|
||||
# Bounded timeouts so a hung git/docker/ssh never wedges the monitor-thread.
|
||||
_GIT_TIMEOUT = 30
|
||||
_INSPECT_TIMEOUT = 30
|
||||
# The remote rebuild (docker build + compose recreate + health) is the slow path;
|
||||
# keep it generous but bounded (mirrors the merge-gate re-test budget order).
|
||||
# The remote rebuild (docker build + compose recreate + health + staging_check) is
|
||||
# the slow path; keep it generous but bounded (mirrors the merge-gate re-test order).
|
||||
_REBUILD_TIMEOUT = 1200
|
||||
|
||||
# Explicit STAGING target for the --build-staging rebuild (Strategy A). These mirror
|
||||
# the hook's staging-safe defaults but are passed EXPLICITLY so a future change to the
|
||||
# hook defaults can never silently retarget the self-rebuild at prod (8500) — the whole
|
||||
# path builds/recreates STAGING ONLY (AC-9, review P2). Never the prod 8500 target.
|
||||
_STAGING_SERVICE = "orchestrator-staging"
|
||||
_STAGING_PORT = 8501
|
||||
_STAGING_COMPOSE_PROFILE = "staging"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Conditionality (mirrors self_deploy_applies / _merge_gate_applies)
|
||||
@@ -234,9 +243,12 @@ def rebuild_staging_image(repo: str, branch: str, sha: str) -> tuple[bool, str]:
|
||||
The hook (``orchestrator-deploy-hook.sh --build-staging``) runs, on the host:
|
||||
``docker build --build-arg GIT_SHA=<sha> -t <staging-image> <host-worktree>``
|
||||
-> ``docker compose --profile staging up -d --no-build orchestrator-staging``
|
||||
-> health-check 8501. Same exit-code contract (0 = ok). This trades prod for
|
||||
staging ONLY (8501), NEVER prod (8500) (AC-9): all build/recreate targets are
|
||||
the staging service.
|
||||
-> health-check 8501
|
||||
-> ``staging_check.py --mode stub`` against the FRESH 8501 (ADR-001 step 3,
|
||||
AC-4: validate exactly the artefact later retagged to prod).
|
||||
Same exit-code contract (0 = ok). This trades prod for staging ONLY (8501),
|
||||
NEVER prod (8500) (AC-9): all build/recreate/validate targets are the staging
|
||||
service — passed EXPLICITLY below, not left to hook defaults (review P2).
|
||||
|
||||
Synchronous ssh is fine here (unlike Phase B): recreating staging does not kill
|
||||
the prod worker running this code. Bounded by ``_REBUILD_TIMEOUT``.
|
||||
@@ -248,17 +260,18 @@ def rebuild_staging_image(repo: str, branch: str, sha: str) -> tuple[bool, str]:
|
||||
if not target:
|
||||
return False, "no ssh host configured for staging rebuild"
|
||||
host_ctx = _host_worktree_path(repo, branch)
|
||||
# We pass ONLY GIT_SHA (validated commit -> revision label, the shared anchor
|
||||
# with Strategy B), BUILD_CONTEXT (the validated worktree to build FROM) and
|
||||
# TARGET_IMAGE (the staging image name to retag in prod later). COMPOSE_PROFILE
|
||||
# / TARGET_SERVICE / TARGET_PORT are deliberately omitted so the hook keeps its
|
||||
# built-in STAGING defaults (profile=staging, orchestrator-staging, 8501): this
|
||||
# rebuild/recreate must touch STAGING ONLY (8501), NEVER prod (8500) (AC-9), and
|
||||
# the prod defaults are never reachable on this path.
|
||||
# Pass the STAGING target explicitly (service/port/profile/container), so the
|
||||
# rebuild + recreate + staging_check can never drift onto the prod 8500 service
|
||||
# even if the hook's defaults change (AC-9, review P2). STAGING_CONTAINER is the
|
||||
# container staging_check is docker-exec'd inside (step 3b).
|
||||
env_assignments = (
|
||||
f"GIT_SHA={shlex.quote(sha)} "
|
||||
f"BUILD_CONTEXT={shlex.quote(host_ctx)} "
|
||||
f"TARGET_IMAGE={shlex.quote(settings.deploy_prod_source_image)}"
|
||||
f"TARGET_IMAGE={shlex.quote(settings.deploy_prod_source_image)} "
|
||||
f"TARGET_SERVICE={shlex.quote(_STAGING_SERVICE)} "
|
||||
f"TARGET_PORT={shlex.quote(str(_STAGING_PORT))} "
|
||||
f"COMPOSE_PROFILE={shlex.quote(_STAGING_COMPOSE_PROFILE)} "
|
||||
f"STAGING_CONTAINER={shlex.quote(_STAGING_SERVICE)}"
|
||||
)
|
||||
inner = (
|
||||
f"cd {shlex.quote(settings.deploy_host_repo_path)} && "
|
||||
@@ -290,9 +303,10 @@ def check_staging_image_fresh(repo: str, work_item_id: str, branch: str) -> tupl
|
||||
a repo the feature is not real for -> ``(True, "image-freshness N/A for <repo>")``.
|
||||
2. Anchor: ``sha = validated_revision(repo, branch)``. Empty -> fail-closed
|
||||
``(False, ...)`` (AC-3): we never rebuild/promote without a known commit.
|
||||
3. Rebuild the staging image from that commit + recreate 8501 (host hook).
|
||||
Healthy -> ``(True, ...)``: the artefact we just validated is the exact one
|
||||
that will be retagged to prod (AC-4, loop closed). FAIL -> ``(False, ...)``
|
||||
3. Rebuild the staging image from that commit, recreate 8501, and run
|
||||
``staging_check.py --mode stub`` against the fresh 8501 (host hook). PASS ->
|
||||
``(True, ...)``: the artefact we just validated (build + e2e) is the exact
|
||||
one that will be retagged to prod (AC-4, loop closed). FAIL -> ``(False, ...)``
|
||||
-> the engine rolls back to ``development`` (AC-2).
|
||||
|
||||
Never-raise (AC-8): any internal error -> ``(False, "<reason>")``; an exception
|
||||
|
||||
Reference in New Issue
Block a user