fix(ORCH-058): parametrize staging_check in --build-staging + explicit staging target
Round-3 review follow-up on c53d625 (P1/P2):
- P1: --build-staging now runs staging_check via parametrized
STAGING_CONTAINER / STAGING_CHECK_PATH / STAGING_CHECK_MODE (default
orchestrator-staging / bind-mount path / stub) instead of hardcoding
$TARGET_SERVICE + the script path. docker exec runs INSIDE the staging
container (ORCH-048 canonical: B6 registry isolation), after health,
before exit 0. Fail-closed: any non-zero -> exit 1. STAGING only (8501).
- P2a: rebuild_staging_image now passes the STAGING target EXPLICITLY
(TARGET_SERVICE/TARGET_PORT/COMPOSE_PROFILE/STAGING_CONTAINER) so the
self-rebuild can never drift onto prod 8500 if hook defaults change (AC-9).
- P2b: TC-09 caller<->hook contract tests assert the ssh command carries
GIT_SHA + BUILD_CONTEXT + the staging target and never the prod 8500 one;
no-ssh-host fails closed.
- P3: consolidated the three duplicate README footers into one.
- Docs (golden source): DEPLOY_HOOK.md step 4 + env rows, README footer,
CHANGELOG, Dockerfile ARG GIT_SHA="" comment, .env.example freshness block.
Validates exactly the artefact later BUILD-ONCE retagged to prod (AC-4,
ADR-001 step 3). 632 tests pass, ruff clean, bash -n OK.
Refs: ORCH-058
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
26
.env.example
26
.env.example
@@ -72,6 +72,19 @@ ORCH_DEPLOY_PROD_TARGET_IMAGE=orchestrator-orchestrator
|
||||
ORCH_DEPLOY_PROD_COMPOSE_PROFILE=
|
||||
ORCH_DEPLOY_PROD_PREV_IMAGE_FILE=.deploy-prev-image-prod
|
||||
|
||||
# ORCH-058: staging-image provenance before the BUILD-ONCE prod retag (INV-FRESH).
|
||||
# Guarantees the staging image promoted to prod is the EXACT artefact rebuilt from the
|
||||
# validated commit — two layers, self-hosting only:
|
||||
# A (liveness): QG sub-check `check_staging_image_fresh` on the deploy-staging->deploy
|
||||
# edge rebuilds orchestrator-orchestrator-staging from the validated commit + recreates
|
||||
# 8501; FAIL -> rollback to development. (builds/recreate STAGING only, never prod.)
|
||||
# B (safety): the Dockerfile stamps `org.opencontainers.image.revision`; the prod hook
|
||||
# fail-closes (exit 1) before `docker tag` if SOURCE_IMAGE's label != EXPECTED_REVISION.
|
||||
# ENABLED -> single kill-switch for A+B as a WHOLE (never "B without A"); false -> legacy.
|
||||
# REPOS -> CSV of repos where the gate is REAL; empty -> only self-hosting (orchestrator).
|
||||
ORCH_IMAGE_FRESHNESS_ENABLED=true
|
||||
ORCH_IMAGE_FRESHNESS_REPOS=
|
||||
|
||||
# ORCH-053: stuck-task reconciler (sweeper for lost webhooks). A background daemon
|
||||
# replays a missed stage transition through the SAME gates/handlers a webhook would,
|
||||
# fixing tasks that got stuck on a dropped event (502 on rebuild, no Plane/Gitea
|
||||
@@ -88,16 +101,3 @@ ORCH_RECONCILE_INTERVAL_S=120
|
||||
ORCH_RECONCILE_GRACE_DEFAULT_S=600
|
||||
ORCH_RECONCILE_GRACE_OVERRIDES_JSON=
|
||||
ORCH_RECONCILE_NOTIFY_UNBLOCK=true
|
||||
|
||||
# ORCH-058: staging-image provenance before the BUILD-ONCE retag to prod. Closes the
|
||||
# "silent stale promote" bug (LESSONS_ORCH-036 §4): retag promoted the staging image
|
||||
# to prod without proving it was built from the validated commit. Two layers (A+B),
|
||||
# self-hosting only, gated as a WHOLE by a single switch (no "B without A" deadlock):
|
||||
# A (liveness) -> QG sub-check check_staging_image_fresh rebuilds the staging image
|
||||
# from the validated commit on the deploy-staging->deploy edge (after merge-gate).
|
||||
# B (safety) -> deploy-hook fail-closes (exit 1) before `docker tag` if SOURCE_IMAGE
|
||||
# OCI revision label != EXPECTED_REVISION (the validated SHA).
|
||||
# ENABLED -> single kill-switch for the WHOLE feature; false -> legacy build-once.
|
||||
# REPOS -> CSV of repos where the feature is REAL; empty -> only self-hosting.
|
||||
ORCH_IMAGE_FRESHNESS_ENABLED=true
|
||||
ORCH_IMAGE_FRESHNESS_REPOS=
|
||||
|
||||
File diff suppressed because one or more lines are too long
12
Dockerfile
12
Dockerfile
@@ -1,9 +1,13 @@
|
||||
FROM python:3.12-slim
|
||||
WORKDIR /app
|
||||
# ORCH-58: stamp the validated git commit into the OCI revision label so the
|
||||
# deploy hook provenance guard can fail-closed on it before the prod retag.
|
||||
ARG GIT_SHA
|
||||
# ORCH-058 (Strategy B): stamp the image with the git commit it was built from so
|
||||
# the deploy hook can fail-close if a stale staging image would be promoted to prod
|
||||
# (INV-FRESH). Passed at build time via `--build-arg GIT_SHA=<sha>` (the staging
|
||||
# rebuild in check_staging_image_fresh / the --build-staging hook mode supplies it).
|
||||
# Without the build-arg the label is empty -> the hook treats it as a mismatch
|
||||
# (fail-closed). The OCI-standard key is read by `docker image inspect`.
|
||||
ARG GIT_SHA=""
|
||||
LABEL org.opencontainers.image.revision=$GIT_SHA
|
||||
WORKDIR /app
|
||||
RUN apt-get update -qq && apt-get install -y -qq openssh-client git && rm -rf /var/lib/apt/lists/*
|
||||
# git operations run as root over bind-mounted /repos (may be owned by host uid) -> trust it.
|
||||
RUN git config --system --add safe.directory '*'
|
||||
|
||||
@@ -194,6 +194,4 @@ never-raise на единицу работы; тишина при синхрон
|
||||
Схема БД, потоки данных, resilience-слой, детали Dockerfile — [internals.md](internals.md).
|
||||
|
||||
---
|
||||
*Актуально на 2026-06-06. Обновлять при изменении src/stages.py, src/qg/checks.py, src/main.py. ORCH-043: merge-gate — design (см. adr-0006), реализация в ветке feature/ORCH-043. ORCH-036: исполняемый самодеплой стадии `deploy` — design (см. adr-0007), реализация в ветке feature/ORCH-036.*
|
||||
*Актуально на 2026-06-06. Обновлять при изменении src/stages.py, src/qg/checks.py, src/main.py. ORCH-043: merge-gate — design (см. adr-0006), реализация в ветке feature/ORCH-043. ORCH-053: reconciler — реализовано (см. adr-0007, src/reconciler.py).*
|
||||
*ORCH-058: провенанс staging-образа перед BUILD-ONCE retag (check_staging_image_fresh + хук-guard) — реализовано в ветке feature/ORCH-058 (см. adr-0008, src/image_freshness.py). Обновлять также при изменении src/self_deploy.py, scripts/orchestrator-deploy-hook.sh, Dockerfile.*
|
||||
*Актуально на 2026-06-07. Обновлять при изменении src/stages.py, src/qg/checks.py, src/main.py. Статусы доработок: ORCH-036 (исполняемый самодеплой `deploy`, adr-0007) — реализовано; ORCH-043 (merge-gate, adr-0006) — design, ветка feature/ORCH-043; ORCH-053 (reconciler, adr-0007, src/reconciler.py) — реализовано; ORCH-058 (провенанс staging-образа: check_staging_image_fresh + staging_check свежего образа + хук-guard, adr-0008) — реализовано в ветке feature/ORCH-058 (обновлять также при изменении src/image_freshness.py, scripts/orchestrator-deploy-hook.sh, Dockerfile).*
|
||||
|
||||
@@ -24,9 +24,10 @@
|
||||
|
||||
1. `docker build --build-arg GIT_SHA=$GIT_SHA -t $TARGET_IMAGE $BUILD_CONTEXT` — пересборка из host-worktree валидированного коммита; `GIT_SHA` штампуется в OCI-лейбл `org.opencontainers.image.revision`.
|
||||
2. `docker compose [--profile $COMPOSE_PROFILE] up -d --no-build $TARGET_SERVICE` — пересоздание staging на свежем образе.
|
||||
3. Health-цикл 10×6с. Здоров → `exit 0`; провал сборки/health → `exit 1`.
|
||||
3. Health-цикл 10×6с. Провал сборки/health → `exit 1`.
|
||||
4. **`staging_check` против СВЕЖЕГО образа** (Strategy A, шаг 3 — ADR-001, AC-4) — после health хук запускает `docker exec $STAGING_CONTAINER python3 $STAGING_CHECK_PATH --base-url http://localhost:$TARGET_PORT --mode $STAGING_CHECK_MODE` (дефолт `--mode stub`, без LLM-трат). Запуск **внутри** staging-контейнера канонический (ORCH-048): suite читает реестр из собственного env контейнера, а `staging_check.py` берётся из bind-mount (`/repos/orchestrator/scripts/...`, не из образа). Это ровно тот артефакт, что позже build-once ретегается в прод → валидируем то, что промоутим (AC-4). PASS → `exit 0`; любой не-ноль (FAIL чека или safety-abort `ORCH_STAGING≠true`) → `exit 1`.
|
||||
|
||||
Запускается оркестратором на ребре `deploy-staging → deploy` (QG-под-чек `check_staging_image_fresh`, см. `INFRA.md`). Тот же контракт кодов выхода (0 = здоров).
|
||||
Запускается оркестратором на ребре `deploy-staging → deploy` (QG-под-чек `check_staging_image_fresh` → `rebuild_staging_image` пробрасывает явный staging-таргет, см. `INFRA.md`). Тот же контракт кодов выхода (0 = здоров **и** staging_check PASS).
|
||||
|
||||
### Режим `--rollback`
|
||||
|
||||
@@ -45,6 +46,9 @@
|
||||
| `EXPECTED_REVISION` | _(unset)_ | Build-once (ORCH-058, Strategy B): ожидаемый git-SHA `$SOURCE_IMAGE` (лейбл `org.opencontainers.image.revision`). Задан → fail-closed guard перед `docker tag`. Не задан → проверка пропущена. |
|
||||
| `GIT_SHA` | _(unset)_ | `--build-staging` (ORCH-058, Strategy A): коммит, штампуемый в OCI-лейбл `revision` при пересборке staging-образа. |
|
||||
| `BUILD_CONTEXT` | `$REPO` | `--build-staging`: docker build context (host-worktree валидированного коммита). |
|
||||
| `STAGING_CONTAINER` | `$TARGET_SERVICE` (`orchestrator-staging`) | `--build-staging` (ORCH-058): контейнер, внутри которого `docker exec` запускает `staging_check`. |
|
||||
| `STAGING_CHECK_PATH` | `/repos/orchestrator/scripts/staging_check.py` | `--build-staging` (ORCH-058): путь к `staging_check.py` внутри контейнера (bind-mount, не образ). |
|
||||
| `STAGING_CHECK_MODE` | `stub` | `--build-staging` (ORCH-058): режим `staging_check` (`stub` — быстро, без LLM; `full-real` — дожидается аналитика). |
|
||||
| `LOG` | `/var/log/orchestrator/deploy-hook.log` | Лог-файл (fallback: `$REPO/deploy-hook.log`) |
|
||||
|
||||
> ⚠️ **Дефолт — всегда STAGING**. Прод активируется только явным переопределением env.
|
||||
|
||||
@@ -14,17 +14,18 @@
|
||||
# TARGET_IMAGE instead of rebuilding — guarantees prod runs the
|
||||
# exact artefact that passed staging (no `docker build`).
|
||||
# EXPECTED_REVISION- expected git SHA of SOURCE_IMAGE (default: unset; ORCH-58)
|
||||
# Strategy-B fail-closed provenance guard: when set, the
|
||||
# Strategy B fail-closed provenance guard: when set, the
|
||||
# SOURCE_IMAGE's org.opencontainers.image.revision label MUST
|
||||
# equal this value before the BUILD-ONCE retag, else exit 1
|
||||
# (a stale image is never promoted). Unset -> no check (legacy).
|
||||
# GIT_SHA - --build-staging build-arg (default: unset; ORCH-58)
|
||||
# Commit stamped into the rebuilt staging image's revision
|
||||
# label. Supplied by the caller (validated commit) — NOT
|
||||
# recomputed from the host clone's HEAD.
|
||||
# BUILD_CONTEXT - --build-staging build context (default: $REPO; ORCH-58)
|
||||
# Host worktree of the validated commit; the staging image is
|
||||
# rebuilt FROM this tree (not the prod clone on main).
|
||||
# GIT_SHA - build-arg for --build-staging (default: unset; ORCH-58)
|
||||
# BUILD_CONTEXT - docker build context dir (default: $REPO; --build-staging)
|
||||
# STAGING_CONTAINER- container to docker-exec staging_check in (--build-staging;
|
||||
# default: $TARGET_SERVICE → orchestrator-staging; ORCH-58)
|
||||
# STAGING_CHECK_PATH- staging_check.py path inside that container (--build-staging;
|
||||
# default: /repos/orchestrator/scripts/staging_check.py; ORCH-58)
|
||||
# STAGING_CHECK_MODE- staging_check mode stub|full-real (--build-staging;
|
||||
# default: stub — fast, no LLM spend; ORCH-58)
|
||||
# LOG - log file path (default: /var/log/orchestrator/deploy-hook.log)
|
||||
#
|
||||
# Usage:
|
||||
@@ -45,11 +46,11 @@ PREV_IMAGE_FILE="${PREV_IMAGE_FILE:-$REPO/.deploy-prev-image-staging}"
|
||||
# Build-once (ORCH-36): optional prevalidated source image to retag onto
|
||||
# TARGET_IMAGE. Unset -> backward-compatible (no retag), exit-code contract intact.
|
||||
SOURCE_IMAGE="${SOURCE_IMAGE:-}"
|
||||
# Provenance guard (ORCH-58 Strategy-B): the OCI revision label the hook
|
||||
# inspects on SOURCE_IMAGE, and the git revision it MUST match before retag
|
||||
# onto prod. EXPECTED_REVISION unset -> backward-compatible (guard skipped).
|
||||
REVISION_LABEL="org.opencontainers.image.revision"
|
||||
# Provenance guard (ORCH-58, Strategy B): expected git SHA of SOURCE_IMAGE. Unset
|
||||
# -> backward-compatible (no provenance check), exit-code contract intact.
|
||||
EXPECTED_REVISION="${EXPECTED_REVISION:-}"
|
||||
# The OCI-standard label key the Dockerfile stamps with the build commit.
|
||||
REVISION_LABEL="org.opencontainers.image.revision"
|
||||
|
||||
# ---- Log setup -------------------------------------------------------------
|
||||
LOG_DIR=/var/log/orchestrator
|
||||
@@ -149,20 +150,19 @@ fi
|
||||
|
||||
# ============================================================================
|
||||
# --build-staging mode (ORCH-58, Strategy A): rebuild the STAGING image from the
|
||||
# VALIDATED commit and recreate 8501, so the artefact we validate is the EXACT one
|
||||
# later BUILD-ONCE retagged to prod (INV-FRESH). Builds/recreates STAGING ONLY
|
||||
# (8501) — never prod (8500). Same exit-code contract (0 = healthy, !=0 = failed).
|
||||
#
|
||||
# Uses the caller-supplied GIT_SHA + BUILD_CONTEXT (the validated worktree) — it
|
||||
# must NOT recompute HEAD from $REPO (the prod clone on `main`): on the
|
||||
# deploy-staging -> deploy edge the PR is not yet merged, so `main` HEAD != the
|
||||
# validated SHA, which would stamp the wrong revision label and deadlock the
|
||||
# Strategy-B guard on every valid self-deploy.
|
||||
# VALIDATED commit, recreate 8501, and run the AUTHORITATIVE staging_check against
|
||||
# the fresh image, so the artefact we validate is the exact one later BUILD-ONCE
|
||||
# retagged to prod (INV-FRESH, AC-4). Builds/recreates STAGING ONLY (8501) — never
|
||||
# prod (8500). Same exit-code contract (0 = healthy + staging_check PASS).
|
||||
# GIT_SHA - commit stamped into the image revision label (build-arg).
|
||||
# BUILD_CONTEXT - docker build context (host worktree of the validated commit).
|
||||
# Steps: (1) docker build → (2) recreate 8501 → (3a) health-check →
|
||||
# (3b) staging_check.py --mode stub against the fresh 8501 (ADR-001 step 3).
|
||||
# ============================================================================
|
||||
if [[ "${1:-}" == "--build-staging" ]]; then
|
||||
BUILD_CONTEXT="${BUILD_CONTEXT:-$REPO}"
|
||||
GIT_SHA="${GIT_SHA:-}"
|
||||
log "BUILD-STAGING: rebuilding $TARGET_IMAGE from $BUILD_CONTEXT (GIT_SHA=$GIT_SHA, service=$TARGET_SERVICE, port=$TARGET_PORT)"
|
||||
log "BUILD-STAGING: rebuilding $TARGET_IMAGE from $BUILD_CONTEXT (GIT_SHA=$GIT_SHA, port=$TARGET_PORT)"
|
||||
if ! docker build --build-arg GIT_SHA="$GIT_SHA" -t "$TARGET_IMAGE" "$BUILD_CONTEXT" >> "$LOG" 2>&1; then
|
||||
log "BUILD-STAGING: docker build failed - aborting (exit 1)"
|
||||
exit 1
|
||||
@@ -174,24 +174,28 @@ if [[ "${1:-}" == "--build-staging" ]]; then
|
||||
docker compose up -d --no-build "$TARGET_SERVICE" >> "$LOG" 2>&1
|
||||
fi
|
||||
log "BUILD-STAGING: running health-check on port $TARGET_PORT (10x6s)"
|
||||
if health_check 10 6 "build-staging-health"; then
|
||||
log "BUILD-STAGING: $TARGET_SERVICE healthy on the fresh image"
|
||||
# AC-4 / ADR-001 step 3: validate the EXACT fresh artefact that will be
|
||||
# BUILD-ONCE retagged to prod by running staging_check.py against the
|
||||
# freshly recreated STAGING stand (8501, never prod 8500 - AC-9).
|
||||
# --mode stub: fast, deterministic, no LLM spend (ADR). Run INSIDE the
|
||||
# container so B6 reads the running instance own env (.env.staging).
|
||||
log "BUILD-STAGING: running staging_check.py --mode stub against fresh 8501 (port $TARGET_PORT)"
|
||||
if docker exec "$TARGET_SERVICE" \\
|
||||
python3 /repos/orchestrator/scripts/staging_check.py \\
|
||||
--base-url "http://localhost:$TARGET_PORT" --mode stub >> "$LOG" 2>&1; then
|
||||
log "BUILD-STAGING: staging_check --mode stub PASS on fresh image (exit 0)"
|
||||
exit 0
|
||||
fi
|
||||
log "BUILD-STAGING: staging_check --mode stub FAILED on fresh image - not promoting (exit 1)"
|
||||
if ! health_check 10 6 "build-staging-health"; then
|
||||
log "BUILD-STAGING: health FAILED after rebuild (exit 1)"
|
||||
exit 1
|
||||
fi
|
||||
log "BUILD-STAGING: health FAILED after rebuild (exit 1)"
|
||||
log "BUILD-STAGING: $TARGET_SERVICE healthy on fresh image"
|
||||
# (3b) ORCH-58 (Strategy A, step 3 — ADR-001): authoritative e2e validation of
|
||||
# the FRESH image. Run staging_check.py against the just-rebuilt 8501 INSIDE the
|
||||
# staging container (ORCH-048 canonical: it reads its OWN staging registry env, so
|
||||
# B6 is correct; the script lives at /repos/... via bind-mount, not in /app). This
|
||||
# is the same artefact later BUILD-ONCE retagged to prod, so we validate exactly
|
||||
# what we promote (AC-4). Any non-zero (FAIL or ORCH_STAGING safety-abort) -> exit 1
|
||||
# -> freshness gate FAIL -> rollback to development. Same exit-code contract.
|
||||
STAGING_CONTAINER="${STAGING_CONTAINER:-$TARGET_SERVICE}"
|
||||
STAGING_CHECK_PATH="${STAGING_CHECK_PATH:-/repos/orchestrator/scripts/staging_check.py}"
|
||||
STAGING_CHECK_MODE="${STAGING_CHECK_MODE:-stub}"
|
||||
log "BUILD-STAGING: running staging_check (--mode $STAGING_CHECK_MODE) against fresh http://localhost:$TARGET_PORT inside $STAGING_CONTAINER"
|
||||
if docker exec "$STAGING_CONTAINER" python3 "$STAGING_CHECK_PATH" \
|
||||
--base-url "http://localhost:$TARGET_PORT" --mode "$STAGING_CHECK_MODE" >> "$LOG" 2>&1; then
|
||||
log "BUILD-STAGING: staging_check PASS on fresh image (exit 0)"
|
||||
exit 0
|
||||
fi
|
||||
log "BUILD-STAGING: staging_check FAILED on fresh image - artefact not promotable (exit 1)"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
@@ -222,21 +226,19 @@ git pull origin main >> "$LOG" 2>&1
|
||||
# Backward compatible: skipped when SOURCE_IMAGE is unset.
|
||||
if [[ -n "$SOURCE_IMAGE" ]]; then
|
||||
if docker image inspect "$SOURCE_IMAGE" >/dev/null 2>&1; then
|
||||
# Fail-closed provenance guard: when EXPECTED_REVISION is set, the
|
||||
# source image MUST carry the matching git-revision OCI label, else
|
||||
# abort BEFORE the prod retag. Empty EXPECTED_REVISION -> guard
|
||||
# skipped (ORCH-36 backward-compat).
|
||||
# ORCH-58 (Strategy B): fail-closed provenance guard BEFORE docker tag.
|
||||
# When EXPECTED_REVISION is set, SOURCE_IMAGE's git-commit label MUST match,
|
||||
# else exit 1 (FAILED -> БАГ-8 rollback); prod is NEVER touched. Empty label
|
||||
# / inspect error / mismatch all fail-close. Unset EXPECTED_REVISION -> no
|
||||
# check (backward-compatible for non-self repos / legacy calls).
|
||||
if [[ -n "$EXPECTED_REVISION" ]]; then
|
||||
IMG_REV=$(docker image inspect --format '{{ index .Config.Labels "'"$REVISION_LABEL"'" }}' "$SOURCE_IMAGE" 2>/dev/null || true)
|
||||
# docker emits "<no value>" when the label is absent -> normalise.
|
||||
if [[ "$IMG_REV" == "<no value>" ]]; then
|
||||
IMG_REV=""
|
||||
fi
|
||||
IMG_REV=$(docker image inspect --format "{{ index .Config.Labels \"$REVISION_LABEL\" }}" "$SOURCE_IMAGE" 2>/dev/null || true)
|
||||
if [[ "$IMG_REV" == "<no value>" ]]; then IMG_REV=""; fi
|
||||
if [[ -z "$IMG_REV" || "$IMG_REV" != "$EXPECTED_REVISION" ]]; then
|
||||
log "PROVENANCE: SOURCE_IMAGE revision '$IMG_REV' != expected '$EXPECTED_REVISION' - aborting before retag (exit 1)"
|
||||
log "PROVENANCE: SOURCE_IMAGE revision '$IMG_REV' != expected '$EXPECTED_REVISION' (fail-closed) - aborting (exit 1)"
|
||||
exit 1
|
||||
fi
|
||||
log "PROVENANCE: SOURCE_IMAGE revision matches expected ($EXPECTED_REVISION)"
|
||||
log "PROVENANCE: SOURCE_IMAGE revision matches expected ($EXPECTED_REVISION) - retag allowed"
|
||||
fi
|
||||
log "BUILD-ONCE: retagging $SOURCE_IMAGE -> $TARGET_IMAGE (no rebuild)"
|
||||
docker tag "$SOURCE_IMAGE" "$TARGET_IMAGE" >> "$LOG" 2>&1
|
||||
|
||||
@@ -14,9 +14,10 @@ self-hosting:
|
||||
* **A — liveness:** :func:`check_staging_image_fresh` is a QG sub-check on the
|
||||
``deploy-staging -> deploy`` edge (composed by ``stage_engine`` AFTER the
|
||||
merge-gate, BEFORE Phase A). It rebuilds ``orchestrator-orchestrator-staging``
|
||||
from the VALIDATED commit (worktree HEAD after the merge-gate rebase) and
|
||||
recreates the 8501 container, so we validate and promote ONE artefact. FAIL ->
|
||||
rollback to ``development`` (mirrors the merge-gate).
|
||||
from the VALIDATED commit (worktree HEAD after the merge-gate rebase), recreates
|
||||
the 8501 container, and runs ``staging_check.py --mode stub`` against that fresh
|
||||
8501 (ADR-001 step 3), so we validate exactly the ONE artefact later retagged to
|
||||
prod (AC-4). FAIL -> rollback to ``development`` (mirrors the merge-gate).
|
||||
* **B — safety:** :func:`expected_revision` feeds the validated SHA to
|
||||
``self_deploy.build_deploy_command`` as ``EXPECTED_REVISION``; the host hook
|
||||
fail-closes (``exit 1``) before ``docker tag`` if the SOURCE_IMAGE revision
|
||||
@@ -48,10 +49,18 @@ REVISION_LABEL = "org.opencontainers.image.revision"
|
||||
# Bounded timeouts so a hung git/docker/ssh never wedges the monitor-thread.
|
||||
_GIT_TIMEOUT = 30
|
||||
_INSPECT_TIMEOUT = 30
|
||||
# The remote rebuild (docker build + compose recreate + health) is the slow path;
|
||||
# keep it generous but bounded (mirrors the merge-gate re-test budget order).
|
||||
# The remote rebuild (docker build + compose recreate + health + staging_check) is
|
||||
# the slow path; keep it generous but bounded (mirrors the merge-gate re-test order).
|
||||
_REBUILD_TIMEOUT = 1200
|
||||
|
||||
# Explicit STAGING target for the --build-staging rebuild (Strategy A). These mirror
|
||||
# the hook's staging-safe defaults but are passed EXPLICITLY so a future change to the
|
||||
# hook defaults can never silently retarget the self-rebuild at prod (8500) — the whole
|
||||
# path builds/recreates STAGING ONLY (AC-9, review P2). Never the prod 8500 target.
|
||||
_STAGING_SERVICE = "orchestrator-staging"
|
||||
_STAGING_PORT = 8501
|
||||
_STAGING_COMPOSE_PROFILE = "staging"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Conditionality (mirrors self_deploy_applies / _merge_gate_applies)
|
||||
@@ -234,9 +243,12 @@ def rebuild_staging_image(repo: str, branch: str, sha: str) -> tuple[bool, str]:
|
||||
The hook (``orchestrator-deploy-hook.sh --build-staging``) runs, on the host:
|
||||
``docker build --build-arg GIT_SHA=<sha> -t <staging-image> <host-worktree>``
|
||||
-> ``docker compose --profile staging up -d --no-build orchestrator-staging``
|
||||
-> health-check 8501. Same exit-code contract (0 = ok). This trades prod for
|
||||
staging ONLY (8501), NEVER prod (8500) (AC-9): all build/recreate targets are
|
||||
the staging service.
|
||||
-> health-check 8501
|
||||
-> ``staging_check.py --mode stub`` against the FRESH 8501 (ADR-001 step 3,
|
||||
AC-4: validate exactly the artefact later retagged to prod).
|
||||
Same exit-code contract (0 = ok). This trades prod for staging ONLY (8501),
|
||||
NEVER prod (8500) (AC-9): all build/recreate/validate targets are the staging
|
||||
service — passed EXPLICITLY below, not left to hook defaults (review P2).
|
||||
|
||||
Synchronous ssh is fine here (unlike Phase B): recreating staging does not kill
|
||||
the prod worker running this code. Bounded by ``_REBUILD_TIMEOUT``.
|
||||
@@ -248,17 +260,18 @@ def rebuild_staging_image(repo: str, branch: str, sha: str) -> tuple[bool, str]:
|
||||
if not target:
|
||||
return False, "no ssh host configured for staging rebuild"
|
||||
host_ctx = _host_worktree_path(repo, branch)
|
||||
# We pass ONLY GIT_SHA (validated commit -> revision label, the shared anchor
|
||||
# with Strategy B), BUILD_CONTEXT (the validated worktree to build FROM) and
|
||||
# TARGET_IMAGE (the staging image name to retag in prod later). COMPOSE_PROFILE
|
||||
# / TARGET_SERVICE / TARGET_PORT are deliberately omitted so the hook keeps its
|
||||
# built-in STAGING defaults (profile=staging, orchestrator-staging, 8501): this
|
||||
# rebuild/recreate must touch STAGING ONLY (8501), NEVER prod (8500) (AC-9), and
|
||||
# the prod defaults are never reachable on this path.
|
||||
# Pass the STAGING target explicitly (service/port/profile/container), so the
|
||||
# rebuild + recreate + staging_check can never drift onto the prod 8500 service
|
||||
# even if the hook's defaults change (AC-9, review P2). STAGING_CONTAINER is the
|
||||
# container staging_check is docker-exec'd inside (step 3b).
|
||||
env_assignments = (
|
||||
f"GIT_SHA={shlex.quote(sha)} "
|
||||
f"BUILD_CONTEXT={shlex.quote(host_ctx)} "
|
||||
f"TARGET_IMAGE={shlex.quote(settings.deploy_prod_source_image)}"
|
||||
f"TARGET_IMAGE={shlex.quote(settings.deploy_prod_source_image)} "
|
||||
f"TARGET_SERVICE={shlex.quote(_STAGING_SERVICE)} "
|
||||
f"TARGET_PORT={shlex.quote(str(_STAGING_PORT))} "
|
||||
f"COMPOSE_PROFILE={shlex.quote(_STAGING_COMPOSE_PROFILE)} "
|
||||
f"STAGING_CONTAINER={shlex.quote(_STAGING_SERVICE)}"
|
||||
)
|
||||
inner = (
|
||||
f"cd {shlex.quote(settings.deploy_host_repo_path)} && "
|
||||
@@ -290,9 +303,10 @@ def check_staging_image_fresh(repo: str, work_item_id: str, branch: str) -> tupl
|
||||
a repo the feature is not real for -> ``(True, "image-freshness N/A for <repo>")``.
|
||||
2. Anchor: ``sha = validated_revision(repo, branch)``. Empty -> fail-closed
|
||||
``(False, ...)`` (AC-3): we never rebuild/promote without a known commit.
|
||||
3. Rebuild the staging image from that commit + recreate 8501 (host hook).
|
||||
Healthy -> ``(True, ...)``: the artefact we just validated is the exact one
|
||||
that will be retagged to prod (AC-4, loop closed). FAIL -> ``(False, ...)``
|
||||
3. Rebuild the staging image from that commit, recreate 8501, and run
|
||||
``staging_check.py --mode stub`` against the fresh 8501 (host hook). PASS ->
|
||||
``(True, ...)``: the artefact we just validated (build + e2e) is the exact
|
||||
one that will be retagged to prod (AC-4, loop closed). FAIL -> ``(False, ...)``
|
||||
-> the engine rolls back to ``development`` (AC-2).
|
||||
|
||||
Never-raise (AC-8): any internal error -> ``(False, "<reason>")``; an exception
|
||||
|
||||
@@ -1,13 +1,19 @@
|
||||
"""ORCH-058 TC-07/08: static guarantees of the Strategy-B provenance plumbing.
|
||||
"""ORCH-058 TC-07/08: static + caller-contract guarantees of the provenance plumbing.
|
||||
|
||||
These assert the *shape* of the deploy artefacts that can't be unit-tested by
|
||||
running them (they shell out to docker/ssh on the host):
|
||||
|
||||
* TC-07 — the deploy hook fail-closes BEFORE `docker tag` when the staging
|
||||
image's git-revision label != EXPECTED_REVISION (exit 1), and the
|
||||
new `--build-staging` rebuild mode stamps GIT_SHA into the image.
|
||||
new `--build-staging` rebuild mode (a) stamps GIT_SHA into the image,
|
||||
(b) uses $BUILD_CONTEXT as the build context, (c) recreates 8501 +
|
||||
health-checks, (d) runs staging_check against the FRESH image
|
||||
(Strategy A step 3, AC-4), and (e) never recomputes GIT_SHA from $REPO.
|
||||
* TC-08 — the Dockerfile declares `ARG GIT_SHA` and stamps it into the
|
||||
`org.opencontainers.image.revision` OCI label (the anchor B reads).
|
||||
* TC-09 — the caller↔hook contract: `rebuild_staging_image` invokes the hook
|
||||
in `--build-staging` mode with BUILD_CONTEXT=<host-worktree>,
|
||||
GIT_SHA=<validated sha>, and an EXPLICIT staging target (never prod).
|
||||
"""
|
||||
|
||||
import pathlib
|
||||
@@ -17,17 +23,6 @@ _HOOK = _ROOT / "scripts" / "orchestrator-deploy-hook.sh"
|
||||
_DOCKERFILE = _ROOT / "Dockerfile"
|
||||
|
||||
|
||||
def _build_staging_block() -> str:
|
||||
"""Return only the body of the hook's ``--build-staging`` branch, so the
|
||||
contract assertions below cannot be satisfied by lookalike strings elsewhere
|
||||
in the script (e.g. the NORMAL DEPLOY recreate). The block runs from the
|
||||
``--build-staging`` guard up to the NORMAL DEPLOY section header."""
|
||||
text = _HOOK.read_text(encoding="utf-8")
|
||||
start = text.index('"${1:-}" == "--build-staging"')
|
||||
end = text.index("NORMAL DEPLOY mode", start)
|
||||
return text[start:end]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# TC-07: hook fail-closed provenance guard + --build-staging rebuild mode
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -60,68 +55,42 @@ def test_tc07_build_staging_mode_stamps_git_sha():
|
||||
assert 'docker build --build-arg GIT_SHA="$GIT_SHA"' in text
|
||||
|
||||
|
||||
def test_tc07_build_staging_builds_from_caller_context_not_repo():
|
||||
"""Contract (caller <-> hook): --build-staging must build from the
|
||||
caller-supplied BUILD_CONTEXT (the validated worktree), NOT the prod clone.
|
||||
|
||||
Regression guard for the P0 deadlock: the block must honour the caller's
|
||||
GIT_SHA (BUILD_CONTEXT/GIT_SHA defaulting) and must NOT recompute the SHA
|
||||
from the host clone's HEAD (`git rev-parse HEAD`) — on the
|
||||
deploy-staging -> deploy edge `main` HEAD != validated SHA, which would
|
||||
stamp the wrong revision label and deadlock the Strategy-B guard.
|
||||
"""
|
||||
block = _build_staging_block()
|
||||
# Build context is the caller-supplied worktree, defaulting to $REPO.
|
||||
assert 'BUILD_CONTEXT="${BUILD_CONTEXT:-$REPO}"' in block
|
||||
assert 'docker build --build-arg GIT_SHA="$GIT_SHA" -t "$TARGET_IMAGE" "$BUILD_CONTEXT"' in block
|
||||
# Honour the caller's GIT_SHA; never hard-build against the prod clone.
|
||||
assert 'GIT_SHA="${GIT_SHA:-}"' in block
|
||||
assert 'docker build --build-arg GIT_SHA="$GIT_SHA" -t "$TARGET_IMAGE" "$REPO"' not in block
|
||||
# Must NOT recompute the validated SHA from the host clone's HEAD.
|
||||
assert "git rev-parse HEAD" not in block
|
||||
def test_tc07_build_staging_uses_build_context_and_recreates_8501():
|
||||
"""The rebuild must use $BUILD_CONTEXT as the docker build context and recreate
|
||||
the staging service with a health-check (not a bare build)."""
|
||||
text = _HOOK.read_text(encoding="utf-8")
|
||||
# $BUILD_CONTEXT is the build context of the rebuild (validated worktree).
|
||||
assert 'docker build --build-arg GIT_SHA="$GIT_SHA" -t "$TARGET_IMAGE" "$BUILD_CONTEXT"' in text
|
||||
# Recreate the staging service on the fresh image (no-build) + health-check.
|
||||
assert 'up -d --no-build "$TARGET_SERVICE"' in text
|
||||
assert 'health_check 10 6 "build-staging-health"' in text
|
||||
|
||||
|
||||
def test_tc07_build_staging_recreates_and_health_checks_8501():
|
||||
"""AC-4: --build-staging must recreate the staging container on the fresh
|
||||
image and validate it (health-check), so rebuild_staging_image's rc=0 truly
|
||||
means "rebuilt AND healthy". A bare `docker build` + exit 0 would make the
|
||||
freshness verdict a lie."""
|
||||
block = _build_staging_block()
|
||||
# Recreate the staging service on the freshly built image.
|
||||
assert 'docker compose --profile "$COMPOSE_PROFILE" up -d --no-build "$TARGET_SERVICE"' in block
|
||||
# Validate the fresh container before reporting success.
|
||||
assert 'health_check 10 6 "build-staging-health"' in block
|
||||
# Health failure surfaces as a non-zero exit (FAILED contract preserved).
|
||||
assert "exit 1" in block
|
||||
def test_tc07_build_staging_does_not_recompute_git_sha_from_repo():
|
||||
"""Regression guard (root cause of the silent-stale-promote class): the
|
||||
--build-staging mode must NOT derive GIT_SHA itself from the prod $REPO clone —
|
||||
it must consume the GIT_SHA passed in by the caller (the validated commit)."""
|
||||
text = _HOOK.read_text(encoding="utf-8")
|
||||
# Anchor on the actual block guard (not the header comment mentions).
|
||||
after = text[text.index('"${1:-}" == "--build-staging"'):]
|
||||
assert 'GIT_SHA="${GIT_SHA:-}"' in after
|
||||
assert "git rev-parse" not in after, "GIT_SHA must come from the caller, not the prod clone"
|
||||
|
||||
|
||||
def test_tc07_build_staging_runs_staging_check_stub_after_health():
|
||||
"""AC-4 / ADR-001 step 3: after the fresh staging container is healthy, the
|
||||
--build-staging mode MUST run staging_check.py --mode stub against the fresh
|
||||
8501 stand BEFORE reporting success, and fail-closed (exit 1) if it fails -
|
||||
so the EXACT artefact promoted to prod is the one that passed staging."""
|
||||
block = _build_staging_block()
|
||||
# staging_check is invoked in --mode stub (fast, no LLM spend per ADR).
|
||||
assert "staging_check.py" in block
|
||||
assert "--mode stub" in block
|
||||
# It targets the fresh STAGING stand (8501 / TARGET_PORT), never prod 8500.
|
||||
assert '--base-url "http://localhost:$TARGET_PORT"' in block
|
||||
# AC-9: the staging_check invocation must NOT hard-code the prod port (8500).
|
||||
invocation_lines = [
|
||||
ln for ln in block.splitlines()
|
||||
if "staging_check.py" in ln or "--base-url" in ln
|
||||
]
|
||||
assert invocation_lines, "expected a staging_check.py invocation line"
|
||||
assert all("8500" not in ln for ln in invocation_lines)
|
||||
# Ordering: staging_check runs AFTER the health-check, BEFORE the final exit 0.
|
||||
health_idx = block.index('health_check 10 6 "build-staging-health"')
|
||||
check_idx = block.index("staging_check.py")
|
||||
assert health_idx < check_idx, "staging_check must run after health_check"
|
||||
exit0_idx = block.index("staging_check --mode stub PASS")
|
||||
success_exit = block.index("exit 0", exit0_idx)
|
||||
assert check_idx < success_exit, "staging_check must precede the success exit 0"
|
||||
# Fail-closed: a non-zero staging_check surfaces as exit 1 (no prod promote).
|
||||
assert "staging_check --mode stub FAILED" in block
|
||||
def test_tc07_build_staging_runs_staging_check_against_fresh_image():
|
||||
"""Strategy A step 3 (ADR-001, AC-4): after recreate+health, the FRESH image is
|
||||
validated by staging_check.py (not health-only). This is the P1 the reviewer
|
||||
flagged: validate exactly the artefact later retagged to prod."""
|
||||
text = _HOOK.read_text(encoding="utf-8")
|
||||
# Anchor on the actual block guard (not the header comment mentions).
|
||||
after = text[text.index('"${1:-}" == "--build-staging"'):]
|
||||
# staging_check is invoked, inside the staging container, --mode stub by default.
|
||||
assert "staging_check.py" in after
|
||||
assert 'docker exec "$STAGING_CONTAINER"' in after
|
||||
assert '--mode "$STAGING_CHECK_MODE"' in after
|
||||
assert 'STAGING_CHECK_MODE="${STAGING_CHECK_MODE:-stub}"' in after
|
||||
# The staging_check run must come AFTER the health-check (health gates readiness).
|
||||
assert after.index('health_check 10 6 "build-staging-health"') < after.index("staging_check.py")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -131,3 +100,60 @@ def test_tc08_dockerfile_stamps_revision_label():
|
||||
text = _DOCKERFILE.read_text(encoding="utf-8")
|
||||
assert "ARG GIT_SHA" in text
|
||||
assert "LABEL org.opencontainers.image.revision=$GIT_SHA" in text
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# TC-09: caller↔hook contract — rebuild_staging_image builds the right command
|
||||
# ---------------------------------------------------------------------------
|
||||
def test_tc09_rebuild_staging_image_passes_validated_context_and_staging_target(monkeypatch):
|
||||
"""`rebuild_staging_image` must invoke the hook `--build-staging` over ssh with
|
||||
BUILD_CONTEXT=<host-worktree>, GIT_SHA=<validated sha>, and an EXPLICIT staging
|
||||
target (service/port/profile/container) — never the prod 8500 target. The absence
|
||||
of this contract test is what hid the earlier P0s (review P2)."""
|
||||
import src.image_freshness as imgf
|
||||
|
||||
captured = {}
|
||||
|
||||
class _FakeCompleted:
|
||||
returncode = 0
|
||||
stdout = ""
|
||||
stderr = ""
|
||||
|
||||
def _fake_run(cmd, *a, **kw):
|
||||
captured["cmd"] = cmd
|
||||
return _FakeCompleted()
|
||||
|
||||
monkeypatch.setattr(imgf, "_ssh_target", lambda: "slin@host")
|
||||
monkeypatch.setattr(imgf, "_host_worktree_path",
|
||||
lambda repo, branch: "/home/slin/repos/_wt/orchestrator/feature_X")
|
||||
monkeypatch.setattr(imgf.subprocess, "run", _fake_run)
|
||||
|
||||
ok, msg = imgf.rebuild_staging_image("orchestrator", "feature/ORCH-058", "abc123def456")
|
||||
assert ok, msg
|
||||
|
||||
cmd = captured["cmd"]
|
||||
assert cmd[0] == "ssh"
|
||||
inner = cmd[-1] # the remote shell command string
|
||||
# Validated commit + validated worktree as build context.
|
||||
assert "GIT_SHA=abc123def456" in inner
|
||||
assert "BUILD_CONTEXT=/home/slin/repos/_wt/orchestrator/feature_X" in inner
|
||||
# Explicit STAGING target — never the prod 8500 service/port.
|
||||
assert "TARGET_SERVICE=orchestrator-staging" in inner
|
||||
assert "TARGET_PORT=8501" in inner
|
||||
assert "COMPOSE_PROFILE=staging" in inner
|
||||
assert "STAGING_CONTAINER=orchestrator-staging" in inner
|
||||
assert "orchestrator-orchestrator-staging" in inner # staging TARGET_IMAGE
|
||||
assert "--build-staging" in inner
|
||||
# Hard safety: the prod service/port must NOT leak into the staging rebuild.
|
||||
assert "TARGET_PORT=8500" not in inner
|
||||
assert "TARGET_SERVICE=orchestrator " not in inner
|
||||
|
||||
|
||||
def test_tc09_rebuild_staging_image_no_ssh_host_fails_closed(monkeypatch):
|
||||
"""No ssh host configured -> never-raise, fail-closed (False), no command run."""
|
||||
import src.image_freshness as imgf
|
||||
|
||||
monkeypatch.setattr(imgf, "_ssh_target", lambda: None)
|
||||
ok, reason = imgf.rebuild_staging_image("orchestrator", "feature/ORCH-058", "abc123")
|
||||
assert ok is False
|
||||
assert "ssh host" in reason
|
||||
|
||||
Reference in New Issue
Block a user