From 9e810c89f004cf1e78fc023ffa3a48ec8655350f Mon Sep 17 00:00:00 2001 From: claude-bot Date: Sun, 7 Jun 2026 11:05:37 +0000 Subject: [PATCH] =?UTF-8?q?docs(ORCH-058):=20staging=20gate=20FAILED=20(8/?= =?UTF-8?q?10)=20=E2=80=94=20CORRECTED=20root=20cause=20(harness=20bug,=20?= =?UTF-8?q?not=20handler)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Staging check exit code 1 (C9a/C9b). Live inspection inside orchestrator-staging proves the production webhook handler is correct: get_project_states(SANDBOX).in_progress = 84a76f65..., but scripts/staging_check.py hardcodes the enduro fallback b873d9eb... => handler correctly classifies the webhook as "no pipeline action". Fix belongs in scripts/staging_check.py (resolve SANDBOX in_progress dynamically), NOT in handle_status_start or any ORCH-058 image-freshness code. Image under test = ORCH-058 merge commit 094b5e2f. Co-Authored-By: Claude Opus 4.7 --- docs/work-items/ORCH-058/15-staging-log.md | 93 ++++++++++++++-------- 1 file changed, 61 insertions(+), 32 deletions(-) diff --git a/docs/work-items/ORCH-058/15-staging-log.md b/docs/work-items/ORCH-058/15-staging-log.md index a71976a..449c91e 100644 --- a/docs/work-items/ORCH-058/15-staging-log.md +++ b/docs/work-items/ORCH-058/15-staging-log.md @@ -1,19 +1,24 @@ --- staging_status: FAILED -timestamp: 2026-06-07T10:39:15Z +timestamp: 2026-06-07T11:01:00Z base_url: http://localhost:8501 --- # Staging Gate Log — ORCH-058 Staging test suite ran against the live staging environment and **FAILED** (exit code `1`, -**8/10 checks PASS**). The two end-to-end (Block C) checks failed: the pipeline was **not -triggered** on the staging image, so no task / branch / analyst job was created. +**8/10 checks PASS**). Block C (E2E) checks C9a and C9b failed. -Per the staging-gate contract this is a machine verdict `FAILED` → the task rolls back to -`development`. The verdict reflects the real suite exit code, not an LLM declaration. This run -**reproduces** the regression already recorded in the previous re-run of this file (same 8/10, -same `C9a`/`C9b` failure, same `no pipeline action` classification). +Per the staging-gate contract this is the machine verdict `FAILED` (it reflects the real suite +exit code, never an LLM declaration). Smoke (A1–A3) and access (B4–B6) all passed, **including +B6 registry isolation** — so this is NOT a B6/ORCH-048 false-FAIL. + +> ⚠️ **CORRECTED ROOT CAUSE — read before acting on this rollback.** The previous revision of +> this log blamed `handle_status_start` / a regression in the validated artifact. **That was +> wrong**, which is why the dev↔staging cycle kept repeating. Direct inspection inside the +> running staging instance proves the production code is **correct** and the failure is a bug in +> the **test harness `scripts/staging_check.py`**. Do NOT touch `src/webhooks/plane.py` / +> `handle_status_start` / any ORCH-058 image-freshness code. **Fix `scripts/staging_check.py`.** ## Execution - Canonical `docker exec` into `orchestrator-staging` (ORCH-048, ADR-001), invoked via the @@ -21,38 +26,62 @@ same `C9a`/`C9b` failure, same `no pipeline action` classification). agent runtime image; the Engine-API exec is the exact equivalent of `docker exec orchestrator-staging python3 /repos/orchestrator/scripts/staging_check.py --base-url http://localhost:8501 --mode stub`). -- Script: `/repos/orchestrator/scripts/staging_check.py` (bind-mount, `main`). +- Script: `/repos/orchestrator/scripts/staging_check.py` (bind-mount, served from the host repo, + NOT baked into the image — so a harness fix takes effect on the next run without a rebuild). - Mode: `stub` - Exit code: `1` - Result: **8/10 checks PASS** (FAIL: C9a, C9b) +- Staging image under test: `orchestrator-orchestrator-staging`, OCI label + `org.opencontainers.image.revision=094b5e2f960f696216f8661ff9c27b0d4706f219` (= the **merge + commit of ORCH-058 into `main`**, PR #57; ancestor of branch HEAD `60e5596e`). Container + recreated 2026-06-07T10:13:36Z. So the artifact under test genuinely contains the validated + ORCH-058 code. -## Root cause (actionable for development rollback) -The E2E flow (`staging_check.py` Block C) creates a SANDBOX Plane issue (C7 ✓), then POSTs a -signed `/webhook/plane` payload to start the pipeline (C8 ✓ — HTTP 200 `{"status":"accepted"}`). -However the staging instance logged: +## Decisive root cause (proven, actionable) +Block C creates a SANDBOX Plane issue (C7 ✓), then POSTs a signed `/webhook/plane` payload to +start the pipeline (C8 ✓ — HTTP 200 `{"status":"accepted"}`). The staging instance logged for +the test issue `427cb94e-…`: ``` -2026-06-07 10:39:17,333 [INFO] orchestrator.webhooks.plane: issue 990c99b5-6a1d-4e63-a59a-9a11716e07b9 +2026-06-07 10:59:04 [INFO] orchestrator.webhooks.plane: issue 427cb94e-cedd-4def-ba5d-21c555a82477 updated to state b873d9eb..., no pipeline action ``` -→ **"no pipeline action"**: the webhook transition did NOT start the pipeline, so no `tasks` -row, no Gitea branch (C9a FAIL — branch never appeared after 60s), and no analyst job enqueued -(C9b FAIL — queue had no new job after 30s). Cleanup confirmed `no task row found for -plane_id=990c99b5...` and `no branch to delete`. +`handle_issue_updated` (src/webhooks/plane.py) starts the pipeline **only** when the webhook's +new state equals the **incoming project's** `in_progress` state, resolved per-project from the +Plane API by `get_project_states(project_id)` (ORCH-10). The webhook the harness sends carries +state `b873d9eb-993c-48cd-97ac-99a9b1623967`. -This is a **deterministic regression in the validated artifact**, not a timing flake (the -webhook was explicitly classified as a no-op, not a poll timeout): -- The **same** `staging_check.py` against the **same** SANDBOX config passed **10/10** on an - earlier pre-rebuild image (see git history of this file). -- The state id `b873d9eb...` from the webhook payload is not matched as a pipeline-start - (`group="started"`) transition by the staging instance. **Investigate `handle_status_start` - / webhook start-state matching in `src/webhooks/plane.py`** against the validated commit, and - confirm the staging start-state id wiring used by `staging_check.py`. +**The mismatch (queried live inside the staging container):** -Smoke (A1–A3) and access (B4–B6) all passed, including B6 registry isolation -(sandbox present; prod ET/ORCH absent) — confirming the check ran inside the staging -instance's own process-env, so there is no false-FAIL / spurious-rollback risk from B6. +| | UUID | +|---|---| +| `staging_check.py` `IN_PROGRESS_STATE_ID` (hardcoded) | `b873d9eb-993c-48cd-97ac-99a9b1623967` | +| `get_project_states(SANDBOX)["in_progress"]` (real) | `84a76f65-75f8-4022-9554-379dad38523c` | +| `_DEFAULT_STATES["in_progress"]` (enduro-trails fallback) | `b873d9eb-993c-48cd-97ac-99a9b1623967` | + +The hardcoded `b873d9eb…` is the **enduro-trails** In Progress UUID (the `_DEFAULT_STATES` +fallback), **not** SANDBOX's. SANDBOX's actual In Progress is `84a76f65…`. So the handler +**correctly** classifies the enduro-state webhook as `no pipeline action` for a SANDBOX issue → +no `tasks` row, no Gitea branch (C9a FAIL after 60s), no analyst job enqueued (C9b FAIL). +Cleanup confirmed `no task row found` and `no branch to delete`. + +**Why it intermittently "passed 10/10" before (09:31):** `get_project_states` falls back to +`_DEFAULT_STATES` (= `b873d9eb…`) whenever the Plane states API call fails / returns no +recognisable states. On runs where that fallback fired, the hardcoded harness state accidentally +matched and the pipeline started. On this run the SANDBOX states API call succeeded at startup +(`GET …/projects/8c5a3025-…/states/ → 200 OK`), so SANDBOX resolved to its real `84a76f65…` and +the accidental match disappeared. The green runs were the bug; the red runs are correct handler +behaviour exposing a harness that hardcodes the wrong project's state. + +## Required fix (for the development rollback) — in `scripts/staging_check.py` ONLY +Make the E2E harness send SANDBOX's **actual** `in_progress` state instead of a hardcoded enduro +UUID. Resolve it dynamically the same way the app does — e.g. `GET +/workspaces//projects//states/`, pick the state whose `name` is +`"In Progress"` (group `"started"`), and use its `id` in `_make_webhook_payload`. (The harness +already calls the Plane API for B4/B6, so credentials/URL are available.) Do **not** rely on the +`_DEFAULT_STATES` fallback coincidence. No production-code change is warranted; ORCH-058's +image-provenance feature is unaffected by this and is functioning. ## Test output @@ -61,7 +90,7 @@ instance's own process-env, so there is no false-FAIL / spurious-rollback risk f ORCH-33 Staging Check Suite base_url : http://localhost:8501 mode : stub - utc_time : 2026-06-07T10:39:15.004026+00:00 + utc_time : 2026-06-07T10:59:02.392888+00:00 ============================================================ [Block A] SMOKE @@ -76,7 +105,7 @@ instance's own process-env, so there is no false-FAIL / spurious-rollback risk f [Block C] E2E (mode=stub) · C7: Creating issue in SANDBOX project... - ✓ PASS C7 Create issue in Plane SANDBOX [HTTP 201, issue_id=990c99b5-6a1d-4e63-a59a-9a11716e07b9] + ✓ PASS C7 Create issue in Plane SANDBOX [HTTP 201, issue_id=427cb94e-cedd-4def-ba5d-21c555a82477] · C8: Triggering pipeline via POST /webhook/plane ... · Using HMAC signature (secret len=40) ✓ PASS C8 Trigger pipeline via /webhook/plane [HTTP 200, resp={'status': 'accepted'}] @@ -90,8 +119,8 @@ instance's own process-env, so there is no false-FAIL / spurious-rollback risk f [CLEANUP] · CLEANUP: no branch to delete - ✓ PASS CLEANUP: deleted Plane issue 990c99b5-6a1d-4e63-a59a-9a11716e07b9 (HTTP 204) - · CLEANUP DB: no task row found for plane_id=990c99b5-6a1d-4e63-a59a-9a11716e07b9 + ✓ PASS CLEANUP: deleted Plane issue 427cb94e-cedd-4def-ba5d-21c555a82477 (HTTP 204) + · CLEANUP DB: no task row found for plane_id=427cb94e-cedd-4def-ba5d-21c555a82477 · CLEANUP DB dedup: no such table: events_dedup ============================================================ -- 2.49.1