docs(ORCH-058): staging gate FAILED (8/10) — CORRECTED root cause (harness bug, not handler)

Staging check exit code 1 (C9a/C9b). Live inspection inside orchestrator-staging proves the production webhook handler is correct: get_project_states(SANDBOX).in_progress = 84a76f65..., but scripts/staging_check.py hardcodes the enduro fallback b873d9eb... => handler correctly classifies the webhook as "no pipeline action". Fix belongs in scripts/staging_check.py (resolve SANDBOX in_progress dynamically), NOT in handle_status_start or any ORCH-058 image-freshness code. Image under test = ORCH-058 merge commit 094b5e2f. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-07 11:05:37 +00:00
parent 60e5596e94
commit 9e810c89f0
1 changed files with 61 additions and 32 deletions
--- a/docs/work-items/ORCH-058/15-staging-log.md
+++ b/docs/work-items/ORCH-058/15-staging-log.md
@@ -1,19 +1,24 @@
 ---
 staging_status: FAILED
-timestamp: 2026-06-07T10:39:15Z
+timestamp: 2026-06-07T11:01:00Z
 base_url: http://localhost:8501
 ---

 # Staging Gate Log — ORCH-058

 Staging test suite ran against the live staging environment and **FAILED** (exit code `1`,
-**8/10 checks PASS**). The two end-to-end (Block C) checks failed: the pipeline was **not
-triggered** on the staging image, so no task / branch / analyst job was created.
+**8/10 checks PASS**). Block C (E2E) checks C9a and C9b failed.

-Per the staging-gate contract this is a machine verdict `FAILED` → the task rolls back to
-`development`. The verdict reflects the real suite exit code, not an LLM declaration. This run
-**reproduces** the regression already recorded in the previous re-run of this file (same 8/10,
-same `C9a`/`C9b` failure, same `no pipeline action` classification).
+Per the staging-gate contract this is the machine verdict `FAILED` (it reflects the real suite
+exit code, never an LLM declaration). Smoke (A1–A3) and access (B4–B6) all passed, **including
+B6 registry isolation** — so this is NOT a B6/ORCH-048 false-FAIL.
+
+> ⚠️ **CORRECTED ROOT CAUSE — read before acting on this rollback.** The previous revision of
+> this log blamed `handle_status_start` / a regression in the validated artifact. **That was
+> wrong**, which is why the dev↔staging cycle kept repeating. Direct inspection inside the
+> running staging instance proves the production code is **correct** and the failure is a bug in
+> the **test harness `scripts/staging_check.py`**. Do NOT touch `src/webhooks/plane.py` /
+> `handle_status_start` / any ORCH-058 image-freshness code. **Fix `scripts/staging_check.py`.**

 ## Execution
 - Canonical `docker exec` into `orchestrator-staging` (ORCH-048, ADR-001), invoked via the
@@ -21,38 +26,62 @@ same `C9a`/`C9b` failure, same `no pipeline action` classification).
  agent runtime image; the Engine-API exec is the exact equivalent of
  `docker exec orchestrator-staging python3 /repos/orchestrator/scripts/staging_check.py
  --base-url http://localhost:8501 --mode stub`).
- Script: `/repos/orchestrator/scripts/staging_check.py` (bind-mount, `main`).
+- Script: `/repos/orchestrator/scripts/staging_check.py` (bind-mount, served from the host repo,
+  NOT baked into the image — so a harness fix takes effect on the next run without a rebuild).
 - Mode: `stub`
 - Exit code: `1`
 - Result: **8/10 checks PASS** (FAIL: C9a, C9b)
+- Staging image under test: `orchestrator-orchestrator-staging`, OCI label
+  `org.opencontainers.image.revision=094b5e2f960f696216f8661ff9c27b0d4706f219` (= the **merge
+  commit of ORCH-058 into `main`**, PR #57; ancestor of branch HEAD `60e5596e`). Container
+  recreated 2026-06-07T10:13:36Z. So the artifact under test genuinely contains the validated
+  ORCH-058 code.

-## Root cause (actionable for development rollback)
-The E2E flow (`staging_check.py` Block C) creates a SANDBOX Plane issue (C7 ✓), then POSTs a
-signed `/webhook/plane` payload to start the pipeline (C8 ✓ — HTTP 200 `{"status":"accepted"}`).
-However the staging instance logged:
+## Decisive root cause (proven, actionable)
+Block C creates a SANDBOX Plane issue (C7 ✓), then POSTs a signed `/webhook/plane` payload to
+start the pipeline (C8 ✓ — HTTP 200 `{"status":"accepted"}`). The staging instance logged for
+the test issue `427cb94e-…`:

 ```
-2026-06-07 10:39:17,333 [INFO] orchestrator.webhooks.plane: issue 990c99b5-6a1d-4e63-a59a-9a11716e07b9
+2026-06-07 10:59:04 [INFO] orchestrator.webhooks.plane: issue 427cb94e-cedd-4def-ba5d-21c555a82477
                            updated to state b873d9eb..., no pipeline action
 ```

-→ **"no pipeline action"**: the webhook transition did NOT start the pipeline, so no `tasks`
-row, no Gitea branch (C9a FAIL — branch never appeared after 60s), and no analyst job enqueued
-(C9b FAIL — queue had no new job after 30s). Cleanup confirmed `no task row found for
-plane_id=990c99b5...` and `no branch to delete`.
+`handle_issue_updated` (src/webhooks/plane.py) starts the pipeline **only** when the webhook's
+new state equals the **incoming project's** `in_progress` state, resolved per-project from the
+Plane API by `get_project_states(project_id)` (ORCH-10). The webhook the harness sends carries
+state `b873d9eb-993c-48cd-97ac-99a9b1623967`.

-This is a **deterministic regression in the validated artifact**, not a timing flake (the
-webhook was explicitly classified as a no-op, not a poll timeout):
- The **same** `staging_check.py` against the **same** SANDBOX config passed **10/10** on an
-  earlier pre-rebuild image (see git history of this file).
- The state id `b873d9eb...` from the webhook payload is not matched as a pipeline-start
-  (`group="started"`) transition by the staging instance. **Investigate `handle_status_start`
-  / webhook start-state matching in `src/webhooks/plane.py`** against the validated commit, and
-  confirm the staging start-state id wiring used by `staging_check.py`.
+**The mismatch (queried live inside the staging container):**

-Smoke (A1–A3) and access (B4–B6) all passed, including B6 registry isolation
-(sandbox present; prod ET/ORCH absent) — confirming the check ran inside the staging
-instance's own process-env, so there is no false-FAIL / spurious-rollback risk from B6.
+| | UUID |
+|---|---|
+| `staging_check.py` `IN_PROGRESS_STATE_ID` (hardcoded) | `b873d9eb-993c-48cd-97ac-99a9b1623967` |
+| `get_project_states(SANDBOX)["in_progress"]` (real) | `84a76f65-75f8-4022-9554-379dad38523c` |
+| `_DEFAULT_STATES["in_progress"]` (enduro-trails fallback) | `b873d9eb-993c-48cd-97ac-99a9b1623967` |
+
+The hardcoded `b873d9eb…` is the **enduro-trails** In Progress UUID (the `_DEFAULT_STATES`
+fallback), **not** SANDBOX's. SANDBOX's actual In Progress is `84a76f65…`. So the handler
+**correctly** classifies the enduro-state webhook as `no pipeline action` for a SANDBOX issue →
+no `tasks` row, no Gitea branch (C9a FAIL after 60s), no analyst job enqueued (C9b FAIL).
+Cleanup confirmed `no task row found` and `no branch to delete`.
+
+**Why it intermittently "passed 10/10" before (09:31):** `get_project_states` falls back to
+`_DEFAULT_STATES` (= `b873d9eb…`) whenever the Plane states API call fails / returns no
+recognisable states. On runs where that fallback fired, the hardcoded harness state accidentally
+matched and the pipeline started. On this run the SANDBOX states API call succeeded at startup
+(`GET …/projects/8c5a3025-…/states/ → 200 OK`), so SANDBOX resolved to its real `84a76f65…` and
+the accidental match disappeared. The green runs were the bug; the red runs are correct handler
+behaviour exposing a harness that hardcodes the wrong project's state.
+
+## Required fix (for the development rollback) — in `scripts/staging_check.py` ONLY
+Make the E2E harness send SANDBOX's **actual** `in_progress` state instead of a hardcoded enduro
+UUID. Resolve it dynamically the same way the app does — e.g. `GET
+/workspaces/<slug>/projects/<SANDBOX_PROJECT_ID>/states/`, pick the state whose `name` is
+`"In Progress"` (group `"started"`), and use its `id` in `_make_webhook_payload`. (The harness
+already calls the Plane API for B4/B6, so credentials/URL are available.) Do **not** rely on the
+`_DEFAULT_STATES` fallback coincidence. No production-code change is warranted; ORCH-058's
+image-provenance feature is unaffected by this and is functioning.

 ## Test output

@@ -61,7 +90,7 @@ instance's own process-env, so there is no false-FAIL / spurious-rollback risk f
  ORCH-33 Staging Check Suite
  base_url : http://localhost:8501
  mode     : stub
-  utc_time : 2026-06-07T10:39:15.004026+00:00
+  utc_time : 2026-06-07T10:59:02.392888+00:00
 ============================================================

 [Block A] SMOKE
@@ -76,7 +105,7 @@ instance's own process-env, so there is no false-FAIL / spurious-rollback risk f

 [Block C] E2E  (mode=stub)
  ·      C7: Creating issue in SANDBOX project...
-  ✓ PASS  C7 Create issue in Plane SANDBOX  [HTTP 201, issue_id=990c99b5-6a1d-4e63-a59a-9a11716e07b9]
+  ✓ PASS  C7 Create issue in Plane SANDBOX  [HTTP 201, issue_id=427cb94e-cedd-4def-ba5d-21c555a82477]
  ·      C8: Triggering pipeline via POST /webhook/plane ...
  ·        Using HMAC signature (secret len=40)
  ✓ PASS  C8 Trigger pipeline via /webhook/plane  [HTTP 200, resp={'status': 'accepted'}]
@@ -90,8 +119,8 @@ instance's own process-env, so there is no false-FAIL / spurious-rollback risk f

 [CLEANUP]
  ·      CLEANUP: no branch to delete
-  ✓ PASS  CLEANUP: deleted Plane issue 990c99b5-6a1d-4e63-a59a-9a11716e07b9 (HTTP 204)
-  ·      CLEANUP DB: no task row found for plane_id=990c99b5-6a1d-4e63-a59a-9a11716e07b9
+  ✓ PASS  CLEANUP: deleted Plane issue 427cb94e-cedd-4def-ba5d-21c555a82477 (HTTP 204)
+  ·      CLEANUP DB: no task row found for plane_id=427cb94e-cedd-4def-ba5d-21c555a82477
  ·      CLEANUP DB dedup: no such table: events_dedup

 ============================================================