docs(ORCH-058): staging gate FAILED (8/10) — CORRECTED root cause (harness bug, not handler)
All checks were successful
CI / test (pull_request) Successful in 16s

Staging check exit code 1 (C9a/C9b). Live inspection inside orchestrator-staging
proves the production webhook handler is correct: get_project_states(SANDBOX).in_progress
= 84a76f65..., but scripts/staging_check.py hardcodes the enduro fallback b873d9eb...
=> handler correctly classifies the webhook as "no pipeline action". Fix belongs in
scripts/staging_check.py (resolve SANDBOX in_progress dynamically), NOT in handle_status_start
or any ORCH-058 image-freshness code. Image under test = ORCH-058 merge commit 094b5e2f.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-06-07 11:05:37 +00:00
parent 60e5596e94
commit 9e810c89f0

View File

@@ -1,19 +1,24 @@
---
staging_status: FAILED
timestamp: 2026-06-07T10:39:15Z
timestamp: 2026-06-07T11:01:00Z
base_url: http://localhost:8501
---
# Staging Gate Log — ORCH-058
Staging test suite ran against the live staging environment and **FAILED** (exit code `1`,
**8/10 checks PASS**). The two end-to-end (Block C) checks failed: the pipeline was **not
triggered** on the staging image, so no task / branch / analyst job was created.
**8/10 checks PASS**). Block C (E2E) checks C9a and C9b failed.
Per the staging-gate contract this is a machine verdict `FAILED` → the task rolls back to
`development`. The verdict reflects the real suite exit code, not an LLM declaration. This run
**reproduces** the regression already recorded in the previous re-run of this file (same 8/10,
same `C9a`/`C9b` failure, same `no pipeline action` classification).
Per the staging-gate contract this is the machine verdict `FAILED` (it reflects the real suite
exit code, never an LLM declaration). Smoke (A1A3) and access (B4B6) all passed, **including
B6 registry isolation** — so this is NOT a B6/ORCH-048 false-FAIL.
> ⚠️ **CORRECTED ROOT CAUSE — read before acting on this rollback.** The previous revision of
> this log blamed `handle_status_start` / a regression in the validated artifact. **That was
> wrong**, which is why the dev↔staging cycle kept repeating. Direct inspection inside the
> running staging instance proves the production code is **correct** and the failure is a bug in
> the **test harness `scripts/staging_check.py`**. Do NOT touch `src/webhooks/plane.py` /
> `handle_status_start` / any ORCH-058 image-freshness code. **Fix `scripts/staging_check.py`.**
## Execution
- Canonical `docker exec` into `orchestrator-staging` (ORCH-048, ADR-001), invoked via the
@@ -21,38 +26,62 @@ same `C9a`/`C9b` failure, same `no pipeline action` classification).
agent runtime image; the Engine-API exec is the exact equivalent of
`docker exec orchestrator-staging python3 /repos/orchestrator/scripts/staging_check.py
--base-url http://localhost:8501 --mode stub`).
- Script: `/repos/orchestrator/scripts/staging_check.py` (bind-mount, `main`).
- Script: `/repos/orchestrator/scripts/staging_check.py` (bind-mount, served from the host repo,
NOT baked into the image — so a harness fix takes effect on the next run without a rebuild).
- Mode: `stub`
- Exit code: `1`
- Result: **8/10 checks PASS** (FAIL: C9a, C9b)
- Staging image under test: `orchestrator-orchestrator-staging`, OCI label
`org.opencontainers.image.revision=094b5e2f960f696216f8661ff9c27b0d4706f219` (= the **merge
commit of ORCH-058 into `main`**, PR #57; ancestor of branch HEAD `60e5596e`). Container
recreated 2026-06-07T10:13:36Z. So the artifact under test genuinely contains the validated
ORCH-058 code.
## Root cause (actionable for development rollback)
The E2E flow (`staging_check.py` Block C) creates a SANDBOX Plane issue (C7 ✓), then POSTs a
signed `/webhook/plane` payload to start the pipeline (C8 ✓ — HTTP 200 `{"status":"accepted"}`).
However the staging instance logged:
## Decisive root cause (proven, actionable)
Block C creates a SANDBOX Plane issue (C7 ✓), then POSTs a signed `/webhook/plane` payload to
start the pipeline (C8 ✓ — HTTP 200 `{"status":"accepted"}`). The staging instance logged for
the test issue `427cb94e-…`:
```
2026-06-07 10:39:17,333 [INFO] orchestrator.webhooks.plane: issue 990c99b5-6a1d-4e63-a59a-9a11716e07b9
2026-06-07 10:59:04 [INFO] orchestrator.webhooks.plane: issue 427cb94e-cedd-4def-ba5d-21c555a82477
updated to state b873d9eb..., no pipeline action
```
**"no pipeline action"**: the webhook transition did NOT start the pipeline, so no `tasks`
row, no Gitea branch (C9a FAIL — branch never appeared after 60s), and no analyst job enqueued
(C9b FAIL — queue had no new job after 30s). Cleanup confirmed `no task row found for
plane_id=990c99b5...` and `no branch to delete`.
`handle_issue_updated` (src/webhooks/plane.py) starts the pipeline **only** when the webhook's
new state equals the **incoming project's** `in_progress` state, resolved per-project from the
Plane API by `get_project_states(project_id)` (ORCH-10). The webhook the harness sends carries
state `b873d9eb-993c-48cd-97ac-99a9b1623967`.
This is a **deterministic regression in the validated artifact**, not a timing flake (the
webhook was explicitly classified as a no-op, not a poll timeout):
- The **same** `staging_check.py` against the **same** SANDBOX config passed **10/10** on an
earlier pre-rebuild image (see git history of this file).
- The state id `b873d9eb...` from the webhook payload is not matched as a pipeline-start
(`group="started"`) transition by the staging instance. **Investigate `handle_status_start`
/ webhook start-state matching in `src/webhooks/plane.py`** against the validated commit, and
confirm the staging start-state id wiring used by `staging_check.py`.
**The mismatch (queried live inside the staging container):**
Smoke (A1A3) and access (B4B6) all passed, including B6 registry isolation
(sandbox present; prod ET/ORCH absent) — confirming the check ran inside the staging
instance's own process-env, so there is no false-FAIL / spurious-rollback risk from B6.
| | UUID |
|---|---|
| `staging_check.py` `IN_PROGRESS_STATE_ID` (hardcoded) | `b873d9eb-993c-48cd-97ac-99a9b1623967` |
| `get_project_states(SANDBOX)["in_progress"]` (real) | `84a76f65-75f8-4022-9554-379dad38523c` |
| `_DEFAULT_STATES["in_progress"]` (enduro-trails fallback) | `b873d9eb-993c-48cd-97ac-99a9b1623967` |
The hardcoded `b873d9eb…` is the **enduro-trails** In Progress UUID (the `_DEFAULT_STATES`
fallback), **not** SANDBOX's. SANDBOX's actual In Progress is `84a76f65…`. So the handler
**correctly** classifies the enduro-state webhook as `no pipeline action` for a SANDBOX issue →
no `tasks` row, no Gitea branch (C9a FAIL after 60s), no analyst job enqueued (C9b FAIL).
Cleanup confirmed `no task row found` and `no branch to delete`.
**Why it intermittently "passed 10/10" before (09:31):** `get_project_states` falls back to
`_DEFAULT_STATES` (= `b873d9eb…`) whenever the Plane states API call fails / returns no
recognisable states. On runs where that fallback fired, the hardcoded harness state accidentally
matched and the pipeline started. On this run the SANDBOX states API call succeeded at startup
(`GET …/projects/8c5a3025-…/states/ → 200 OK`), so SANDBOX resolved to its real `84a76f65…` and
the accidental match disappeared. The green runs were the bug; the red runs are correct handler
behaviour exposing a harness that hardcodes the wrong project's state.
## Required fix (for the development rollback) — in `scripts/staging_check.py` ONLY
Make the E2E harness send SANDBOX's **actual** `in_progress` state instead of a hardcoded enduro
UUID. Resolve it dynamically the same way the app does — e.g. `GET
/workspaces/<slug>/projects/<SANDBOX_PROJECT_ID>/states/`, pick the state whose `name` is
`"In Progress"` (group `"started"`), and use its `id` in `_make_webhook_payload`. (The harness
already calls the Plane API for B4/B6, so credentials/URL are available.) Do **not** rely on the
`_DEFAULT_STATES` fallback coincidence. No production-code change is warranted; ORCH-058's
image-provenance feature is unaffected by this and is functioning.
## Test output
@@ -61,7 +90,7 @@ instance's own process-env, so there is no false-FAIL / spurious-rollback risk f
ORCH-33 Staging Check Suite
base_url : http://localhost:8501
mode : stub
utc_time : 2026-06-07T10:39:15.004026+00:00
utc_time : 2026-06-07T10:59:02.392888+00:00
============================================================
[Block A] SMOKE
@@ -76,7 +105,7 @@ instance's own process-env, so there is no false-FAIL / spurious-rollback risk f
[Block C] E2E (mode=stub)
· C7: Creating issue in SANDBOX project...
✓ PASS C7 Create issue in Plane SANDBOX [HTTP 201, issue_id=990c99b5-6a1d-4e63-a59a-9a11716e07b9]
✓ PASS C7 Create issue in Plane SANDBOX [HTTP 201, issue_id=427cb94e-cedd-4def-ba5d-21c555a82477]
· C8: Triggering pipeline via POST /webhook/plane ...
· Using HMAC signature (secret len=40)
✓ PASS C8 Trigger pipeline via /webhook/plane [HTTP 200, resp={'status': 'accepted'}]
@@ -90,8 +119,8 @@ instance's own process-env, so there is no false-FAIL / spurious-rollback risk f
[CLEANUP]
· CLEANUP: no branch to delete
✓ PASS CLEANUP: deleted Plane issue 990c99b5-6a1d-4e63-a59a-9a11716e07b9 (HTTP 204)
· CLEANUP DB: no task row found for plane_id=990c99b5-6a1d-4e63-a59a-9a11716e07b9
✓ PASS CLEANUP: deleted Plane issue 427cb94e-cedd-4def-ba5d-21c555a82477 (HTTP 204)
· CLEANUP DB: no task row found for plane_id=427cb94e-cedd-4def-ba5d-21c555a82477
· CLEANUP DB dedup: no such table: events_dedup
============================================================