chore(ORCH-111): retrigger merge-gate re-test (2nd host CPU-starvation flake)
The deploy-edge merge-gate re-test bounced ORCH-111 back to development again
with `3 failed, 1916 passed, 14 errors in 444.79s` — a resource-exhaustion
signature, NOT a code defect. This is the SECOND occurrence of the identical
flake on this branch (cf. 4311720).
Evidence the branch is sound:
- Watchdog-only change (watchdog/** + docker-compose.yml + docs). It touches no
src/, no STAGE_TRANSITIONS/QG_CHECKS/check_*, and none of the failing test
files (tests/test_stage_engine.py, tests/test_orch109_timeout_model.py).
- The failures/errors are OUTSIDE this branch's scope:
test_stage_engine.py::TestStagingInfraTolerance tc02/tc13/tc14 and
test_orch109_timeout_model.py::TestContractsUnchanged::test_tc12. They pass in
isolation (4 passed/5.9s) and were ERRORS (subprocess timeouts), not assertion
failures — a systemic host failure, not logic.
- No pytest-randomly/-xdist installed -> deterministic order; merge-gate re-test
and a local run execute the same order on the same code.
- The failed run took 444.79s vs a clean local full run of 204.72s (2x slower):
the orphaned-pytest CPU-starvation incident ORCH-111 itself alerts on. By
design ORCH-111 only observes; it does not reap (ADR BR-3).
Full `pytest tests/` is green locally: 1933 passed, 0 failed, 0 errors in
204.72s (well under the 600s merge_retest budget), and the local run was FASTER
than the prior retrigger's (267s) -> host load is currently low. Empty commit to
re-run CI + the pipeline now.
NOTE (operator): until the orphaned host pytest processes are cleaned up, the
merge-gate re-test can keep flaking. ORCH-111 detects them (proc_blocking,
default-off) but does not reap them (BR-3) -> manual host cleanup is the durable
fix; a follow-up work item for reap/remediation is recommended.
Refs: ORCH-111
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in: