feat(testing): deterministic test-runner replacing LLM tester on the testing stage (ORCH-116) #142

Merged
admin merged 16 commits from feature/ORCH-116-orch-replace-llm-tester-with-d into main 2026-06-16 10:27:25 +03:00
Owner

ORCH-116 — детерминированный test-раннер вместо LLM-тестера на testing

Второй реализованный срез determinization-roadmap (ORCH-118 A5, needs-hybrid-fallback):
на стадии testing для self-hosting orchestrator LLM-агент tester заменён
детерминированным кодом
(src/test_runner.py), перехватываемым в launch_job до _spawn
(прецедент deploy-finalizer/post-deploy-monitor/staging-runner).

Что делает раннер

  • Регресс python -m pytest <test_runner_target> в worktree ветки через proc_group
    (tree-kill, таймаут test_runner_timeout_s=900) + опц. read-only smoke (/health//status/
    /queue + блок serial_gate; транзиент → ограниченный ретрай, не-200/нет блока → немедленный FAIL).
  • Маппинг exit-кода единым контрактом self_deploy.map_exit_code_to_status в токенах result:
    (0→PASS/иначе/None→FAIL; smoke-провал AND-ится в FAIL).
  • Пишет 13-test-report.md (author_agent: test-runner/model_used: n/a, 52c-схема; 52c-status:
    выровнен по вердикту — D6.1
    ) + best-effort push в фичеветку; вызывает существующий
    advance_stage(current_stage="testing", finished_agent="tester") — без новых рёбер/исходов.

Инвариант (NFR-1)

Замена продюсера артефакта, не гейта: контракт 13-test-report.md, гейт
check_tests_passed/_parse_tests_verdict, STAGE_TRANSITIONS, machine-verdict result:,
схема БД — байт-в-байт не тронуты. Аддитивно, под kill-switch, never-raise, fail-closed,
скоуп self-hosting.

Двухуровневый исход (анти-ORCH-110)

Сюита исполнилась → verdict→advance (FAIL → тот же откат testing → development + developer-retry).
Сюита НЕ исполнилась (tool-error) → bounded DEFER (re-queue tester-джоба + restart-safe маркер),
на исчерпании test_runner_infra_max_retries → fail-closed FAIL + alert. Никогда тихий advance/
ложный green; не жжёт developer-retry на транзиентной инфре.

Гибрид (BR-8/NFR-7)

LLM строго off-control-path — раннер единственный продюсер result:; будущий триаж падений не
выносит/не переопределяет вердикт и не добавляет ребро в STAGE_TRANSITIONS (Phase 1 не реализован).

Откат

ORCH_TEST_RUNNER_ENABLED=false → на testing снова LLM-tester через _spawn байт-в-байт.

Документация (ORCH-118 NFR-6, атомарно с кодом)

llm-call-sites.md (A5 реализован), llm-determinization-roadmap.md (rank 2 реализован, инвариант
«ровно один first_slice» цел), llm-usage-policy.md, README/internals/overview, tester.md,
CLAUDE.md, CHANGELOG.md.

Тесты

tests/test_orch116_test_runner.py (TC-01…TC-14) + зелёные LLM-анти-дрейф тесты (TC-15).
Полный регресс: 2137 passed.

ADR: docs/work-items/ORCH-116/06-adr/ADR-001-deterministic-test-runner.md,
сквозной docs/architecture/adr/adr-0049-deterministic-test-runner.md.

Refs: ORCH-116

🤖 Generated with Claude Code

## ORCH-116 — детерминированный test-раннер вместо LLM-тестера на `testing` Второй реализованный срез determinization-roadmap (ORCH-118 A5, `needs-hybrid-fallback`): на стадии `testing` для self-hosting `orchestrator` **LLM-агент `tester` заменён детерминированным кодом** (`src/test_runner.py`), перехватываемым в `launch_job` **до `_spawn`** (прецедент `deploy-finalizer`/`post-deploy-monitor`/`staging-runner`). ### Что делает раннер - Регресс `python -m pytest <test_runner_target>` **в worktree ветки** через `proc_group` (tree-kill, таймаут `test_runner_timeout_s=900`) + опц. **read-only smoke** (`/health`/`/status`/ `/queue` + блок `serial_gate`; транзиент → ограниченный ретрай, не-200/нет блока → немедленный FAIL). - Маппинг exit-кода **единым** контрактом `self_deploy.map_exit_code_to_status` в токенах `result:` (`0→PASS`/иначе/None→`FAIL`; smoke-провал AND-ится в `FAIL`). - Пишет `13-test-report.md` (`author_agent: test-runner`/`model_used: n/a`, 52c-схема; **52c-`status:` выровнен по вердикту — D6.1**) + best-effort push в фичеветку; вызывает **существующий** `advance_stage(current_stage="testing", finished_agent="tester")` — без новых рёбер/исходов. ### Инвариант (NFR-1) Замена *продюсера* артефакта, **не** гейта: контракт `13-test-report.md`, гейт `check_tests_passed`/`_parse_tests_verdict`, `STAGE_TRANSITIONS`, machine-verdict `result:`, схема БД — **байт-в-байт не тронуты**. Аддитивно, под kill-switch, never-raise, fail-closed, скоуп self-hosting. ### Двухуровневый исход (анти-ORCH-110) Сюита исполнилась → verdict→advance (FAIL → тот же откат `testing → development` + developer-retry). Сюита НЕ исполнилась (tool-error) → bounded DEFER (re-queue `tester`-джоба + restart-safe маркер), на исчерпании `test_runner_infra_max_retries` → fail-closed FAIL + alert. Никогда тихий advance/ ложный green; не жжёт developer-retry на транзиентной инфре. ### Гибрид (BR-8/NFR-7) LLM строго off-control-path — раннер единственный продюсер `result:`; будущий триаж падений не выносит/не переопределяет вердикт и не добавляет ребро в `STAGE_TRANSITIONS` (Phase 1 не реализован). ### Откат `ORCH_TEST_RUNNER_ENABLED=false` → на `testing` снова LLM-`tester` через `_spawn` байт-в-байт. ### Документация (ORCH-118 NFR-6, атомарно с кодом) `llm-call-sites.md` (A5 реализован), `llm-determinization-roadmap.md` (rank 2 реализован, инвариант «ровно один `first_slice`» цел), `llm-usage-policy.md`, README/internals/overview, `tester.md`, CLAUDE.md, CHANGELOG.md. ### Тесты `tests/test_orch116_test_runner.py` (TC-01…TC-14) + зелёные LLM-анти-дрейф тесты (TC-15). **Полный регресс: 2137 passed.** ADR: `docs/work-items/ORCH-116/06-adr/ADR-001-deterministic-test-runner.md`, сквозной `docs/architecture/adr/adr-0049-deterministic-test-runner.md`. Refs: ORCH-116 🤖 Generated with [Claude Code](https://claude.com/claude-code)
admin added 11 commits 2026-06-16 09:57:06 +03:00
Second realised slice of the determinization-roadmap (ORCH-118 A5,
needs-hybrid-fallback): on the `testing` stage for the self-hosting
`orchestrator` repo the LLM `tester` agent is replaced by a deterministic
test-runner (src/test_runner.py), intercepted in launch_job BEFORE _spawn
(deploy-finalizer / post-deploy-monitor / staging-runner precedent).

It runs the regression `python -m pytest <target>` in the task worktree via
proc_group (tree-kill) + an optional read-only smoke (/health, /status, /queue
+ serial_gate), maps the exit-code -> result: PASS|FAIL via the existing
self_deploy.map_exit_code_to_status contract, writes 13-test-report.md and
initiates the EXISTING check_tests_passed gate exactly as a finished LLM-tester.

Invariant (NFR-1): only the *producer* changes — the artifact contract
(13-test-report.md / result:), the gate check_tests_passed / _parse_tests_verdict,
STAGE_TRANSITIONS and the DB schema are byte-for-byte UNCHANGED. Additive, under
a kill-switch (test_runner_enabled), never-raise, fail-closed, self-hosting scope,
two-level outcome (tool-error DEFER, anti ORCH-110), hybrid (LLM strictly
off-control-path). 52c-`status:` is aligned with the verdict (D6.1) so the
three-field _parse_tests_verdict never false-negatives a PASS.

Docs (ORCH-118 NFR-6, atomic with code): llm-call-sites.md (A5 implemented),
llm-determinization-roadmap.md (rank 2 implemented), llm-usage-policy.md,
README/internals/overview, tester.md, CLAUDE.md, CHANGELOG.md. Coverage:
tests/test_orch116_test_runner.py (TC-01..TC-14); LLM anti-drift tests green.
Full suite: 2137 passed.

Refs: ORCH-116
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
fix(testing): reconcile ORCH-116 with merged ORCH-123 (ADR renumber, CHANGELOG, env parity)
All checks were successful
CI / test (push) Successful in 1m12s
CI / test (pull_request) Successful in 1m12s
74fccf3a09
Recovery from the merge-gate rebase-conflict bounce. The feature branch was
rebased onto origin/main (which had merged ORCH-123). The single conflicting
hunk — docs/architecture/README.md — was resolved during the rebase: kept
ORCH-123's host-side staging-runner line AND the ORCH-116 test-runner bullet.

This follow-up commit reconciles the remainder:

- Renumber the global sweeping ADR adr-0049 -> adr-0050. ORCH-123 took adr-0049
  (adr-0049-host-side-docker-execution-boundary.md) on main while ORCH-116 was
  in flight, so ORCH-116 yields to the merged task and moves to the next free
  number. Mechanical cross-reference reconciliation only (git mv + title + every
  test-runner reference across README/internals/CLAUDE/CHANGELOG/config.py +
  06-adr/ADR-001 + 12-review). Main's adr-0049 host-side references are left
  byte-for-byte untouched. No design/verdict content was altered.
- Restore the ORCH-116 CHANGELOG entry that the CHANGELOG auto-merge silently
  dropped (both ORCH-123 and ORCH-116 inserted at the same [Unreleased] anchor;
  git kept only ORCH-123).
- Add the missing ORCH_TEST_RUNNER_* keys to .env.example (parity with the
  ORCH_STAGING_RUNNER_* block; ORCH-101 canon of start keys).

Refs: ORCH-116

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
admin force-pushed feature/ORCH-116-orch-replace-llm-tester-with-d from d1d0fd4418 to 74fccf3a09 2026-06-16 09:57:06 +03:00 Compare
admin added 1 commit 2026-06-16 09:59:31 +03:00
developer(ET): auto-commit from developer run_id=756
All checks were successful
CI / test (push) Successful in 1m14s
CI / test (pull_request) Successful in 1m11s
c470576202
admin added 1 commit 2026-06-16 10:11:35 +03:00
reviewer(ET): auto-commit from reviewer run_id=757
All checks were successful
CI / test (push) Successful in 1m16s
CI / test (pull_request) Successful in 1m23s
e12b03b235
admin added 1 commit 2026-06-16 10:20:01 +03:00
tester(ET): auto-commit from tester run_id=758
All checks were successful
CI / test (push) Successful in 1m16s
CI / test (pull_request) Successful in 1m16s
3270647d86
admin added 1 commit 2026-06-16 10:21:37 +03:00
staging(ORCH-115): staging gate SUCCESS for ORCH-116
All checks were successful
CI / test (push) Successful in 1m20s
CI / test (pull_request) Successful in 1m16s
b212afbbd0
admin added 1 commit 2026-06-16 10:27:24 +03:00
deploy(ORCH-036): finalize SUCCESS for ORCH-116
All checks were successful
CI / test (push) Successful in 1m15s
CI / test (pull_request) Successful in 1m13s
274fbd77fc
admin merged commit 39fe1a5081 into main 2026-06-16 10:27:25 +03:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: admin/orchestrator#142