feat(testing): deterministic test-runner replacing LLM tester on the testing stage (ORCH-116)

Second realised slice of the determinization-roadmap (ORCH-118 A5, needs-hybrid-fallback): on the `testing` stage for the self-hosting `orchestrator` repo the LLM `tester` agent is replaced by a deterministic test-runner (src/test_runner.py), intercepted in launch_job BEFORE _spawn (deploy-finalizer / post-deploy-monitor / staging-runner precedent). It runs the regression `python -m pytest <target>` in the task worktree via proc_group (tree-kill) + an optional read-only smoke (/health, /status, /queue + serial_gate), maps the exit-code -> result: PASS|FAIL via the existing self_deploy.map_exit_code_to_status contract, writes 13-test-report.md and initiates the EXISTING check_tests_passed gate exactly as a finished LLM-tester. Invariant (NFR-1): only the *producer* changes — the artifact contract (13-test-report.md / result:), the gate check_tests_passed / _parse_tests_verdict, STAGE_TRANSITIONS and the DB schema are byte-for-byte UNCHANGED. Additive, under a kill-switch (test_runner_enabled), never-raise, fail-closed, self-hosting scope, two-level outcome (tool-error DEFER, anti ORCH-110), hybrid (LLM strictly off-control-path). 52c-`status:` is aligned with the verdict (D6.1) so the three-field _parse_tests_verdict never false-negatives a PASS. Docs (ORCH-118 NFR-6, atomic with code): llm-call-sites.md (A5 implemented), llm-determinization-roadmap.md (rank 2 implemented), llm-usage-policy.md, README/internals/overview, tester.md, CLAUDE.md, CHANGELOG.md. Coverage: tests/test_orch116_test_runner.py (TC-01..TC-14); LLM anti-drift tests green. Full suite: 2137 passed. Refs: ORCH-116 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:59:51 +03:00
parent 04f9b9cbce
commit 80aa2409f7
17 changed files with 1506 additions and 8 deletions
--- a/docs/overview/tech-agents.md
+++ b/docs/overview/tech-agents.md
@@ -54,6 +54,14 @@ Machine-verdict ключи читаются гейтами **только из Y
 под выключенным флагом / для не-self репо и продолжает вести прод-стадию `deploy`. Подробнее —
 [конвейер](tech-pipeline.md) и [карта LLM-консультаций](../architecture/llm-call-sites.md).

+Особенность (ORCH-116): на стадии `testing` для self-hosting `orchestrator` LLM-`tester` заменён
+**детерминированным test-раннером** (`src/test_runner.py`) — его PASS/FAIL-ядро деривируемо (exit-код
+`pytest` в worktree + read-only smoke), вердикт `result:` производит детерминированный код. Это
+гибрид (`needs-hybrid-fallback`): LLM-промпт `tester` остаётся fallback'ом под выключенным флагом / для
+репо без тест-контракта, а будущий off-control-path триаж падений не выносит и не переопределяет
+`result:`. Подробнее — [конвейер](tech-pipeline.md) и
+[карта LLM-консультаций](../architecture/llm-call-sites.md).
+
 ## Человек как седьмая роль

 Человек не пишет артефакты конвейера, но принимает два решения, которые не делегированы
--- a/docs/overview/tech-pipeline.md
+++ b/docs/overview/tech-pipeline.md
@@ -44,6 +44,17 @@ created → analysis → architecture → development → review → testing →
 > на стадии снова работает LLM-`deployer` байт-в-байт. Это первый реализованный срез
 > determinization-roadmap (см. `docs/architecture/llm-determinization-roadmap.md`).

+> **Детерминированный test-раннер (ORCH-116).** На стадии `testing` для self-hosting `orchestrator`
+> работу ведёт **детерминированный код** (`src/test_runner.py`), а не LLM-агент `tester`: он
+> перехватывается в `launch_job` до запуска агента (тем же паттерном, что staging-раннер), исполняет
+> регресс `pytest` в worktree ветки + read-only smoke, маппит exit-код в `result:` и инициирует **тот
+> же** гейт `check_tests_passed`. Это замена *продюсера* артефакта, а не гейта: контракт
+> `13-test-report.md`, имя/семантика `check_tests_passed`/`_parse_tests_verdict`, `STAGE_TRANSITIONS`
+> — не изменились. Под kill-switch `test_runner_enabled` (скоуп `test_runner_repos`, пусто →
+> self-hosting only; репо без тест-контракта → LLM-tester); при выключении снова работает LLM-`tester`
+> байт-в-байт. Это второй реализованный срез determinization-roadmap (гибрид: LLM-фолбэк остаётся на
+> off-control-path триаж, не на вынесение `result:`).
+
 ## Под-гейты деплойного ребра — врезки, не стадии

 На переходе `deploy-staging → deploy` исполняются четыре под-гейта в нормативном порядке
--- a/docs/overview/tech-quality-security.md
+++ b/docs/overview/tech-quality-security.md
@@ -53,7 +53,10 @@ control-path и его вердикт деривируем из exit-кодов
 анти-дрейф тестами. **Первый срез реализован (ORCH-115):** на `deploy-staging` для self-hosting
 `orchestrator` LLM-`deployer` заменён детерминированным `src/staging_runner.py` (вердикт
 `staging_status:` = маппинг exit-кода staging-сюиты); LLM-ветвь остаётся fallback'ом, гейт
-`check_staging_status` не тронут. Замена второго кандидата (`tester`) — follow-up по роли.
+`check_staging_status` не тронут. **Второй срез реализован (ORCH-116):** на `testing` для self-hosting
+`orchestrator` LLM-`tester` заменён детерминированным `src/test_runner.py` (вердикт `result:` = exit-код
+`pytest` + read-only smoke); это гибрид (`needs-hybrid-fallback`) — LLM-ветвь остаётся fallback'ом /
+будущим off-control-path триажем, гейт `check_tests_passed`/`_parse_tests_verdict` не тронут.

 - Карта вызовов LLM: [`../architecture/llm-call-sites.md`](../architecture/llm-call-sites.md)
 - Нормативная политика: [`../architecture/llm-usage-policy.md`](../architecture/llm-usage-policy.md)