fix(launcher): raise developer/reviewer timeout budgets + stamp model at launch

Two additive, isolated launch-subsystem fixes from incident ORCH-104, without touching STAGE_TRANSITIONS / QG_CHECKS / check_* / machine-verdict / DB schema. D1 — launch-time model stamp: write the resolved model into agent_runs.model in the SAME UPDATE as the effort stamp (ORCH-087), so the model is present from launch, survives a timeout-kill (exit_code=-9), and is visible in-flight in /metrics & /queue. record_usage stays an enrichment (model=COALESCE preserves the launch stamp when the usage JSON model is None). never-raise (isolated try/except). D3/D4 — dedicated per-role budgets: agent_timeout_developer_s=3600 / agent_timeout_reviewer_s=3000 with a deterministic _resolve_timeout ladder (overrides_json[agent] > dedicated role key > agent_timeout_seconds=1800; other roles byte-for-byte). Malformed/non-positive config falls back to the global default + WARNING (never-break). reaper_max_running_s raised 3600 -> 5400 in lockstep to keep the ORCH-065 invariant (5400 > 3600 + 20 = 3620). FR-4 (kill / in-flight visibility) and FR-5 (anti-salvage) are structural in the existing code; pinned here by regression tests (tests/test_orch109_timeout_model.py, TC-01..TC-12). Docs: .env.example, config passport, CHANGELOG, CLAUDE.md (README/internals authored by architect in this branch). Refs: ORCH-109 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 01:32:01 +03:00
parent b025e1bdf4
commit 6bd7f9ba84
8 changed files with 683 additions and 20 deletions
--- a/.env.example
+++ b/.env.example
@@ -107,6 +107,30 @@ ORCH_AGENT_EFFORT_DEPLOYER=medium
 # (G4 NOT enabled, ADR-001 ORCH-74: determinism — all agents stay on opus-4-8). A
 # non-empty value is validated by the SAME predicate as the model; a typo is dropped.
 ORCH_AGENT_FALLBACK_MODEL=
+
+# ── Agent timeout / wall-clock budgets (ORCH-7, raised per-role ORCH-109) ─────
+# The in-process watchdog kills a run that exceeds its wall-clock budget
+# (SIGTERM -> grace -> SIGKILL, exit_code=-9). _resolve_timeout ladder (highest
+# first): OVERRIDES_JSON[agent] > dedicated role key > SECONDS (global default).
+#   SECONDS                -> global default budget for every role WITHOUT a raised
+#                             key (analyst/architect/tester/deployer).
+#   KILL_GRACE_SECONDS     -> pause between SIGTERM and SIGKILL so claude can flush
+#                             artifacts before the hard kill.
+#   OVERRIDES_JSON         -> optional per-agent override object, e.g.
+#                             {"reviewer":3600,"architect":2700}; wins for ANY role.
+#                             Malformed JSON -> ignored + WARNING (never-break).
+# ORCH-109: the two HEAVY roles get raised dedicated budgets (defaults = prod, so an
+# empty .env reproduces prod — ORCH-101 canon). A non-positive value falls back to
+# SECONDS + WARNING.
+#   DEVELOPER_S            -> developer budget (xhigh, coding/agentic bottleneck), 60m.
+#   REVIEWER_S             -> reviewer budget (large diff + high reasoning), 50m.
+# CROSS-INVARIANT (ORCH-065): ORCH_REAPER_MAX_RUNNING_S MUST stay > max(budget)+grace;
+# it is raised to 5400 in lockstep below (5400 > 3600 + 20 = 3620).
+ORCH_AGENT_TIMEOUT_SECONDS=1800
+ORCH_AGENT_KILL_GRACE_SECONDS=20
+ORCH_AGENT_TIMEOUT_OVERRIDES_JSON=
+ORCH_AGENT_TIMEOUT_DEVELOPER_S=3600
+ORCH_AGENT_TIMEOUT_REVIEWER_S=3000
 # ORCH-042/ORCH-067: live-tracker mode. bump (DEFAULT since ORCH-067) -> on every
 # update the old card is deleted and a fresh one is sent silently to the BOTTOM of
 # the chat (deleteMessage + sendMessage + repoint), so the current status is always
@@ -365,6 +389,8 @@ ORCH_PLANE_STATES_TTL_S=300
 #   REAPER_INTERVAL_S       -> background scan period (seconds).
 #   REAPER_DEAD_TICKS       -> consecutive dead-pid ticks before reaping (Tier-1, >=2).
 #   REAPER_MAX_RUNNING_S    -> Tier-3 backstop ceiling; must exceed max agent_timeout+grace.
+#                              ORCH-109: raised 3600 -> 5400 in lockstep with the developer
+#                              budget (5400 > 3600 + 20 = 3620).
 #   REAPER_FINALIZE_GRACE_S -> Tier-2 grace: how long agent_runs.exit_code must have been
 #                              recorded before a still-'running' job is reaped; MUST exceed
 #                              the max finalization window (git push + PR + Plane comments).
@@ -374,7 +400,7 @@ ORCH_PLANE_STATES_TTL_S=300
 ORCH_REAPER_ENABLED=true
 ORCH_REAPER_INTERVAL_S=60
 ORCH_REAPER_DEAD_TICKS=2
-ORCH_REAPER_MAX_RUNNING_S=3600
+ORCH_REAPER_MAX_RUNNING_S=5400
 ORCH_REAPER_FINALIZE_GRACE_S=300
 ORCH_LEASE_RECLAIM_ENABLED=true

--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -3,7 +3,11 @@
 Формат: [Keep a Changelog](https://keepachangelog.com/). Записи — на смысловой PR/задачу.

 ## [Unreleased]
- **Презентация: слайды Lite-установки и использования через Plane** (ORCH-105, `docs`): слайдо-источник `docs/overview/presentation.md` расширен тремя слайдами в каноне ORCH-011 (16 → 19, сквозная нумерация сохранена): один слайд про **Lite-установку скриптами** (два контейнера платформы — оркестратор + сторож на инфре заказчика; развёртывание без правки кода, только конфиг; помощники `gen_secrets.py`/`onboard_project.py` + `docker compose up -d`; runbook `LITE_SETUP.md` с проверкой каждого шага; одношаговый bootstrap — это смежный Bundled, не Lite) и два слайда оператор-инструкции **«как пользоваться орком через Plane»** (запуск через статус «To Analyse»; статусы Plane — индикация, не управление; оба человеческих гейта «Approved»/«Confirm Deploy»; авто-лейблы `autoApprove`/`autoDeploy`/`Bug` — снимают только человеческие решения, ни одна техническая проверка не пропускается; отмена через «STOP»; наблюдение — статусы доски + живая Telegram-карточка + комментарии со ссылками на ветку/PR). Факты сверены с golden sources (`docs/deployment/LITE_SETUP.md`, `docs/overview/tech-pipeline.md`, `tech-integrations.md`, `CLAUDE.md`). **Docs+tests only:** `src/**`/`STAGE_TRANSITIONS`/`QG_CHECKS`/`check_*`/схема БД — байт-в-байт; новый QG не вводится; `python-pptx` не добавлен в прод-образ; собранный `.pptx` в git не коммитится. Анти-дрейф — новая функция `test_presentation_covers_lite_and_plane_usage_bits` в `tests/test_system_docs.py` (существующие проверки без послаблений). ADR: `docs/work-items/ORCH-105/06-adr/ADR-001-presentation-lite-and-plane-usage-slides.md` (канон витрины не меняется — `adr-0039-system-overview-docs-canon.md`).
+- **Timeout-бюджеты developer/reviewer + launch-стамп модели в телеметрии** (ORCH-109, `fix`): две аддитивные изолированные правки подсистемы запуска агентов (инцидент ORCH-104, runs 658/659/660), **без** касания `STAGE_TRANSITIONS`/`QG_CHECKS`/`check_*`/machine-verdict/схемы БД. ADR: `docs/work-items/ORCH-109/06-adr/ADR-001-agent-timeout-budgets-and-launch-model-stamp.md`, сквозной `docs/architecture/adr/adr-0040-agent-timeout-budgets-and-launch-model-stamp.md`.
+  - **Launch-стамп модели (D1, FR-1):** резолвенная `resolve_agent_model(...)` пишется в `agent_runs.model` в **момент launch** объединённым `UPDATE agent_runs SET model=?, effort=? WHERE id=?` рядом со стампом эффорта (ORCH-087) в `launcher._spawn`. Раньше модель писалась только постфактум из финального usage-JSON (`record_usage`, `model=COALESCE(?, model)`), а убитый по тайм-ауту прогон этот JSON не эмитит → модель оставалась `NULL` ровно тогда, когда нужна для разбора инцидента. Теперь модель присутствует с launch, **переживает timeout-kill (`exit_code=-9`)**, видна in-flight в `GET /metrics`/`GET /queue` (`get_running_agents` уже отдаёт `model`) и в строке Telegram-карточки. Пустой резолв (CLI-дефолт без `--model`) → `NULL` (симметрично `effort or None`). Постфактум `record_usage` остаётся **обогащением** (COALESCE сохраняет launch-стамп при `model=None`). never-raise: сбой стампа изолирован `try/except` + WARNING, launch продолжается.
+  - **Поднятые per-role wall-clock бюджеты (D3/D4, FR-3):** выделенные типизированные ключи `agent_timeout_developer_s=3600` (60м) / `agent_timeout_reviewer_s=3000` (50м) (env `ORCH_AGENT_TIMEOUT_DEVELOPER_S`/`_REVIEWER_S`). `_resolve_timeout(agent)` получил детерминированную лестницу: `agent_timeout_overrides_json[agent]` (операторский escape-hatch, высший приоритет, BC) → выделенный ключ роли → `agent_timeout_seconds=1800` (прочие роли — байт-в-байт). Малформный JSON / непозитивный/нечисловой выделенный ключ → откат на глобальный дефолт + WARNING (never-break). Дефолты = боевым значениям (канон ORCH-101): пустой `.env` воспроизводит поднятые бюджеты. **Кросс-инвариант reaper ORCH-065** сохранён синхронным поднятием `reaper_max_running_s` 3600 → **5400** (`5400 > max(timeout)3600 + grace20 = 3620`).
+  - **FR-4/NFR-6 (видимость при kill / in-flight) и FR-5 (анти-salvage) — структурно уже выполнены** существующим кодом (продвижение гейтится `if exit_code == 0`, timeout-kill → `_finalize_job` retry/fail, не advance); ORCH-109 фиксирует их **регресс-тестами**, новых ветвей не вводит. Покрытие — новый `tests/test_orch109_timeout_model.py` (TC-01…TC-12, детерминированный, без сети/CLI). Обновлены `tests/test_config.py` (reaper-дефолт 5400) и `tests/test_launcher.py` (ладдер `_resolve_timeout`). Документация — `.env.example` (блок agent-timeout + reaper), `config.py`-паспорт, `docs/architecture/README.md`/`internals.md` (per-role бюджеты).
+- **Презентация: слайды Lite-установки и использования через Plane** (ORCH-105, `docs`): слайдо-источник `docs/overview/presentation.md` расширен тремя слайдами в каноне ORCH-011 (16 → 19, сквозная нумерация сохранена): один слайд про **Lite-установку скриптами** (два контейнера платформы — оркестратор + сторож на инфре заказчика; развёртывание без правки кода, только конфиг; помощники `gen_secrets.py`/`onboard_project.py` + `docker compose up -d`; runbook `LITE_SETUP.md` с проверкой каждого шага; одношаговый bootstrap — это смежный Bundled, не Lite) и два слайда оператор-инструкции **«как пользоваться орком через Plane»** (запуск через статус «To Analyse»; статусы Plane — индикация, не управление; оба человеческих гейта «Approved»/«Confirm Deploy»; авто-лейблы `autoApprove`/`autoDeploy`/`Bug` — снимают только человеческие решения, ни одна техническая проверка не пропускается; отмена через «STOP»; наблюдение — статусы доски + живая Telegram-карточка + комментарии со ссылками на ветку/PR). Факты сверены с golden sources (`docs/deployment/LITE_SETUP.md`, `docs/overview/tech-pipeline.md`, `tech-integrations.md`, `CLAUDE.md`). **Docs+tests only:** `src/**`/`STAGE_TRANSITIONS`/`QG_CHECKS`/`check_*`/схема БД — байт-в-байт; новый QG не вводится; `python-pptx` не добавлен в прод-образ; собранный `.pptx` в git не коммитится. Анти-дрейф — новая функция `test_presentation_covers_lite_and_plane_usage_bits` в `tests/test_system_docs.py` (существующие проверки без послаблений). ADR: `docs/work-items/ORCH-105/06-adr/ADR-001-presentation-lite-and-plane-usage-slides.md` (канон витрины не меняется — `adr-0039-system-overview-docs-canon.md`). слайдо-источник `docs/overview/presentation.md` расширен тремя слайдами в каноне ORCH-011 (16 → 19, сквозная нумерация сохранена): один слайд про **Lite-установку скриптами** (два контейнера платформы — оркестратор + сторож на инфре заказчика; развёртывание без правки кода, только конфиг; помощники `gen_secrets.py`/`onboard_project.py` + `docker compose up -d`; runbook `LITE_SETUP.md` с проверкой каждого шага; одношаговый bootstrap — это смежный Bundled, не Lite) и два слайда оператор-инструкции **«как пользоваться орком через Plane»** (запуск через статус «To Analyse»; статусы Plane — индикация, не управление; оба человеческих гейта «Approved»/«Confirm Deploy»; авто-лейблы `autoApprove`/`autoDeploy`/`Bug` — снимают только человеческие решения, ни одна техническая проверка не пропускается; отмена через «STOP»; наблюдение — статусы доски + живая Telegram-карточка + комментарии со ссылками на ветку/PR). Факты сверены с golden sources (`docs/deployment/LITE_SETUP.md`, `docs/overview/tech-pipeline.md`, `tech-integrations.md`, `CLAUDE.md`). **Docs+tests only:** `src/**`/`STAGE_TRANSITIONS`/`QG_CHECKS`/`check_*`/схема БД — байт-в-байт; новый QG не вводится; `python-pptx` не добавлен в прод-образ; собранный `.pptx` в git не коммитится. Анти-дрейф — новая функция `test_presentation_covers_lite_and_plane_usage_bits` в `tests/test_system_docs.py` (существующие проверки без послаблений). ADR: `docs/work-items/ORCH-105/06-adr/ADR-001-presentation-lite-and-plane-usage-slides.md` (канон витрины не меняется — `adr-0039-system-overview-docs-canon.md`).
 - **Витрина системы `docs/overview/`: бизнес + тех, маршруты трёх аудиторий, презентация** (ORCH-011, `docs`): единая точка входа в документацию платформы — новый docs-раздел `docs/overview/` (плоский каталог, 10 файлов, ADR-001 D1): индекс `README.md` (маршруты «Я заказчик / Я менеджер / Я разработчик» + норматив сопровождения «изменил функциональность → обнови витрину в том же PR»), бизнес-часть `business.md` (проблема → решение → что умеет фактически → ценность → 6 сценариев; без жаргона, цифры только с атрибуцией), 7 тех-блоков `tech-*.md` (архитектура со схемой потока, конвейер/гейты, агенты, модель объектов, интеграции, качество/безопасность, наблюдаемость; link-first — за деталями ссылки в golden sources, разрешённый дубль только машинно-сверяемый). **Docs+tests+dev-скрипт** (паттерн ORCH-102/103): `src/**`/`docker-compose.yml`/`Dockerfile`/`requirements*`/`STAGE_TRANSITIONS`/`QG_CHECKS`/machine-verdict/схема БД — ноль изменений. ADR: `docs/work-items/ORCH-011/06-adr/ADR-001-system-overview-canon.md`, сквозной `adr-0039-system-overview-docs-canon.md`.
  - **Презентация (D4/D5):** слайдо-источник `docs/overview/presentation.md` (16 слайдов в машинно-парсимой структуре «## Слайд N: …» + процедура сборки «команда + Проверка:») + dev-скрипт `scripts/build_presentation.py` (python-pptx, тёмный дизайн, редактируемый текст с точной кириллицей; чистый stdlib-парсер `parse_slides` + ленивый импорт pptx). Запуск только вне рантайма; `python-pptx` НЕ в прод-образе (машинный гард); собранный `.pptx` в git не коммитится — `build/` в `.gitignore`.
  - **Анти-дрейф (D6):** новый структурный `tests/test_system_docs.py` (без сети/LLM/subprocess, паттерн `test_lite_setup_doc.py`) — 10 файлов витрины; маршруты/норматив; derive-сверки с кодом: стадии импортом `src.stages.STAGE_TRANSITIONS` (вкл. `deploy-staging`/`cancelled`, порядок цепочки), exit-гейты и под-гейты именами реестра `QG_CHECKS` в нормативном порядке security → merge → coverage → image-freshness (+ маркер «не стадии»), 6 агентов glob'ом промптов, таблица эффортов class-default'ами config (ORCH-41/81); валидность относительных ссылок + обязательные golden-source ссылки; полнотекстовый FORBIDDEN-скан (импорт из `test_no_host_hardcodes.py`) + секрет-эвристика + запрет вне-репозиторных путей; слайды каноническим парсером; `pptx` отсутствует в `requirements*`/`Dockerfile`; указатели README/CLAUDE/CHANGELOG.
--- a/CLAUDE.md
+++ b/CLAUDE.md
--- a/src/agents/launcher.py
+++ b/src/agents/launcher.py
@@ -563,14 +563,26 @@ class AgentLauncher:
        # so this is the only reliable source for the tracker's "· model · effort"
        # line. Empty resolve (no --effort flag) -> NULL so the suffix is omitted.
        # Reuses the still-open conn; never blocks the launch.
+        #
+        # ORCH-109 (D1): stamp the resolved MODEL in the SAME UPDATE. Previously the
+        # model was only written post-hoc from the final usage-JSON (usage.record_usage,
+        # model=COALESCE(?, model)); a timeout-killed run never emits that JSON, so the
+        # model stayed NULL exactly when an incident needs it. Resolving it here is
+        # deterministic (resolve_agent_model above), so the value is present from launch,
+        # survives a timeout-kill (-9), and is visible in-flight in /metrics & /queue.
+        # The post-hoc record_usage stays an ENRICHMENT (COALESCE keeps the launch stamp
+        # when the JSON model is None/missing). Empty resolve (model == "", CLI default
+        # with no --model) -> NULL, symmetric with `effort or None`, so the tracker's
+        # model suffix is correctly omitted. never-raise: failure is isolated + WARNING;
+        # the launch continues (model_flag is built from the local `model`, not the DB).
        try:
            conn.execute(
-                "UPDATE agent_runs SET effort=? WHERE id=?",
-                (effort or None, run_id),
+                "UPDATE agent_runs SET model=?, effort=? WHERE id=?",
+                (model or None, effort or None, run_id),
            )
            conn.commit()
        except Exception as e:
-            logger.warning(f"effort stamp failed for run_id={run_id}: {e}")
+            logger.warning(f"model/effort stamp failed for run_id={run_id}: {e}")
        model_flag = f"--model {model} " if model else ""
        effort_flag = f"--effort {effort} " if effort else ""
        # ORCH-074 (G2): agent_fallback_model is read directly here, bypassing
@@ -658,16 +670,34 @@ class AgentLauncher:
        notify_agent_started(run_id, agent, task_id)
        return run_id

+    # ORCH-109 (D3): dedicated raised-budget keys for the two HEAVY roles. Maps the
+    # role to its Settings attribute; resolved BELOW the operator JSON escape-hatch
+    # and ABOVE the global default. A role absent here keeps the global default.
+    _TIMEOUT_ROLE_KEYS = {
+        "developer": "agent_timeout_developer_s",
+        "reviewer": "agent_timeout_reviewer_s",
+    }
+
    @staticmethod
    def _resolve_timeout(agent: str = None) -> int:
-        """ORCH-7 (M-2): resolve the wall-clock timeout for an agent.
+        """ORCH-7 (M-2) + ORCH-109 (D3): resolve the wall-clock timeout for an agent.

-        Per-agent override from settings.agent_timeout_overrides_json (a JSON object
-        like {"reviewer": 3600}) wins; otherwise the global default
-        settings.agent_timeout_seconds is used. A malformed override JSON is ignored
-        (falls back to the default) and only logged, so a bad env never bricks runs.
+        Deterministic priority ladder (highest first):
+          1. settings.agent_timeout_overrides_json[agent] -- operator escape-hatch,
+             wins for ANY role (full BC). A malformed JSON is ignored + logged.
+          2. dedicated per-role key (ORCH-109): developer -> agent_timeout_developer_s
+             (3600), reviewer -> agent_timeout_reviewer_s (3000). A non-positive /
+             non-int value is ignored + logged (never-break) and falls through to (3).
+          3. settings.agent_timeout_seconds -- the global default (1800) for every
+             other role (analyst/architect/tester/deployer), byte-for-byte as before.
+
+        Never raises: any bad config degrades to the global default so a bad env
+        never bricks runs. Cross-invariant (ORCH-065): max(resolved) + grace must
+        stay < reaper_max_running_s (raised to 5400 in lockstep; see config.py).
        """
        default = settings.agent_timeout_seconds
+
+        # (1) operator JSON override -- highest priority, unchanged semantics.
        raw = (settings.agent_timeout_overrides_json or "").strip()
        if agent and raw:
            try:
@@ -676,6 +706,22 @@ class AgentLauncher:
                    return int(overrides[agent])
            except (ValueError, TypeError) as e:
                logger.warning(f"Invalid agent_timeout_overrides_json, using default: {e}")
+
+        # (2) dedicated per-role raised budget (ORCH-109 D3/D4).
+        key = AgentLauncher._TIMEOUT_ROLE_KEYS.get(agent)
+        if key is not None:
+            try:
+                value = int(getattr(settings, key))
+                if value > 0:
+                    return value
+                logger.warning(
+                    f"Non-positive {key}={value!r}; falling back to "
+                    f"agent_timeout_seconds={default}"
+                )
+            except (ValueError, TypeError) as e:
+                logger.warning(f"Invalid {key} for agent '{agent}', using default: {e}")
+
+        # (3) global default.
        return default

    def _watchdog(self, pid: int, run_id: int, timeout: int = None,
--- a/src/config.py
+++ b/src/config.py
@@ -120,10 +120,28 @@ class Settings(BaseSettings):
    #                            (env ORCH_AGENT_KILL_GRACE_SECONDS).
    # agent_timeout_overrides_json -> optional per-agent override JSON object,
    #                            e.g. {"reviewer": 3600, "architect": 2700}
-    #                            (env ORCH_AGENT_TIMEOUT_OVERRIDES_JSON).
+    #                            (env ORCH_AGENT_TIMEOUT_OVERRIDES_JSON). HIGHEST
+    #                            priority escape-hatch in _resolve_timeout (wins for
+    #                            any role).
+    # ORCH-109 (D3/D4): raised wall-clock budgets for the two HEAVY roles.
+    #   agent_timeout_developer_s -> developer is the bottleneck (effort xhigh,
+    #                            coding/agentic); 3600s/60m (env
+    #                            ORCH_AGENT_TIMEOUT_DEVELOPER_S).
+    #   agent_timeout_reviewer_s  -> reviewer reads a large diff + writes the review
+    #                            (high reasoning); 3000s/50m (env
+    #                            ORCH_AGENT_TIMEOUT_REVIEWER_S).
+    # _resolve_timeout ladder: overrides_json[agent] > dedicated role key >
+    # agent_timeout_seconds (other roles stay at 1800, byte-for-byte). A malformed
+    # JSON / non-positive dedicated value falls back to agent_timeout_seconds +
+    # WARNING (never-break). The defaults ARE the prod budget (ORCH-101 canon: empty
+    # .env reproduces prod). CROSS-INVARIANT (ORCH-065): reaper_max_running_s MUST
+    # stay > max(resolved timeout) + agent_kill_grace_seconds; raised in lockstep to
+    # 5400 below (5400 > 3600 + 20 = 3620).
    agent_timeout_seconds: int = 1800
    agent_kill_grace_seconds: int = 20
    agent_timeout_overrides_json: str = ""
+    agent_timeout_developer_s: int = 3600
+    agent_timeout_reviewer_s: int = 3000

    # ORCH-41: per-agent LLM model. Empty -> agent_model_default. Resolution order:
    # project-override (projects_json agent_models) > ORCH_AGENT_MODEL_<AGENT> >
@@ -480,6 +498,9 @@ class Settings(BaseSettings):
    #   reaper_max_running_s  -> Tier-3 backstop ceiling: a job 'running' longer than
    #                           this is reaped even when liveness is unknowable. MUST be
    #                           > max agent_timeout + grace so a legit agent is safe.
+    #                           ORCH-109 (D4): raised 3600 -> 5400 in lockstep with the
+    #                           developer budget (5400 > 3600 + 20 = 3620; headroom 1780s
+    #                           also covers the monitor finalization window).
    #   reaper_finalize_grace_s -> Tier-2 anti-false-positive: a LIVE monitor writes
    #                           agent_runs.exit_code FIRST, THEN does git commit/push +
    #                           PR + Plane usage comments (seconds..minutes) and only
@@ -494,7 +515,7 @@ class Settings(BaseSettings):
    reaper_enabled: bool = True
    reaper_interval_s: int = 60
    reaper_dead_ticks: int = 2
-    reaper_max_running_s: int = 3600
+    reaper_max_running_s: int = 5400
    reaper_finalize_grace_s: int = 300
    lease_reclaim_enabled: bool = True

--- a/tests/test_config.py
+++ b/tests/test_config.py
@@ -195,7 +195,9 @@ def test_reaper_settings_defaults(monkeypatch):
    assert s.reaper_enabled is True
    assert s.reaper_interval_s == 60
    assert s.reaper_dead_ticks == 2
-    assert s.reaper_max_running_s == 3600
+    # ORCH-109 (D4): raised 3600 -> 5400 in lockstep with the developer budget so the
+    # Tier-3 backstop stays > max(agent_timeout)+grace (5400 > 3600 + 20 = 3620).
+    assert s.reaper_max_running_s == 5400
    assert s.lease_reclaim_enabled is True


--- a/tests/test_launcher.py
+++ b/tests/test_launcher.py
@@ -160,12 +160,19 @@ class TestDeadCodeRemoved:
 # ORCH-7 (M-2): configurable timeout + per-agent override
 # ---------------------------------------------------------------------------
 class TestResolveTimeout:
-    """M-2: _resolve_timeout honours a per-agent JSON override, else the default."""
+    """M-2: _resolve_timeout honours a per-agent JSON override, else the default.
+
+    ORCH-109 (D3): the ladder grew a middle level — dedicated raised budgets for
+    developer/reviewer between the JSON escape-hatch and the global default. These
+    tests are updated to assert the global default via a NON-raised role; the new
+    dedicated-key ladder is covered in full by tests/test_orch109_timeout_model.py.
+    """

    def test_default_when_no_override(self, monkeypatch):
        monkeypatch.setattr(settings, "agent_timeout_seconds", 1800)
        monkeypatch.setattr(settings, "agent_timeout_overrides_json", "")
-        assert AgentLauncher._resolve_timeout("developer") == 1800
+        # ORCH-109: a role WITHOUT a dedicated key keeps the global default.
+        assert AgentLauncher._resolve_timeout("analyst") == 1800
        assert AgentLauncher._resolve_timeout(None) == 1800

    def test_override_for_specific_agent(self, monkeypatch):
@@ -173,16 +180,19 @@ class TestResolveTimeout:
        monkeypatch.setattr(
            settings, "agent_timeout_overrides_json", '{"reviewer": 3600, "architect": 2700}'
        )
+        # JSON override stays the HIGHEST priority for ANY role (full BC).
        assert AgentLauncher._resolve_timeout("reviewer") == 3600
        assert AgentLauncher._resolve_timeout("architect") == 2700
-        # an agent not in the override map falls back to the default
-        assert AgentLauncher._resolve_timeout("developer") == 1800
+        # ORCH-109: a role not in the override map AND without a dedicated key
+        # (tester) falls back to the global default.
+        assert AgentLauncher._resolve_timeout("tester") == 1800

    def test_malformed_override_falls_back_to_default(self, monkeypatch):
        monkeypatch.setattr(settings, "agent_timeout_seconds", 1800)
        monkeypatch.setattr(settings, "agent_timeout_overrides_json", "{not-json")
-        # must not raise, must return the default
-        assert AgentLauncher._resolve_timeout("reviewer") == 1800
+        # must not raise; a role without a dedicated key returns the global default
+        # (ORCH-109: developer/reviewer would return their dedicated budget instead).
+        assert AgentLauncher._resolve_timeout("architect") == 1800


 class TestWatchdogGracefulKill:
--- a/tests/test_orch109_timeout_model.py
+++ b/tests/test_orch109_timeout_model.py
@@ -0,0 +1,554 @@
+"""ORCH-109: timeout budgets + launch-time model telemetry for developer/reviewer.
+
+Covers FR-1..FR-6 / AC-1..AC-10 through TC-01..TC-12 (04-test-plan.yaml). Fully
+deterministic: an isolated temp SQLite DB + synthetic agent_runs / jobs rows; no
+network, no Claude CLI subprocess. Settings are monkeypatched / overridden.
+
+Two production changes under test (ADR-001):
+  * D1 — launcher._spawn stamps the resolved model into agent_runs.model in the
+    SAME UPDATE as the effort stamp, so the model is present from launch and
+    survives a timeout-kill / is visible in-flight.
+  * D3/D4 — launcher._resolve_timeout grows a dedicated per-role budget level
+    (developer 3600 / reviewer 3000) between the JSON escape-hatch and the global
+    default; reaper_max_running_s raised 3600 -> 5400 in lockstep (ORCH-065).
+FR-2 (COALESCE preserve), FR-4/NFR-6 (kill / in-flight visibility) and FR-5
+(anti-salvage) are STRUCTURAL guarantees already present in the code — pinned here
+as regression tests, not new branches.
+"""
+import os
+import sqlite3
+import tempfile
+
+os.environ.setdefault("ORCH_PLANE_API_TOKEN", "test-token")
+os.environ.setdefault("ORCH_GITEA_TOKEN", "test-token")
+
+_test_db = os.path.join(tempfile.gettempdir(), "test_orch109_timeout_model.db")
+os.environ["ORCH_DB_PATH"] = _test_db
+os.environ.setdefault("ORCH_REPOS_DIR", tempfile.gettempdir())
+
+import pytest  # noqa: E402
+
+import src.db as db_module  # noqa: E402
+from src.db import init_db, get_db, get_running_agents  # noqa: E402
+from src.config import settings, Settings  # noqa: E402
+from src.agents.launcher import AgentLauncher, resolve_agent_model  # noqa: E402
+from src import usage as U  # noqa: E402
+from src import notifications as N  # noqa: E402
+
+
+@pytest.fixture(autouse=True)
+def setup_db(monkeypatch):
+    # get_db() reads settings.db_path live; pin it to our isolated DB.
+    monkeypatch.setattr(db_module.settings, "db_path", _test_db, raising=False)
+    if os.path.exists(_test_db):
+        os.unlink(_test_db)
+    init_db()
+    # render-only tests: never consult the live Plane overlay (no network).
+    monkeypatch.setattr(N._get_settings(), "tracker_live_status", False, raising=False)
+    yield
+    if os.path.exists(_test_db):
+        os.unlink(_test_db)
+
+
+# --------------------------------------------------------------------------- #
+# TC-01..TC-03 — _resolve_timeout dedicated-budget ladder (FR-3, AC-3 / AC-4)
+# --------------------------------------------------------------------------- #
+class TestResolveTimeoutLadder:
+    """The priority ladder: overrides_json > dedicated role key > global default."""
+
+    def _pin(self, monkeypatch, *, dev=3600, rev=3000, default=1800, overrides=""):
+        monkeypatch.setattr(settings, "agent_timeout_seconds", default)
+        monkeypatch.setattr(settings, "agent_timeout_overrides_json", overrides)
+        monkeypatch.setattr(settings, "agent_timeout_developer_s", dev)
+        monkeypatch.setattr(settings, "agent_timeout_reviewer_s", rev)
+
+    def test_tc01_developer_reviewer_raised(self, monkeypatch):
+        """TC-01/AC-3: developer/reviewer resolve to their raised dedicated budget."""
+        self._pin(monkeypatch)
+        assert AgentLauncher._resolve_timeout("developer") == 3600
+        assert AgentLauncher._resolve_timeout("reviewer") == 3000
+
+    def test_tc01_dedicated_keys_are_configurable(self, monkeypatch):
+        """TC-01/AC-3: the budgets are config-driven, not hardcoded."""
+        self._pin(monkeypatch, dev=4200, rev=2400)
+        assert AgentLauncher._resolve_timeout("developer") == 4200
+        assert AgentLauncher._resolve_timeout("reviewer") == 2400
+
+    def test_tc02_other_roles_use_global_default(self, monkeypatch):
+        """TC-02/AC-3: roles without a dedicated key keep the global default (1800)."""
+        self._pin(monkeypatch)
+        for role in ("analyst", "architect", "tester", "deployer"):
+            assert AgentLauncher._resolve_timeout(role) == 1800
+        # unknown role / None also fall through to the global default.
+        assert AgentLauncher._resolve_timeout("unknown-role") == 1800
+        assert AgentLauncher._resolve_timeout(None) == 1800
+
+    def test_tc01_overrides_json_wins_over_dedicated(self, monkeypatch):
+        """AC-3: the operator JSON escape-hatch stays HIGHEST priority for ANY role."""
+        self._pin(monkeypatch, overrides='{"developer": 1234, "reviewer": 999}')
+        assert AgentLauncher._resolve_timeout("developer") == 1234
+        assert AgentLauncher._resolve_timeout("reviewer") == 999
+
+    def test_tc03_malformed_overrides_json_never_raises(self, monkeypatch):
+        """TC-03/AC-4: malformed JSON is ignored; resolution still succeeds (never-break)."""
+        self._pin(monkeypatch, overrides="{not-json")
+        # malformed JSON ignored -> developer still resolves via its dedicated key.
+        assert AgentLauncher._resolve_timeout("developer") == 3600
+        # a role without a dedicated key falls through to the global default.
+        assert AgentLauncher._resolve_timeout("analyst") == 1800
+
+    @pytest.mark.parametrize("bad", [0, -5, "abc"])
+    def test_tc03_non_positive_dedicated_falls_back(self, monkeypatch, bad):
+        """TC-03/AC-4: an absurd/non-positive/non-int dedicated value -> global default."""
+        self._pin(monkeypatch, dev=bad)
+        # must NOT raise; falls back to agent_timeout_seconds + WARNING.
+        assert AgentLauncher._resolve_timeout("developer") == 1800
+
+
+# --------------------------------------------------------------------------- #
+# TC-04 / TC-05 — launch-time model stamp in _spawn (FR-1, AC-1 + NFR-2)
+# --------------------------------------------------------------------------- #
+class TestLaunchModelStamp:
+    """_spawn writes the resolved model to agent_runs.model at launch (next to effort)."""
+
+    def _seed_task(self, repo="orchestrator", branch="feature/ORCH-109-x", wid="ORCH-109"):
+        conn = get_db()
+        cur = conn.execute(
+            "INSERT INTO tasks (plane_id, work_item_id, repo, branch, stage, title) "
+            "VALUES (?,?,?,?,?,?)",
+            ("p1", wid, repo, branch, "development", "t"),
+        )
+        tid = cur.lastrowid
+        conn.commit()
+        conn.close()
+        return tid
+
+    def _fake_spawn_env(self, tmp_path, monkeypatch, repo="orchestrator"):
+        """Fake every OS/process side-effect so _spawn touches only the DB."""
+        import src.agents.launcher as L
+        (tmp_path / repo).mkdir()
+        monkeypatch.setattr(L.settings, "repos_dir", str(tmp_path), raising=False)
+        monkeypatch.setattr(L.settings, "runs_dir", str(tmp_path / "runs"), raising=False)
+        monkeypatch.setattr(L, "ensure_worktree", lambda r, b: str(tmp_path / repo))
+        monkeypatch.setattr("src.projects.get_project_by_repo", lambda r: None)
+
+        class _Proc:
+            pid = 4242
+
+        monkeypatch.setattr(L.subprocess, "Popen", lambda *a, **k: _Proc())
+
+        class _T:
+            def __init__(self, *a, **k):
+                pass
+
+            def start(self):
+                pass
+
+        monkeypatch.setattr(L.threading, "Thread", _T)
+        monkeypatch.setattr(L, "notify_agent_started", lambda *a, **k: None)
+        return L
+
+    def test_tc04_spawn_stamps_model_and_effort(self, tmp_path, monkeypatch):
+        """TC-04/AC-1: after _spawn the run row carries the resolved model AND effort."""
+        L = self._fake_spawn_env(tmp_path, monkeypatch)
+        # Deterministic resolve: developer -> claude-opus-4-8 (default) / xhigh (floor).
+        monkeypatch.setattr(L.settings, "agent_model_developer", "", raising=False)
+        monkeypatch.setattr(L.settings, "agent_model_default", "claude-opus-4-8", raising=False)
+        monkeypatch.setattr(L.settings, "agent_effort_developer", "", raising=False)
+        monkeypatch.setattr(L.settings, "agent_effort_default", "", raising=False)
+
+        tid = self._seed_task()
+        run_id = L.AgentLauncher()._spawn(
+            "developer", "orchestrator", task_content=None, task_id=tid
+        )
+
+        conn = get_db()
+        row = conn.execute(
+            "SELECT model, effort FROM agent_runs WHERE id=?", (run_id,)
+        ).fetchone()
+        conn.close()
+        assert row["model"] == "claude-opus-4-8"
+        assert row["effort"] == "xhigh"
+        # The stamp matches the resolver — single source of truth.
+        assert row["model"] == resolve_agent_model("developer", None)
+
+    def test_tc05_stamp_failure_is_isolated(self, tmp_path, monkeypatch):
+        """TC-05/NFR-2: a failing model/effort stamp does NOT propagate out of _spawn."""
+        L = self._fake_spawn_env(tmp_path, monkeypatch)
+        real_get_db = db_module.get_db
+
+        class _RaisingConn:
+            """Delegates to a real conn but raises on the launch stamp UPDATE only."""
+
+            def __init__(self, real):
+                self._real = real
+
+            def execute(self, sql, *a, **k):
+                if "SET model=?, effort=?" in sql:
+                    raise sqlite3.OperationalError("simulated stamp failure")
+                return self._real.execute(sql, *a, **k)
+
+            def commit(self):
+                return self._real.commit()
+
+            def close(self):
+                return self._real.close()
+
+            def __getattr__(self, name):
+                return getattr(self._real, name)
+
+        monkeypatch.setattr(L, "get_db", lambda: _RaisingConn(real_get_db()))
+
+        tid = self._seed_task()
+        # Must NOT raise even though the stamp UPDATE blows up.
+        run_id = L.AgentLauncher()._spawn(
+            "developer", "orchestrator", task_content=None, task_id=tid
+        )
+        assert run_id is not None
+
+        # The run row exists; model stayed NULL (stamp failed) — launch unharmed.
+        conn = real_get_db()
+        row = conn.execute(
+            "SELECT id, model FROM agent_runs WHERE id=?", (run_id,)
+        ).fetchone()
+        conn.close()
+        assert row is not None
+        assert row["model"] is None
+
+
+# --------------------------------------------------------------------------- #
+# TC-06 / TC-07 — post-hoc enrich preserves / refines the launch stamp (FR-2)
+# --------------------------------------------------------------------------- #
+class TestRecordUsagePreservesStamp:
+    """record_usage (model=COALESCE(?, model)) never clobbers a launch-stamped model."""
+
+    def _run_with_model(self, model="claude-opus-4-8", agent="developer"):
+        conn = get_db()
+        cur = conn.execute(
+            "INSERT INTO agent_runs (task_id, agent, model) VALUES (?,?,?)",
+            (1, agent, model),
+        )
+        rid = cur.lastrowid
+        conn.commit()
+        conn.close()
+        return rid
+
+    def _model_of(self, rid):
+        conn = get_db()
+        row = conn.execute("SELECT model FROM agent_runs WHERE id=?", (rid,)).fetchone()
+        conn.close()
+        return row["model"]
+
+    def test_tc06_record_usage_none_preserves_model(self):
+        """TC-06/AC-2: usage=None (no final JSON, e.g. timeout) keeps the launch stamp."""
+        rid = self._run_with_model()
+        U.record_usage(rid, None)  # must not raise
+        assert self._model_of(rid) == "claude-opus-4-8"
+
+    def test_tc06_record_usage_model_none_preserves_model(self):
+        """TC-06/AC-2: a usage JSON with model=None keeps the launch stamp (COALESCE)."""
+        rid = self._run_with_model()
+        U.record_usage(rid, {"input_tokens": 10, "output_tokens": 5, "model": None})
+        assert self._model_of(rid) == "claude-opus-4-8"
+
+    def test_tc07_record_usage_nonempty_model_enriches_blank(self):
+        """TC-07/AC-2: a non-empty model in the JSON sets a blank (CLI-default) stamp."""
+        rid = self._run_with_model(model=None)
+        U.record_usage(
+            rid, {"input_tokens": 1, "output_tokens": 1, "model": "claude-opus-4-8"}
+        )
+        assert self._model_of(rid) == "claude-opus-4-8"
+
+    def test_tc07_record_usage_refines_existing_model(self):
+        """TC-07/AC-2: a fuller provider-prefixed id refines a bare launch stamp."""
+        rid = self._run_with_model(model="claude-opus-4-8")
+        U.record_usage(
+            rid,
+            {"input_tokens": 1, "output_tokens": 1, "model": "tokenator/claude-opus-4-8"},
+        )
+        assert self._model_of(rid) == "tokenator/claude-opus-4-8"
+
+
+# --------------------------------------------------------------------------- #
+# TC-08 — reaper cross-invariant (NFR-4, AC-5)
+# --------------------------------------------------------------------------- #
+class TestReaperInvariant:
+    """reaper_max_running_s MUST stay > max(resolved timeout) + grace (ORCH-065)."""
+
+    def test_tc08_shipped_defaults_satisfy_invariant(self, monkeypatch):
+        """TC-08/AC-5: the canonical shipped defaults hold the invariant."""
+        for name in (
+            "ORCH_AGENT_TIMEOUT_SECONDS",
+            "ORCH_AGENT_KILL_GRACE_SECONDS",
+            "ORCH_AGENT_TIMEOUT_OVERRIDES_JSON",
+            "ORCH_AGENT_TIMEOUT_DEVELOPER_S",
+            "ORCH_AGENT_TIMEOUT_REVIEWER_S",
+            "ORCH_REAPER_MAX_RUNNING_S",
+        ):
+            monkeypatch.delenv(name, raising=False)
+        s = Settings()
+        max_budget = max(
+            s.agent_timeout_seconds,
+            s.agent_timeout_developer_s,
+            s.agent_timeout_reviewer_s,
+        )
+        assert s.reaper_max_running_s > max_budget + s.agent_kill_grace_seconds
+        # Concrete shipped numbers (ADR-001 D4): 5400 > 3600 + 20 = 3620.
+        assert (max_budget, s.agent_kill_grace_seconds, s.reaper_max_running_s) == (
+            3600,
+            20,
+            5400,
+        )
+
+    def test_tc08_resolved_max_is_developer(self, monkeypatch):
+        """TC-08/AC-5: the max resolved per-role budget is the developer budget."""
+        monkeypatch.setattr(settings, "agent_timeout_seconds", 1800)
+        monkeypatch.setattr(settings, "agent_timeout_overrides_json", "")
+        monkeypatch.setattr(settings, "agent_timeout_developer_s", 3600)
+        monkeypatch.setattr(settings, "agent_timeout_reviewer_s", 3000)
+        monkeypatch.setattr(settings, "agent_kill_grace_seconds", 20)
+        monkeypatch.setattr(settings, "reaper_max_running_s", 5400)
+        roles = ["analyst", "architect", "developer", "reviewer", "tester", "deployer"]
+        max_timeout = max(AgentLauncher._resolve_timeout(r) for r in roles)
+        assert max_timeout == 3600
+        assert settings.reaper_max_running_s > max_timeout + settings.agent_kill_grace_seconds
+
+
+# --------------------------------------------------------------------------- #
+# TC-09 — tracker stage line shows model+effort on a timeout-killed run (FR-4)
+# --------------------------------------------------------------------------- #
+class TestTrackerTimeoutVisibility:
+    """A -9 run still renders '· <model> · <effort>' because both are launch-stamped.
+
+    The stage line takes its model/effort from the LAST run of the agent
+    (stage_runs[-1] in _stage_line). When that last run is a timeout-kill (-9), its
+    launch-stamped values are exactly what the operator sees — the whole point of
+    stamping at launch. Without the stamp the -9 row would carry model=NULL and the
+    line would drop the model suffix (the AC-6 FAIL condition).
+    """
+
+    def _mk_task(self, stage="done", wid="ORCH-109"):
+        conn = get_db()
+        cur = conn.execute(
+            "INSERT INTO tasks (plane_id, work_item_id, repo, branch, stage, title) "
+            "VALUES (?,?,?,?,?,?)",
+            ("p1", wid, "orchestrator", "feature/ORCH-109-x", stage, "t"),
+        )
+        tid = cur.lastrowid
+        conn.commit()
+        conn.close()
+        return tid
+
+    def _add_run(self, tid, *, exit_code, model, effort, started, finished):
+        conn = get_db()
+        conn.execute(
+            "INSERT INTO agent_runs (task_id, agent, started_at, finished_at, "
+            "exit_code, input_tokens, output_tokens, cost_usd, model, effort) "
+            "VALUES (?,?,?,?,?,?,?,?,?,?)",
+            (tid, "developer", started, finished, exit_code, 10, 5, 0.0, model, effort),
+        )
+        conn.commit()
+        conn.close()
+
+    def test_tc09_killed_run_renders_model_effort(self):
+        """TC-09/AC-6: the -9 (last) developer run's launch-stamped model+effort show."""
+        tid = self._mk_task(stage="done")
+        # run 1: succeeded (opens the ✅ stage line) — DIFFERENT model so we can prove
+        # the displayed value comes from the killed run, not this one.
+        self._add_run(
+            tid,
+            exit_code=0,
+            model="claude-sonnet-4-6",
+            effort="high",
+            started="2026-06-14 09:00:00",
+            finished="2026-06-14 09:20:00",
+        )
+        # run 2: timeout-killed (-9), the LAST run -> _stage_line reads its row.
+        self._add_run(
+            tid,
+            exit_code=-9,
+            model="tokenator/claude-opus-4-8",
+            effort="xhigh",
+            started="2026-06-14 09:25:00",
+            finished="2026-06-14 09:55:00",
+        )
+
+        text = N.render_task_tracker(tid)
+        line = [ln for ln in text.splitlines() if ln.startswith("✅ Разработка")][0]
+        # model NOT null: the killed run's launch-stamped opus-4-8 · xhigh is shown.
+        assert line.rstrip().endswith("opus-4-8 · xhigh")
+        assert "sonnet" not in line  # the displayed value is the -9 run's, not run 1's
+
+    def test_tc09_unstamped_killed_run_drops_model_suffix(self):
+        """AC-6 FAIL-guard: a -9 run with model=NULL would omit the suffix (negative)."""
+        tid = self._mk_task(stage="done")
+        self._add_run(
+            tid,
+            exit_code=0,
+            model="tokenator/claude-opus-4-8",
+            effort="xhigh",
+            started="2026-06-14 09:00:00",
+            finished="2026-06-14 09:20:00",
+        )
+        # killed run WITHOUT a launch stamp (the pre-ORCH-109 bug): model+effort NULL.
+        self._add_run(
+            tid,
+            exit_code=-9,
+            model=None,
+            effort=None,
+            started="2026-06-14 09:25:00",
+            finished="2026-06-14 09:55:00",
+        )
+        text = N.render_task_tracker(tid)
+        line = [ln for ln in text.splitlines() if ln.startswith("✅ Разработка")][0]
+        # No launch stamp -> the model/effort suffix is dropped (cost shown without model).
+        assert "opus-4-8" not in line
+        assert "xhigh" not in line
+
+
+# --------------------------------------------------------------------------- #
+# TC-10 — in-flight model visibility via get_running_agents (NFR-6)
+# --------------------------------------------------------------------------- #
+class TestInflightModelVisibility:
+    """get_running_agents exposes the launch-stamped model for a RUNNING job."""
+
+    def test_tc10_running_job_exposes_model(self):
+        """TC-10/AC-7: /metrics & /queue see the model before the run finishes."""
+        conn = get_db()
+        cur = conn.execute(
+            "INSERT INTO agent_runs (task_id, agent, model, effort) VALUES (?,?,?,?)",
+            (1, "developer", "claude-opus-4-8", "xhigh"),
+        )
+        rid = cur.lastrowid
+        conn.execute(
+            "INSERT INTO jobs (agent, repo, status, run_id, started_at) "
+            "VALUES (?,?,?,?,datetime('now'))",
+            ("developer", "orchestrator", "running", rid),
+        )
+        conn.commit()
+        conn.close()
+
+        rows = get_running_agents()
+        assert len(rows) == 1
+        assert rows[0]["model"] == "claude-opus-4-8"  # non-null in-flight
+        assert rows[0]["effort"] == "xhigh"
+
+
+# --------------------------------------------------------------------------- #
+# TC-11 — anti-salvage: a timeout-killed run does NOT advance the stage (FR-5)
+# --------------------------------------------------------------------------- #
+class TestAntiSalvage:
+    """Advancement is gated by `if exit_code == 0`; a -9 run is routed to retry/fail."""
+
+    class _Proc:
+        def __init__(self, code):
+            self._code = code
+
+        def wait(self):
+            return self._code
+
+    def _seed_run(self, agent="developer"):
+        conn = get_db()
+        cur = conn.execute(
+            "INSERT INTO agent_runs (task_id, agent) VALUES (?,?)", (1, agent)
+        )
+        rid = cur.lastrowid
+        conn.commit()
+        conn.close()
+        return rid
+
+    def _drive(self, monkeypatch, exit_code, agent="developer", job_id=7):
+        import src.agents.launcher as L
+
+        calls = {"advance": [], "finalize": []}
+        monkeypatch.setattr(
+            L.AgentLauncher,
+            "_try_advance_stage",
+            lambda self, *a, **k: calls["advance"].append(a),
+        )
+        monkeypatch.setattr(
+            L.AgentLauncher,
+            "_finalize_job",
+            lambda self, *a, **k: calls["finalize"].append(a),
+        )
+        monkeypatch.setattr(
+            L.AgentLauncher, "_post_usage_comments", lambda self, *a, **k: None
+        )
+        monkeypatch.setattr(L, "notify_agent_finished", lambda *a, **k: None)
+        monkeypatch.setattr(L, "get_worktree_path", lambda r, b: "/nonexistent/path")
+
+        # git status returns "no changes" so the commit/push branch is skipped.
+        class _R:
+            stdout = ""
+            stderr = ""
+            returncode = 0
+
+        monkeypatch.setattr(L.subprocess, "run", lambda *a, **k: _R())
+
+        rid = self._seed_run(agent)
+        L.AgentLauncher()._monitor_agent(
+            self._Proc(exit_code),
+            rid,
+            agent,
+            "orchestrator",
+            "feature/ORCH-109-x",
+            output_path=None,
+            log_fh=None,
+            job_id=job_id,
+        )
+        return calls
+
+    def test_tc11_killed_developer_run_does_not_advance(self, monkeypatch):
+        """TC-11/AC-8: a developer run killed (-9) does not auto-advance the stage."""
+        calls = self._drive(monkeypatch, exit_code=-9, agent="developer")
+        assert calls["advance"] == []          # NO auto-advance on -9
+        assert len(calls["finalize"]) == 1     # routed to retry/fail finalizer instead
+
+    def test_tc11_killed_reviewer_run_does_not_advance(self, monkeypatch):
+        """TC-11/AC-8: same guard for the reviewer role (review -> testing)."""
+        calls = self._drive(monkeypatch, exit_code=-9, agent="reviewer")
+        assert calls["advance"] == []
+
+    def test_tc11_clean_exit_advances(self, monkeypatch):
+        """Positive control: a clean exit (0) DOES reach _try_advance_stage."""
+        calls = self._drive(monkeypatch, exit_code=0, agent="developer")
+        assert len(calls["advance"]) == 1
+
+
+# --------------------------------------------------------------------------- #
+# TC-12 — contracts & schema untouched (NFR-1 / NFR-3, AC-9)
+# --------------------------------------------------------------------------- #
+class TestContractsUnchanged:
+    """ORCH-109 lives entirely outside the stage-machine / QG / schema layers."""
+
+    def test_tc12_stage_transitions_unchanged(self):
+        """AC-9: no new edge / sink introduced."""
+        from src.stages import STAGE_TRANSITIONS
+
+        assert set(STAGE_TRANSITIONS) == {
+            "created",
+            "analysis",
+            "architecture",
+            "development",
+            "review",
+            "testing",
+            "deploy-staging",
+            "deploy",
+            "done",
+            "cancelled",
+        }
+
+    def test_tc12_agent_runs_model_effort_columns_preexist(self):
+        """AC-9: model/effort are PRE-EXISTING columns; ORCH-109 adds no migration."""
+        conn = get_db()
+        cols = [r[1] for r in conn.execute("PRAGMA table_info(agent_runs)").fetchall()]
+        conn.close()
+        assert "model" in cols
+        assert "effort" in cols
+
+    def test_tc12_qg_checks_registry_present(self):
+        """AC-9: the QG registry is untouched (timeout/telemetry is not a gate)."""
+        from src.qg.checks import QG_CHECKS
+
+        assert "check_ci_green" in QG_CHECKS
+        assert "check_reviewer_verdict" in QG_CHECKS