Close the missing invariant "by merge-verify time the branch has an open
code-PR". The pipeline created a PR only on the developer path with a fresh
worktree commit (launcher._ensure_pr), so a branch (e.g. after a manual main
restore) could reach the deploy->done merge-verify under-gate PR-less ->
merge_pr returned "no open PR" -> a FALSE HOLD (ORCH-074 incident).
- merge_gate.ensure_open_pr(repo, branch) -> (status, detail): idempotent
leaf-actor (never-raise). GET open PRs filtered head==branch AND base==main
(identical to merge_pr/ORCH-073 FR-3 — auto docs-PR is not a code-PR) ->
existed; else POST -> created; 409/422 race -> re-GET -> existed (no dup);
any other error -> failed.
- stage_engine._handle_merge_verify: врезка after validated_revision and
BEFORE merge_pr. created|existed -> proceed; failed -> honest HOLD via new
_hold_pr_create_failed (note "pr-create-failed-hold", text distinguishable
from the not-merged HOLD; task stays on deploy, NO rollback).
- launcher._ensure_pr delegated to ensure_open_pr (single PR-creation path,
shared head==branch & base==main filter); the developer-only trigger is
unchanged.
- ORCH-073 protection untouched & authoritative: merge is confirmed ONLY by
verify_merged_to_main (SHA-in-main) + check_main_regression. Real un-merged
code still HOLDs.
- Kill-switch ORCH_MERGE_VERIFY_AUTOCREATE_PR_ENABLED (default true); scope =
merge_verify_applies (self-hosting / merge_verify_repos); non-self -> no-op;
false -> ORCH-074 behaviour 1:1. No DB migration; main never push/force-push.
- Append ORCH-082 marker to MAIN_REGRESSION_MARKERS (append-only convention).
- conftest defaults the autocreate flag OFF (mirrors merge_verify_enabled) so
unrelated deploy->done tests stay 1:1 (no network).
Tests: tests/test_orch082_ensure_pr.py (TC-01..05),
tests/test_orch082_merge_verify_autocreate.py (TC-06..12). Docs: README
merge-verify block (ORCH-082), CHANGELOG, .env.example.
Refs: ORCH-082
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Root-cause fix for main erosion (phantom merge): code of ORCH-067/069 reached
`done` while absent from origin/main (only their auto docs-PRs landed).
- FR-1: verify_merged_to_main confirms merge ONLY by `git merge-base
--is-ancestor <validated_sha> origin/main`; the OR-branch pr_already_merged is
removed (a merged PR no longer confirms). Empty SHA / git error -> False.
- FR-2: pr_already_merged demoted to merge_pr idempotency-guard; counts a PR only
when merged & head.ref==<branch> & base.ref=="main" (explicit in-loop filter).
- FR-3: merge_pr selects the open code-PR by head==<branch> AND base==main.
- FR-5: new deterministic check_main_regression in _handle_merge_verify (after
confirmed SHA-in-main, before done) verifies MAIN_REGRESSION_MARKERS still in
origin/main; deterministic count==0 -> alert "main regressed" + HOLD (NOT done,
no rollback); git error of the grep -> fail-open. Kill-switch
ORCH_REGRESSION_GUARD_ENABLED; non-self -> no-op.
- FR-4: root .gitattributes `CHANGELOG.md merge=union` so Unreleased edits
auto-merge on rebase without conflict (branch not rolled back).
Invariants unchanged (STAGE_TRANSITIONS, QG_CHECKS, deploy-status, merge-gate,
image-freshness, DB schema, external HTTP API); non-self repos no-op (INV-5);
never-raise (INV-1); merge only via Gitea PR-API (INV-2).
Docs: CHANGELOG, .env.example (README/ADR updated by architect). Tests:
tests/test_orch073_*.py (TC-01..18); existing merge-gate tests updated for the
new code-PR filter.
Refs: ORCH-073
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Конвейер продвигается только входящими webhook; потерянное событие (502 на
ребилде, отсутствие ретраев у Plane/Gitea, неразрезолвленный sha→branch)
оставляет задачу молча застрявшей (класс инцидента ORCH-044). Новый фоновый
daemon-поток src/reconciler.py (паттерн queue_worker) доигрывает пропущенный
переход через те же штатные гейты/обработчики, что и webhook:
- F-1 gate-side: для задач stage≠done, без активного job и age(updated_at) ≥
grace_for_stage(stage) — read-only пред-оценка канонического QG; зелёный →
stage_engine.advance_stage(..., finished_agent=None); красный → тишина (спам
нотификаций структурно невозможен). analysis F-1 не трогает (человеческий гейт).
- F-2 plane-side: опрос Plane API per-project (plane_sync.list_issues_by_state,
курсорная пагинация, never-raise) → реплей In Progress/Approved/Rejected через
существующие handle_status_start/handle_verdict (async из sync-потока, asyncio.run).
- F-3: усиление sha→branch в handle_ci_status — БД-fallback по единственной
development-задаче repo (неоднозначность → не резолвим), debug→info.
- Анти-дубль на создании (db.create_task_atomic под process-wide Lock): гонка
reconcile↔webhook не плодит второй task/branch/worktree/analyst-job (AC-4).
- F-4 observability: лог-строка разблокировки + Telegram + блок reconcile в /queue.
Старт/стоп в main.lifespan (после worker.start() / перед worker.stop()),
restart-safe, never-raise на единицу работы. Kill-switches ORCH_RECONCILE_ENABLED
/ ORCH_RECONCILE_PLANE_ENABLED + grace-настройки. Схема БД и реестры
STAGE_TRANSITIONS/QG_CHECKS не менялись.
Тесты: test_reconciler.py, test_reconciler_plane.py, test_gitea_sha_resolve.py,
test_config.py (33 новых, 563 всего зелёные). Документация обновлена (golden source):
architecture/README.md, INFRA.md, README.md, CHANGELOG.md, adr-0007 → accepted.
Refs: ORCH-053
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds autouse fixture _reset_webhook_secrets to tests/conftest.py that
resets the process-wide Pydantic settings singleton before every test:
1. gitea_webhook_secret / plane_webhook_secret → "" (HMAC disabled by
default). Tests that deliberately test the 401 path
(test_webhook_dedup.py:268,278) override this with their own monkeypatch
which runs after autouse fixtures and wins for that test only.
2. db_path → os.environ["ORCH_DB_PATH"] (last written value after all test
modules are imported). Without this, test_webhook_dedup.py (imported
first alphabetically) seeds settings.db_path = dedup.db, while
test_webhooks.py setup_db tries to remove test_orchestrator.db — leaving
the DB dirty between tests that share a branch name and causing
get_task_by_repo_branch() to return a stale row with the wrong stage.
Per-test monkeypatches in test_webhook_dedup.setup_db still override it.
Root cause: both leaks come from the same singleton settings being read once
at import, before any per-test isolation runs. The autouse fixture is the
correct per-test reset point for process-wide singletons.
Result: pytest tests/ → 294 passed, 0 failed (was 10 failed/284 passed).
A pytest run on prod was sending REAL Telegram messages to Slava: some tests
(e.g. test_webhook_dedup advancing a stage) reach notify_stage_change ->
send_telegram, which read the live .env token/chat_id and actually POSTed.
Add an autouse fixture stubbing send_telegram to a no-op for every test. Patch
the SOURCE src.notifications.send_telegram (covers all notify_* helpers and the
many modules that do a local from .notifications import send_telegram inside
functions) AND src.stage_engine.send_telegram (module-level binding, would not be
intercepted by the source patch alone). webhooks/plane, launcher, queue_worker are
patched defensively with raising=False.
Verified: full suite run with FAKE telegram creds + an un-swallowable httpx.post
trip-wire (BaseException, so send_telegram except Exception can not hide it) shows
ZERO calls to api.telegram.org. Without the fixture the trip-wire fires, proving
the guard is real.