fix(preflight): check the binary the launcher actually spawns (ORCH-1)

Container ORCH_CLAUDE_BIN pointed at a non-existent /usr/bin/claude while the launcher spawns the hardcoded /opt/claude-code/bin/claude.exe. Preflight now follows AgentLauncher.CLAUDE_BIN (the genuinely executed path), so it no longer falsely blocks every job in production.
docs(resilience): document preflight/429/backoff/breaker + env vars (ORCH-1)
2026-06-03 00:13:44 +03:00 · 2026-06-03 00:12:17 +03:00 · 2026-06-03 00:12:17 +03:00 · 2026-06-03 00:12:17 +03:00 · 2026-06-03 00:12:17 +03:00 · 2026-06-03 00:12:17 +03:00
39 changed files with 6283 additions and 145 deletions
--- a/.env
+++ b/.env
@@ -1,10 +0,0 @@
-ORCH_PLANE_API_URL=http://plane-app-api-1:8000
-ORCH_PLANE_API_TOKEN=
-ORCH_PLANE_WORKSPACE_SLUG=
-ORCH_PLANE_WEBHOOK_SECRET=
-ORCH_GITEA_URL=http://localhost:3000
-ORCH_GITEA_TOKEN=c81227b0dee2217f9ab3d28c3642a4578a1b9772
-ORCH_GITEA_WEBHOOK_SECRET=
-ORCH_CLAUDE_BIN=/usr/bin/claude
-ORCH_REPOS_DIR=/home/slin/repos
-ORCH_DB_PATH=/app/data/orchestrator.db
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,7 @@
+.env
+.venv/
+__pycache__/
+*.pyc
+data/
+*.db
+.pytest_cache/
--- a/8
+++ b/8
@@ -1,7 +1,11 @@
 FROM python:3.12-slim
 WORKDIR /app
+RUN apt-get update -qq && apt-get install -y -qq openssh-client git && rm -rf /var/lib/apt/lists/*
+# git operations run as root over bind-mounted /repos (may be owned by host uid) -> trust it.
+RUN git config --system --add safe.directory '*'
 COPY requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
-COPY src/ src/
-RUN mkdir -p /app/data/runs
+COPY src/ ./src/
+COPY data/ ./data/
+ENV PYTHONPATH=/app
 CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8500"]
--- a/README.md
+++ b/README.md
@@ -1,70 +1,220 @@
 # Multi-Agent Orchestrator

-FastAPI-сервис для оркестрации мульти-агентного пайплайна разработки.
+FastAPI-сервис для оркестрации мульти-агентного пайплайна разработки. Принимает webhooks от Plane и Gitea, управляет жизненным циклом задач через Quality Gates, запускает Claude CLI агентов на каждой стадии.

-## Что делает
+## Архитектура

- Принимает webhooks от **Plane** (task management) и **Gitea** (git events)
- Проверяет Quality Gates перед переходом между стадиями
- Запускает **Claude CLI** агентов (analyst, architect, developer, reviewer, tester)
- Ведёт журнал событий в SQLite
+```
+Plane (task mgmt) ──webhook──┐
+                              ├──► Orchestrator (FastAPI) ──► Quality Gates ──► Agent Launcher
+Gitea (git events) ─webhook──┘         │                                            │
+                                        ▼                                            ▼
+                                   SQLite DB                                   Claude CLI
+                                (events, tasks,                            (analyst, architect,
+                                 agent_runs)                              developer, reviewer, tester)
+```
+
+## Стадии пайплайна
+
+```
+created → analysis → architecture → development → review → testing → deploy → done
+                                         ↑                     │
+                                         └─── REQUEST_CHANGES ─┘  (max 3 retries)
+```
+
+| Стадия | Агент | Quality Gate (выход) | Триггер перехода |
+|--------|-------|---------------------|------------------|
+| created | — | — | Plane webhook (work_item.created) |
+| analysis | analyst | Файлы BRD/TRZ/AC/TestPlan | Push docs/ |
+| architecture | architect | ADR или infra-requirements | Push docs/ |
+| development | developer | check_tests_local (орк сам гоняет `make test`) | Auto-advance после developer |
+| review | reviewer | check_reviewer_verdict (`verdict:` во frontmatter 12-review.md) | Auto-advance после reviewer |
+| testing | tester | Test report с PASS | Auto-advance после tester |
+| deploy | deployer | — | SSH deploy-hook |
+| done | — | — | — |

 ## API Endpoints

 | Method | Path | Описание |
 |--------|------|----------|
 | GET | `/health` | Health check |
-| GET | `/status` | Активные задачи |
+| GET | `/status` | Активные задачи (stage != done) |
+| GET | `/queue` | Очередь задач (ORCH-1): counts по статусам + max_concurrency + последние 10 jobs |
 | POST | `/webhook/plane` | Plane webhook receiver |
 | POST | `/webhook/gitea` | Gitea webhook receiver |

-## Настройка
+## Структура проекта

-```bash
-cp .env.example .env
-# Заполнить токены в .env
+```
+src/
+├── main.py              # FastAPI app, lifespan (orphan recovery)
+├── config.py            # Pydantic settings (env vars)
+├── db.py                # SQLite: init, get_db, update_task_stage
+├── stages.py            # State machine (transitions, agents, QG)
+├── notifications.py     # Уведомления (логирование)
+├── plane_sync.py        # Синхронизация статусов с Plane API
+├── queue_worker.py      # ORCH-1: фоновый воркер очереди (claim → launch_job)
+├── agents/
+│   └── launcher.py      # AgentLauncher: launch/launch_job, monitor, watchdog, auto-advance
+├── webhooks/
+│   ├── plane.py         # Plane webhook handler
+│   └── gitea.py         # Gitea webhook handler (push, PR, CI status)
+└── qg/
+    └── checks.py        # Quality Gate checks (filesystem + Gitea API)
+data/
+├── orchestrator.db      # SQLite database
+└── runs/                # Agent output logs ({run_id}.log)
+docs/
+├── ARCHITECTURE.md      # Подробная архитектура
+├── LESSONS_ET006.md     # Lessons learned из ET-006
+├── BUGFIXES_2026-05-21.md # Багфиксы
+└── SETUP_WEBHOOKS.md    # Настройка webhooks
+docker-compose.yml       # Deployment config
+Dockerfile               # Python 3.12 + Docker CLI + tini
 ```

-## Запуск (Docker)
+## Запуск
+
+### Docker (production)

 ```bash
 docker compose up -d --build
 ```

-## Запуск (dev)
+### Dev

 ```bash
 pip install -r requirements.txt
 uvicorn src.main:app --reload --port 8500
 ```

-## Тесты
+## Конфигурация

-```bash
-pip install pytest
-pytest tests/ -v
-```
-
-## Переменные окружения
+Все переменные с префиксом `ORCH_`:

 | Переменная | Описание | Default |
 |-----------|----------|---------|
 | `ORCH_PLANE_API_URL` | Plane API URL | `http://localhost:8091` |
 | `ORCH_PLANE_API_TOKEN` | Plane API token | — |
-| `ORCH_PLANE_WEBHOOK_SECRET` | Webhook secret для верификации | — |
+| `ORCH_PLANE_WEBHOOK_SECRET` | Webhook secret | — |
+| `ORCH_PLANE_WORKSPACE_SLUG` | Workspace slug | — |
+| `ORCH_PLANE_PROJECT_ID` | Project UUID | — |
 | `ORCH_GITEA_URL` | Gitea URL | `http://localhost:3000` |
 | `ORCH_GITEA_TOKEN` | Gitea API token | — |
 | `ORCH_GITEA_WEBHOOK_SECRET` | Gitea webhook secret | — |
-| `ORCH_CLAUDE_BIN` | Путь к Claude CLI | `/usr/bin/claude` |
-| `ORCH_REPOS_DIR` | Директория с репозиториями | `/home/slin/repos` |
-| `ORCH_DB_PATH` | Путь к SQLite БД | `/app/data/orchestrator.db` |
+| `ORCH_GITEA_OWNER` | Gitea repo owner | `admin` |
+| `ORCH_DEFAULT_REPO` | Default repository (fallback) | `enduro-trails` |
+| `ORCH_PROJECTS_JSON` | Multi-repo реестр (JSON-массив, ORCH-6) | `""` → дефолт в `src/projects.py` |
+| `ORCH_CLAUDE_BIN` | Путь к Claude CLI | `/opt/claude-code/bin/claude.exe` |
+| `ORCH_REPOS_DIR` | Repos dir (container) | `/repos` |
+| `ORCH_HOST_REPOS_DIR` | Repos dir (host) | `/home/slin/repos` |
+| `ORCH_DB_PATH` | SQLite path | `/app/data/orchestrator.db` |
+| `ORCH_MAX_CONCURRENCY` | Сколько jobs воркер запускает параллельно (ORCH-1) | `1` |
+| `ORCH_QUEUE_POLL_INTERVAL` | Период опроса очереди воркером, сек (ORCH-1) | `2.0` |
+| `ORCH_PREFLIGHT_CACHE_TTL` | Кэш preflight (CLI/net), сек (ORCH-1 resilience) | `45` |
+| `ORCH_BACKOFF_BASE_SECONDS` | База exp-backoff для transient (429) | `10` |
+| `ORCH_BACKOFF_MAX_SECONDS` | Потолок backoff | `600` |
+| `ORCH_TRANSIENT_MAX_ATTEMPTS` | Ретраи для 429/недоступности | `5` |
+| `ORCH_BREAKER_THRESHOLD` | transient подряд до открытия breaker | `3` |
+| `ORCH_BREAKER_PAUSE_SECONDS` | Пауза при открытом breaker | `300` |

-## Архитектура
+## Очередь задач (ORCH-1 / F-2b)

+Webhook-хэндлеры больше не спавнят claude-агентов синхронно в процессе uvicorn.
+Вместо этого они кладут **job** в персистентную SQLite-таблицу `jobs`
+(`enqueue_job`, мгновенный ответ), а фоновый воркер (`src/queue_worker.py`)
+забирает jobs с учётом `ORCH_MAX_CONCURRENCY` и запускает агента (`launch_job`,
+та же Popen-логика, что и раньше).
+
+Преимущества:
+- **Рестарт-safe.** При старте jobs со статусом `running` возвращаются в `queued`
+  (queue-recovery в lifespan) — работа не теряется.
+- **Лимит параллелизма.** Воркер не превышает `ORCH_MAX_CONCURRENCY`.
+- **Ретраи.** Упавший job (exit≠0) ретраится пока `attempts < max_attempts`,
+  потом `failed` + Telegram-нотификация.
+
+Статусы job: `queued → running → done | failed`. Наблюдаемость — через `GET /queue`.
+
+**Resilience-слой:** дешёвый preflight (CLI/net, кэш, без токенов) гейтит claim;
+429/overload детектится по логу (transient vs permanent), transient ретраится с
+exp-backoff (`available_at`, Retry-After); circuit breaker паузит воркер после N
+transient подряд. Подробности: `docs/ORCH-1_JOB_QUEUE.md`.
+
+## Multi-repo: реестр проектов (ORCH-6)
+
+Оркестратор обслуживает несколько репозиториев через реестр проектов
+(`src/projects.py`), ключ = **Plane project id**. Plane-webhook фильтрует события
+по проекту (неизвестный проект → `ignored`) и резолвит `repo` / `work_item_prefix` /
+Plane-проект из маппинга.
+
+По умолчанию (если `ORCH_PROJECTS_JSON` пуст) зарегистрированы два проекта:
+
+| Проект | Plane project id | repo | prefix |
+|--------|------------------|------|--------|
+| enduro-trails | `7a79f0a9-5278-49cd-9007-9a338f238f9c` | `enduro-trails` | `ET` |
+| orchestrator | `8da6aa25-a60e-44d6-a1e2-d8ae59aa7d6a` | `orchestrator` | `ORCH` |
+
+### Как добавить новый проект
+
+1. Убедись, что gitea-репо уже клонировано в `/repos/<repo>` (авто-clone — отдельно).
+2. Узнай Plane project uuid (из URL проекта в Plane или через Plane API).
+3. Добавь запись в `ORCH_PROJECTS_JSON` в `.env` (JSON-массив). **Важно:** если
+   задаёшь `ORCH_PROJECTS_JSON`, он полностью заменяет дефолт — перечисли **все**
+   нужные проекты (включая enduro-trails и orchestrator):
+
+   ```bash
+   ORCH_PROJECTS_JSON='[
+     {"plane_project_id":"7a79f0a9-5278-49cd-9007-9a338f238f9c","repo":"enduro-trails","work_item_prefix":"ET","name":"enduro-trails"},
+     {"plane_project_id":"8da6aa25-a60e-44d6-a1e2-d8ae59aa7d6a","repo":"orchestrator","work_item_prefix":"ORCH","name":"orchestrator"},
+     {"plane_project_id":"<новый-uuid>","repo":"<новый-repo>","work_item_prefix":"<PREFIX>","name":"<имя>"}
+   ]'
+   ```
+
+4. Пересобери: `docker compose up -d --build`.
+5. Проверь резолв:
+   ```bash
+   docker exec orchestrator python3 -c "from src.projects import get_project_by_plane_id as g; print(g('<новый-uuid>'))"
+   ```
+
+Поля `name` опционально (по умолчанию = `repo`). Подробности — `docs/ARCHITECTURE.md`.
+
+## Ключевые механизмы
+
+### Auto-advance
+После успешного завершения агента (exit_code=0), `_try_advance_stage()` проверяет QG и автоматически продвигает задачу + запускает следующего агента.
+
+### Review bounce
+При REQUEST_CHANGES от reviewer задача откатывается в development, developer перезапускается (до 3 попыток). При исчерпании — эскалация.
+
+### Orphan recovery (M-1)
+При старте контейнера каждый run с `finished_at IS NULL` старше 35 минут помечается exit_code=-1, логируется per-run warning и отправляется Telegram-уведомление «нужна ручная проверка/перезапуск» (не молча).
+
+### Запись task-файлов (B-1)
+Task-файлы `.task-*.md` пишутся **прямой записью в смонтированный volume `/repos/<repo>/`** (без docker). При ошибке записи — RuntimeError (не молчит). В `.gitignore` проекта.
+
+### Логи агентов (B-2)
+stdout/stderr агента перенаправляются СРАЗУ в `/app/data/runs/{id}.log` на уровне ОС (без PIPE). monitor-поток делает `proc.wait()` → реальный exit_code, нет зомби.
+
+### Watchdog
+Каждый агент имеет timeout 30 минут. При превышении — SIGKILL + запись exit_code=-9.
+
+### Event routing
+Gitea events роутятся по типу:
+- `push` → проверка файлов, advance architecture/development
+- `pull_request*` (wildcard) → review approved/rejected, PR merge
+- `status` → (legacy) Gitea CI; С-1: больше не authoritative, `failure` логируется на debug и не блокирует/не алертит (QG развития = локальный `check_tests_local`)
+
+## Тесты
+
+```bash
+pytest tests/ -v
 ```
-Plane webhook ──┐
-                ├──► Orchestrator ──► Quality Gates ──► Agent Launcher ──► Claude CLI
-Gitea webhook ──┘         │
-                          ▼
-                      SQLite (events, tasks, agent_runs)
-```
+
+## Известные ограничения
+
+1. **Single-task / shared `/repos` checkout** — одновременно безопасно обрабатывается одна задача: все агенты и `check_tests_local` делают `git checkout` в одном `/repos/<repo>` → гонки при параллельных задачах. Исправление — git worktree per task (S-4, отдельно).
+2. **Plane sync** — маппинг issue ID может быть некорректным (P3, в работе)
+3. **In-process daemon-потоки** — агенты живут в потоках uvicorn; при рестарте ловит orphan-recovery. Целевое — очередь задач (F-2b)
+4. **Gitea CI не настроен** — тесты гоняет сам оркестратор локально
+3. **Tester timeout** — e2e тесты с Playwright могут занимать >25 мин на тяжёлых фичах
+4. **No retry on API errors** — httpx вызовы к Gitea/Plane без retry logic
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -3,11 +3,25 @@ services:
    build: .
    container_name: orchestrator
    restart: unless-stopped
-    ports:
-      - "127.0.0.1:8500:8500"
+    # init: true injects docker-init (tini) as PID 1 so reparented grandchild
+    # processes from the claude/node subprocess tree are reaped (no zombies, B-2).
+    init: true
+    network_mode: host
    volumes:
      - ./data:/app/data
-      - /home/slin/repos:/repos:ro
+      - /home/slin/repos:/repos
+      - /var/run/docker.sock:/var/run/docker.sock
+      - /usr/lib/node_modules/@anthropic-ai/claude-code:/opt/claude-code:ro
+      - /usr/bin/node:/usr/bin/node:ro
+      - /home/slin/.claude:/home/slin/.claude
+      - /home/slin/.claude.json:/home/slin/.claude.json:ro
+      - /home/slin/.orchestrator-ssh:/root/.ssh:ro
    env_file: .env
    environment:
      - ORCH_REPOS_DIR=/repos
+      - ORCH_HOST_REPOS_DIR=/home/slin/repos
+      - DEPLOY_SSH_USER=slin
+      - DEPLOY_SSH_HOST=127.0.0.1
+      - DEPLOY_HOOK_SCRIPT=/home/slin/bin/enduro-deploy-hook.sh
+    group_add:
+      - "999"
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -0,0 +1,335 @@
+# Архитектура Orchestrator
+
+## Обзор
+
+Orchestrator — event-driven FastAPI сервис, который управляет жизненным циклом задач разработки через мульти-агентный пайплайн. Каждая задача проходит через фиксированные стадии, на каждой из которых работает специализированный Claude CLI агент.
+
+## Компоненты
+
+### 1. Webhook Receivers
+
+#### Plane Webhook (`src/webhooks/plane.py`)
+- **Фильтр по проекту (ORCH-6):** извлекает `data.project` (Plane project uuid) и игнорирует событие, если проект не в реестре (`known_plane_project_ids()`) → ответ `{"status":"ignored","reason":"unknown project"}`. Это предотвращает инцидент 2026-06-02 (webhook на весь workspace без фильтра).
+- Принимает `work_item.created` — резолвит repo/prefix/Plane-проект из реестра по `project`, создаёт задачу в DB, запускает analyst
+- Принимает `work_item.updated` — синхронизация статусов
+
+#### Реестр проектов (`src/projects.py`, multi-repo, ORCH-6)
+Маппинг **Plane project id → (repo, work_item_prefix, name)**. Позволяет одному
+оркестратору обслуживать несколько репозиториев, не путая их.
+
+```python
+@dataclass(frozen=True)
+class ProjectConfig:
+    plane_project_id: str   # uuid Plane-проекта (ключ реестра)
+    repo: str               # имя gitea-репо (= папка в /repos)
+    work_item_prefix: str   # ET / ORCH
+    name: str               # человекочитаемое
+```
+
+Резолверы:
+- `get_project_by_plane_id(uuid) -> ProjectConfig | None` — для фильтра/резолва в plane-webhook.
+- `get_project_by_repo(repo) -> ProjectConfig | None` — когда известен только repo (gitea-webhook, plane_sync).
+- `known_plane_project_ids() -> set[str]` — множество разрешённых проектов (фильтр).
+
+**Источник конфигурации:** env `ORCH_PROJECTS_JSON` (JSON-массив `ProjectConfig`).
+Если пусто/битый JSON — используется встроенный дефолт-реестр (enduro-trails + orchestrator),
+чтобы система работала из коробки. Парсинг устойчив: битые записи пропускаются,
+полностью невалидный JSON → fallback на дефолт.
+
+Следствия multi-repo:
+- **repo per project:** `repo = get_project_by_plane_id(project_id).repo` вместо хардкода `default_repo`.
+- **prefix per project:** `get_next_work_item_id(repo, prefix)` нумерует независимо — `ORCH-001` vs `ET-010` (`src/db.py`).
+- **plane_sync в правильный проект:** state/comment пишутся в Plane-проект самой задачи (резолв по repo через `get_project_by_repo`), а не в единственный хардкоженный `PROJECT_ID` (обратная совместимость сохранена дефолтом на enduro).
+- **gitea-webhook:** push в repo вне реестра → `ignored` (не триггерит конвейер).
+
+#### Gitea Webhook (`src/webhooks/gitea.py`)
+- **push** — проверяет наличие артефактов (docs/, src/), продвигает стадию
+- **pull_request\*** (wildcard) — обрабатывает review approved/rejected, PR merge
+- **status** — CI green/failure, продвигает development → review
+
+### 2. State Machine (`src/stages.py`)
+
+Линейный пайплайн с одним возможным откатом (review → development):
+
+```
+STAGE_TRANSITIONS = {
+    created:      → analysis      (agent: None)
+    analysis:     → architecture  (agent: architect,  QG: check_analysis_approved)
+    architecture: → development   (agent: developer,  QG: check_architecture_done)
+    development:  → review        (agent: reviewer,   QG: check_tests_local)
+    review:       → testing       (agent: tester,     QG: check_reviewer_verdict)
+    testing:      → deploy        (agent: deployer,   QG: check_tests_passed)
+    deploy:       → done          (agent: None,       QG: None)
+}
+```
+
+### 3. Quality Gates (`src/qg/checks.py`)
+
+| Check | Метод проверки |
+|-------|---------------|
+| check_analysis_approved | Filesystem: 4 файла + :approved: comment в Plane |
+| check_architecture_done | Filesystem: ADR dir или infra-requirements.md |
+| check_tests_local | Оркестратор сам гоняет `make test` в **worktree задачи** `/repos/_wt/<repo>/<branch>` (judge по exit-code). Заменил check_ci_green: Gitea CI не сконфигурирован. Worktree-изоляция → безопасно при параллельных задачах (ORCH-2 / S-4). |
+| check_reviewer_verdict | Filesystem: читает `verdict: APPROVED\|REQUEST_CHANGES` из YAML-frontmatter `12-review.md` (только машиночитаемое поле, не подстроки в тексте) |
+| check_tests_passed | Filesystem: test-report.md содержит "PASS" |
+| check_ci_green | (legacy) Gitea API: GET /commits/{branch}/status — больше не используется как QG развития |
+| check_review_approved | (legacy) Gitea API: GET /pulls/{n}/reviews — не используется в STAGE_TRANSITIONS |
+
+### 4. Agent Launcher (`src/agents/launcher.py`)
+
+Запускает Claude CLI как subprocess:
+
+```bash
+claude.exe --print  --system-prompt  --allowedTools Read,Write,Edit,Bash
+```
+
+Каждый запуск:
+1. Записывает run в DB (agent_runs)
+2. Запускает subprocess. **stdout/stderr перенаправляются СРАЗУ в файл `/app/data/runs/{id}.log` на уровне ОС** (Popen `stdout=log_fh`). Никакого PIPE в памяти оркестратора → нет PIPE-deadlock, нет потока-читателя, нет зомби (B-2).
+3. Стартует **watchdog thread** (timeout 30 мин → SIGKILL по pid)
+4. Стартует **monitor thread**: `proc.wait()` (гарантированный reap → реальный exit_code в БД) → закрывает log_fh → git commit/push → auto-advance
+
+### 5. Auto-advance (`launcher._try_advance_stage`)
+
+После успешного завершения агента:
+1. Определяет текущую стадию задачи
+2. Проверяет QG для выхода из стадии
+3. Если QG пройден — продвигает стадию
+4. Запускает следующего агента (если определён)
+
+Примечание: переход `review → testing` использует `check_reviewer_verdict` (читается из frontmatter `12-review.md`); `development → review` — `check_tests_local` (оркестратор сам прогоняет тесты, не зависит от Gitea CI).
+
+### 6. Review Bounce
+
+При REQUEST_CHANGES:
+1. Считает количество developer runs для задачи
+2. Если < MAX_DEV_RETRIES (3) — откатывает в development, перезапускает developer
+3. Если >= MAX_DEV_RETRIES — эскалация (логирование + уведомление)
+
+## Database Schema
+
+```sql
+-- Задачи
+CREATE TABLE tasks (
+    id INTEGER PRIMARY KEY,
+    work_item_id TEXT,          -- Plane issue identifier (e.g. "ET-006")
+    plane_issue_id TEXT,        -- Plane UUID
+    repo TEXT,
+    branch TEXT,
+    stage TEXT DEFAULT 'created',
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+
+-- Запуски агентов
+CREATE TABLE agent_runs (
+    id INTEGER PRIMARY KEY,
+    task_id INTEGER REFERENCES tasks(id),
+    agent TEXT,                 -- analyst/architect/developer/reviewer/tester
+    started_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    finished_at TIMESTAMP,
+    exit_code INTEGER,
+    output_path TEXT            -- /app/data/runs/{id}.log
+);
+
+-- Сырые события
+CREATE TABLE events (
+    id INTEGER PRIMARY KEY,
+    source TEXT,               -- plane/gitea
+    event_type TEXT,
+    payload TEXT,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+```
+
+## Deployment
+
+### Docker Compose
+
+```yaml
+services:
+  orchestrator:
+    build: .
+    container_name: orchestrator
+    restart: unless-stopped
+    network_mode: host
+    volumes:
+      - ./data:/app/data                    # SQLite + logs
+      - /home/slin/repos:/repos             # Git repositories
+      - /var/run/docker.sock:/var/run/docker.sock  # Docker CLI
+      - claude-code:/opt/claude-code:ro     # Claude CLI binary
+      - /home/slin/.claude:/home/slin/.claude      # Claude config
+    env_file: .env
+    group_add: ["999"]         # docker group
+```
+
+### Dockerfile
+
+- Base: python:3.12-slim
+- Docker CLI (sibling containers)
+- **tini** как PID 1 (proper zombie reaping)
+- `git config --global safe.directory '*'`
+- ENTRYPOINT: tini → uvicorn
+
+## Потоки данных
+
+### Happy path (ET-006 пример)
+
+```
+1. Plane webhook: work_item.created → task created, analyst launched
+2. Analyst: пишет BRD/TRZ/AC/TestPlan → git push docs/
+3. Plane comment :approved: → QG check_analysis_approved → PASS
+4. Auto-advance: analysis → architecture, architect launched
+5. Architect: пишет ADR, infra-requirements → git push docs/
+6. Gitea push webhook: ADR detected → QG check_architecture_done → PASS
+7. Auto-advance: architecture → development, developer launched
+8. Developer: пишет код src/ + tests/ → git push, creates PR
+9. Gitea status webhook: CI green → QG check_ci_green → PASS
+10. Auto-advance: development → review, reviewer launched
+11. Reviewer: оставляет review (APPROVED или REQUEST_CHANGES)
+12. Gitea PR webhook: review event → QG check_review_approved → PASS
+13. Advance: review → testing, tester launched
+14. Tester: прогоняет тесты, пишет test-report.md → git push
+15. Auto-advance: testing → deploy (QG check_tests_passed → PASS)
+16. PR merge → Gitea PR webhook: action=closed, merged=true → done
+```
+
+### Review bounce path
+
+```
+11. Reviewer: REQUEST_CHANGES
+12. Gitea PR webhook: review_state=REQUEST_CHANGES, stage=review
+13. Rollback: review → development, developer relaunched (attempt N/3)
+14. Developer: фиксит замечания → git push
+15. CI green → development → review, reviewer relaunched
+16. Reviewer: APPROVED → continue happy path
+```
+
+## Resilience
+
+| Механизм | Описание |
+|----------|----------|
+| Watchdog | Каждый агент: timeout 30 мин → SIGKILL + exit_code=-9 |
+| safe.directory | git операции работают в любой директории |
+| Max retries | Developer: max 3 попытки, затем эскалация |
+| Zombie-free | stdout идёт сразу в файл + monitor `proc.wait()` → процесс всегда reap'нут (B-2) |
+| Orphan recovery | При старте: orphan-run'ы (finished_at IS NULL, старше 35 мин) помечаются exit=-1 с per-run warning + Telegram-уведомлением «нужна ручная проверка» (M-1) |
+
+## Агенты
+
+Каждый агент — Claude CLI с:
+- **System prompt**: `.openclaw/agents/{role}.md` (в репозитории)
+- **Task file**: `.task-{suffix}.md` — генерируется orchestrator **прямой записью в worktree задачи** `/repos/_wt/<repo>/<branch>/` (B-1, без docker; ORCH-2 — в изолированную рабочую копию, не в shared `/repos/<repo>`). В `.gitignore` репозитория проекта (рантайм-артефакт, не коммитится).
+- **Tools**: Read, Write, Edit, Bash
+- **Output**: `--print` mode (весь вывод в stdout после завершения)
+
+| Агент | Артефакты | Время (типичное) |
+|-------|-----------|-------------------|
+| analyst | BRD, TRZ, AC, TestPlan | 5-10 мин |
+| architect | ADR, infra-requirements, tech-risks | 5-10 мин |
+| developer | src/, tests/, PR | 15-30 мин |
+| reviewer | review report, PR review | 3-5 мин |
+| tester | test-report.md, e2e results | 10-25 мин |
+| deployer | merge PR + SSH deploy-hook + smoke | 5-10 мин |
+
+## Изоляция через git worktree (ORCH-2 / S-4)
+
+Каждая задача (= одна git-ветка) работает в **изолированной git worktree**, а не в общем
+`/repos/<repo>`. Это убирает гонки `git checkout`, когда две задачи активны одновременно.
+
+```
+/repos/<repo>                      ← основной clone (fetch / управление worktree, read-only запросы)
+/repos/_wt/<repo>/<safe-branch>    ← worktree конкретной задачи (рабочая копия агента)
+```
+
+Модуль `src/git_worktree.py`:
+- `get_worktree_path(repo, branch)` — путь worktree (не создаёт).
+- `ensure_worktree(repo, branch)` — создаёт (или переиспользует) worktree на нужной ветке;
+  для новой ветки создаёт её от `origin/main`. Возвращает путь.
+- `remove_worktree(repo, branch)` — опциональная очистка при `done`.
+
+Где используется worktree:
+- **launcher**: агент запускается с `cd <worktree>` (без `git checkout` в cmd); task-файл
+  пишется в worktree; commit/push в `_monitor_agent` идут в worktree.
+- **qg/checks**: чтение артефактов агента (`check_analysis_complete`, `check_architecture_done`,
+  `check_tests_passed`, `check_reviewer_verdict`) и `check_tests_local` (`make test`) — из worktree.
+  Артефакт-функции принимают опциональный `branch`; без него падают на shared `/repos/<repo>`
+  (обратная совместимость).
+- **webhooks/gitea**: `git branch -r --contains <sha>` оставлен в основном clone — это
+  **read-only** запрос (нет checkout/мутации), гонок не создаёт.
+
+> Один branch может быть checked out только в одной worktree одновременно —
+> это и есть нужное свойство: одна задача = одна ветка = одна worktree.
+
+## Известные ограничения
+
+- ~~Shared `/repos` checkout (гонки при параллельных задачах).~~ **РЕШЕНО (ORCH-2 / S-4):**
+  git worktree per task/branch — см. раздел «Изоляция через git worktree» ниже.
+- ~~In-process daemon-потоки (рестарт → сироты, потеря работы).~~ **РЕШЕНО (ORCH-1 / F-2b):**
+  персистентная очередь jobs + фоновый воркер — см. раздел «Очередь задач (ORCH-1)» ниже.
+  Daemon-потоки monitor/watchdog остаются для одного запущенного агента, но при
+  рестарте его job возвращается в `queued` (queue-recovery) и переподхватывается.
+
+## Очередь задач (ORCH-1 / F-2b)
+
+Раньше webhook-хэндлер **синхронно** спавнил `subprocess.Popen` + 2 daemon-thread
+прямо в процессе uvicorn (8 точек вызова). Рестарт = сироты + потеря работы,
+нет лимита параллелизма, нет ретраев.
+
+### Flow
+
+```
+webhook (plane/gitea)                 background thread (queue_worker)
+        │                                        │
+  enqueue_job() ---> [ jobs table ] <--- claim_next_job()  (atomic queued->running)
+  (мгновенный          status=queued                 │
+   ответ 200)                                    launch_job(job)
+                                                       │
+                                          AgentLauncher._spawn (Popen claude)
+                                                       │
+                                          _monitor_agent (proc.wait, commit/push,
+                                                       │  advance stage)
+                                                       │
+                                          _finalize_job:
+                                            exit 0  -> mark_job done
+                                            exit !=0 & attempts<max -> requeue (queued)
+                                            exit !=0 & attempts>=max -> failed + Telegram
+```
+
+### Таблица `jobs`
+
+| Колонка | Назначение |
+|--------|------------|
+| `status` | `queued` → `running` → `done` \| `failed` |
+| `attempts` / `max_attempts` | счётчик попыток (инкремент при claim) / лимит ретраев (default 2) |
+| `run_id` | FK на `agent_runs.id` после старта |
+| `task_content` | ТЗ, которое пишется в task-файл агента |
+| `error` | последняя ошибка |
+
+`idx_jobs_status (status, id)` — быстрый FIFO-выбор queued.
+
+### Атомарный claim
+
+`claim_next_job()` делает `SELECT queued ORDER BY id LIMIT 1` → `UPDATE ... WHERE id=? AND
+status='queued'` и проверяет `rowcount`. При гонке двух тиков лишь один UPDATE
+переведёт строку в `running` (rowcount==1); проигравший берёт следующий job.
+
+### Queue-recovery (рестарт-safe)
+
+В `main.py` lifespan **после** M-1 orphan-recovery вызывается `requeue_running_jobs()`:
+jobs со статусом `running` (воркер умёр на рестарте) → возвращаются в `queued`.
+Потом стартует воркер; на shutdown — `worker.stop()` (Event.set + join).
+
+### Конфиг
+
+- `ORCH_MAX_CONCURRENCY` (default 1) — лимит параллельных jobs.
+- `ORCH_QUEUE_POLL_INTERVAL` (default 2.0) — период опроса.
+
+Наблюдаемость: `GET /queue` — counts по статусам + последние 10 jobs.
+
+> Совместимость: `launcher.launch()` (прямой синхронный запуск, `job_id=None`)
+> сохранён для обратной совместимости. Очередь использует `launch_job()`;
+> оба разделяют `_spawn()` (Popen-логика B-2 не изменена).
+- **Gitea CI не настроен.** QG развития теперь локальный (`check_tests_local`);
+  Gitea CI-статусы не являются authoritative и не блокируют pipeline.
+- **Docker внутри контейнера orchestrator НЕДОСТУПЕН.** Деплой идёт только через
+  SSH-хук `enduro-deploy-hook.sh` на хосте.
--- a/docs/BACKLOG_PIPELINE.md
+++ b/docs/BACKLOG_PIPELINE.md
@@ -0,0 +1,80 @@
+# Pipeline Design Backlog
+
+Вопросы требующие архитектурной проработки перед реализацией.
+
+---
+
+## BL-001 — Тестирование / Аудит вне work item
+
+**Статус:** Open  
+**Добавлено:** 2026-05-23
+
+### Проблема
+
+Текущий пайплайн feature-driven: каждый запуск привязан к Plane issue.
+Нет механизма для:
+- Standalone UI-аудита (проверить текущее состояние приложения)
+- Регрессионного тестирования без новой фичи
+- Периодических health-check UI
+
+### Вопросы для проработки
+
+1. Нужен ли отдельный тип задачи "audit" в Plane?
+2. Или аудит — это всегда ad-hoc вне orchestrator?
+3. Если через orchestrator — какой сокращённый пайплайн? (`analyst → tester` без dev/review)
+4. Куда писать отчёт? В Plane? В отдельный docs/audits/?
+5. Кто инициирует: Слава через Plane, или Стрим через heartbeat?
+
+### Варианты
+
+| Вариант | Плюсы | Минусы |
+|---------|-------|--------|
+| Ad-hoc через Стрим (spawn agents) | Быстро, без инфра | Не трекается в Plane |
+| Synthetic Plane issue | Трекается | Orchestrator не умеет пропускать этапы |
+| Новый тип задачи "audit" в orchestrator | Правильно архитектурно | Требует разработки |
+
+---
+
+## BL-002 — Управление бэклогом / Задачи
+
+**Статус:** Open  
+**Добавлено:** 2026-05-23
+
+### Проблема
+
+Не определён процесс: кто и куда заводит задачи, как они попадают в пайплайн.
+
+### Вопросы для проработки
+
+1. **Кто заводит задачи в Plane?**
+   - Слава напрямую через Plane UI?
+   - Стрим создаёт задачи по запросу Славы в чате?
+   - Автоматически по ключевым словам из Telegram?
+
+2. **Куда заводить?**
+   - Только в Plane project "Enduro Trails"?
+   - Стрим ведёт свой список в workspace?
+   - Нужен ли отдельный inbox?
+
+3. **Что инициирует пайплайн?**
+   - Сейчас: Plane issue с определённым статусом → webhook → orchestrator
+   - Нужно ли добавить: Telegram → Стрим создаёт Plane issue → пайплайн?
+
+4. **Приоритизация:**
+   - Кто решает что брать в работу следующим?
+   - Есть ли sprint/канбан?
+
+5. **Plane синхронизация (см. текущий баг):**
+   - Plane не синхронизирован (ET-001..ET-006 показаны некорректно)
+   - Нужно ли чинить маппинг plane_issue_id в orchestrator?
+   - Или Plane — просто decorative, реальный трекинг в orchestrator.db?
+
+### Контекст
+
+- Текущая связка: Plane webhook → orchestrator → агенты
+- Plane sync сломан (известный P3 из LESSONS_ET006)
+- orchestrator.db — единственный источник правды о состоянии задач
+
+---
+
+*Документ для обсуждения архитектуры пайплайна. Не roadmap, не ТЗ.*
--- a/docs/BUGFIXES_2026-05-21.md
+++ b/docs/BUGFIXES_2026-05-21.md
@@ -0,0 +1,62 @@
+# Bugfixes — 2026-05-21
+
+## Контекст
+
+Задача ET-005 (переключатель единиц измерения) застряла на переходе `development → review`.
+В процессе диагностики и починки найдено и исправлено 5 багов в orchestrator.
+
+## Баги исправленные
+
+### 1. CI status webhook: пустой `branches` в payload
+
+**Файл:** `src/webhooks/gitea.py` (handle_ci_status)
+
+**Проблема:** Gitea отправляет CI status webhook с `branches: []`. Функция делала ранний `return` — не могла определить branch и не продвигала задачу.
+
+**Решение:** Fallback через `git branch -r --contains <sha>` — определяет ветку по SHA коммита. Ищет ветку `feature/*` в output.
+
+### 2. git safe.directory в контейнере
+
+**Файл:** Docker runtime (orchestrator container)
+
+**Проблема:** `subprocess.run(["git", ...])` внутри контейнера падал с `fatal: detected dubious ownership in repository` — repo mount принадлежит другому user.
+
+**Решение:** `git config --global --add safe.directory '*'` при старте контейнера. Убран кастомный `env={**os.environ, "HOME": "/home/slin"}` который ломал gitconfig.
+
+### 3. X-Gitea-Event: pull_request_approved не роутился
+
+**Файл:** `src/webhooks/gitea.py` (webhook router)
+
+**Проблема:** Gitea отправляет event type `pull_request_approved` при approve review, но роутер обрабатывал только `pull_request`.
+
+**Решение:** Расширен роутинг на `pull_request`, `pull_request_approved`, `pull_request_review_approved`.
+
+### 4. review.state vs review.type — новый формат Gitea
+
+**Файл:** `src/webhooks/gitea.py` (handle_pr)
+
+**Проблема:** Gitea webhook отправляет `review.type = "pull_request_review_approved"` вместо `review.state = "APPROVED"`. Код искал только `review.state`.
+
+**Решение:** Маппинг из `review.type` если `review.state` пустой: `"approved" in type → APPROVED`, `"request_changes"/"rejected" in type → REQUEST_CHANGES`.
+
+### 5. Нет auto-advance после завершения agent
+
+**Файл:** `src/agents/launcher.py`
+
+**Проблема:** После завершения tester (exit_code=0) задача оставалась в `testing` — не было механизма автоматического продвижения. Для `development → review` триггер — CI status webhook, для `review → testing` — PR review webhook, но для `testing → deploy` внешнего триггера нет.
+
+**Решение:** Добавлен метод `_try_advance_stage()` в `AgentLauncher`, вызывается из `_monitor_agent` после успешного завершения агента. Проверяет QG, продвигает stage, запускает следующего агента.
+
+## Известные проблемы (не исправлены)
+
+### dismiss_stale_approvals
+
+Branch protection `dismiss_stale_approvals: true` на main ветке: tester пушит коммит после review approval → approval становится stale → merge блокируется.
+
+**Workaround:** Re-approve через claude-bot после каждого push tester'а.
+
+**Рекомендация:** Либо отключить `dismiss_stale_approvals`, либо добавить auto-re-approve в orchestrator после tester push.
+
+## Результат
+
+ET-005 прошла полный цикл: `analysis → architecture → development → review → testing → deploy → done`
--- a/docs/BUGFIXES_2026-06-02.md
+++ b/docs/BUGFIXES_2026-06-02.md
@@ -0,0 +1,84 @@
+# Bugfixes 2026-06-02 — устранение багов оркестратора
+
+**Источник:** `tasks/multi-agent/AUDIT_2026-06-02.md`
+**Цель:** вернуть автономность мультиагентного pipeline (ET-009: 0/6 этапов были автономны).
+**Исполнитель:** Dev-агент (Opus 4.8 Tokenator).
+
+---
+
+## Что починено
+
+### B-1 — запись `.task-*.md` без docker
+**Было:** `launcher._write_task_file()` писал файл через `docker run --rm -i python:3.12-slim bash -c "cat > ..."`. Бинарника `docker` в контейнере НЕТ → запись падала молча → агент читал старый task-файл.
+**Стало:** прямая запись в смонтированный volume `/repos/<repo>/<task_file>` обычным `open(..., "w")`. При ошибке записи — `RuntimeError` (не молчит).
+**Файл:** `src/agents/launcher.py` (`_write_task_file`, вызов в `launch`).
+**Проверка:**
+```bash
+docker exec orchestrator python3 -c "
+import sys; sys.path.insert(0,'/repos/orchestrator')
+from src.agents.launcher import launcher
+launcher._write_task_file('enduro-trails', '.task-test-write.md', 'hello-from-fix')
+print(open('/repos/enduro-trails/.task-test-write.md').read())"
+# => hello-from-fix   (без docker)
+```
+✅ Verified: READBACK = `hello-from-fix`.
+
+### B-2 — Popen stdout → файл, убран PIPE-поток (зомби, потеря exit_code)
+**Было:** `Popen(stdout=PIPE)` + daemon-поток с `select`/`readline` + startup-timeout 120с. → PIPE-deadlock, зомби при рестарте, `exit_code=None` в БД (все прогоны ET-009).
+**Стало:** `log_fh = open(output_path, "w")`; `Popen(stdout=log_fh, stderr=STDOUT)`. `_monitor_agent` упрощён до `proc.wait()` + `log_fh.close()`. PIPE-поток и startup-timeout удалены. Watchdog по pid (`AGENT_TIMEOUT`) сохранён.
+**Файл:** `src/agents/launcher.py` (`launch`, `_monitor_agent`).
+**Проверка:** после прогона `SELECT exit_code FROM agent_runs ORDER BY id DESC LIMIT 1` != NULL; `ps aux | grep defunct` — пусто.
+
+### B-3 — `.task-*.md` в `.gitignore`, не коммитятся
+**Было:** task-файлы трекались в git (`.task-arch.md`, `.task-dev.md`, `.task-review.md`, `.task.md`) и тащились между задачами.
+**Стало:** в `enduro-trails/.gitignore` добавлено `.task*.md`; трекаемые файлы убраны из индекса (`git rm --cached`).
+**Файл:** `enduro-trails/.gitignore` (+ untrack). Ветка `main` protected → изменения в **PR #19** (`chore/gitignore-task-files`).
+**Проверка:** `git check-ignore .task.md .task-arch.md` → matched. `git add docs/ src/ tests/` (scoped) не цепляют task-файлы.
+
+### S-5 — машиночитаемый verdict ревьюера
+**Было:** `check_reviewer_verdict` искал подстроки `APPROVED`/`REQUEST_CHANGES` во всём тексте (5000 байт) → ложные срабатывания на таблицах.
+**Стало:** читается ТОЛЬКО `verdict:` из YAML-frontmatter `12-review.md` (через `yaml.safe_load`). Нет verdict / нет frontmatter → not-approved. `reviewer.md` обновлён: требование frontmatter `verdict: APPROVED|REQUEST_CHANGES`.
+**Файлы:** `src/qg/checks.py` (`check_reviewer_verdict`), `enduro-trails/.openclaw/agents/reviewer.md` (PR #19; рабочая копия применена сразу).
+**Проверка:** ET-009 `12-review.md` (frontmatter `verdict: APPROVED`) → `(True, 'Reviewer verdict: APPROVED')`. Unit-тесты покрывают APPROVED/REQUEST_CHANGES/no-verdict/no-frontmatter/таблица-в-теле.
+
+### S-1 — QG тестов гоняет сам оркестратор (не Gitea CI)
+**Было:** `development → review` QG = `check_ci_green` (Gitea status). CI не настроен → всегда false → автопереход не происходил + ложные «CI failed» алерты.
+**Стало:** новый QG `check_tests_local` — оркестратор делает `git fetch/checkout <branch>` + `make test` в `/repos/<repo>`, judge по exit-code. `stages.py`: `development` QG → `check_tests_local`. Dispatch добавлен в `launcher._try_advance_stage` и `webhooks/plane._try_advance_stage` (args `(repo, branch)`). `webhooks/gitea.handle_ci_status`: `failure` → debug-лог, без `notify_error`.
+**Файлы:** `src/qg/checks.py`, `src/stages.py`, `src/agents/launcher.py`, `src/webhooks/plane.py`, `src/webhooks/gitea.py`.
+**Грабля (известное ограничение):** `check_tests_local` делает checkout в shared `/repos` — небезопасно при параллельных задачах (S-4 worktree — отдельно).
+
+### M-1 — нормальный orphan-recovery
+**Было:** `UPDATE agent_runs SET exit_code=-1 WHERE finished_at IS NULL AND started_at < now-35min` — молча списывал зомби.
+**Стало:** перечисляем каждый orphan-run, помечаем exit=-1, логируем per-run `warning` («manual check needed»), отправляем Telegram-уведомление. Не автоперезапускаем (риск зацикливания). Killing по pid невозможен — pid не персистится в БД (задокументировано).
+**Файл:** `src/main.py` (lifespan).
+
+---
+
+## Что НЕ входило (отдельные задачи)
+- S-2/S-3 (rollback деплоера в shared-репо), S-4 (git worktree per task), M-3 (единый stage-engine), F-2b (очередь задач), M-7 (идемпотентность webhook). `_auto_merge_pr` — мёртвый код оставлен (отдельная чистка).
+
+## Тесты
+- Новый файл `tests/test_launcher.py`: 10 тестов (`_write_task_file` пишет/raise/без docker; `check_reviewer_verdict` frontmatter cases).
+- `tests/test_qg.py`: 16 passed. `tests/test_launcher.py`: 10 passed.
+- ⚠️ Pre-existing: `tests/test_webhooks.py` имеет падения (401/signature + cross-file env pollution) — НЕ связаны с этими фиксами, существовали до правок. Запуск в изоляции part-passes; в общем прогоне больше падений из-за общего env/DB между тест-файлами. Гигиена test_webhooks — отдельная задача.
+
+## Деплой
+Оркестратор пересобран: `cd /home/slin/repos/orchestrator && docker compose up -d --build`. Health: `{"status":"ok"}`.
+
+---
+
+## Дополнительно найдено и починено в ходе теста автономности
+
+### git safe.directory (launcher commit/push)
+В ходе теста выяснилось: git внутри контейнера (root) над bind-mounted `/repos` падал с "dubious ownership" → авто-commit/push агента не проходил. Фикс: `git config --system --add safe.directory "*"` в Dockerfile. Теперь `_monitor_agent` commit+push работает автономно (проверено: `analyst(ET): auto-commit run_id=47` запушен в origin).
+
+### init:true (PID-1 reaper) — добиваем B-2
+Прямой child (bash) reap-ался корректно через `proc.wait()`, НО claude (node) порождает свои дочерние процессы; при выходе bash они реparent-ились на PID 1 (uvicorn), который их НЕ reap-ал → grandchild-зомби. Фикс: `init: true` в docker-compose.yml — Docker внедряет `docker-init`(tini) как PID 1. Проверено: после реального прогона агента `ZOMBIE_COUNT_AFTER=0`.
+
+## Тест автономности (Task 9) — РЕЗУЛЬТАТ
+Запуск через `launcher.launch("analyst", ...)` (НЕ base64). Подтверждено автономно:
+- B-1: свежий `.task.md` записан без docker (which docker = NO_DOCKER_BINARY)
+- B-2: `exit_code=0` в `agent_runs` (run 46/47/48)
+- зомби: 0 после прогона (tini reaper)
+- git: auto-commit + push в origin отработал
+- M-1: при рестарте orphan-recovery залогировал per-run + Telegram (runs 42/43/44 ET-009)
--- a/docs/BUGFIXES_2026-06-02_ORCH2.md
+++ b/docs/BUGFIXES_2026-06-02_ORCH2.md
@@ -0,0 +1,81 @@
+# ORCH-2 / S-4 — git worktree per task (изоляция shared /repos)
+
+**Дата:** 2026-06-02
+**Ветка:** `feature/ORCH-2-worktree`
+**Источник:** `AUDIT_2026-06-02.md` (SERIOUS S-4), `DEV_TASK_ORCH2_WORKTREE.md`
+**Исполнитель:** Dev (Opus 4.8 Tokenator)
+
+## Проблема (S-4)
+
+Все git-операции (`launcher.launch` cmd, `_monitor_agent` commit/push, `check_tests_local`)
+делали `git checkout <branch>` в одном общем `/repos/<repo>`. При двух активных задачах
+checkout одной перетирал рабочую копию другой → гонки (на ET-009 это дало «два коллектора»
+и путаницу веток).
+
+## Решение
+
+**git worktree per branch.** Каждая задача (ветка) работает в изолированной рабочей копии:
+
+```
+/repos/<repo>                      ← основной clone (fetch / worktree mgmt / read-only)
+/repos/_wt/<repo>/<safe-branch>    ← worktree задачи (рабочая копия агента)
+```
+
+## Изменения
+
+| Файл | Что |
+|------|-----|
+| `src/config.py` | + `worktrees_dir: str = "/repos/_wt"` |
+| `src/git_worktree.py` (новый) | `_safe`, `get_worktree_path`, `ensure_worktree`, `remove_worktree` |
+| `src/agents/launcher.py` | `launch()`: ветка резолвится заранее → `ensure_worktree`; cmd = `cd <worktree>` без `git checkout`; `_write_task_file(repo, branch, ...)` пишет в worktree; `_monitor_agent` commit/push в worktree (checkout убран); чтение `01-questions.md`/`10-conflict.md` из worktree; QG-диспетчер прокидывает `branch` |
+| `src/qg/checks.py` | `_repo_path(repo, branch)` helper (worktree если есть, иначе shared); артефакт-чеки получили опциональный `branch`; `check_tests_local` → `ensure_worktree` + `make test` в worktree (TODO про S-4 удалён) |
+| `src/webhooks/plane.py` | QG-диспетчер прокидывает `branch`; review-файл fallback читается из worktree |
+| `src/webhooks/gitea.py` | `git branch -r --contains <sha>` — подтверждено read-only, оставлено в main clone (+ комментарий) |
+| `tests/test_git_worktree.py` (новый) | покрытие `_safe`/`get_worktree_path`/`ensure_worktree`/`remove_worktree` + изоляция двух веток (реальные локальные git-репо в tmp, без сети) |
+| `tests/test_launcher.py` | `TestWriteTaskFile` обновлён под новую сигнатуру (запись в worktree) |
+| `docs/ARCHITECTURE.md` | раздел «Изоляция через git worktree»; убран пункт про shared-checkout гонки |
+
+## Совместимость с прежними фиксами
+
+- **B-1** (запись task-файла без docker, прямой `open()`): сохранена — теперь путь = worktree.
+- **B-2** (Popen stdout → файл, monitor `proc.wait()` без зомби): не тронут.
+- **S-5** (`check_reviewer_verdict` — только YAML-frontmatter): не тронут, добавлен лишь worktree-путь.
+- **S-1** (`check_tests_local` — свой `make test` вместо Gitea CI): сохранён, тесты теперь в worktree.
+
+Обратная совместимость QG-диспетчеризации: артефакт-чеки принимают `branch` опционально
+(default `None` → shared `/repos/<repo>`), поэтому существующие 2-арг вызовы/тесты не сломаны.
+
+## Проверка
+
+```bash
+# Тесты (в контейнере через образ — хостовый .venv сломан):
+IMG=$(docker inspect orchestrator --format '{{.Config.Image}}')
+docker run --rm -v /home/slin/repos/orchestrator:/code -w /code --entrypoint python3 $IMG -m pytest tests/ -q
+# → 37 passed, 9 failed (pre-existing test_webhooks 401/signature — НЕ относятся к ORCH-2,
+#   идентичны baseline на main).
+
+# test_git_worktree.py изолированно → 9 passed.
+```
+
+### Тест изоляции (в работающем контейнере)
+
+```bash
+docker exec orchestrator python3 -c "
+import sys; sys.path.insert(0,'/app')
+from src.git_worktree import ensure_worktree
+import subprocess
+p1 = ensure_worktree('enduro-trails','feature/wt-test-A')
+p2 = ensure_worktree('enduro-trails','feature/wt-test-B')
+b1 = subprocess.run(['git','-C',p1,'branch','--show-current'],capture_output=True,text=True).stdout.strip()
+b2 = subprocess.run(['git','-C',p2,'branch','--show-current'],capture_output=True,text=True).stdout.strip()
+assert p1!=p2 and b1!=b2, 'NOT ISOLATED'
+print('ISOLATION OK', p1, p2, b1, b2)
+"
+```
+
+(Результат прогона на сервере — см. ниже / в отчёте Стрим.)
+
+## Ограничения / заметки
+
+- Очередь задач (ORCH-1 / F-2b) **не** входит в эту задачу.
+- `remove_worktree` существует, но автоматический вызов при `done` не подключён (опционально, отдельным шагом).
--- a/docs/BUGFIXES_2026-06-03.md
+++ b/docs/BUGFIXES_2026-06-03.md
@@ -0,0 +1,82 @@
+# BUGFIXES / CHANGES — 2026-06-03
+
+## ORCH-6 — Multi-repo: фильтр проекта + маппинг repo per project
+
+**Тип:** root-fix инцидента + новая возможность (multi-repo)
+**Ветка:** `feature/ORCH-6-multirepo`
+**Plane:** ORCH-6 (project `8da6aa25-a60e-44d6-a1e2-d8ae59aa7d6a`)
+**Связанный инцидент:** [`INCIDENT_2026-06-02_webhook_autorun.txt`](./INCIDENT_2026-06-02_webhook_autorun.txt)
+
+### Контекст инцидента
+
+При создании задач ORCH-1..7 в Plane (проект `orchestrator`) Plane-webhook
+(id `93f0c342-a614-4248-9d0f-c107276f5620`) сработал на каждую задачу и запустил
+конвейер — но **всё ушло в репо `enduro-trails`**, потому что `plane.py:91`
+хардкодил `repo = settings.default_repo`. Webhook слушал **весь workspace без
+фильтра по проекту**, наплодив мусорные ET-010..016.
+
+Митигация на время фикса: Plane-webhook **деактивирован** (`is_active=false`).
+
+### Root cause
+
+1. Нет фильтра по Plane-проекту — любая issue из любого проекта попадала в конвейер.
+2. `repo` хардкожен на единственный `default_repo` (enduro-trails).
+3. `work_item_prefix` всегда `ET` (db.py).
+4. `plane_sync` ходил в единственный хардкоженный `PROJECT_ID` (enduro).
+
+### Что сделано
+
+| Файл | Изменение |
+|------|-----------|
+| `src/projects.py` (новый) | Реестр проектов: `ProjectConfig` + дефолт-список (enduro-trails + orchestrator) + резолверы `get_project_by_plane_id` / `get_project_by_repo` / `known_plane_project_ids`. Источник переопределения — `ORCH_PROJECTS_JSON`; устойчивый парсинг (битый JSON / битые записи → fallback на дефолт). |
+| `src/config.py` | Добавлен `projects_json: str = ""` (env `ORCH_PROJECTS_JSON`). |
+| `src/webhooks/plane.py` | **Фильтр по проекту**: `data.project` не в реестре → `{"status":"ignored","reason":"unknown project"}`. Резолв `repo`/`prefix`/Plane-проекта из реестра. Plane-sync для задачи идёт в её собственный проект. |
+| `src/db.py` | `get_next_work_item_id(repo, prefix="ET")` — нумерация per (repo, prefix); `ORCH-001` независимо от `ET-010`. Дефолт `ET` сохранён для обратной совместимости. |
+| `src/plane_sync.py` | `_resolve_project_id` + параметризация `project_id` (дефолт на enduro → обратная совместимость существующих вызовов). |
+| `src/webhooks/gitea.py` | Неизвестный repo (`get_project_by_repo` → None) → `ignored` в 3 хэндлерах. |
+
+### Тесты
+
+- `tests/test_projects.py` (16 тестов): резолверы (by plane_id, by repo, unknown→None,
+  known_plane_project_ids), парсинг `ORCH_PROJECTS_JSON` (валидный / битый JSON / не массив /
+  битые записи → skip / all-bad → fallback), reload с кастомным JSON.
+- `tests/test_plane_webhook.py` (4 теста, FastAPI TestClient, `launcher.launch` замокан):
+  unknown project → `ignored` + нет task/branch/agent; orchestrator-проект → `repo=orchestrator`,
+  `ORCH-*`; enduro-проект → `repo=enduro-trails`, `ET-*`; независимые префиксы (`ORCH-001`/`ORCH-002`
+  параллельно с `ET-001`).
+
+**Прогон (в контейнере, образ `orchestrator-orchestrator`):** `57 passed`. 9 падений в
+`tests/test_webhooks.py` — **pre-existing** (webhook signature 401 / TypeError, не связаны с ORCH-6,
+не трогались).
+
+```bash
+IMG=$(docker inspect orchestrator --format '{{.Config.Image}}')
+docker run --rm -v /home/slin/repos/orchestrator:/code -w /code --entrypoint python3 $IMG -m pytest tests/ -q
+```
+
+### Проверка резолва (offline, в работающем контейнере)
+
+```bash
+docker exec orchestrator python3 -c "
+from src.projects import get_project_by_plane_id, known_plane_project_ids
+o = get_project_by_plane_id('8da6aa25-a60e-44d6-a1e2-d8ae59aa7d6a')
+e = get_project_by_plane_id('7a79f0a9-5278-49cd-9007-9a338f238f9c')
+assert o.repo=='orchestrator' and o.work_item_prefix=='ORCH'
+assert e.repo=='enduro-trails' and e.work_item_prefix=='ET'
+assert get_project_by_plane_id('00000000-0000-0000-0000-000000000000') is None
+print('RESOLVE OK:', o.repo, e.repo, '| known:', len(known_plane_project_ids()))
+"
+```
+
+### ⚠️ Важно
+
+- Plane-webhook **остаётся выключенным** (`is_active=false`). Включение — отдельный
+  шаг Стрим после ревью PR.
+- `ORCH_PROJECTS_JSON` (если задан) **полностью заменяет** дефолт — перечислять все нужные проекты.
+- Обратная совместимость `plane_sync` сохранена (дефолт project_id = enduro), ET-задачи не сломаны.
+
+### Re-enable webhook (после ревью, делает Стрим)
+
+```sql
+UPDATE webhooks SET is_active=true WHERE id='93f0c342-a614-4248-9d0f-c107276f5620';
+```
--- a/docs/INCIDENT_2026-06-02_webhook_autorun.txt
+++ b/docs/INCIDENT_2026-06-02_webhook_autorun.txt
@@ -0,0 +1,7 @@
+INCIDENT 2026-06-02: Plane webhook auto-triggered pipeline for ALL ORCH-1..7 tasks
+- Plane webhook (id 93f0c342) fires on ANY issue creation in workspace, no project filter
+- plane.py:91 hardcodes repo=settings.default_repo (enduro-trails)
+- Result: ORCH-x tasks ran analyst/architect in WRONG repo (enduro-trails), created junk ET-010..016
+- MITIGATION: Plane webhook DEACTIVATED (is_active=false) until ORCH-6 adds project filter
+- ROOT FIX = ORCH-6 (multi-repo): filter by plane_project_id + repo mapping per project
+- To re-enable webhook after ORCH-6: UPDATE webhooks SET is_active=true WHERE id=93f0c342...
--- a/docs/LESSONS_ET006.md
+++ b/docs/LESSONS_ET006.md
@@ -0,0 +1,190 @@
+# Lessons Learned — ET-006 (GPX Upload & Visualization)
+
+## Дата: 2026-05-22
+## Задача: ET-006 — Загрузка и визуализация GPX-треков
+
+---
+
+## Что сработало хорошо
+
+### 1. Review bounce — реальный баг найден и исправлен автоматически
+Reviewer обнаружил P1: `Math.min.apply(null, array)` падает с RangeError на массивах >100K элементов.
+Developer пофиксил за 6 минут (attempt 2), второй review прошёл чисто.
+**Вывод:** reviewer в пайплайне оправдывает себя — ловит баги которые unit-тесты пропускают.
+
+### 2. Auto-advance testing → deploy
+Новый `_try_advance_stage()` в launcher сработал без ручного вмешательства.
+
+### 3. Качество артефактов агентов
+- Analyst предусмотрел REQ-F-13 (persist GPX layers при map style switch) — предотвратил архитектурный bounce-back
+- Architect обосновал невозможность Web Worker (DOMParser отсутствует в WorkerGlobalScope)
+- Developer: ~1300 строк production + 700 строк тестов, все REQ покрыты
+- Tester: полный e2e с Playwright, 48 pass / 0 fail
+
+### 4. Полный цикл с bounce
+```
+analysis → architecture → development → review (REQUEST_CHANGES)
+→ development (fix P1) → review (APPROVED) → testing → deploy → done
+```
+Время: ~6.5 часов (включая ожидание API и e2e тесты)
+
+---
+
+## Проблемы найденные
+
+### P1. Zombie processes после docker rebuild
+**Симптом:** Monitor threads умирают при `docker compose up --build`, agent процессы остаются zombie.
+**Влияние:** Ручное вмешательство для commit/push и advance stage.
+**Root cause:** Daemon threads в Python не переживают restart контейнера, но child processes (claude.exe) наследуются init (PID 1).
+
+### P2. Stale reviews блокируют merge
+**Симптом:** Tester пушит коммит после review approval → approval становится stale → merge отклоняется.
+**Влияние:** Ручной re-approve перед каждым merge.
+**Root cause:** Branch protection `dismiss_stale_approvals: true`.
+
+### P3. Plane sync 404
+**Симптом:** `plane_issue_id` в orchestrator DB не совпадает с реальным UUID issue в Plane API.
+**Влияние:** State updates в Plane не работают (comments работают).
+**Root cause:** Webhook payload содержит ID объекта webhook event, не issue ID.
+
+### P4. Неполный event routing
+**Симптом:** `pull_request_rejected` event type не роутился в `handle_pr`.
+**Влияние:** REQUEST_CHANGES от reviewer не откатывал задачу автоматически.
+**Root cause:** Gitea использует разные event types: `pull_request`, `pull_request_approved`, `pull_request_rejected`.
+
+### P5. Analyst не запускался автоматически
+**Симптом:** После создания задачи через Plane webhook analyst не стартовал.
+**Влияние:** Ручной запуск analyst.
+**Root cause:** В `handle_work_item_created` не было вызова `launcher.launch("analyst")`.
+
+### P6. Tester долгий (25 мин)
+**Симптом:** Playwright e2e тесты с headless Chromium на GPX-фиче заняли 25 минут.
+**Влияние:** Долгое ожидание, watchdog timeout (30 мин) почти сработал.
+**Root cause:** Рендеринг 700K точек + установка зависимостей (Playwright, shapely) в runtime.
+
+---
+
+## Решения
+
+### P1. Zombie processes → Entrypoint + orphan recovery
+
+**Решение A (быстрое):** Добавить в Dockerfile:
+```dockerfile
+RUN git config --global --add safe.directory '*'
+```
+
+**Решение B (полное):** Startup recovery в `main.py`:
+```python
+@app.on_event("startup")
+async def recover_orphaned_runs():
+    """Mark orphaned runs (started but never finished) as failed."""
+    conn = get_db()
+    orphans = conn.execute(
+        "UPDATE agent_runs SET finished_at=datetime('now'), exit_code=-1 "
+        "WHERE finished_at IS NULL AND started_at < datetime('now', '-35 minutes')"
+    ).rowcount
+    conn.commit()
+    if orphans:
+        logger.warning(f"Recovered {orphans} orphaned agent runs")
+    # Re-check tasks stuck in intermediate stages
+    stuck = conn.execute(
+        "SELECT id, stage, work_item_id, repo, branch FROM tasks "
+        "WHERE stage NOT IN ('done', 'created')"
+    ).fetchall()
+    for task in stuck:
+        # Try to advance if QG passes
+        ...
+```
+
+**Решение C (robust):** Использовать `tini` как PID 1 в контейнере для proper zombie reaping:
+```dockerfile
+RUN apt-get install -y tini
+ENTRYPOINT ["tini", "--"]
+CMD ["uvicorn", "src.main:app", ...]
+```
+
+### P2. Stale reviews → Отключить dismiss или auto-re-approve
+
+**Решение A (простое):** Отключить `dismiss_stale_approvals`:
+```bash
+curl -X PATCH '.../branch_protections/main' -d '{"dismiss_stale_approvals": false}'
+```
+
+**Решение B (лучше):** Auto-re-approve в launcher после tester push:
+```python
+# В _monitor_agent, после успешного push для tester:
+if agent == "tester":
+    _reapprove_pr(repo, branch)
+```
+
+**Рекомендация:** Решение A — проще и безопаснее. В нашем пайплайне reviewer уже проверяет код, stale dismiss не добавляет ценности.
+
+### P3. Plane sync → Исправить маппинг ID
+
+**Решение:** При `work_item.created` webhook сохранять правильный `issue_id`:
+```python
+# В handle_work_item_created:
+plane_issue_id = data.get("id")  # Это ID issue, не event
+# Проверить через Plane API: GET /issues/{id} — если 404, искать по name
+```
+
+**Диагностика:** Сравнить `plane_issue_id` в DB с реальным через:
+```bash
+curl http://localhost:8091/api/v1/workspaces/ag_proj/projects/.../issues/?search=ET-006
+```
+
+### P4. Event routing → Wildcard для pull_request_*
+
+**Решение:**
+```python
+if event_type == "push":
+    await handle_push(payload)
+elif event_type.startswith("pull_request"):
+    await handle_pr(payload)
+elif event_type == "status":
+    await handle_ci_status(payload)
+```
+
+### P5. Analyst auto-launch → Уже исправлено
+Патч применён: `launcher.launch("analyst")` добавлен в `handle_work_item_created`.
+
+### P6. Tester долгий → Pre-bake dependencies
+
+**Решение A:** Добавить Playwright и зависимости в Dockerfile:
+```dockerfile
+RUN pip install playwright pytest-playwright shapely mapbox-vector-tile && \
+    playwright install chromium --with-deps
+```
+
+**Решение B:** Разделить unit/integration и e2e тесты. Unit/integration — обязательные (быстрые), e2e — опциональные (по флагу в task description).
+
+**Решение C:** Увеличить timeout для tester до 45 минут:
+```python
+AGENT_CONFIGS = {
+    "tester": {..., "timeout": 2700},  # 45 min
+}
+```
+
+---
+
+## Приоритет исправлений
+
+| # | Проблема | Приоритет | Усилие | Решение |
+|---|----------|-----------|--------|---------|
+| P1 | Zombie processes | HIGH | Medium | tini + startup recovery |
+| P2 | Stale reviews | HIGH | Low | Отключить dismiss_stale_approvals |
+| P4 | Event routing | HIGH | Low | startswith("pull_request") |
+| P5 | Analyst auto-launch | DONE | — | Уже исправлено |
+| P6 | Tester timeout | MEDIUM | Medium | Pre-bake deps в Dockerfile |
+| P3 | Plane sync 404 | LOW | Medium | Исправить маппинг ID |
+
+---
+
+## Метрики ET-006
+
+- **Общее время:** ~6.5 часов (00:20 → 06:45 UTC)
+- **Agent runs:** 7 (analyst, architect, developer×2, reviewer×2, tester)
+- **Ручные вмешательства:** 4 (zombie recovery×2, PR approve, event re-trigger)
+- **Код написан агентами:** ~2000 строк (1300 production + 700 tests)
+- **Баги найдены reviewer:** 1×P1, 3×P2, 6×P3
+- **Баги исправлены developer:** все P1 + все P2 + 3×P3
--- a/docs/ORCH-1_JOB_QUEUE.md
+++ b/docs/ORCH-1_JOB_QUEUE.md
@@ -0,0 +1,127 @@
+# ORCH-1 (F-2b): Persistent Job Queue
+
+**Дата:** 2026-06-02
+**Ветка:** `feature/ORCH-1-job-queue`
+**Источник:** AUDIT_2026-06-02 (B-2 / F-2b)
+
+## Проблема
+
+Агенты запускались **in-process**: `launcher.launch()` синхронно спавнил
+`subprocess.Popen` + 2 daemon-thread (`_watchdog`, `_monitor_agent`) прямо в
+процессе uvicorn, из **8 webhook-точек**. Последствия:
+
+- **Рестарт = катастрофа.** daemon-threads умирают, claude-процессы → сироты,
+  работа теряется (M-1 лишь помечал `exit=-1` и звал человека).
+- **Нет лимита параллелизма** — N webhook'ов = N одновременных claude.
+- **Нет ретраев** — упавший агент просто мёртв.
+
+## Решение
+
+Персистентная очередь задач (SQLite-таблица `jobs`) + фоновый воркер:
+
+1. Webhook-хэндлер кладёт job (`enqueue_job`) → мгновенный ответ 200.
+2. Фоновый воркер (`src/queue_worker.py`, отдельный daemon-thread) забирает
+   jobs с учётом `max_concurrency` (`claim_next_job`, атомарно) и спавнит агента
+   (`launcher.launch_job`, та же Popen-логика).
+3. По завершении `_monitor_agent` → `_finalize_job`:
+   - `exit 0` → `done`;
+   - `exit != 0` & `attempts < max_attempts` → requeue (`queued`);
+   - `exit != 0` & `attempts >= max_attempts` → `failed` + Telegram.
+
+## Что изменено
+
+| Файл | Изменение |
+|------|-----------|
+| `src/db.py` | Таблица `jobs` + индекс; хелперы `enqueue_job`, `claim_next_job` (атомарный), `mark_job`, `count_running_jobs`, `requeue_running_jobs`, `get_job`, `job_status_counts`, `recent_jobs` |
+| `src/config.py` | `max_concurrency` (env `ORCH_MAX_CONCURRENCY`, default 1), `queue_poll_interval` (env `ORCH_QUEUE_POLL_INTERVAL`, default 2.0) |
+| `src/agents/launcher.py` | `launch()` → тонкая обёртка над `_spawn()`; новый `launch_job(job)`; `_spawn()` (общий, `job_id` опционально); monitor/watchdog принимают `job_id`; новый `_finalize_job()` (статусы + ретраи). 4 внутренних advance-вызова `self.launch` → `enqueue_job` |
+| `src/webhooks/plane.py` | 4 точки `launcher.launch` → `enqueue_job` |
+| `src/webhooks/gitea.py` | 4 точки `launcher.launch` → `enqueue_job` |
+| `src/queue_worker.py` | **НОВЫЙ** — `QueueWorker` (drain loop + max_concurrency + graceful stop) |
+| `src/main.py` | lifespan: queue-recovery (`requeue_running_jobs`) после M-1, старт/останов воркера; новый `GET /queue` |
+| `tests/test_queue.py` | **НОВЫЙ** — 19 тестов (lifecycle, атомарность claim, ретраи, requeue, observability, worker max_concurrency; Popen полностью замокан) |
+
+## Атомарность claim
+
+```sql
+SELECT id FROM jobs WHERE status='queued' ORDER BY id LIMIT 1;
+UPDATE jobs SET status='running', attempts=attempts+1, started_at=datetime('now')
+  WHERE id=? AND status='queued';   -- rowcount==1 => claimed, ==0 => проиграл гонку
+```
+
+Гарантия: один job не выдаётся дважды даже при параллельных тиках воркера
+(проверено `test_concurrent_claims_no_duplicate` — 8 потоков, 20 jobs).
+
+## Сохранённые фиксы (НЕ сломаны)
+
+- **B-1** task-file write (direct `open()` в worktree) — без изменений.
+- **B-2** Popen → log_fh (no PIPE), monitor reap — без изменений, только обёрнут.
+- **M-1** orphan-recovery в `main.py` — оставлен, queue-recovery добавлен ПОСЛЕ него.
+- **ORCH-2** worktree per task — без изменений.
+- **ORCH-6** project registry/filter — без изменений.
+
+## Acceptance
+
+| # | Проверка | Статус |
+|---|----------|--------|
+| 1 | webhook кладёт job (queued) | ✅ enqueue_job |
+| 2 | воркер исполняет queued→running→done | ✅ worker + _finalize_job |
+| 3 | running ≤ max_concurrency | ✅ test_worker_respects_max_concurrency |
+| 4 | ретрай fail→queued→failed+notify | ✅ test_finalize_job_requeue_then_fail |
+| 5 | рестарт-safe (running→requeue) | ✅ requeue_running_jobs + lifespan |
+| 6 | M-1 не сломан | ✅ оставлен в lifespan |
+| 7 | тесты (new green, 9 pre-existing) | ✅ 76 passed / 9 pre-existing |
+| 8 | `/queue` | ✅ counts + recent |
+
+## Тесты
+
+```bash
+IMG=$(docker inspect orchestrator --format '{{.Config.Image}}')
+docker run --rm -v /home/slin/repos/orchestrator:/code -w /code \
+  --entrypoint python3 $IMG -m pytest tests/ -q
+# 110 passed, 9 failed (pre-existing test_webhooks 401/signature/TypeError)
+```
+
+---
+
+## Resilience-слой (ДОПОЛНЕНИЕ: preflight + 429 + backoff + circuit breaker)
+
+Надёжность очереди против недоступности CLI и rate-limit. Два РАЗНЫХ класса
+проблем лечатся по-разному.
+
+### A. Дешёвый preflight (`src/preflight.py`) — не жжёт токены
+Перед claim воркер проверяет: `os.path.exists(CLAUDE_BIN)` + `claude --version`
+(timeout 5с, токены НЕ тратит). Результат кэшируется `preflight_cache_ttl` (45с).
+FAIL → воркер НЕ claim’ит (job остаётся `queued`), ждёт. 🚫 НЕТ prompt-ping.
+
+### B. 429 — детект НА ВЫХОДЕ (`src/error_classifier.py`)
+rate-limit нельзя предсказать — классифицируем по логу прогона. `classify_log_file`
+читает хвост лога (16KB), ищет `429/rate limit/overloaded/quota/503/529/timeout/...`
+→ `transient` или `permanent`. Извлекает `Retry-After`.
+
+- **transient** (429/сеть) → backoff-ретрай с ОТДЕЛЬНЫМ `transient_attempts`
+  (лимит `transient_max_attempts=5`) — не жжёт code-fault бюджет.
+- **permanent** (code-fault) → обычные `attempts < max_attempts` (2), потом `failed`.
+
+### C. Backoff + `available_at`
+Колонки `jobs.available_at TEXT` + `jobs.transient_attempts INTEGER` (миграция
+`_ensure_column`). `claim_next_job`: `WHERE status='queued' AND (available_at IS NULL
+OR available_at <= datetime('now'))`. При transient: `available_at = now +
+min(2^n * base, max)` (base=10с, max=600с), `Retry-After` уважается (берёмся max).
+
+### D. Circuit breaker (`CircuitBreaker` в queue_worker)
+N=3 transient подряд → **open**: воркер паузит `breaker_pause_seconds=300`, ВООБЩЕ
+не дёргает CLI, Telegram-алерт. Через паузу → **half-open** (пробует 1 job);
+ожил (exit 0) → **closed**; снова transient → опять open. Состояние в памяти
+воркера, отражается в `/queue.resilience`.
+Связь launcher→breaker — через callback `launcher.on_outcome` (без import-цикла).
+
+### Конфиг (config.py)
+`preflight_cache_ttl=45`, `backoff_base_seconds=10`, `backoff_max_seconds=600`,
+`transient_max_attempts=5`, `breaker_threshold=3`, `breaker_pause_seconds=300`.
+
+### Тесты
+`tests/test_resilience.py` — 34 теста: preflight (FAIL→queued, кэш, force),
+классификатор (transient/permanent/Retry-After), backoff (рост/cap/Retry-After,
+`available_at` гейтинг), launcher transient/permanent finalize, breaker
+(open/half-open/closed/re-open, блок claim).
--- a/docs/SETUP_WEBHOOKS.md
+++ b/docs/SETUP_WEBHOOKS.md
@@ -0,0 +1,163 @@
+# Webhook Setup: Plane + Gitea → Orchestrator
+
+## Архитектура
+
+```
+Gitea (push/PR/CI) ──→ Nginx proxy ──→ Orchestrator /webhook/gitea
+Plane (work_item/comment) ──→ Nginx proxy ──→ Orchestrator /webhook/plane
+```
+
+External URL: `https://openclaw.mva154.duckdns.org/orchestrator/`
+Internal URL: `http://127.0.0.1:8500/`
+
+---
+
+## Gitea Webhook
+
+**Создан автоматически через API.**
+
+- URL: `https://openclaw.mva154.duckdns.org/orchestrator/webhook/gitea`
+- Events: `push`, `pull_request`, `status`
+- Secret: значение `ORCH_GITEA_WEBHOOK_SECRET` в `.env`
+- Signature header: `X-Gitea-Signature` (HMAC-SHA256 hex digest)
+
+### Проверка
+
+```bash
+GITEA_TOKEN=$(grep ORCH_GITEA_TOKEN /home/slin/repos/orchestrator/.env | cut -d= -f2)
+curl -s "http://localhost:3000/api/v1/repos/admin/enduro-trails/hooks" \
+  -H "Authorization: token ${GITEA_TOKEN}" | python3 -m json.tool
+```
+
+### Пересоздание (если нужно)
+
+```bash
+GITEA_WEBHOOK_SECRET=$(openssl rand -hex 20)
+# Обновить в .env: ORCH_GITEA_WEBHOOK_SECRET=<new_secret>
+
+curl -X POST "http://localhost:3000/api/v1/repos/admin/enduro-trails/hooks" \
+  -H "Authorization: token ${GITEA_TOKEN}" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "type": "gitea",
+    "active": true,
+    "config": {
+      "url": "https://openclaw.mva154.duckdns.org/orchestrator/webhook/gitea",
+      "content_type": "json",
+      "secret": "'${GITEA_WEBHOOK_SECRET}'"
+    },
+    "events": ["push", "pull_request", "status"],
+    "branch_filter": "*"
+  }'
+```
+
+---
+
+## Plane Webhook
+
+**Создан напрямую в PostgreSQL** (Plane CE не экспортирует webhook API через внешний /api/v1/).
+
+- URL: `https://openclaw.mva154.duckdns.org/orchestrator/webhook/plane`
+- Events: `issue` (work_item.created), `issue_comment` (comment.created)
+- Secret: значение `ORCH_PLANE_WEBHOOK_SECRET` в `.env`
+- Signature header: `X-Plane-Signature` (HMAC-SHA256 hex digest)
+
+### Проверка
+
+```bash
+docker exec -e PGPASSWORD=plane plane-app-plane-db-1 psql -U plane -d plane -c \
+  "SELECT id, url, is_active FROM webhooks;"
+```
+
+### Ручная настройка через UI (альтернатива)
+
+1. Открыть `https://plane.mva154.duckdns.org`
+2. Workspace Settings → Webhooks → Add Webhook
+3. URL: `https://openclaw.mva154.duckdns.org/orchestrator/webhook/plane`
+4. Secret: значение из `ORCH_PLANE_WEBHOOK_SECRET` в `.env`
+5. Events: Issue, Issue Comment
+6. Save
+
+### Пересоздание через SQL
+
+```bash
+PLANE_WEBHOOK_SECRET=$(openssl rand -hex 20)
+# Обновить в .env: ORCH_PLANE_WEBHOOK_SECRET=<new_secret>
+
+WORKSPACE_ID=$(docker exec -e PGPASSWORD=plane plane-app-plane-db-1 psql -U plane -d plane -t -A -c \
+  "SELECT id FROM workspaces WHERE slug='ag_proj'")
+
+WEBHOOK_ID=$(cat /proc/sys/kernel/random/uuid)
+
+docker exec -e PGPASSWORD=plane plane-app-plane-db-1 psql -U plane -d plane -c "
+INSERT INTO webhooks (id, created_at, updated_at, deleted_at, workspace_id, url, is_active, secret_key, project, issue, module, cycle, issue_comment, is_internal, version)
+VALUES ('${WEBHOOK_ID}', NOW(), NOW(), NULL, '${WORKSPACE_ID}',
+  'https://openclaw.mva154.duckdns.org/orchestrator/webhook/plane',
+  true, '${PLANE_WEBHOOK_SECRET}', true, true, false, false, true, false, 'v1');
+"
+```
+
+---
+
+## HMAC Signature Verification
+
+Оба handler'а проверяют подпись:
+- Если secret пустой в `.env` — верификация пропускается (для dev/debug)
+- Если secret задан — запрос без валидной подписи получает `401 Unauthorized`
+
+### Формат подписи
+
+| Source | Header | Algorithm | Format |
+|--------|--------|-----------|--------|
+| Gitea | `X-Gitea-Signature` | HMAC-SHA256 | hex digest (без префикса) |
+| Plane | `X-Plane-Signature` | HMAC-SHA256 | hex digest |
+
+### Тест подписи вручную
+
+```bash
+SECRET=$(grep ORCH_GITEA_WEBHOOK_SECRET /home/slin/repos/orchestrator/.env | cut -d= -f2)
+BODY='{"ref":"refs/heads/test","repository":{"name":"enduro-trails"},"commits":[]}'
+SIG=$(echo -n "${BODY}" | openssl dgst -sha256 -hmac "${SECRET}" | awk '{print $NF}')
+
+curl -X POST http://localhost:8500/webhook/gitea \
+  -H "Content-Type: application/json" \
+  -H "X-Gitea-Event: push" \
+  -H "X-Gitea-Signature: ${SIG}" \
+  -d "${BODY}"
+# Expected: {"status":"accepted"}
+```
+
+---
+
+## Переменные окружения (.env)
+
+| Переменная | Описание |
+|-----------|----------|
+| `ORCH_GITEA_WEBHOOK_SECRET` | HMAC secret для Gitea webhook |
+| `ORCH_PLANE_WEBHOOK_SECRET` | HMAC secret для Plane webhook |
+| `ORCH_GITEA_TOKEN` | API token для Gitea |
+| `ORCH_PLANE_API_TOKEN` | API token для Plane |
+
+---
+
+## Troubleshooting
+
+```bash
+# Логи Orchestrator
+docker logs orchestrator --tail 50 2>&1 | grep -i "webhook\|signature\|401"
+
+# События в БД
+docker exec orchestrator python3 -c "
+import sqlite3
+conn = sqlite3.connect('/app/data/orchestrator.db')
+for r in conn.execute('SELECT id, source, event_type, timestamp FROM events ORDER BY id DESC LIMIT 10').fetchall():
+    print(r)
+"
+
+# Gitea webhook delivery history
+# Gitea UI → Settings → Webhooks → click webhook → Recent Deliveries
+```
+
+---
+
+*Создано: 2026-05-21 | Автор: Dev-агент*
--- a/requirements.txt
+++ b/requirements.txt
@@ -2,3 +2,4 @@ fastapi==0.115.0
 uvicorn[standard]==0.30.0
 pydantic-settings==2.5.0
 httpx==0.27.0
+pytest==8.3.3
--- a/src/agents/launcher.py
+++ b/src/agents/launcher.py
@@ -1,11 +1,21 @@
 import subprocess
 import os
+import logging
+import threading
+import signal
 from ..config import settings
-from ..db import get_db
+from ..db import get_db, get_task_by_repo_branch, update_task_stage, enqueue_job
+from ..stages import get_next_stage, get_qg_for_stage, get_agent_for_stage
+from ..git_worktree import ensure_worktree, get_worktree_path
+from ..qg.checks import QG_CHECKS
+from ..notifications import notify_stage_change, notify_qg_failure, notify_agent_started, notify_agent_finished, notify_approve_requested
+from ..plane_sync import notify_stage_change as plane_notify_stage, add_comment as plane_add_comment
+
+logger = logging.getLogger("orchestrator.launcher")


 class AgentLauncher:
-    """Launch Claude CLI agents for specific tasks."""
+    """Launch Claude CLI agents directly (binary mounted into container)."""

    AGENT_CONFIGS = {
        "analyst": {
@@ -17,6 +27,7 @@ class AgentLauncher:
            "system_prompt": ".openclaw/agents/architect.md",
            "task_file": ".task-arch.md",
            "allowed_tools": "Read,Write,Edit,Bash",
+            "model": "opus",
        },
        "developer": {
            "system_prompt": ".openclaw/agents/developer.md",
@@ -27,68 +38,144 @@ class AgentLauncher:
            "system_prompt": ".openclaw/agents/reviewer.md",
            "task_file": ".task-review.md",
            "allowed_tools": "Read,Write,Edit,Bash",
+            "model": "opus",
        },
        "tester": {
            "system_prompt": ".openclaw/agents/tester.md",
            "task_file": ".task-test.md",
            "allowed_tools": "Read,Write,Edit,Bash",
        },
+        "deployer": {
+            "task_file": ".task-deploy.md",
+            "system_prompt": ".openclaw/agents/deployer.md",
+            "allowed_tools": "Read,Write,Edit,Bash",
+        },
    }

-    def launch(self, agent: str, repo: str, task_content: str = None) -> int:
+    CLAUDE_BIN = "/opt/claude-code/bin/claude.exe"
+    AGENT_TIMEOUT = 1800  # 30 minutes
+
+    def launch(self, agent: str, repo: str, task_content: str = None, task_id: int = None) -> int:
        """
-        Launch a Claude CLI agent.
+        Launch a Claude CLI agent directly (legacy synchronous path).
+
+        Kept for backward compatibility (direct callers / existing tests). The
+        ORCH-1 job queue uses launch_job() instead, but both share _spawn().

        Args:
            agent: Agent role (analyst, architect, developer, reviewer, tester)
            repo: Repository name
            task_content: Optional task content to write to task file
+            task_id: Optional task ID to associate with this run

        Returns:
            agent_run_id from DB
        """
+        return self._spawn(agent, repo, task_content, task_id, job_id=None)
+
+    def launch_job(self, job: dict) -> int:
+        """ORCH-1: launch an agent for a claimed queue job.
+
+        Same spawn path as launch(), but threads job['id'] through so the monitor
+        can update the job's status (done / requeue / failed) and link jobs.run_id
+        to the agent_runs row. Returns the agent_run_id.
+        """
+        return self._spawn(
+            job["agent"],
+            job["repo"],
+            job.get("task_content"),
+            job.get("task_id"),
+            job_id=job["id"],
+        )
+
+    def _spawn(self, agent: str, repo: str, task_content: str = None,
+               task_id: int = None, job_id: int = None) -> int:
+        """Shared spawn implementation for launch() and launch_job().
+
+        When job_id is set, the monitor/watchdog drive the jobs table status
+        (ORCH-1). The claude-CLI Popen logic (B-2) and worktree/task-file logic
+        (B-1 / ORCH-2) are unchanged.
+        """
        config = self.AGENT_CONFIGS.get(agent)
        if not config:
            raise ValueError(f"Unknown agent: {agent}")

-        repo_path = os.path.join(settings.repos_dir, repo)
-        if not os.path.isdir(repo_path):
-            raise FileNotFoundError(f"Repo not found: {repo_path}")
+        # Main clone lives at /repos/<repo>; the agent works in an isolated worktree
+        # (ORCH-2 / S-4) so concurrent tasks never fight over a shared checkout.
+        local_repo_path = os.path.join(settings.repos_dir, repo)
+        if not os.path.isdir(local_repo_path):
+            raise FileNotFoundError(f"Repo not found: {local_repo_path}")

-        # Write task file if content provided
+        # Determine branch (needed before we touch the worktree / task file).
+        _br_row = get_db().execute("SELECT branch FROM tasks WHERE id=?", (task_id,)).fetchone() if task_id else None
+        agent_branch = _br_row[0] if _br_row else "main"
+
+        # Ensure the per-branch worktree exists and is on the right branch.
+        work_path = ensure_worktree(repo, agent_branch)
+
+        # Write task file if content provided (B-1: direct write; now into the worktree).
        if task_content:
-            task_path = os.path.join(repo_path, config["task_file"])
-            with open(task_path, "w") as f:
-                f.write(task_content)
+            self._write_task_file(repo, agent_branch, config["task_file"], task_content)

        # Record run in DB
        conn = get_db()
        cursor = conn.execute(
-            "INSERT INTO agent_runs (task_id, agent) VALUES (NULL, ?)",
-            (agent,),
+            "INSERT INTO agent_runs (task_id, agent) VALUES (?, ?)",
+            (task_id, agent),
        )
        run_id = cursor.lastrowid
        conn.commit()

-        # Prepare output log
+        # ORCH-1: link this job to the agent_runs row and stamp started_at.
+        if job_id is not None:
+            conn.execute(
+                "UPDATE jobs SET run_id = ?, started_at = datetime('now') WHERE id = ?",
+                (run_id, job_id),
+            )
+            conn.commit()
+
+        # Prepare output log path
        output_path = f"/app/data/runs/{run_id}.log"
        os.makedirs(os.path.dirname(output_path), exist_ok=True)

-        # Build shell command
+        # Build the claude command
+        task_file = config["task_file"]
+        system_prompt = config["system_prompt"]
+        allowed_tools = config["allowed_tools"]
+
+        model = config.get("model", "")
+        model_flag = f"--model {model} " if model else ""
+
+        # No git fetch/checkout here: ensure_worktree() already put the worktree on
+        # the right branch. The agent simply runs inside its isolated work_path.
        cmd = (
-            f'cd {repo_path} && {settings.claude_bin} --print '
-            f'"$(cat {config["task_file"]})" '
-            f'--system-prompt "$(cat {config["system_prompt"]})" '
-            f'--allowedTools {config["allowed_tools"]}'
+            f'cd {work_path} && '
+            f'{self.CLAUDE_BIN} --print '
+            f'{model_flag}'
+            f'"$(cat {task_file})" '
+            f'--system-prompt "$(cat {system_prompt})" '
+            f'--allowedTools {allowed_tools}'
        )

-        # Launch as background process
-        with open(output_path, "w") as log_file:
-            subprocess.Popen(
-                ["bash", "-c", cmd],
-                stdout=log_file,
-                stderr=subprocess.STDOUT,
-                cwd=repo_path,
+        logger.info(f"Launching agent '{agent}' for repo '{repo}', run_id={run_id}")
+
+        # Launch as background process.
+        # B-2 fix: redirect stdout/stderr straight to the log file at the OS level.
+        # No PIPE in the orchestrator process -> no PIPE deadlock, no reader thread,
+        # no zombies. log_fh is closed by _monitor_agent after proc.wait().
+        log_fh = open(output_path, "w")
+        proc = subprocess.Popen(
+            ["bash", "-c", cmd],
+            stdout=log_fh,
+            stderr=subprocess.STDOUT,
+            env={
+                    **os.environ,
+                    "HOME": "/home/slin",
+                    "GIT_AUTHOR_NAME": "claude-bot",
+                    "GIT_AUTHOR_EMAIL": "claude-bot@mva154.local",
+                    "GIT_COMMITTER_NAME": "claude-bot",
+                    "GIT_COMMITTER_EMAIL": "claude-bot@mva154.local",
+                },
            )

        # Update DB with output path
@@ -99,7 +186,581 @@ class AgentLauncher:
        conn.commit()
        conn.close()

+        # Start timeout watchdog
+        t = threading.Thread(
+            target=self._watchdog,
+            args=(proc.pid, run_id),
+            kwargs={"job_id": job_id},
+            daemon=True,
+        )
+        t.start()
+
+        # Start monitor thread (waits for completion, commits, pushes)
+        # agent_branch already computed above
+        m = threading.Thread(
+            target=self._monitor_agent,
+            args=(proc, run_id, agent, repo, agent_branch, output_path, log_fh),
+            kwargs={"job_id": job_id},
+            daemon=True,
+        )
+        m.start()
+
+        logger.info(f"Agent '{agent}' launched, pid={proc.pid}, run_id={run_id}")
+        notify_agent_started(run_id, agent, task_id)
        return run_id

+    def _watchdog(self, pid: int, run_id: int, timeout: int = None, job_id: int = None):
+        """Kill agent if it exceeds timeout.
+
+        ORCH-1: on a timeout-kill the monitor's proc.wait() returns the kill exit
+        code and drives the job retry/fail logic, so the watchdog itself only needs
+        to SIGKILL and record the agent_runs exit. job_id is accepted for symmetry.
+        """
+        import time
+        if timeout is None:
+            timeout = self.AGENT_TIMEOUT
+        time.sleep(timeout)
+        try:
+            os.kill(pid, signal.SIGKILL)
+            logger.warning(f"Agent run_id={run_id} killed after {timeout}s timeout")
+            conn = get_db()
+            conn.execute(
+                "UPDATE agent_runs SET finished_at=datetime('now'), exit_code=-9 WHERE id=?",
+                (run_id,),
+            )
+            conn.commit()
+            conn.close()
+        except ProcessLookupError:
+            pass  # Already finished
+
+    def _monitor_agent(self, proc, run_id, agent, repo, branch, output_path=None, log_fh=None, job_id=None):
+        """Wait for agent to finish, commit+push results, update DB.
+
+        B-2 fix: stdout already goes straight to the log file via Popen, so we just
+        block on proc.wait() (guaranteed reap -> no zombie, real exit_code) and then
+        close the log file handle. No PIPE, no select loop, no startup timeout here
+        (the watchdog still enforces the overall AGENT_TIMEOUT by pid).
+        """
+        import time as _time
+        _start_ts = _time.time()
+
+        exit_code = proc.wait()
+        if log_fh is not None:
+            try:
+                log_fh.close()
+            except Exception:
+                pass
+        _duration_s = int(_time.time() - _start_ts)
+        logger.info(f"Agent run_id={run_id} ({agent}) finished with exit_code={exit_code}")
+
+        # Update DB
+        conn = get_db()
+        conn.execute(
+            "UPDATE agent_runs SET finished_at=datetime('now'), exit_code=? WHERE id=?",
+            (exit_code, run_id),
+        )
+        conn.commit()
+
+        # Get task_id for notification
+        _row = conn.execute("SELECT task_id FROM agent_runs WHERE id=?", (run_id,)).fetchone()
+        _task_id = _row[0] if _row else None
+        conn.close()
+
+        notify_agent_finished(run_id, agent, exit_code, task_id=_task_id, duration_s=_duration_s)
+
+        # Commit and push any changes — in the per-branch worktree (ORCH-2 / S-4),
+        # NOT in the shared /repos/<repo>. The worktree is already on `branch`
+        # (ensure_worktree did the checkout), so no checkout is needed here.
+        repo_path = get_worktree_path(repo, branch)
+        try:
+            git_env = {
+                **os.environ,
+                "HOME": "/home/slin",
+                "GIT_AUTHOR_NAME": "claude-bot",
+                "GIT_AUTHOR_EMAIL": "claude-bot@mva154.local",
+                "GIT_COMMITTER_NAME": "claude-bot",
+                "GIT_COMMITTER_EMAIL": "claude-bot@mva154.local",
+            }
+            result = subprocess.run(
+                ["git", "-C", repo_path, "status", "--porcelain"],
+                capture_output=True, text=True, timeout=10, env=git_env
+            )
+            if result.stdout.strip():
+                # Add docs/ always
+                subprocess.run(
+                    ["git", "-C", repo_path, "add", "docs/"],
+                    capture_output=True, text=True, timeout=10, env=git_env
+                )
+                # Add src/ and tests/ for developer
+                if agent == "developer":
+                    subprocess.run(
+                        ["git", "-C", repo_path, "add", "src/", "tests/"],
+                        capture_output=True, text=True, timeout=10, env=git_env
+                    )
+                # Commit
+                commit_result = subprocess.run(
+                    ["git", "-C", repo_path, "commit", "-m",
+                     f"{agent}(ET): auto-commit from {agent} run_id={run_id}"],
+                    capture_output=True, text=True, timeout=30, env=git_env
+                )
+                if commit_result.returncode == 0:
+                    push_result = subprocess.run(
+                        ["git", "-C", repo_path, "push", "origin", branch],
+                        capture_output=True, text=True, timeout=60, env=git_env
+                    )
+                    if push_result.returncode == 0:
+                        logger.info(f"Agent run_id={run_id}: committed and pushed to {branch}")
+                        # Auto-create PR after developer pushes
+                        if agent == "developer":
+                            self._ensure_pr(repo, branch, run_id)
+                    else:
+                        logger.error(f"Agent run_id={run_id}: push failed: {push_result.stderr}")
+                else:
+                    logger.warning(f"Agent run_id={run_id}: commit failed: {commit_result.stderr}")
+            else:
+                logger.info(f"Agent run_id={run_id}: no changes to commit")
+        except Exception as e:
+            logger.error(f"Agent run_id={run_id}: post-run git failed: {e}")
+
+        # Handle deployer failure (smoke/healthcheck failed) — Task 7
+        if exit_code != 0 and agent == "deployer":
+            conn = get_db()
+            task_row = conn.execute(
+                "SELECT id, work_item_id FROM tasks WHERE repo=? AND branch=?",
+                (repo, branch),
+            ).fetchone()
+            conn.close()
+            if task_row:
+                _tid, _wid = task_row
+                update_task_stage(_tid, "development")
+                notify_stage_change(_tid, "deploy", "development")
+                plane_notify_stage(_wid, "deploy", "development")
+                from ..plane_sync import set_issue_blocked
+                set_issue_blocked(_wid)
+                plane_add_comment(
+                    _wid,
+                    "\u274c Deploy FAILED (smoke/healthcheck). Rolled back. Developer \u043d\u0443\u0436\u0435\u043d \u0434\u043b\u044f \u0444\u0438\u043a\u0441\u0430."
+                )
+                from ..notifications import send_telegram
+                send_telegram(f"\U0001f6a8 {_wid}: Deploy failed! Rolled back. Needs fix.")
+
+        # Notify on startup timeout (exit_code from kill = -9 or 137)
+        if exit_code != 0 and exit_code not in (None,):
+            conn = get_db()
+            task_row = conn.execute(
+                "SELECT id, work_item_id FROM tasks WHERE repo=? AND branch=?",
+                (repo, branch),
+            ).fetchone()
+            conn.close()
+            if task_row and agent != "deployer":  # deployer handled above
+                _tid, _wid = task_row
+                from ..notifications import send_telegram
+                send_telegram(f"\u26a0\ufe0f {_wid}: Agent {agent} failed (exit_code={exit_code}). Check logs: /app/data/runs/{run_id}.log")
+
+        # Auto-advance stage if agent finished successfully and QG passes
+        if exit_code == 0:
+            self._try_advance_stage(run_id, agent, repo, branch)
+
+        # ORCH-1: drive the job-queue status for queue-launched jobs only.
+        # (Legacy direct launch() has job_id=None and is unaffected.)
+        if job_id is not None:
+            self._finalize_job(job_id, agent, run_id, exit_code, output_path=output_path)
+
+    def _backoff_seconds(self, transient_attempts: int, retry_after: int = None) -> int:
+        """Exponential backoff for transient failures, honouring Retry-After.
+
+        backoff = min(2^transient_attempts * base, max). If the server sent a
+        Retry-After, use the larger of the two (never poll sooner than asked).
+        """
+        base = settings.backoff_base_seconds
+        cap = settings.backoff_max_seconds
+        backoff = min((2 ** max(transient_attempts, 0)) * base, cap)
+        if retry_after is not None and retry_after > 0:
+            backoff = max(backoff, min(retry_after, cap))
+        return int(backoff)
+
+    def _finalize_job(self, job_id: int, agent: str, run_id: int, exit_code, output_path=None):
+        """ORCH-1: update the jobs row after the agent process finished.
+
+        exit_code == 0  -> done (and resets the breaker streak via on_outcome).
+        exit_code != 0  -> classify the failure from the run log tail (token-free):
+          - TRANSIENT (429/overload/network): backoff-requeue with available_at in
+            the future + a SEPARATE transient_attempts budget
+            (settings.transient_max_attempts), honouring Retry-After. Reported to
+            the breaker so it opens after N consecutive transient failures.
+          - PERMANENT (code fault): ordinary attempts < max_attempts requeue,
+            otherwise 'failed' + Telegram.
+        """
+        from ..db import get_job, mark_job
+        from ..error_classifier import classify_log_file
+        try:
+            job = get_job(job_id)
+            if not job:
+                return
+            if exit_code == 0:
+                mark_job(job_id, "done", run_id=run_id)
+                logger.info(f"Job {job_id} ({agent}) done (run_id={run_id})")
+                self._record_outcome(transient=False, recovered=True)
+                return
+
+            # Classify the failure from the agent log tail (no token cost).
+            kind, retry_after = "permanent", None
+            log_path = output_path or f"/app/data/runs/{run_id}.log"
+            try:
+                kind, retry_after = classify_log_file(log_path)
+            except Exception:
+                pass
+
+            if kind == "transient":
+                self._finalize_transient(job_id, agent, run_id, exit_code, job, retry_after)
+            else:
+                self._finalize_permanent(job_id, agent, run_id, exit_code, job)
+        except Exception as e:
+            logger.error(f"Job {job_id}: _finalize_job error: {e}")
+
+    def _finalize_transient(self, job_id, agent, run_id, exit_code, job, retry_after):
+        """Transient (429/overload/net) failure -> backoff requeue or fail when budget out."""
+        from ..db import mark_job, mark_job_transient
+        tattempts = job.get("transient_attempts", 0)
+        tmax = settings.transient_max_attempts
+        err = (f"transient (429/overload) agent {agent} exit={exit_code} "
+               f"(run_id={run_id}); retry_after={retry_after}")
+        self._record_outcome(transient=True, recovered=False)
+        if tattempts < tmax:
+            backoff = self._backoff_seconds(tattempts + 1, retry_after)
+            mark_job_transient(job_id, backoff, error=err)
+            logger.warning(
+                f"Job {job_id} ({agent}) TRANSIENT fail (exit={exit_code}), "
+                f"backoff {backoff}s, transient_attempt {tattempts + 1}/{tmax}"
+            )
+        else:
+            mark_job(job_id, "failed", run_id=run_id, error=err)
+            logger.error(
+                f"Job {job_id} ({agent}) failed after {tattempts} transient attempts"
+            )
+            self._notify_failed(job_id, agent, job, run_id,
+                                f"transient (rate-limit) after {tattempts} attempts")
+
+    def _finalize_permanent(self, job_id, agent, run_id, exit_code, job):
+        """Permanent (code-fault) failure -> normal attempts<max requeue, then fail."""
+        from ..db import mark_job
+        attempts = job.get("attempts", 0)
+        max_attempts = job.get("max_attempts", 2)
+        err = f"agent {agent} exit_code={exit_code} (run_id={run_id})"
+        self._record_outcome(transient=False, recovered=False)
+        if attempts < max_attempts:
+            mark_job(job_id, "queued", run_id=run_id, error=err)
+            logger.warning(
+                f"Job {job_id} ({agent}) failed (exit={exit_code}), "
+                f"requeued (attempt {attempts}/{max_attempts})"
+            )
+        else:
+            mark_job(job_id, "failed", run_id=run_id, error=err)
+            logger.error(
+                f"Job {job_id} ({agent}) failed permanently after "
+                f"{attempts} attempts (exit={exit_code})"
+            )
+            self._notify_failed(job_id, agent, job, run_id,
+                                f"{attempts} attempts (exit={exit_code})")
+
+    def _notify_failed(self, job_id, agent, job, run_id, why):
+        try:
+            from ..notifications import send_telegram
+            send_telegram(
+                f"\U0001f6a8 Job {job_id} ({agent}, repo {job.get('repo')}) "
+                f"failed: {why}. Logs: /app/data/runs/{run_id}.log"
+            )
+        except Exception:
+            pass
+
+    def _record_outcome(self, transient: bool, recovered: bool):
+        """Forward the run outcome to the circuit breaker (if a worker is wired).
+
+        Decoupled via a settable callback (set by QueueWorker.start) so the launcher
+        does not hard-import the worker (avoids a cycle) and tests can run the
+        launcher standalone.
+        """
+        cb = getattr(self, "on_outcome", None)
+        if cb:
+            try:
+                cb(transient=transient, recovered=recovered)
+            except Exception:
+                pass
+
+    def _try_advance_stage(self, run_id: int, agent: str, repo: str, branch: str):
+        """After agent finishes successfully, check QG and advance stage if possible."""
+        try:
+            conn = get_db()
+            task_row = conn.execute(
+                "SELECT id, stage, work_item_id FROM tasks WHERE repo=? AND branch=?",
+                (repo, branch),
+            ).fetchone()
+            conn.close()
+            if not task_row:
+                return
+
+            task_id, current_stage, work_item_id = task_row
+            qg_name = get_qg_for_stage(current_stage)
+            next_stage = get_next_stage(current_stage)
+
+            if not next_stage:
+                return
+
+            # Run QG check if defined
+            if qg_name and qg_name in QG_CHECKS:
+                check_fn = QG_CHECKS[qg_name]
+                if qg_name in ("check_analysis_approved",):
+                    # Requires human approval - post request comment if analyst just finished
+                    if agent == "analyst" and qg_name == "check_analysis_approved" and work_item_id:
+                        files_check = QG_CHECKS.get("check_analysis_complete")
+                        if files_check:
+                            files_ok, _ = files_check(repo, work_item_id, branch)
+                            if files_ok:
+                                # Full artifacts ready -> In Review
+                                from ..plane_sync import set_issue_in_review
+                                set_issue_in_review(work_item_id)
+                                plane_add_comment(
+                                    work_item_id,
+                                    "\U0001f4cb BRD/\u0422\u0417/AC/TestPlan \u0433\u043e\u0442\u043e\u0432\u044b. "
+                                    "\u041f\u0440\u043e\u0448\u0443 review \u0438 \u0440\u0435\u0430\u043a\u0446\u0438\u044e :approved: \u0434\u043b\u044f \u043f\u0440\u043e\u0434\u0432\u0438\u0436\u0435\u043d\u0438\u044f \u0432 Architecture."
+                                )
+                                notify_approve_requested(task_id)
+                                logger.info(f"Task {task_id}: analyst finished, requested :approved: in Plane")
+                            else:
+                                # Check if questions file exists (in the task worktree)
+                                import os as _os
+                                questions_path = _os.path.join(
+                                    get_worktree_path(repo, branch),
+                                    f"docs/work-items/{work_item_id}/01-questions.md"
+                                )
+                                if _os.path.isfile(questions_path):
+                                    # Analyst has questions -> Needs Input
+                                    from ..plane_sync import set_issue_needs_input
+                                    set_issue_needs_input(work_item_id)
+                                    with open(questions_path, "r") as qf:
+                                        questions_text = qf.read()
+                                    plane_add_comment(
+                                        work_item_id,
+                                        f"\u2753 Analyst \u043d\u0443\u0436\u0434\u0430\u0435\u0442\u0441\u044f \u0432 \u0443\u0442\u043e\u0447\u043d\u0435\u043d\u0438\u0438:\n\n{questions_text}"
+                                    )
+                                    from ..notifications import send_telegram
+                                    send_telegram(
+                                        f"\u2753 {work_item_id}: Analyst \u0437\u0430\u0434\u0430\u0451\u0442 \u0432\u043e\u043f\u0440\u043e\u0441\u044b. \u041e\u0442\u0432\u0435\u0442\u044c \u0432 Plane."
+                                    )
+                                else:
+                                    # No artifacts and no questions
+                                    plane_add_comment(
+                                        work_item_id,
+                                        "\u26a0\ufe0f Analyst \u0437\u0430\u0432\u0435\u0440\u0448\u0438\u043b\u0441\u044f \u0431\u0435\u0437 \u0430\u0440\u0442\u0435\u0444\u0430\u043a\u0442\u043e\u0432 \u0438 \u0431\u0435\u0437 \u0432\u043e\u043f\u0440\u043e\u0441\u043e\u0432. \u041f\u0440\u043e\u0432\u0435\u0440\u044c\u0442\u0435 \u043b\u043e\u0433."
+                                    )
+                    return
+                elif qg_name in ("check_ci_green", "check_tests_local"):
+                    # (repo, branch) signature — already worktree-aware.
+                    passed, reason = check_fn(repo, branch)
+                elif qg_name == "check_tests_passed":
+                    # Artifact check — pass branch so it reads from the worktree.
+                    passed, reason = check_fn(repo, work_item_id or "", branch)
+                else:
+                    # Other artifact checks (check_architecture_done, etc.) — worktree-aware.
+                    passed, reason = check_fn(repo, work_item_id or "", branch)
+
+                if not passed:
+                    logger.info(f"Task {task_id}: QG '{qg_name}' not passed after {agent}: {reason}")
+                    # If reviewer says REQUEST_CHANGES, rollback to development
+                    if agent == "reviewer" and "REQUEST_CHANGES" in reason:
+                        update_task_stage(task_id, "development")
+                        notify_stage_change(task_id, current_stage, "development")
+                        plane_notify_stage(work_item_id, current_stage, "development")
+                        # Count retries
+                        conn2 = get_db()
+                        retry_count = conn2.execute(
+                            "SELECT COUNT(*) FROM agent_runs WHERE task_id=? AND agent='developer'",
+                            (task_id,)
+                        ).fetchone()[0]
+                        conn2.close()
+                        if retry_count < 3:
+                            task_desc = (
+                                f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
+                                f"Stage: development\nNote: REQUEST_CHANGES from reviewer "
+                                f"(attempt {retry_count+1}/3). Fix findings in "
+                                f"docs/work-items/{work_item_id}/12-review.md"
+                            )
+                            new_job = enqueue_job("developer", repo, task_desc, task_id=task_id)
+                            logger.info(f"Task {task_id}: reviewer REQUEST_CHANGES, enqueued developer (job_id={new_job})")
+                        else:
+                            from ..notifications import send_telegram
+                            send_telegram(f"\u26a0\ufe0f {work_item_id}: Max developer retries (3) reached. Manual intervention needed.")
+                            logger.error(f"Task {task_id}: max retries reached")
+
+                    # Task 6: Tester FAIL -> rollback to development
+                    if agent == "tester" and qg_name == "check_tests_passed" and not passed:
+                        update_task_stage(task_id, "development")
+                        notify_stage_change(task_id, current_stage, "development")
+                        plane_notify_stage(work_item_id, current_stage, "development")
+                        from ..plane_sync import set_issue_in_progress
+                        set_issue_in_progress(work_item_id)
+                        plane_add_comment(
+                            work_item_id,
+                            f"\u274c \u0422\u0435\u0441\u0442\u044b \u043d\u0435 \u043f\u0440\u043e\u0448\u043b\u0438: {reason}. Developer \u043f\u0435\u0440\u0435\u0437\u0430\u043f\u0443\u0449\u0435\u043d \u0434\u043b\u044f \u0444\u0438\u043a\u0441\u0430."
+                        )
+                        conn2 = get_db()
+                        retry_count = conn2.execute(
+                            "SELECT COUNT(*) FROM agent_runs WHERE task_id=? AND agent='developer'",
+                            (task_id,)
+                        ).fetchone()[0]
+                        conn2.close()
+                        if retry_count < 3:
+                            task_desc = (
+                                f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
+                                f"Stage: development\nNote: Tests FAILED. "
+                                f"Fix failures described in docs/work-items/{work_item_id}/13-test-report.md"
+                            )
+                            new_job = enqueue_job("developer", repo, task_desc, task_id=task_id)
+                            logger.info(f"Task {task_id}: tester FAIL, enqueued developer (job_id={new_job})")
+                        else:
+                            from ..notifications import send_telegram
+                            from ..plane_sync import set_issue_blocked
+                            set_issue_blocked(work_item_id)
+                            send_telegram(f"\U0001f6a8 {work_item_id}: Tests still failing after 3 developer retries. Manual intervention needed.")
+
+                    # Task 8: Architect conflict -> rollback to analysis
+                    if agent == "architect" and qg_name == "check_architecture_done" and not passed:
+                        import os as _os
+                        conflict_path = _os.path.join(
+                            get_worktree_path(repo, branch),
+                            f"docs/work-items/{work_item_id}/10-conflict.md"
+                        )
+                        if _os.path.isfile(conflict_path):
+                            update_task_stage(task_id, "analysis")
+                            notify_stage_change(task_id, current_stage, "analysis")
+                            plane_notify_stage(work_item_id, current_stage, "analysis")
+                            from ..plane_sync import set_issue_in_progress
+                            set_issue_in_progress(work_item_id)
+                            with open(conflict_path, "r") as cf:
+                                conflict_text = cf.read()[:500]
+                            plane_add_comment(
+                                work_item_id,
+                                f"\u26a0\ufe0f Architect \u043d\u0430\u0448\u0451\u043b \u043a\u043e\u043d\u0444\u043b\u0438\u043a\u0442 \u0441 \u0422\u0417. \u0412\u043e\u0437\u0432\u0440\u0430\u0442 \u0432 Analysis.\n\n{conflict_text}"
+                            )
+                            task_desc = (
+                                f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
+                                f"Stage: analysis\nNote: Architect conflict. Revise TRZ. "
+                                f"See docs/work-items/{work_item_id}/10-conflict.md"
+                            )
+                            new_job = enqueue_job("analyst", repo, task_desc, task_id=task_id)
+                            logger.info(f"Task {task_id}: architect conflict, enqueued analyst (job_id={new_job})")
+                            return
+
+                    return
+            elif qg_name:
+                return
+
+            # Advance stage
+            update_task_stage(task_id, next_stage)
+            notify_stage_change(task_id, current_stage, next_stage)
+            plane_notify_stage(work_item_id, current_stage, next_stage)
+            logger.info(f"Task {task_id}: {current_stage} -> {next_stage} (auto-advance after {agent})")
+
+            # Launch next agent if defined
+            next_agent = get_agent_for_stage(next_stage)
+            if next_agent:
+                task_desc = f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\nStage: {next_stage}"
+                new_job_id = enqueue_job(next_agent, repo, task_desc, task_id=task_id)
+                logger.info(f"Task {task_id}: enqueued '{next_agent}' (job_id={new_job_id})")
+
+        except Exception as e:
+            logger.error(f"Auto-advance failed for run_id={run_id}: {e}")
+
+
+    def _ensure_pr(self, repo: str, branch: str, run_id: int):
+        import httpx
+        owner = settings.gitea_owner
+        headers = {"Authorization": f"token {settings.gitea_token}"}
+        base_url = f"{settings.gitea_url}/api/v1"
+        try:
+            resp = httpx.get(
+                f"{base_url}/repos/{owner}/{repo}/pulls",
+                params={"state": "open", "head": branch},
+                headers=headers, timeout=10
+            )
+            resp.raise_for_status()
+            prs = resp.json()
+            if prs:
+                return prs[0]["number"]
+            parts = branch.split("/")
+            title = parts[-1] if parts else branch
+            resp = httpx.post(
+                f"{base_url}/repos/{owner}/{repo}/pulls",
+                json={"title": f"feat: {title}", "head": branch, "base": "main",
+                      "body": f"Auto-created by orchestrator after developer run_id={run_id}"},
+                headers=headers, timeout=10
+            )
+            resp.raise_for_status()
+            pr_number = resp.json()["number"]
+            logger.info(f"Created PR #{pr_number} for {branch}")
+            return pr_number
+        except Exception as e:
+            logger.error(f"Failed to create PR for {branch}: {e}")
+            return None
+
+    def _auto_merge_pr(self, repo: str, branch: str, task_id: int, work_item_id: str):
+        import httpx
+        owner = settings.gitea_owner
+        headers = {"Authorization": f"token {settings.gitea_token}"}
+        base_url = f"{settings.gitea_url}/api/v1"
+        try:
+            resp = httpx.get(
+                f"{base_url}/repos/{owner}/{repo}/pulls",
+                params={"state": "open", "head": branch},
+                headers=headers, timeout=10
+            )
+            resp.raise_for_status()
+            prs = resp.json()
+            if not prs:
+                pr_number = self._ensure_pr(repo, branch, 0)
+                if not pr_number:
+                    return False
+            else:
+                pr_number = prs[0]["number"]
+            resp = httpx.post(
+                f"{base_url}/repos/{owner}/{repo}/pulls/{pr_number}/merge",
+                json={"Do": "merge"},
+                headers=headers, timeout=30
+            )
+            if resp.status_code in (200, 204):
+                logger.info(f"PR #{pr_number} merged for {branch}")
+                update_task_stage(task_id, "done")
+                notify_stage_change(task_id, "deploy", "done")
+                plane_notify_stage(work_item_id, "deploy", "done")
+                from ..notifications import send_telegram
+                send_telegram(f"\u2705 {work_item_id}: PR #{pr_number} merged! deploy -> done. Task complete.")
+                return True
+            else:
+                logger.error(f"Merge failed for PR #{pr_number}: {resp.status_code} {resp.text}")
+                from ..notifications import send_telegram
+                send_telegram(f"\u26a0\ufe0f {work_item_id}: Auto-merge failed (HTTP {resp.status_code}). Manual merge needed.")
+                return False
+        except Exception as e:
+            logger.error(f"Auto-merge failed for {branch}: {e}")
+            return False
+
+    def _write_task_file(self, repo: str, branch: str, task_file: str, content: str):
+        """Write task file directly into the task's worktree.
+
+        B-1 fix: no docker (direct open()). ORCH-2/S-4: the target is the per-branch
+        worktree (/repos/_wt/<repo>/<branch>), not the shared /repos/<repo>, so the
+        agent reads the task ZADANIE from its own isolated working copy.
+        Raise on failure instead of silently swallowing errors.
+        """
+        work_path = get_worktree_path(repo, branch)  # /repos/_wt/<repo>/<branch>
+        full_path = os.path.join(work_path, task_file)
+        try:
+            with open(full_path, "w", encoding="utf-8") as f:
+                f.write(content)
+            logger.info(f"Task file written: {full_path} ({len(content)} bytes)")
+        except OSError as e:
+            logger.error(f"Failed to write task file {full_path}: {e}")
+            raise RuntimeError(f"Failed to write task file: {e}")
+

 launcher = AgentLauncher()
--- a/src/config.py
+++ b/src/config.py
@@ -7,19 +7,57 @@ class Settings(BaseSettings):
    plane_api_token: str = ""
    plane_workspace_slug: str = ""
    plane_webhook_secret: str = ""
+    plane_project_id: str = ""

    # Gitea
    gitea_url: str = "http://localhost:3000"
    gitea_token: str = ""
    gitea_webhook_secret: str = ""
+    gitea_owner: str = "admin"
+    default_repo: str = "enduro-trails"
+
+    # ORCH-6: multi-repo project registry. JSON array of
+    #   {plane_project_id, repo, work_item_prefix, name}.
+    # Empty -> built-in default registry in src/projects.py.
+    projects_json: str = ""

    # Claude CLI
-    claude_bin: str = "/usr/bin/claude"
-    repos_dir: str = "/home/slin/repos"
+    claude_bin: str = "/opt/claude-code/bin/claude.exe"
+    repos_dir: str = "/repos"
+    host_repos_dir: str = "/home/slin/repos"
+    worktrees_dir: str = "/repos/_wt"  # ORCH-2 / S-4: isolated worktree per task/branch

    # DB
    db_path: str = "/app/data/orchestrator.db"

+    # ORCH-1 (F-2b): persistent job queue / background worker.
+    # max_concurrency  -> max agent jobs running in parallel (env ORCH_MAX_CONCURRENCY)
+    # queue_poll_interval -> worker loop poll seconds (env ORCH_QUEUE_POLL_INTERVAL)
+    max_concurrency: int = 1
+    queue_poll_interval: float = 2.0
+
+    # ORCH-1b (resilience): preflight + 429/rate-limit + backoff + circuit breaker.
+    # preflight_cache_ttl  -> cache the cheap CLI/network preflight result (seconds);
+    #                         the worker does NOT re-run `claude --version` more often
+    #                         than this (env ORCH_PREFLIGHT_CACHE_TTL).
+    # backoff_base_seconds -> base for exponential transient backoff.
+    # backoff_max_seconds  -> ceiling for the transient backoff.
+    # transient_max_attempts -> retry budget for transient (429/overload/network)
+    #                         failures, separate from code-fault `attempts`.
+    # breaker_threshold    -> consecutive transient failures that OPEN the breaker.
+    # breaker_pause_seconds -> how long the breaker stays open before half-open.
+    preflight_cache_ttl: int = 45
+    backoff_base_seconds: int = 10
+    backoff_max_seconds: int = 600
+    transient_max_attempts: int = 5
+    breaker_threshold: int = 3
+    breaker_pause_seconds: int = 300
+
+
+    # Telegram notifications
+    telegram_bot_token: str = ""
+    telegram_chat_id: str = ""
+
    class Config:
        env_prefix = "ORCH_"
        env_file = ".env"
--- a/src/db.py
+++ b/src/db.py
@@ -22,12 +22,14 @@ def init_db():
        CREATE TABLE IF NOT EXISTS tasks (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            plane_id TEXT,
+            work_item_id TEXT,
            repo TEXT NOT NULL,
            branch TEXT,
            stage TEXT DEFAULT 'created',
            agent_running TEXT,
            created_at TEXT DEFAULT (datetime('now')),
-            updated_at TEXT DEFAULT (datetime('now'))
+            updated_at TEXT DEFAULT (datetime('now')),
+            plane_issue_id TEXT
        );
        CREATE TABLE IF NOT EXISTS agent_runs (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
@@ -38,5 +40,334 @@ def init_db():
            exit_code INTEGER,
            output_path TEXT
        );
+        -- ORCH-1 (F-2b): persistent job queue. Webhook handlers enqueue a job and
+        -- return immediately; a background worker claims jobs (respecting
+        -- max_concurrency), spawns the claude agent, and updates the status.
+        -- Restart-safe: running jobs are requeued on startup (queue-recovery).
+        CREATE TABLE IF NOT EXISTS jobs (
+            id INTEGER PRIMARY KEY AUTOINCREMENT,
+            agent TEXT NOT NULL,
+            repo TEXT NOT NULL,
+            task_id INTEGER,                          -- FK tasks.id (nullable)
+            task_content TEXT,                        -- written to the agent task_file
+            status TEXT NOT NULL DEFAULT 'queued',    -- queued|running|done|failed
+            attempts INTEGER NOT NULL DEFAULT 0,
+            max_attempts INTEGER NOT NULL DEFAULT 2,
+            run_id INTEGER,                           -- agent_runs.id once started
+            error TEXT,                               -- last error message
+            transient_attempts INTEGER NOT NULL DEFAULT 0,  -- ORCH-1 resilience: 429/transient retries
+            available_at TEXT,                        -- ORCH-1 resilience: backoff gate (claim when <= now)
+            created_at TEXT DEFAULT (datetime('now')),
+            started_at TEXT,
+            finished_at TEXT
+        );
+        CREATE INDEX IF NOT EXISTS idx_jobs_status ON jobs(status, id);
    """)
+    # Lightweight migration: add resilience columns to a pre-existing jobs table
+    # (CREATE TABLE IF NOT EXISTS won't add columns to an already-created table).
+    _ensure_column(conn, "jobs", "transient_attempts", "INTEGER NOT NULL DEFAULT 0")
+    _ensure_column(conn, "jobs", "available_at", "TEXT")
    conn.close()
+
+
+def _ensure_column(conn, table: str, column: str, decl: str):
+    """Add a column to `table` if it does not already exist (idempotent migration)."""
+    cols = [r[1] for r in conn.execute(f"PRAGMA table_info({table})").fetchall()]
+    if column not in cols:
+        conn.execute(f"ALTER TABLE {table} ADD COLUMN {column} {decl}")
+        conn.commit()
+
+
+def get_task_by_plane_id(plane_id: str) -> dict | None:
+    """Find task by Plane work item ID (checks plane_id and plane_issue_id)."""
+    conn = get_db()
+    row = conn.execute(
+        "SELECT * FROM tasks WHERE plane_id = ? OR plane_issue_id = ?", (plane_id, plane_id)
+    ).fetchone()
+    conn.close()
+    if row:
+        return dict(row)
+    return None
+
+
+def get_task_by_repo_branch(repo: str, branch: str) -> dict | None:
+    """Find task by repo and branch name."""
+    conn = get_db()
+    row = conn.execute(
+        "SELECT * FROM tasks WHERE repo = ? AND branch = ?", (repo, branch)
+    ).fetchone()
+    conn.close()
+    if row:
+        return dict(row)
+    return None
+
+
+def update_task_stage(task_id: int, stage: str):
+    """Update task stage and timestamp."""
+    conn = get_db()
+    conn.execute(
+        "UPDATE tasks SET stage = ?, updated_at = datetime('now') WHERE id = ?",
+        (stage, task_id),
+    )
+    conn.commit()
+    conn.close()
+
+
+def get_next_work_item_id(repo: str, prefix: str = "ET") -> str:
+    """Generate next work item ID (e.g., ET-003 / ORCH-001).
+
+    ORCH-6: numbering is per (repo, prefix). The prefix comes from the project
+    registry (proj.work_item_prefix), so orchestrator issues number ORCH-001,
+    ORCH-002 independently of the ET sequence in enduro-trails. Default prefix
+    stays "ET" for backward compatibility with existing callers.
+    """
+    conn = get_db()
+    row = conn.execute(
+        "SELECT work_item_id FROM tasks "
+        "WHERE repo = ? AND work_item_id LIKE ? AND work_item_id IS NOT NULL "
+        "ORDER BY id DESC LIMIT 1",
+        (repo, f"{prefix}-%"),
+    ).fetchone()
+    conn.close()
+
+    if row and row["work_item_id"]:
+        # Parse <PREFIX>-003 -> 3, increment (keep the existing prefix).
+        existing_prefix, num = row["work_item_id"].rsplit("-", 1)
+        prefix = existing_prefix
+        next_num = int(num) + 1
+    else:
+        next_num = 1
+
+    return f"{prefix}-{next_num:03d}"
+
+
+# ---------------------------------------------------------------------------
+# ORCH-1 (F-2b): job queue helpers
+# ---------------------------------------------------------------------------
+
+def enqueue_job(
+    agent: str,
+    repo: str,
+    task_content: str | None = None,
+    task_id: int | None = None,
+    max_attempts: int = 2,
+) -> int:
+    """Enqueue a new job (status='queued'). Returns the new job id.
+
+    This is what webhook handlers call instead of launching an agent in-process:
+    it is a fast DB INSERT that returns immediately. The background worker
+    (queue_worker) picks the job up later.
+    """
+    conn = get_db()
+    cursor = conn.execute(
+        "INSERT INTO jobs (agent, repo, task_id, task_content, max_attempts) "
+        "VALUES (?, ?, ?, ?, ?)",
+        (agent, repo, task_id, task_content, max_attempts),
+    )
+    job_id = cursor.lastrowid
+    conn.commit()
+    conn.close()
+    return job_id
+
+
+def claim_next_job() -> dict | None:
+    """Atomically claim the oldest queued job and mark it 'running'.
+
+    Atomicity: the UPDATE carries the `status='queued'` guard in its WHERE clause
+    and we check `rowcount`. If two worker ticks race for the same row, only the
+    first UPDATE flips it to 'running' (rowcount==1); the loser sees rowcount==0
+    and retries the SELECT. We rely on SQLite's default per-connection transaction
+    so the SELECT+UPDATE pair is consistent. Returns the claimed job dict or None
+    when the queue is empty.
+    """
+    conn = get_db()
+    try:
+        while True:
+            row = conn.execute(
+                "SELECT id FROM jobs WHERE status='queued' "
+                "AND (available_at IS NULL OR available_at <= datetime('now')) "
+                "ORDER BY id LIMIT 1"
+            ).fetchone()
+            if not row:
+                return None
+            job_id = row["id"]
+            cur = conn.execute(
+                "UPDATE jobs SET status='running', "
+                "attempts = attempts + 1, started_at = datetime('now') "
+                "WHERE id = ? AND status='queued'",
+                (job_id,),
+            )
+            conn.commit()
+            if cur.rowcount == 1:
+                claimed = conn.execute(
+                    "SELECT * FROM jobs WHERE id = ?", (job_id,)
+                ).fetchone()
+                return dict(claimed)
+            # Lost the race for this row; loop and try the next queued job.
+    finally:
+        conn.close()
+
+
+def mark_job_transient(job_id: int, available_at_sql_offset_seconds: int,
+                       error: str | None = None) -> None:
+    """ORCH-1 resilience: requeue a job after a *transient* failure (429/overload/net).
+
+    Increments `transient_attempts` (separate from the code-fault `attempts`),
+    sets status back to 'queued', and gates re-pickup via `available_at` =
+    now + backoff seconds. started_at/finished_at are cleared.
+    """
+    conn = get_db()
+    sets = [
+        "status='queued'",
+        "transient_attempts = transient_attempts + 1",
+        "available_at = datetime('now', ?)",
+        "started_at = NULL",
+        "finished_at = NULL",
+    ]
+    params: list = [f"+{int(available_at_sql_offset_seconds)} seconds"]
+    if error is not None:
+        sets.append("error = ?")
+        params.append(error)
+    params.append(job_id)
+    conn.execute(f"UPDATE jobs SET {', '.join(sets)} WHERE id = ?", params)
+    conn.commit()
+    conn.close()
+
+
+def mark_job(
+    job_id: int,
+    status: str,
+    run_id: int | None = None,
+    error: str | None = None,
+):
+    """Update a job's status (queued|running|done|failed).
+
+    - run_id (optional): link to the agent_runs row that executed this job.
+    - error (optional): last error message (for failed/retry).
+    - 'done'/'failed' also stamp finished_at.
+    - 'queued' (requeue for retry) clears started_at/finished_at so the next
+      claim treats it as fresh.
+    """
+    conn = get_db()
+    sets = ["status = ?"]
+    params: list = [status]
+    if run_id is not None:
+        sets.append("run_id = ?")
+        params.append(run_id)
+    if error is not None:
+        sets.append("error = ?")
+        params.append(error)
+    if status in ("done", "failed"):
+        sets.append("finished_at = datetime('now')")
+    elif status == "queued":
+        sets.append("started_at = NULL")
+        sets.append("finished_at = NULL")
+    params.append(job_id)
+    conn.execute(f"UPDATE jobs SET {', '.join(sets)} WHERE id = ?", params)
+    conn.commit()
+    conn.close()
+
+
+def count_running_jobs() -> int:
+    """Number of jobs currently in 'running' status (for max_concurrency)."""
+    conn = get_db()
+    n = conn.execute(
+        "SELECT COUNT(*) FROM jobs WHERE status='running'"
+    ).fetchone()[0]
+    conn.close()
+    return int(n)
+
+
+def requeue_running_jobs() -> int:
+    """Queue-recovery: on startup, any job left 'running' belongs to a worker that
+    died on restart -> put it back to 'queued'. attempts are kept as-is (the next
+    claim does NOT re-increment beyond what is needed; claim_next_job increments on
+    pickup). Returns the number of requeued jobs.
+    """
+    conn = get_db()
+    cur = conn.execute(
+        "UPDATE jobs SET status='queued', started_at = NULL "
+        "WHERE status='running'"
+    )
+    conn.commit()
+    n = cur.rowcount
+    conn.close()
+    return int(n)
+
+
+def get_job(job_id: int) -> dict | None:
+    """Fetch a single job by id."""
+    conn = get_db()
+    row = conn.execute("SELECT * FROM jobs WHERE id = ?", (job_id,)).fetchone()
+    conn.close()
+    return dict(row) if row else None
+
+
+def job_status_counts() -> dict:
+    """Return counts grouped by status (for /queue observability)."""
+    conn = get_db()
+    rows = conn.execute(
+        "SELECT status, COUNT(*) AS n FROM jobs GROUP BY status"
+    ).fetchall()
+    conn.close()
+    counts = {"queued": 0, "running": 0, "done": 0, "failed": 0}
+    for r in rows:
+        counts[r["status"]] = r["n"]
+    return counts
+
+
+def recent_jobs(limit: int = 10) -> list[dict]:
+    """Return the most recent jobs (for /queue observability)."""
+    conn = get_db()
+    rows = conn.execute(
+        "SELECT * FROM jobs ORDER BY id DESC LIMIT ?", (limit,)
+    ).fetchall()
+    conn.close()
+    return [dict(r) for r in rows]
+
+
+# ---------------------------------------------------------------------------
+# ORCH-1b (resilience): transient backoff helpers
+# ---------------------------------------------------------------------------
+
+def requeue_job_transient(job_id: int, delay_seconds: float, error: str | None = None):
+    """ORCH-1b: requeue a job after a TRANSIENT (429/overload/network) failure.
+
+    Unlike a code-fault requeue, this:
+      - increments `transient_attempts` (a separate budget from code-fault attempts)
+      - sets `available_at = now + delay_seconds` so claim_next_job won't pick it
+        up until the backoff window elapses
+      - sets status back to 'queued' and clears started_at/finished_at
+
+    delay_seconds is computed by the caller (exp backoff, capped, Retry-After).
+    """
+    conn = get_db()
+    conn.execute(
+        "UPDATE jobs SET status='queued', "
+        "transient_attempts = transient_attempts + 1, "
+        "available_at = datetime('now', ? || ' seconds'), "
+        "started_at = NULL, finished_at = NULL, "
+        "error = COALESCE(?, error) "
+        "WHERE id = ?",
+        (f"+{int(round(delay_seconds))}", error, job_id),
+    )
+    conn.commit()
+    conn.close()
+
+
+def compute_backoff(transient_attempts: int, retry_after: float | None = None) -> float:
+    """ORCH-1b: exponential backoff (seconds) for a transient failure.
+
+    delay = min(2**transient_attempts * base, max). If the server sent a
+    Retry-After hint we honour it as a floor (use the larger of the two so we
+    never poll sooner than the server asked).
+
+    `transient_attempts` is the count AFTER this failure (i.e. how many transient
+    failures have occurred), so the first backoff uses 2**1.
+    """
+    base = getattr(settings, "backoff_base_seconds", 10)
+    cap = getattr(settings, "backoff_max_seconds", 600)
+    exp = min((2 ** max(transient_attempts, 0)) * base, cap)
+    if retry_after is not None and retry_after > 0:
+        return float(min(max(exp, retry_after), cap))
+    return float(exp)
--- a/src/error_classifier.py
+++ b/src/error_classifier.py
@@ -0,0 +1,87 @@
+"""ORCH-1 resilience: classify an agent failure as transient vs permanent.
+
+Rate limits / overload / network blips cannot be reliably predicted in advance,
+so we classify *after the run* by scanning the agent's combined stdout/stderr log
+(B-2 sends both to /app/data/runs/<run_id>.log).
+
+- transient -> 429 / rate limit / overloaded / network / quota-exhausted etc.
+              => backoff + transient retry (separate counter, larger budget).
+- permanent -> a genuine code fault / agent error
+              => normal attempts < max_attempts, then 'failed'.
+
+Also extracts a Retry-After hint (seconds) when the server provided one.
+"""
+import re
+
+# Case-insensitive substrings/patterns that signal a transient/rate-limit issue.
+_TRANSIENT_PATTERNS = [
+    r"\b429\b",
+    r"rate[\s_-]*limit",
+    r"rate_limit_error",
+    r"overloaded",
+    r"overloaded_error",
+    r"too many requests",
+    r"quota",
+    r"insufficient[_\s-]*quota",
+    r"retry[\s-]*after",
+    r"service unavailable",
+    r"\b503\b",
+    r"\b529\b",
+    r"timed out",
+    r"timeout",
+    r"connection (reset|refused|error|aborted)",
+    r"temporarily unavailable",
+    r"econnreset",
+    r"etimedout",
+]
+
+_TRANSIENT_RE = re.compile("|".join(_TRANSIENT_PATTERNS), re.IGNORECASE)
+
+# Retry-After: header style ("Retry-After: 30") or JSON ("retry_after": 30) or
+# "retry after 30 seconds". Returns the integer seconds.
+_RETRY_AFTER_RE = re.compile(
+    r"retry[\s_-]*after[\"']?\s*[:=]?\s*[\"']?\s*(\d+)",
+    re.IGNORECASE,
+)
+
+
+def classify_text(text: str) -> str:
+    """Return 'transient' or 'permanent' for a chunk of log/stderr text."""
+    if not text:
+        return "permanent"
+    return "transient" if _TRANSIENT_RE.search(text) else "permanent"
+
+
+def parse_retry_after(text: str) -> int | None:
+    """Return Retry-After seconds if present in the text, else None."""
+    if not text:
+        return None
+    m = _RETRY_AFTER_RE.search(text)
+    if m:
+        try:
+            return int(m.group(1))
+        except (TypeError, ValueError):
+            return None
+    return None
+
+
+def classify_log_file(path: str, tail_bytes: int = 16384) -> tuple[str, int | None]:
+    """Classify the tail of a log file.
+
+    Reads the last `tail_bytes` of the log (rate-limit messages appear near the
+    end) and returns (classification, retry_after_seconds_or_None).
+    On any read error, treats it as 'permanent' (no special backoff).
+    """
+    if not path:
+        return "permanent", None
+    try:
+        with open(path, "rb") as f:
+            try:
+                f.seek(-tail_bytes, 2)
+            except OSError:
+                f.seek(0)
+            data = f.read()
+        text = data.decode("utf-8", errors="replace")
+    except Exception:
+        return "permanent", None
+    return classify_text(text), parse_retry_after(text)
--- a/src/git_worktree.py
+++ b/src/git_worktree.py
@@ -0,0 +1,107 @@
+"""Git worktree management — isolated working copy per task/branch (ORCH-2 / S-4).
+
+Background
+----------
+Previously every git operation (checkout/commit/push/test) ran in the single shared
+clone ``/repos/<repo>``. With two active tasks a ``git checkout`` of one branch would
+overwrite the working copy of the other -> races (see AUDIT S-4 / ET-009 "two collectors").
+
+Solution
+--------
+Each task (branch) gets an isolated git worktree::
+
+    /repos/<repo>                      <- main clone (fetch / worktree management)
+    /repos/_wt/<repo>/<safe-branch>    <- worktree for one task/branch (agent works here)
+
+A branch can only be checked out in ONE worktree at a time, which is exactly the
+property we want: one task = one branch = one worktree.
+"""
+import os
+import re
+import subprocess
+import logging
+from .config import settings
+
+logger = logging.getLogger("orchestrator.git_worktree")
+
+
+def _safe(branch: str) -> str:
+    """Filesystem-safe branch name for use in a path component."""
+    return re.sub(r"[^A-Za-z0-9._-]", "_", branch)
+
+
+def get_worktree_path(repo: str, branch: str) -> str:
+    """Path of the worktree for (repo, branch). Does NOT create it."""
+    return os.path.join(settings.worktrees_dir, repo, _safe(branch))
+
+
+def _main_repo(repo: str) -> str:
+    return os.path.join(settings.repos_dir, repo)
+
+
+def ensure_worktree(repo: str, branch: str) -> str:
+    """Create (or reuse) an isolated worktree for ``branch``. Returns its path.
+
+    Main clone stays at ``/repos/<repo>``. Worktree lives at
+    ``/repos/_wt/<repo>/<safe-branch>``.
+
+    - If the worktree already exists, it is fetched + fast-aligned to the branch
+      (and to ``origin/<branch>`` when that remote branch exists).
+    - If the branch exists (locally or on origin) it is checked out into a fresh
+      worktree; otherwise a new branch is created from ``origin/main``.
+    """
+    main_repo = _main_repo(repo)
+    wt = get_worktree_path(repo, branch)
+
+    if not os.path.isdir(main_repo):
+        raise FileNotFoundError(f"Main repo not found: {main_repo}")
+
+    # Always refresh refs in the main clone first.
+    subprocess.run(["git", "-C", main_repo, "fetch", "origin"],
+                   capture_output=True, timeout=60)
+
+    # Reuse existing worktree (.git may be a dir or a file pointer for worktrees).
+    if os.path.isdir(os.path.join(wt, ".git")) or os.path.isfile(os.path.join(wt, ".git")):
+        subprocess.run(["git", "-C", wt, "fetch", "origin"], capture_output=True, timeout=60)
+        subprocess.run(["git", "-C", wt, "checkout", branch], capture_output=True, timeout=30)
+        # Align to remote only if the remote branch exists (avoid wiping local-only work).
+        rb = subprocess.run(
+            ["git", "-C", wt, "rev-parse", "--verify", "--quiet", f"origin/{branch}"],
+            capture_output=True,
+        )
+        if rb.returncode == 0:
+            subprocess.run(["git", "-C", wt, "reset", "--hard", f"origin/{branch}"],
+                           capture_output=True, timeout=30)
+        logger.info(f"Worktree reused: {wt} (branch {branch})")
+        return wt
+
+    os.makedirs(os.path.dirname(wt), exist_ok=True)
+
+    # Try to attach an existing branch (local or remote-tracking) to the new worktree.
+    r = subprocess.run(["git", "-C", main_repo, "worktree", "add", wt, branch],
+                       capture_output=True, text=True, timeout=60)
+    if r.returncode != 0:
+        # Branch doesn't exist yet — create it from origin/main.
+        r2 = subprocess.run(
+            ["git", "-C", main_repo, "worktree", "add", "-b", branch, wt, "origin/main"],
+            capture_output=True, text=True, timeout=60,
+        )
+        if r2.returncode != 0:
+            raise RuntimeError(
+                f"git worktree add failed for {repo}:{branch}: "
+                f"{r.stderr.strip()} | {r2.stderr.strip()}"
+            )
+    logger.info(f"Worktree ready: {wt} (branch {branch})")
+    return wt
+
+
+def remove_worktree(repo: str, branch: str):
+    """Remove the worktree for (repo, branch) — optional cleanup when a task is done."""
+    main_repo = _main_repo(repo)
+    wt = get_worktree_path(repo, branch)
+    subprocess.run(["git", "-C", main_repo, "worktree", "remove", "--force", wt],
+                   capture_output=True, timeout=30)
+    # Prune dangling administrative entries.
+    subprocess.run(["git", "-C", main_repo, "worktree", "prune"],
+                   capture_output=True, timeout=30)
+    logger.info(f"Worktree removed: {wt}")
--- a/src/main.py
+++ b/src/main.py
@@ -1,14 +1,75 @@
 from fastapi import FastAPI
 from contextlib import asynccontextmanager
+import logging
 from .db import init_db
 from .webhooks.plane import router as plane_router
 from .webhooks.gitea import router as gitea_router

+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
+)
+

@asynccontextmanager
 async def lifespan(app: FastAPI):
    init_db()
-    yield
+    # M-1: proper orphan-recovery.
+    # An orphan = an agent_run with no finished_at that is older than the recovery
+    # window. After a uvicorn restart the monitor thread is gone, so its child claude
+    # process (if any) was reparented to init; we cannot kill it by pid (pid is not
+    # persisted). Instead of silently writing exit=-1, we: enumerate each orphan,
+    # mark it exit=-1, log a warning per run, and notify so a human can check/restart.
+    log = logging.getLogger('orchestrator')
+    from .db import get_db
+    conn = get_db()
+    orphan_rows = conn.execute(
+        "SELECT id, task_id, agent FROM agent_runs "
+        "WHERE finished_at IS NULL AND started_at < datetime('now', '-35 minutes')"
+    ).fetchall()
+    for row in orphan_rows:
+        run_id, task_id, agent = row[0], row[1], row[2]
+        conn.execute(
+            "UPDATE agent_runs SET finished_at=datetime('now'), exit_code=-1 WHERE id=?",
+            (run_id,),
+        )
+        log.warning(
+            f"Orphan run {run_id} (task {task_id}, agent {agent}) recovered — "
+            f"manual check needed (process may have been killed on restart)"
+        )
+    conn.commit()
+    conn.close()
+    if orphan_rows:
+        try:
+            from .notifications import send_telegram
+            ids = ", ".join(str(r[0]) for r in orphan_rows)
+            send_telegram(
+                f"\u26a0\ufe0f Orchestrator restart: {len(orphan_rows)} orphaned agent run(s) "
+                f"(run_id: {ids}) marked exit=-1. Нужна ручная проверка/перезапуск."
+            )
+        except Exception:
+            pass
+        log.warning(f"Recovered {len(orphan_rows)} orphaned agent runs")
+
+    # ORCH-1 (F-2b): queue-recovery. Any job left in 'running' status belongs to a
+    # worker that died on the previous restart -> put it back to 'queued' so the
+    # worker re-picks it up (restart-safe, no lost work). Runs AFTER M-1.
+    from .db import requeue_running_jobs
+    requeued = requeue_running_jobs()
+    if requeued:
+        log.warning(f"Queue-recovery: requeued {requeued} running job(s) after restart")
+
+    # Start the background job-queue worker (ORCH-1).
+    from .queue_worker import worker
+    worker.start()
+
+    try:
+        yield
+    finally:
+        # Graceful shutdown of the worker (running agents keep going; their jobs
+        # are requeued on next start via queue-recovery if the process dies).
+        worker.stop()


 app = FastAPI(title="Multi-Agent Orchestrator", lifespan=lifespan)
@@ -30,3 +91,17 @@ async def status():
    ).fetchall()
    conn.close()
    return {"active_tasks": [dict(t) for t in tasks]}
+
+
+@app.get("/queue")
+async def queue():
+    """ORCH-1: job-queue observability — status counts + recent jobs."""
+    from .db import job_status_counts, recent_jobs
+    from .queue_worker import worker
+    return {
+        "counts": job_status_counts(),
+        "max_concurrency": worker.max_concurrency,
+        "poll_interval": worker.poll_interval,
+        "resilience": worker.status(),
+        "recent": recent_jobs(10),
+    }
--- a/src/notifications.py
+++ b/src/notifications.py
@@ -0,0 +1,125 @@
+"""Notifications and logging for orchestrator events."""
+
+import logging
+import httpx
+
+logger = logging.getLogger("orchestrator")
+
+# Lazy import to avoid circular imports at module level
+_settings = None
+
+
+def _get_settings():
+    global _settings
+    if _settings is None:
+        from .config import settings
+        _settings = settings
+    return _settings
+
+
+def send_telegram(text: str):
+    """Send notification to Telegram. Fire-and-forget, never raises."""
+    s = _get_settings()
+    if not s.telegram_bot_token or not s.telegram_chat_id:
+        return
+    try:
+        url = f"https://api.telegram.org/bot{s.telegram_bot_token}/sendMessage"
+        httpx.post(
+            url,
+            json={
+                "chat_id": s.telegram_chat_id,
+                "text": text,
+                "parse_mode": "HTML",
+                "disable_notification": False,
+            },
+            timeout=5,
+        )
+    except Exception:
+        pass  # Never crash orchestrator due to notification failure
+
+
+def _get_work_item_id(task_id: int) -> str:
+    """Get work_item_id from DB by task_id."""
+    try:
+        from .db import get_db
+        conn = get_db()
+        row = conn.execute("SELECT work_item_id FROM tasks WHERE id=?", (task_id,)).fetchone()
+        conn.close()
+        return row[0] if row and row[0] else f"task-{task_id}"
+    except Exception:
+        return f"task-{task_id}"
+
+
+def notify_stage_change(task_id: int, old_stage: str, new_stage: str, agent: str = None):
+    """Log and notify stage transition."""
+    work_item_id = _get_work_item_id(task_id)
+    msg = f"\U0001f504 {work_item_id}: {old_stage} \u2192 {new_stage}"
+    if agent:
+        msg += f" (\u0437\u0430\u043f\u0443\u0449\u0435\u043d {agent})"
+    logger.info(msg)
+    send_telegram(msg)
+
+
+def notify_agent_started(run_id: int, agent: str, task_id: int):
+    """Notify agent launch."""
+    work_item_id = _get_work_item_id(task_id)
+    msg = f"\U0001f680 {work_item_id}: {agent} \u0437\u0430\u043f\u0443\u0449\u0435\u043d (run_id={run_id})"
+    logger.info(msg)
+    send_telegram(msg)
+
+
+def notify_agent_finished(run_id: int, agent: str, exit_code: int, task_id: int = None, duration_s: int = None):
+    """Notify agent completion."""
+    work_item_id = _get_work_item_id(task_id) if task_id else "?"
+    if exit_code == 0:
+        dur = f" ({duration_s // 60} \u043c\u0438\u043d)" if duration_s else ""
+        msg = f"\u2705 {work_item_id}: {agent} \u0437\u0430\u0432\u0435\u0440\u0448\u0438\u043b{dur}"
+    elif exit_code == -9:
+        msg = f"\u23f0 {work_item_id}: {agent} \u0443\u0431\u0438\u0442 \u043f\u043e \u0442\u0430\u0439\u043c\u0430\u0443\u0442\u0443 (30 \u043c\u0438\u043d)"
+    else:
+        msg = f"\u274c {work_item_id}: {agent} \u0443\u043f\u0430\u043b (exit_code={exit_code})"
+    logger.info(msg)
+    send_telegram(msg)
+
+
+def notify_qg_result(task_id: int, check: str, passed: bool, reason: str = None):
+    """Notify QG check result."""
+    work_item_id = _get_work_item_id(task_id)
+    if passed:
+        msg = f"\u2705 {work_item_id}: QG {check} \u2014 passed"
+    else:
+        msg = f"\u26a0\ufe0f {work_item_id}: QG {check} \u2014 failed: {reason}"
+    logger.info(msg)
+    send_telegram(msg)
+
+
+def notify_qg_failure(task_id: int, stage: str, check: str, reason: str):
+    """Log and notify QG check failure."""
+    work_item_id = _get_work_item_id(task_id)
+    msg = f"\u26a0\ufe0f {work_item_id}: QG {check} \u2014 failed: {reason}"
+    logger.warning(msg)
+    send_telegram(msg)
+
+
+def notify_approve_requested(task_id: int):
+    """Notify that analyst requests :approved:."""
+    work_item_id = _get_work_item_id(task_id)
+    msg = f"\U0001f4cb {work_item_id}: BRD/\u0422\u0417/AC \u0433\u043e\u0442\u043e\u0432\u044b. \u0416\u0434\u0443 :approved: \u0432 Plane"
+    logger.info(msg)
+    send_telegram(msg)
+
+
+def notify_done(task_id: int):
+    """Notify task completion."""
+    work_item_id = _get_work_item_id(task_id)
+    msg = f"\U0001f389 {work_item_id}: \u0437\u0430\u0434\u0430\u0447\u0430 \u0437\u0430\u0432\u0435\u0440\u0448\u0435\u043d\u0430!"
+    logger.info(msg)
+    send_telegram(msg)
+
+
+def notify_error(task_id: int, error: str):
+    """Log and notify error for a task."""
+    work_item_id = _get_work_item_id(task_id) if task_id else "system"
+    msg = f"\U0001f534 {work_item_id}: ERROR \u2014 {error}"
+    logger.error(msg)
+    send_telegram(msg)
--- a/src/plane_sync.py
+++ b/src/plane_sync.py
@@ -0,0 +1,242 @@
+"""Plane API sync — update issue state and add comments."""
+
+import logging
+import httpx
+from .config import settings
+
+logger = logging.getLogger("orchestrator.plane_sync")
+
+PLANE_BASE = f"{settings.plane_api_url}/api/v1"
+PLANE_HEADERS = {"X-API-Key": settings.plane_api_token}
+WORKSPACE = settings.plane_workspace_slug
+PROJECT_ID = settings.plane_project_id or "7a79f0a9-5278-49cd-9007-9a338f238f9c"
+
+
+def _resolve_project_id(work_item_id: str = None, project_id: str = None) -> str:
+    """ORCH-6: resolve the Plane project id for a sync call.
+
+    Priority:
+      1. explicit project_id arg (caller already knows the project),
+      2. project derived from the task's repo in the DB (by work_item_id),
+      3. legacy default PROJECT_ID (enduro) for backward compatibility.
+    """
+    if project_id:
+        return project_id
+    if work_item_id:
+        try:
+            from .db import get_db
+            from .projects import get_project_by_repo
+            conn = get_db()
+            row = conn.execute(
+                "SELECT repo FROM tasks WHERE work_item_id = ? ORDER BY id DESC LIMIT 1",
+                (work_item_id,),
+            ).fetchone()
+            conn.close()
+            if row and row[0]:
+                proj = get_project_by_repo(row[0])
+                if proj:
+                    return proj.plane_project_id
+        except Exception as e:
+            logger.debug(f"_resolve_project_id fallback for {work_item_id}: {e}")
+    return PROJECT_ID
+
+# Plane state IDs
+PLANE_STATES = {
+    "backlog": "113b24f6-cce8-4be9-9a22-a359b9cf0122",
+    "todo": "2c7d3df3-9eb9-419b-92b7-d7d560bcdd10",
+    "in_progress": "b873d9eb-993c-48cd-97ac-99a9b1623967",
+    "needs_input": "babf08a3-ff4d-41f3-a821-5491aa29a8ac",
+    "in_review": "38fb1f64-aa1e-48a3-92e0-0b109679046b",
+    "blocked": "6c4543f9-ac47-4ef7-ae0f-070020dc9920",
+    "done": "381a2833-3c4e-4be5-bd0f-be84cb946ad8",
+    "cancelled": "b1cae7f9-961d-4889-a179-f3acea697d17",
+}
+
+# Map orchestrator stages to Plane states
+STAGE_TO_STATE = {
+    "created": PLANE_STATES["todo"],
+    "analysis": PLANE_STATES["in_progress"],
+    "architecture": PLANE_STATES["in_progress"],
+    "development": PLANE_STATES["in_progress"],
+    "review": PLANE_STATES["in_progress"],
+    "testing": PLANE_STATES["in_progress"],
+    "deploy": PLANE_STATES["in_progress"],
+    "done": PLANE_STATES["done"],
+}
+
+
+def find_issue_id(work_item_id: str, project_id: str = None) -> str | None:
+    """Find Plane issue UUID by work_item_id (e.g. 'ET-002')."""
+    project_id = _resolve_project_id(work_item_id, project_id)
+    # Primary: lookup from DB (plane_issue_id column)
+    try:
+        from .db import get_db
+        conn = get_db()
+        row = conn.execute(
+            "SELECT plane_issue_id FROM tasks WHERE work_item_id = ? AND plane_issue_id IS NOT NULL",
+            (work_item_id,)
+        ).fetchone()
+        if row and row[0]:
+            return row[0]
+    except Exception as e:
+        logger.debug(f"DB lookup failed for {work_item_id}: {e}")
+
+    # Fallback: search via Plane API
+    url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{project_id}/issues/"
+    try:
+        # First try search by work_item_id
+        resp = httpx.get(url, headers=PLANE_HEADERS, params={"search": work_item_id}, timeout=10)
+        resp.raise_for_status()
+        data = resp.json()
+        results = data.get("results", data if isinstance(data, list) else [])
+        for issue in results:
+            seq = issue.get("sequence_id")
+            identifier = f"ET-{seq:03d}" if seq else ""
+            if identifier == work_item_id or work_item_id in issue.get("name", ""):
+                return issue["id"]
+        # Fallback: get all issues and match by sequence_id number
+        if work_item_id.startswith("ET-"):
+            try:
+                target_num = int(work_item_id.split("-")[1])
+            except (IndexError, ValueError):
+                target_num = None
+            if target_num:
+                resp2 = httpx.get(url, headers=PLANE_HEADERS, timeout=10)
+                resp2.raise_for_status()
+                data2 = resp2.json()
+                results2 = data2.get("results", data2 if isinstance(data2, list) else [])
+                for issue in results2:
+                    if issue.get("sequence_id") == target_num:
+                        return issue["id"]
+    except Exception as e:
+        logger.error(f"Failed to find issue for {work_item_id}: {e}")
+    return None
+
+
+def update_issue_state(work_item_id: str, stage: str, project_id: str = None):
+    """Update Plane issue state based on orchestrator stage."""
+    state_id = STAGE_TO_STATE.get(stage)
+    if not state_id:
+        return
+
+    project_id = _resolve_project_id(work_item_id, project_id)
+    issue_id = find_issue_id(work_item_id, project_id)
+    if not issue_id:
+        logger.warning(f"Issue not found in Plane for {work_item_id}")
+        return
+
+    url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{project_id}/issues/{issue_id}/"
+    try:
+        resp = httpx.patch(url, headers=PLANE_HEADERS, json={"state": state_id}, timeout=10)
+        resp.raise_for_status()
+        logger.info(f"Plane: {work_item_id} state -> {stage} ({state_id[:8]}...)")
+    except Exception as e:
+        logger.error(f"Failed to update Plane state for {work_item_id}: {e}")
+
+
+def add_comment(work_item_id: str, text: str, project_id: str = None):
+    """Add a comment to Plane issue."""
+    project_id = _resolve_project_id(work_item_id, project_id)
+    issue_id = find_issue_id(work_item_id, project_id)
+    if not issue_id:
+        logger.warning(f"Issue not found in Plane for {work_item_id}, skipping comment")
+        return
+
+    url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{project_id}/issues/{issue_id}/comments/"
+    html = f"<p>{text}</p>"
+    try:
+        resp = httpx.post(url, headers=PLANE_HEADERS, json={"comment_html": html}, timeout=10)
+        resp.raise_for_status()
+        logger.info(f"Plane: comment added to {work_item_id}")
+    except Exception as e:
+        logger.error(f"Failed to add comment to {work_item_id}: {e}")
+
+
+
+def set_issue_needs_input(work_item_id: str, project_id: str = None):
+    """Set issue to 'Needs Input' state — waiting for stakeholder response."""
+    _set_issue_state_direct(work_item_id, PLANE_STATES["needs_input"], project_id)
+
+
+def set_issue_in_review(work_item_id: str, project_id: str = None):
+    """Set issue to 'In Review' state — waiting for :approved: or :rejected:."""
+    _set_issue_state_direct(work_item_id, PLANE_STATES["in_review"], project_id)
+
+
+def set_issue_blocked(work_item_id: str, project_id: str = None):
+    """Set issue to 'Blocked' state — manual intervention needed."""
+    _set_issue_state_direct(work_item_id, PLANE_STATES["blocked"], project_id)
+
+
+def set_issue_in_progress(work_item_id: str, project_id: str = None):
+    """Set issue to 'In Progress' state — agent working."""
+    _set_issue_state_direct(work_item_id, PLANE_STATES["in_progress"], project_id)
+
+
+def _set_issue_state_direct(work_item_id: str, state_id: str, project_id: str = None):
+    """Set issue state directly by state_id."""
+    project_id = _resolve_project_id(work_item_id, project_id)
+    issue_id = find_issue_id(work_item_id, project_id)
+    if not issue_id:
+        logger.warning(f"Issue not found in Plane for {work_item_id}")
+        return
+    url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{project_id}/issues/{issue_id}/"
+    try:
+        resp = httpx.patch(url, headers=PLANE_HEADERS, json={"state": state_id}, timeout=10)
+        resp.raise_for_status()
+        logger.info(f"Plane: {work_item_id} state -> {state_id[:8]}...")
+    except Exception as e:
+        logger.error(f"Failed to update Plane state for {work_item_id}: {e}")
+
+
+def notify_stage_change(work_item_id: str, old_stage: str, new_stage: str, agent: str = None, project_id: str = None):
+    """Notify Plane about stage transition with links."""
+    project_id = _resolve_project_id(work_item_id, project_id)
+    update_issue_state(work_item_id, new_stage, project_id)
+
+    msg = f"🔄 Stage: {old_stage} → {new_stage}"
+    if agent:
+        msg += f" (launching {agent})"
+
+    # Add relevant links
+    gitea_base = "http://git.mva154.duckdns.org"
+    try:
+        from .db import get_db
+        conn = get_db()
+        row = conn.execute(
+            "SELECT branch, repo FROM tasks WHERE work_item_id=?", (work_item_id,)
+        ).fetchone()
+        conn.close()
+        if row:
+            branch, repo = row
+            msg += chr(10) + "📂 Branch: [" + branch + "](" + gitea_base + "/admin/" + repo + "/src/branch/" + branch + ")"
+            if new_stage in ("review", "testing", "deploy"):
+                import httpx as _httpx
+                from .config import settings
+                _headers = {"Authorization": f"token {settings.gitea_token}"}
+                _resp = _httpx.get(
+                    f"{settings.gitea_url}/api/v1/repos/{settings.gitea_owner}/{repo}/pulls",
+                    params={"state": "open", "head": branch},
+                    headers=_headers, timeout=5
+                )
+                if _resp.status_code == 200:
+                    _prs = _resp.json()
+                    if _prs:
+                        pr_num = _prs[0]["number"]
+                        msg += chr(10) + "🔗 PR: [#" + str(pr_num) + "](" + gitea_base + "/admin/" + repo + "/pulls/" + str(pr_num) + ")"
+    except Exception:
+        pass
+
+    add_comment(work_item_id, msg, project_id)
+
+
+def notify_qg_failure(work_item_id: str, stage: str, check: str, reason: str, project_id: str = None):
+    """Notify Plane about QG failure."""
+    add_comment(work_item_id, f"⚠️ QG failed at {stage}: {check} — {reason}", project_id)
+
+
+def notify_done(work_item_id: str, project_id: str = None):
+    """Mark issue as Done in Plane."""
+    project_id = _resolve_project_id(work_item_id, project_id)
+    update_issue_state(work_item_id, "done", project_id)
+    add_comment(work_item_id, "✅ Task completed! PR merged and deployed.", project_id)
--- a/src/preflight.py
+++ b/src/preflight.py
@@ -0,0 +1,106 @@
+"""ORCH-1 resilience: cheap preflight check (CLI / network available?).
+
+Goal: before the worker claims a job, confirm the claude CLI binary and runtime
+are reachable WITHOUT spending any tokens. We only do local/cheap checks:
+
+  1. os.path.exists(CLAUDE_BIN)          -- instant
+  2. `claude --version` (timeout 5s)     -- spawns CLI, does NOT call the API
+
+The result is cached for `preflight_cache_ttl` seconds so we do not re-run
+`claude --version` on every worker tick.
+
+🚫 We deliberately do NOT do a prompt ping (ping->pong) — that would burn the
+rate limit and add latency. Preflight is local-only.
+"""
+import os
+import time
+import logging
+import subprocess
+
+from .config import settings
+
+logger = logging.getLogger("orchestrator.preflight")
+
+_VERSION_TIMEOUT = 5
+
+
+class _PreflightCache:
+    def __init__(self):
+        self.ts: float = 0.0
+        self.ok: bool = False
+        self.reason: str = "not checked yet"
+
+
+_cache = _PreflightCache()
+
+
+def _claude_bin() -> str:
+    """Resolve the claude binary preflight should check.
+
+    Must match the binary the launcher actually spawns. The launcher hardcodes
+    AgentLauncher.CLAUDE_BIN for the real Popen, so we prefer that; we only fall
+    back to settings.claude_bin / a default if it is somehow unset. (Note: the
+    container's ORCH_CLAUDE_BIN may point elsewhere; preflight follows the path
+    that is genuinely executed, not the unused env override.)
+    """
+    try:
+        from .agents.launcher import AgentLauncher
+        launcher_bin = getattr(AgentLauncher, "CLAUDE_BIN", None)
+        if launcher_bin and os.path.exists(launcher_bin):
+            return launcher_bin
+        # Launcher path not present -> fall back to configured/default.
+        return launcher_bin or getattr(settings, "claude_bin", None) or "/opt/claude-code/bin/claude.exe"
+    except Exception:
+        return getattr(settings, "claude_bin", None) or "/opt/claude-code/bin/claude.exe"
+
+
+def _run_version(bin_path: str) -> tuple[bool, str]:
+    """`claude --version` — proves the CLI runs without touching the API."""
+    try:
+        r = subprocess.run(
+            [bin_path, "--version"],
+            capture_output=True,
+            text=True,
+            timeout=_VERSION_TIMEOUT,
+        )
+        if r.returncode == 0:
+            return True, (r.stdout or r.stderr or "").strip()[:120] or "ok"
+        return False, f"--version exit {r.returncode}: {(r.stderr or r.stdout).strip()[:120]}"
+    except subprocess.TimeoutExpired:
+        return False, f"--version timed out after {_VERSION_TIMEOUT}s"
+    except FileNotFoundError:
+        return False, "claude binary not found (FileNotFoundError)"
+    except Exception as e:  # pragma: no cover - defensive
+        return False, f"--version error: {e}"
+
+
+def _compute() -> tuple[bool, str]:
+    bin_path = _claude_bin()
+    if not os.path.exists(bin_path):
+        return False, f"CLAUDE_BIN not found: {bin_path}"
+    return _run_version(bin_path)
+
+
+def check(force: bool = False) -> tuple[bool, str]:
+    """Return (ok, reason). Cached for preflight_cache_ttl seconds.
+
+    force=True bypasses the cache (used by the breaker half-open probe / tests).
+    """
+    now = time.time()
+    ttl = settings.preflight_cache_ttl
+    if not force and _cache.ts > 0 and (now - _cache.ts) < ttl:
+        return _cache.ok, _cache.reason
+    ok, reason = _compute()
+    _cache.ts = now
+    _cache.ok = ok
+    _cache.reason = reason
+    if not ok:
+        logger.warning(f"Preflight FAIL: {reason}")
+    return ok, reason
+
+
+def reset_cache() -> None:
+    """Invalidate the cache (tests / forced recheck)."""
+    _cache.ts = 0.0
+    _cache.ok = False
+    _cache.reason = "reset"
--- a/src/projects.py
+++ b/src/projects.py
@@ -0,0 +1,127 @@
+"""ORCH-6: Project registry — map Plane project id -> repo / work-item prefix.
+
+Root cause of the 2026-06-02 incident: the Plane webhook listened to the whole
+workspace and hardcoded ``repo = settings.default_repo`` (enduro-trails). Every
+issue from any project was funneled into one repo with one prefix (ET).
+
+This module introduces a small registry keyed by the Plane project uuid so the
+orchestrator can:
+  * filter webhooks by project (ignore unknown projects),
+  * resolve the gitea repo + work-item prefix for a known project,
+  * route Plane sync (state/comment) into the issue's own project.
+
+Source of truth: ``settings.projects_json`` (a JSON array set via the
+``ORCH_PROJECTS_JSON`` env var). If unset/empty/invalid, a built-in default
+registry is used so the system works out of the box.
+"""
+
+import json
+import logging
+from dataclasses import dataclass
+
+from .config import settings
+
+logger = logging.getLogger("orchestrator.projects")
+
+
+@dataclass(frozen=True)
+class ProjectConfig:
+    plane_project_id: str   # uuid of the Plane project (registry key)
+    repo: str               # gitea repo name (== folder under /repos)
+    work_item_prefix: str   # ET / ORCH
+    name: str               # human-readable label
+
+
+# Built-in default registry (used when ORCH_PROJECTS_JSON is empty/invalid).
+# Keep enduro-trails first so existing behaviour is the safe default.
+_DEFAULT_PROJECTS = [
+    ProjectConfig(
+        plane_project_id="7a79f0a9-5278-49cd-9007-9a338f238f9c",
+        repo="enduro-trails",
+        work_item_prefix="ET",
+        name="enduro-trails",
+    ),
+    ProjectConfig(
+        plane_project_id="8da6aa25-a60e-44d6-a1e2-d8ae59aa7d6a",
+        repo="orchestrator",
+        work_item_prefix="ORCH",
+        name="orchestrator",
+    ),
+]
+
+
+def _parse_projects_json(raw: str) -> list[ProjectConfig] | None:
+    """Parse ORCH_PROJECTS_JSON. Returns None if empty/invalid (-> use default)."""
+    if not raw or not raw.strip():
+        return None
+    try:
+        data = json.loads(raw)
+    except (ValueError, TypeError) as e:
+        logger.error(f"ORCH_PROJECTS_JSON is not valid JSON, falling back to default: {e}")
+        return None
+    if not isinstance(data, list):
+        logger.error("ORCH_PROJECTS_JSON must be a JSON array, falling back to default")
+        return None
+
+    parsed: list[ProjectConfig] = []
+    for i, item in enumerate(data):
+        if not isinstance(item, dict):
+            logger.error(f"ORCH_PROJECTS_JSON[{i}] is not an object, skipping")
+            continue
+        try:
+            parsed.append(
+                ProjectConfig(
+                    plane_project_id=str(item["plane_project_id"]),
+                    repo=str(item["repo"]),
+                    work_item_prefix=str(item["work_item_prefix"]),
+                    name=str(item.get("name", item["repo"])),
+                )
+            )
+        except KeyError as e:
+            logger.error(f"ORCH_PROJECTS_JSON[{i}] missing required key {e}, skipping")
+            continue
+    if not parsed:
+        logger.error("ORCH_PROJECTS_JSON produced no valid entries, falling back to default")
+        return None
+    return parsed
+
+
+def _load_projects() -> list[ProjectConfig]:
+    parsed = _parse_projects_json(getattr(settings, "projects_json", "") or "")
+    if parsed is not None:
+        logger.info(f"Project registry loaded from ORCH_PROJECTS_JSON: {len(parsed)} project(s)")
+        return parsed
+    return list(_DEFAULT_PROJECTS)
+
+
+# Module-level registry, built once at import.
+PROJECTS: list[ProjectConfig] = _load_projects()
+_BY_PLANE_ID: dict[str, ProjectConfig] = {p.plane_project_id: p for p in PROJECTS}
+_BY_REPO: dict[str, ProjectConfig] = {p.repo: p for p in PROJECTS}
+
+
+def get_project_by_plane_id(plane_project_id: str) -> ProjectConfig | None:
+    """Resolve project config by Plane project uuid. None if unknown."""
+    if not plane_project_id:
+        return None
+    return _BY_PLANE_ID.get(plane_project_id)
+
+
+def get_project_by_repo(repo: str) -> ProjectConfig | None:
+    """Resolve project config by gitea repo name. None if unknown."""
+    if not repo:
+        return None
+    return _BY_REPO.get(repo)
+
+
+def known_plane_project_ids() -> set[str]:
+    """Set of Plane project ids the orchestrator is configured to handle."""
+    return set(_BY_PLANE_ID.keys())
+
+
+def reload_projects() -> None:
+    """Rebuild the registry from current settings (used by tests)."""
+    global PROJECTS, _BY_PLANE_ID, _BY_REPO
+    PROJECTS = _load_projects()
+    _BY_PLANE_ID = {p.plane_project_id: p for p in PROJECTS}
+    _BY_REPO = {p.repo: p for p in PROJECTS}
--- a/src/qg/checks.py
+++ b/src/qg/checks.py
@@ -1,26 +1,285 @@
-# Quality Gate checks placeholder
-# Will be expanded as pipeline matures
+"""Quality Gate checks — real implementations using Gitea/Plane API and filesystem."""
+
+import os
+import logging
+import httpx
+from ..config import settings
+
+logger = logging.getLogger("orchestrator.qg")
+
+from ..git_worktree import get_worktree_path, ensure_worktree


-def check_analysis_complete(task_id: int) -> bool:
-    """Check if analysis artifacts exist."""
-    # TODO: verify .task-arch.md exists in repo
-    return True
+def _repo_path(repo: str, branch: str | None = None) -> str:
+    """Resolve the working path to read agent artifacts from.
+
+    ORCH-2 / S-4: artifacts now live in the per-branch worktree. When a branch is
+    given and its worktree exists on disk, read from there; otherwise fall back to
+    the shared /repos/<repo> clone (keeps backward-compat for 2-arg callers/tests).
+    """
+    if branch:
+        wt = get_worktree_path(repo, branch)
+        if os.path.isdir(wt):
+            return wt
+    return os.path.join(settings.repos_dir, repo)
+
+# Shared httpx client config
+GITEA_HEADERS = {"Authorization": f"token {settings.gitea_token}"}
+GITEA_BASE = f"{settings.gitea_url}/api/v1"


-def check_architecture_approved(task_id: int) -> bool:
-    """Check if architecture was approved in Plane."""
-    # TODO: check Plane comment for :approved:
-    return False
+def check_analysis_complete(repo: str, work_item_id: str, branch: str | None = None) -> tuple[bool, str]:
+    """
+    Check if analysis artifacts exist in the repo branch.
+    Required files:
+      - docs/work-items/<work_item_id>/01-brd.md
+      - docs/work-items/<work_item_id>/02-trz.md
+      - docs/work-items/<work_item_id>/03-acceptance-criteria.md
+      - docs/work-items/<work_item_id>/04-test-plan.yaml
+    """
+    required_files = [
+        f"docs/work-items/{work_item_id}/01-brd.md",
+        f"docs/work-items/{work_item_id}/02-trz.md",
+        f"docs/work-items/{work_item_id}/03-acceptance-criteria.md",
+        f"docs/work-items/{work_item_id}/04-test-plan.yaml",
+    ]
+
+    repo_path = _repo_path(repo, branch)
+    missing = []
+
+    for f in required_files:
+        full_path = os.path.join(repo_path, f)
+        if not os.path.isfile(full_path):
+            missing.append(f)
+
+    if missing:
+        return False, f"Missing files: {', '.join(missing)}"
+    return True, "All analysis artifacts present"


-def check_ci_green(repo: str, branch: str) -> bool:
-    """Check if CI status is green for branch."""
-    # TODO: query Gitea commit status API
-    return False
+def check_architecture_done(repo: str, work_item_id: str, branch: str | None = None) -> tuple[bool, str]:
+    """
+    Check if architecture artifacts exist.
+    Required: docs/work-items/<work_item_id>/06-adr/ (at least 1 file)
+    OR: docs/work-items/<work_item_id>/07-infra-requirements.md
+    """
+    repo_path = _repo_path(repo, branch)
+
+    adr_dir = os.path.join(repo_path, f"docs/work-items/{work_item_id}/06-adr")
+    infra_file = os.path.join(repo_path, f"docs/work-items/{work_item_id}/07-infra-requirements.md")
+
+    if os.path.isdir(adr_dir) and len(os.listdir(adr_dir)) > 0:
+        return True, "ADR directory exists with files"
+
+    if os.path.isfile(infra_file):
+        return True, "Infra requirements file exists"
+
+    return False, "No ADR directory or infra-requirements.md found"


-def check_review_approved(repo: str, pr_number: int) -> bool:
-    """Check if PR has approved review."""
-    # TODO: query Gitea PR reviews API
-    return False
+def check_ci_green(repo: str, branch: str) -> tuple[bool, str]:
+    """
+    Check if CI status is green for branch via Gitea API.
+    GET /repos/{owner}/{repo}/commits/{branch}/status
+    """
+    owner = settings.gitea_owner
+    url = f"{GITEA_BASE}/repos/{owner}/{repo}/commits/{branch}/status"
+
+    try:
+        resp = httpx.get(url, headers=GITEA_HEADERS, timeout=10)
+        if resp.status_code == 404:
+            return False, f"Branch '{branch}' not found or no status"
+        resp.raise_for_status()
+        data = resp.json()
+        state = data.get("state", "unknown")
+        if state == "success":
+            return True, "CI green"
+        return False, f"CI state: {state}"
+    except httpx.HTTPError as e:
+        logger.error(f"Gitea API error checking CI: {e}")
+        return False, f"API error: {e}"
+
+
+def check_review_approved(repo: str, pr_number: int) -> tuple[bool, str]:
+    """
+    Check if PR has at least one approved review and no request_changes.
+    GET /repos/{owner}/{repo}/pulls/{pr_number}/reviews
+    """
+    owner = settings.gitea_owner
+    url = f"{GITEA_BASE}/repos/{owner}/{repo}/pulls/{pr_number}/reviews"
+
+    try:
+        resp = httpx.get(url, headers=GITEA_HEADERS, timeout=10)
+        resp.raise_for_status()
+        reviews = resp.json()
+
+        approved = 0
+        changes_requested = 0
+        for review in reviews:
+            # Skip stale reviews (dismissed by new commits)
+            if review.get("stale", False):
+                continue
+            state = review.get("state", "").upper()
+            if state == "APPROVED":
+                approved += 1
+            elif state == "REQUEST_CHANGES":
+                changes_requested += 1
+
+        if changes_requested > 0:
+            return False, f"Changes requested ({changes_requested} reviews)"
+        if approved > 0:
+            return True, f"Approved ({approved} reviews)"
+        return False, "No reviews yet"
+    except httpx.HTTPError as e:
+        logger.error(f"Gitea API error checking reviews: {e}")
+        return False, f"API error: {e}"
+
+
+def check_tests_passed(repo: str, work_item_id: str, branch: str | None = None) -> tuple[bool, str]:
+    """
+    Check if test report exists and contains PASS indicator.
+    File: docs/work-items/<work_item_id>/13-test-report.md
+    """
+    repo_path = _repo_path(repo, branch)
+    report_path = os.path.join(repo_path, f"docs/work-items/{work_item_id}/13-test-report.md")
+
+    if not os.path.isfile(report_path):
+        return False, "Test report not found"
+
+    try:
+        with open(report_path, "r") as f:
+            content = f.read()
+        if "PASS" in content or "All tests passed" in content:
+            return True, "Test report indicates PASS"
+        return False, "Test report exists but no PASS indicator found"
+    except OSError as e:
+        return False, f"Error reading test report: {e}"
+
+
+
+def check_analysis_approved(repo: str, work_item_id: str, branch: str | None = None) -> tuple[bool, str]:
+    """
+    Check if analysis is complete AND approved by stakeholder.
+    Requirements:
+      1. All analysis artifacts exist (BRD, TRZ, AC, TestPlan)
+      2. Stakeholder has posted :approved: comment on the Plane issue
+
+    This QG is designed to be triggered by :approved: comment handler,
+    so the approval check verifies file completeness as a safety gate.
+    """
+    # First check files
+    files_ok, files_reason = check_analysis_complete(repo, work_item_id, branch)
+    if not files_ok:
+        return False, files_reason
+
+    # Check for :approved: comment via Plane API
+    try:
+        from ..plane_sync import find_issue_id, PLANE_BASE, PLANE_HEADERS, WORKSPACE, PROJECT_ID
+        from ..projects import get_project_by_repo
+        # ORCH-6: verify approval in the issue's own Plane project.
+        _proj = get_project_by_repo(repo)
+        _pid = _proj.plane_project_id if _proj else PROJECT_ID
+        issue_id = find_issue_id(work_item_id, _pid)
+        if not issue_id:
+            return False, "Cannot find Plane issue to verify approval"
+
+        url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{_pid}/issues/{issue_id}/comments/"
+        resp = httpx.get(url, headers=PLANE_HEADERS, timeout=10)
+        resp.raise_for_status()
+        comments = resp.json()
+
+        # Handle paginated response
+        if isinstance(comments, dict):
+            comments = comments.get("results", [])
+
+        for comment in comments:
+            body = comment.get("comment_html", "") or comment.get("comment", "")
+            if ":approved:" in body:
+                return True, "Analysis complete and approved by stakeholder"
+
+        return False, "Analysis artifacts present but no :approved: comment found"
+    except Exception as e:
+        logger.warning(f"Failed to check approval for {work_item_id}: {e}")
+        # If we can't reach Plane API but files exist, allow advance
+        # (the :approved: handler already verified the comment exists)
+        return True, f"Files present; Plane API check skipped ({e})"
+
+
+
+
+def check_reviewer_verdict(repo: str, work_item_id: str, branch: str | None = None) -> tuple[bool, str]:
+    """
+    Check reviewer agent verdict from 12-review.md (S-5 fix).
+
+    Reads ONLY the machine-readable `verdict:` field from the YAML frontmatter,
+    so tables / prose that merely mention APPROVED or REQUEST_CHANGES no longer
+    cause false positives/negatives. Returns:
+      (True, ...)  -> verdict: APPROVED
+      (False, ...) -> verdict: REQUEST_CHANGES, missing verdict, or no frontmatter
+    """
+    import yaml
+    repo_path = _repo_path(repo, branch)
+    review_path = os.path.join(repo_path, f"docs/work-items/{work_item_id}/12-review.md")
+
+    if not os.path.isfile(review_path):
+        return False, "Review report not found (12-review.md)"
+
+    try:
+        with open(review_path, "r") as f:
+            content = f.read()
+
+        verdict = None
+        if content.startswith("---"):
+            parts = content.split("---", 2)
+            if len(parts) >= 3:
+                try:
+                    fm = yaml.safe_load(parts[1]) or {}
+                except yaml.YAMLError as e:
+                    return False, f"Invalid YAML frontmatter in review: {e}"
+                verdict = str(fm.get("verdict", "")).upper().strip()
+
+        if verdict == "APPROVED":
+            return True, "Reviewer verdict: APPROVED"
+        if verdict == "REQUEST_CHANGES":
+            return False, "Reviewer verdict: REQUEST_CHANGES"
+        return False, f"No machine-readable verdict in frontmatter (got: {verdict!r})"
+    except OSError as e:
+        return False, f"Error reading review: {e}"
+
+
+def check_tests_local(repo: str, branch: str) -> tuple[bool, str]:
+    """
+    S-1 fix: run the project test suite locally and judge by exit code, instead of
+    depending on Gitea CI (which is not configured -> always false).
+
+    ORCH-2 / S-4: tests run inside the per-branch worktree (ensure_worktree), so this
+    is safe for concurrent active tasks — no shared /repos checkout race.
+    """
+    import subprocess
+    try:
+        repo_path = ensure_worktree(repo, branch)
+        r = subprocess.run(
+            ["make", "test"], cwd=repo_path,
+            capture_output=True, text=True, timeout=600,
+        )
+        if r.returncode == 0:
+            return True, "Local tests passed"
+        tail = (r.stdout + r.stderr)[-500:]
+        return False, f"Local tests failed: ...{tail}"
+    except subprocess.TimeoutExpired:
+        return False, "Local tests timed out (600s)"
+    except Exception as e:
+        return False, f"Local test run error: {e}"
+
+
+# Registry for dynamic lookup by name
+QG_CHECKS = {
+    "check_analysis_approved": check_analysis_approved,
+    "check_analysis_complete": check_analysis_complete,
+    "check_architecture_done": check_architecture_done,
+    "check_ci_green": check_ci_green,
+    "check_review_approved": check_review_approved,
+    "check_tests_passed": check_tests_passed,
+    "check_reviewer_verdict": check_reviewer_verdict,
+    "check_tests_local": check_tests_local,
+}
--- a/src/queue_worker.py
+++ b/src/queue_worker.py
@@ -0,0 +1,246 @@
+"""ORCH-1 (F-2b): background job-queue worker with resilience layer.
+
+A single background thread polls the `jobs` table and spawns agents:
+
+    while running:
+        if breaker.open and not cooled_down: sleep; continue   # don't touch CLI
+        if not preflight.ok: sleep; continue                   # CLI/net down -> wait
+        while count_running_jobs() < max_concurrency:
+            job = claim_next_job()              # atomic queued -> running (available_at-gated)
+            if not job: break
+            launcher.launch_job(job)            # spawns claude (Popen) + monitor thread
+        sleep(poll_interval)
+
+Resilience (ДОПОЛНЕНИЕ):
+  A. Preflight — cheap local CLI/net check (cached, no tokens) gates claiming.
+  B/C. The launcher classifies failures (transient vs permanent) and applies
+       backoff via available_at; the worker only needs to honour available_at
+       (claim_next_job does) and react to transient outcomes via the breaker.
+  D. Circuit breaker — N consecutive transient failures -> open (pause M minutes,
+       no CLI calls, Telegram alert) -> half-open (probe one job) -> closed.
+
+Design: plain daemon thread + threading.Event (the launcher already manages its
+own monitor/watchdog threads + blocking Popen).
+"""
+import time
+import logging
+import threading
+
+from .config import settings
+from .db import claim_next_job, count_running_jobs
+from .agents.launcher import launcher
+from . import preflight
+
+logger = logging.getLogger("orchestrator.queue_worker")
+
+
+class CircuitBreaker:
+    """Trips after `threshold` consecutive transient failures.
+
+    States: closed -> (threshold transient) -> open -> (after pause) half-open
+            -> (recovered) closed | (transient again) open.
+    Thread-safe enough for our single-worker + monitor-thread callbacks (a lock
+    guards the counters).
+    """
+
+    def __init__(self, threshold: int = None, pause_seconds: int = None):
+        self.threshold = threshold if threshold is not None else settings.breaker_threshold
+        self.pause_seconds = (
+            pause_seconds if pause_seconds is not None else settings.breaker_pause_seconds
+        )
+        self._lock = threading.Lock()
+        self.state = "closed"          # closed | open | half-open
+        self.consecutive_transient = 0
+        self.opened_at = 0.0
+        self._notify = None            # optional callable(message) for alerts
+
+    def set_notifier(self, fn):
+        self._notify = fn
+
+    def record_transient(self):
+        with self._lock:
+            self.consecutive_transient += 1
+            if self.state == "half-open":
+                # Probe failed -> re-open.
+                self._open("circuit re-opened: probe job hit transient again")
+            elif self.consecutive_transient >= self.threshold and self.state == "closed":
+                self._open(
+                    f"circuit OPEN: {self.consecutive_transient} consecutive "
+                    f"transient failures; pausing {self.pause_seconds}s (no CLI calls)"
+                )
+
+    def record_recovered(self):
+        with self._lock:
+            self.consecutive_transient = 0
+            if self.state in ("half-open", "open"):
+                self.state = "closed"
+                logger.info("Circuit CLOSED: recovered")
+
+    def record_permanent(self):
+        # A clean permanent (code-fault) failure breaks the transient streak.
+        with self._lock:
+            self.consecutive_transient = 0
+
+    def _open(self, msg: str):
+        self.state = "open"
+        self.opened_at = time.time()
+        logger.warning(msg)
+        if self._notify:
+            try:
+                self._notify(f"\U0001f534 {msg}")
+            except Exception:
+                pass
+
+    def allow_claim(self) -> bool:
+        """Return True if the worker may attempt to claim/launch a job now.
+
+        - closed   -> yes.
+        - open      -> no until pause elapsed; then transition to half-open (yes, one probe).
+        - half-open -> yes (the single probe).
+        """
+        with self._lock:
+            if self.state == "closed":
+                return True
+            if self.state == "open":
+                if (time.time() - self.opened_at) >= self.pause_seconds:
+                    self.state = "half-open"
+                    logger.info("Circuit HALF-OPEN: probing one job")
+                    return True
+                return False
+            # half-open: allow the probe.
+            return True
+
+    def snapshot(self) -> dict:
+        with self._lock:
+            remaining = 0
+            if self.state == "open":
+                remaining = max(0, int(self.pause_seconds - (time.time() - self.opened_at)))
+            return {
+                "state": self.state,
+                "consecutive_transient": self.consecutive_transient,
+                "pause_remaining_s": remaining,
+            }
+
+
+class QueueWorker:
+    """Background worker that drains the persistent job queue (with resilience)."""
+
+    def __init__(self, max_concurrency: int = None, poll_interval: float = None,
+                 breaker: CircuitBreaker = None):
+        self.max_concurrency = (
+            max_concurrency if max_concurrency is not None else settings.max_concurrency
+        )
+        self.poll_interval = (
+            poll_interval if poll_interval is not None else settings.queue_poll_interval
+        )
+        self.breaker = breaker or CircuitBreaker()
+        self.last_preflight_ok = True
+        self.last_preflight_reason = "not checked"
+        self._stop = threading.Event()
+        self._thread: threading.Thread | None = None
+
+    # --- circuit breaker outcome callback wired into the launcher ----------
+    def _on_outcome(self, transient: bool, recovered: bool):
+        if recovered:
+            self.breaker.record_recovered()
+        elif transient:
+            self.breaker.record_transient()
+        else:
+            self.breaker.record_permanent()
+
+    def _drain_once(self):
+        """Claim and launch jobs until concurrency is full or the queue is empty.
+
+        Gated by the circuit breaker and preflight: if the breaker is open (and
+        not yet cooled down) or preflight fails, we do NOT claim — jobs stay
+        queued and no CLI/tokens are touched.
+        """
+        if not self.breaker.allow_claim():
+            return
+        ok, reason = preflight.check()
+        self.last_preflight_ok = ok
+        self.last_preflight_reason = reason
+        if not ok:
+            logger.info(f"Preflight not ok ({reason}) -> not claiming jobs this tick")
+            return
+
+        # In half-open we only probe a single job, regardless of max_concurrency.
+        half_open = self.breaker.snapshot()["state"] == "half-open"
+        launched = 0
+        while not self._stop.is_set():
+            if half_open and launched >= 1:
+                return
+            if count_running_jobs() >= self.max_concurrency:
+                return
+            job = claim_next_job()
+            if not job:
+                return
+            launched += 1
+            try:
+                run_id = launcher.launch_job(job)
+                logger.info(
+                    f"Worker launched job {job['id']} ({job['agent']}, "
+                    f"repo {job['repo']}) -> run_id={run_id}"
+                )
+            except Exception as e:
+                # Launch itself failed (e.g. repo missing): treat as a permanent
+                # launch error so the job does not wedge as 'running' forever.
+                logger.error(f"Worker failed to launch job {job['id']}: {e}")
+                try:
+                    from .db import get_job, mark_job
+
+                    j = get_job(job["id"])
+                    attempts = j.get("attempts", 0) if j else 0
+                    max_attempts = j.get("max_attempts", 2) if j else 2
+                    if attempts < max_attempts:
+                        mark_job(job["id"], "queued", error=f"launch error: {e}")
+                    else:
+                        mark_job(job["id"], "failed", error=f"launch error: {e}")
+                except Exception:
+                    pass
+
+    def _run(self):
+        logger.info(
+            f"Queue worker started (max_concurrency={self.max_concurrency}, "
+            f"poll_interval={self.poll_interval}s, breaker_threshold={self.breaker.threshold})"
+        )
+        while not self._stop.is_set():
+            try:
+                self._drain_once()
+            except Exception as e:
+                logger.error(f"Queue worker loop error: {e}")
+            self._stop.wait(self.poll_interval)
+        logger.info("Queue worker stopped")
+
+    def start(self):
+        if self._thread and self._thread.is_alive():
+            return
+        # Wire breaker alerting + launcher outcome callback.
+        try:
+            from .notifications import send_telegram
+            self.breaker.set_notifier(send_telegram)
+        except Exception:
+            pass
+        launcher.on_outcome = self._on_outcome
+        self._stop.clear()
+        self._thread = threading.Thread(
+            target=self._run, name="queue-worker", daemon=True
+        )
+        self._thread.start()
+
+    def stop(self, timeout: float = 5.0):
+        self._stop.set()
+        if self._thread:
+            self._thread.join(timeout=timeout)
+
+    def status(self) -> dict:
+        """Resilience snapshot for /queue."""
+        return {
+            "breaker": self.breaker.snapshot(),
+            "preflight_ok": self.last_preflight_ok,
+            "preflight_reason": self.last_preflight_reason,
+        }
+
+
+# Module-level singleton used by the FastAPI lifespan.
+worker = QueueWorker()
--- a/src/stages.py
+++ b/src/stages.py
@@ -0,0 +1,54 @@
+"""Stage machine for orchestrator pipeline.
+
+Stages:
+  created → analysis → architecture → development → review → testing → deploy → done
+
+Each stage defines:
+  - next: the stage to advance to
+  - agent: the agent to launch when entering the NEXT stage
+  - qg: the quality gate check required to leave this stage
+"""
+
+STAGE_TRANSITIONS = {
+    "created": {"next": "analysis", "agent": "analyst", "qg": None},
+    "analysis": {"next": "architecture", "agent": "architect", "qg": "check_analysis_approved"},
+    "architecture": {"next": "development", "agent": "developer", "qg": "check_architecture_done"},
+    "development": {"next": "review", "agent": "reviewer", "qg": "check_tests_local"},
+    "review": {"next": "testing", "agent": "tester", "qg": "check_reviewer_verdict"},
+    "testing": {"next": "deploy", "agent": "deployer", "qg": "check_tests_passed"},
+    "deploy": {"next": "done", "agent": None, "qg": None},
+    "done": {"next": None, "agent": None, "qg": None},
+}
+
+
+def get_next_stage(current_stage: str) -> str | None:
+    """Get the next stage after current."""
+    transition = STAGE_TRANSITIONS.get(current_stage)
+    if not transition:
+        return None
+    return transition["next"]
+
+
+def get_agent_for_stage(stage: str) -> str | None:
+    """Get the agent to launch when advancing FROM this stage (entering next stage)."""
+    transition = STAGE_TRANSITIONS.get(stage)
+    if not transition:
+        return None
+    return transition["agent"]
+
+
+def get_qg_for_stage(current_stage: str) -> str | None:
+    """Get the QG check function name required to leave current stage."""
+    transition = STAGE_TRANSITIONS.get(current_stage)
+    if not transition:
+        return None
+    return transition["qg"]
+
+
+def get_previous_stage(current_stage: str) -> str | None:
+    """Get the previous stage (for rollback)."""
+    stages = list(STAGE_TRANSITIONS.keys())
+    idx = stages.index(current_stage) if current_stage in stages else -1
+    if idx <= 0:
+        return None
+    return stages[idx - 1]
--- a/src/webhooks/gitea.py
+++ b/src/webhooks/gitea.py
@@ -1,14 +1,54 @@
-from fastapi import APIRouter, Request
+"""Gitea webhook handlers — full implementation."""
+
+import hmac
+import subprocess
+import os
+import hashlib
 import json
-from ..db import get_db
+import logging
+import httpx
+from fastapi import APIRouter, Request, HTTPException
+
+from ..config import settings
+from ..db import get_db, get_task_by_repo_branch, update_task_stage, enqueue_job
+from ..stages import get_next_stage, get_agent_for_stage
+from ..qg.checks import check_ci_green, check_review_approved
+from ..notifications import notify_stage_change, notify_qg_failure, notify_error
+from ..agents.launcher import launcher
+from ..plane_sync import notify_stage_change as plane_notify_stage
+from ..projects import get_project_by_repo
+
+logger = logging.getLogger("orchestrator.webhooks.gitea")

 router = APIRouter()

+# Max retries for developer on request_changes
+MAX_DEV_RETRIES = 3
+
+
+def verify_gitea_signature(body: bytes, signature: str) -> bool:
+    """Verify Gitea webhook HMAC-SHA256 signature."""
+    if not settings.gitea_webhook_secret:
+        return True  # Skip verification if no secret configured
+    expected = hmac.new(
+        settings.gitea_webhook_secret.encode(),
+        body,
+        hashlib.sha256,
+    ).hexdigest()
+    return hmac.compare_digest(expected, signature)
+

@router.post("/gitea")
 async def gitea_webhook(request: Request):
    """Handle Gitea webhook events."""
    body = await request.body()
+
+    # Verify HMAC signature
+    signature = request.headers.get("X-Gitea-Signature", "")
+    if not verify_gitea_signature(body, signature):
+        logger.warning("Gitea webhook: invalid signature")
+        raise HTTPException(status_code=401, detail="Invalid signature")
+
    payload = json.loads(body)

    # Log event
@@ -19,36 +59,253 @@ async def gitea_webhook(request: Request):
        ("gitea", event_type, body.decode()),
    )
    conn.commit()
+    conn.close()

    if event_type == "push":
-        await handle_push(payload, conn)
-    elif event_type == "pull_request":
-        await handle_pr(payload, conn)
+        await handle_push(payload)
+    elif event_type.startswith("pull_request"):
+        await handle_pr(payload)
    elif event_type == "status":
-        await handle_ci_status(payload, conn)
+        await handle_ci_status(payload)

-    conn.close()
    return {"status": "accepted"}


-async def handle_push(payload: dict, conn):
-    """Push event — log for now."""
-    pass
+async def handle_push(payload: dict):
+    """
+    Push event:
+    - If stage=architecture and push contains ADR files → advance to development
+    - If stage=development and push contains src/ → wait for CI
+    """
+    ref = payload.get("ref", "")
+    # Extract branch: refs/heads/feature/ET-003-slug → feature/ET-003-slug
+    if not ref.startswith("refs/heads/"):
+        return
+    branch = ref.removeprefix("refs/heads/")
+
+    repo_name = payload.get("repository", {}).get("name", settings.default_repo)
+
+    # ORCH-6: ignore pushes to repos outside the project registry.
+    if not get_project_by_repo(repo_name):
+        logger.info(f"Gitea push: ignoring unknown repo '{repo_name}'")
+        return
+
+    task = get_task_by_repo_branch(repo_name, branch)
+    if not task:
+        logger.debug(f"Push to '{branch}' — no matching task found")
+        return
+
+    task_id = task["id"]
+    current_stage = task["stage"]
+    work_item_id = task.get("work_item_id", "")
+
+    # Collect modified files from commits
+    modified_files = set()
+    for commit in payload.get("commits", []):
+        modified_files.update(commit.get("added", []))
+        modified_files.update(commit.get("modified", []))
+
+    if current_stage == "architecture":
+        # Check if ADR files were pushed
+        has_adr = any(
+            f"docs/work-items/{work_item_id}/06-adr/" in f
+            or f"docs/work-items/{work_item_id}/07-infra-requirements.md" == f
+            for f in modified_files
+        )
+        if has_adr:
+            # Advance to development
+            next_stage = "development"
+            update_task_stage(task_id, next_stage)
+            notify_stage_change(task_id, current_stage, next_stage)
+            plane_notify_stage(work_item_id, current_stage, next_stage)
+
+            agent = get_agent_for_stage(current_stage)
+            if agent:
+                try:
+                    task_desc = f"Work item: {work_item_id}\nRepo: {repo_name}\nBranch: {branch}\nStage: {next_stage}"
+                    job_id = enqueue_job(agent, repo_name, task_desc, task_id=task_id)
+                    logger.info(f"Task {task_id}: push triggered {current_stage} → {next_stage}, enqueued '{agent}' (job_id={job_id})")
+                except Exception as e:
+                    notify_error(task_id, f"Failed to launch agent '{agent}': {e}")
+
+    elif current_stage == "development":
+        # Source files pushed — just log, wait for CI
+        has_src = any(f.startswith("src/") for f in modified_files)
+        if has_src:
+            logger.info(f"Task {task_id}: source push detected on '{branch}', waiting for CI")


-async def handle_pr(payload: dict, conn):
-    """PR event — check reviews, CI status."""
+async def handle_ci_status(payload: dict):
+    """
+    CI status update:
+    - If state=success and stage=development → advance to review, launch reviewer
+    - If state=failure → log
+    """
+    state = payload.get("state", "")
+    # Extract branch from target_url or branches
+    branches = payload.get("branches", [])
+    branch = ""
+    if branches:
+        branch = branches[0].get("name", "")
+
+    # Alternative: find branch by SHA from tasks DB
+    if not branch:
+        sha = payload.get("sha", "")
+        repo_name = payload.get("repository", {}).get("name", settings.default_repo)
+        # Try to find task by checking git branch containing this SHA.
+        # ORCH-2 / S-4: this is a READ-ONLY query of remote-tracking refs in the main
+        # clone (no checkout / no mutation), so it is safe to keep on /repos/<repo>.
+        try:
+            result = subprocess.run(
+                ["git", "-C", os.path.join(settings.repos_dir, repo_name),
+                 "branch", "-r", "--contains", sha],
+                capture_output=True, text=True, timeout=10,
+            )
+            for line in result.stdout.strip().splitlines():
+                b = line.strip().replace("origin/", "")
+                if b.startswith("feature/"):
+                    branch = b
+                    break
+        except Exception:
+            pass
+        if not branch:
+            logger.debug(f"CI status event: could not determine branch for sha={sha}")
+            return
+
+    repo_name = payload.get("repository", {}).get("name", settings.default_repo)
+
+    # ORCH-6: ignore CI status for repos outside the project registry.
+    if not get_project_by_repo(repo_name):
+        logger.info(f"Gitea CI status: ignoring unknown repo '{repo_name}'")
+        return
+
+    task = get_task_by_repo_branch(repo_name, branch)
+    if not task:
+        return
+
+    task_id = task["id"]
+    current_stage = task["stage"]
+    work_item_id = task.get("work_item_id", "")
+
+    if state == "success" and current_stage == "development":
+        # Verify CI is actually green via API (double-check)
+        passed, reason = check_ci_green(repo_name, branch)
+        if passed:
+            next_stage = "review"
+            update_task_stage(task_id, next_stage)
+            notify_stage_change(task_id, current_stage, next_stage)
+            plane_notify_stage(work_item_id, current_stage, next_stage)
+
+            agent = get_agent_for_stage(current_stage)
+            if agent:
+                try:
+                    task_desc = f"Work item: {work_item_id}\nRepo: {repo_name}\nBranch: {branch}\nStage: {next_stage}"
+                    job_id = enqueue_job(agent, repo_name, task_desc, task_id=task_id)
+                    logger.info(f"Task {task_id}: CI green → {next_stage}, enqueued '{agent}' (job_id={job_id})")
+                except Exception as e:
+                    notify_error(task_id, f"Failed to launch agent '{agent}': {e}")
+        else:
+            notify_qg_failure(task_id, current_stage, "check_ci_green", reason)
+
+    elif state == "failure":
+        # S-1: Gitea CI is NOT the authoritative gate anymore (the orchestrator runs
+        # tests locally via check_tests_local). Gitea CI is often unconfigured, so a
+        # "failure"/empty status here is not actionable. Log only, do not alert.
+        logger.debug(f"Task {task_id}: Gitea CI state='failure' on branch '{branch}' "
+                     f"(non-authoritative, suppressed — local tests are the gate)")
+
+
+async def handle_pr(payload: dict):
+    """
+    PR event:
+    - action=reviewed + approved → advance to testing, launch tester
+    - action=reviewed + request_changes → back to development, relaunch developer (max 3x)
+    - action=closed + merged → stage=done
+    """
    action = payload.get("action", "")
    pr = payload.get("pull_request", {})
+    review = payload.get("review", {})

-    if action == "reviewed" and pr.get("state") == "approved":
-        # TODO: QG-5 check -> launch Tester
-        pass
+    # Get branch from PR head
+    head_branch = pr.get("head", {}).get("ref", "")
+    repo_name = payload.get("repository", {}).get("name", settings.default_repo)

+    if not head_branch:
+        return

-async def handle_ci_status(payload: dict, conn):
-    """CI status update — check if all green -> advance."""
-    state = payload.get("state", "")
-    if state == "success":
-        # TODO: Check all required contexts green -> advance stage
-        pass
+    # ORCH-6: ignore PR events for repos outside the project registry.
+    if not get_project_by_repo(repo_name):
+        logger.info(f"Gitea PR: ignoring unknown repo '{repo_name}'")
+        return
+
+    task = get_task_by_repo_branch(repo_name, head_branch)
+    if not task:
+        logger.debug(f"PR event for branch '{head_branch}' — no matching task")
+        return
+
+    task_id = task["id"]
+    current_stage = task["stage"]
+    work_item_id = task.get("work_item_id", "")
+
+    if action == "reviewed":
+        # Gitea sends review.state (older) or review.type (newer format)
+        review_state = review.get("state", "").upper()
+        if not review_state and review.get("type", ""):
+            # Map type field: "pull_request_review_approved" -> "APPROVED"
+            rtype = review.get("type", "")
+            if "approved" in rtype.lower():
+                review_state = "APPROVED"
+            elif "request_changes" in rtype.lower() or "rejected" in rtype.lower():
+                review_state = "REQUEST_CHANGES"
+
+        if review_state == "APPROVED" and current_stage == "review":
+            # Advance to testing
+            pr_number = pr.get("number")
+            passed, reason = check_review_approved(repo_name, pr_number)
+            if passed:
+                next_stage = "testing"
+                update_task_stage(task_id, next_stage)
+                notify_stage_change(task_id, current_stage, next_stage)
+                plane_notify_stage(work_item_id, current_stage, next_stage)
+
+                agent = get_agent_for_stage(current_stage)
+                if agent:
+                    try:
+                        task_desc = f"Work item: {work_item_id}\nRepo: {repo_name}\nBranch: {head_branch}\nStage: {next_stage}"
+                        job_id = enqueue_job(agent, repo_name, task_desc, task_id=task_id)
+                        logger.info(f"Task {task_id}: PR approved → {next_stage}, enqueued '{agent}' (job_id={job_id})")
+                    except Exception as e:
+                        notify_error(task_id, f"Failed to launch agent '{agent}': {e}")
+            else:
+                notify_qg_failure(task_id, current_stage, "check_review_approved", reason)
+
+        elif review_state == "REQUEST_CHANGES" and current_stage == "review":
+            # Count retries
+            conn = get_db()
+            retry_count = conn.execute(
+                "SELECT COUNT(*) as cnt FROM agent_runs WHERE task_id = ? AND agent = 'developer'",
+                (task_id,),
+            ).fetchone()["cnt"]
+            conn.close()
+
+            if retry_count < MAX_DEV_RETRIES:
+                # Back to development, relaunch developer
+                update_task_stage(task_id, "development")
+                notify_stage_change(task_id, current_stage, "development")
+                try:
+                    task_desc = (
+                        f"Work item: {work_item_id}\nRepo: {repo_name}\nBranch: {head_branch}\n"
+                        f"Stage: development\nNote: Changes requested in review (attempt {retry_count + 1}/{MAX_DEV_RETRIES})"
+                    )
+                    job_id = enqueue_job("developer", repo_name, task_desc, task_id=task_id)
+                    logger.info(f"Task {task_id}: changes requested, enqueued developer (attempt {retry_count + 1}, job_id={job_id})")
+                except Exception as e:
+                    notify_error(task_id, f"Failed to relaunch developer: {e}")
+            else:
+                notify_error(task_id, f"Max developer retries ({MAX_DEV_RETRIES}) reached, escalating")
+                logger.error(f"Task {task_id}: max retries reached, needs manual intervention")
+
+    elif action == "closed" and pr.get("merged", False):
+        update_task_stage(task_id, "done")
+        notify_stage_change(task_id, current_stage, "done")
+        logger.info(f"Task {task_id}: PR merged, stage → done")
--- a/src/webhooks/plane.py
+++ b/src/webhooks/plane.py
@@ -1,14 +1,64 @@
-from fastapi import APIRouter, Request
+"""Plane webhook handlers — full implementation."""
+
+import hmac
+import hashlib
+import re
 import json
-from ..db import get_db
+import logging
+import httpx
+from fastapi import APIRouter, Request, HTTPException
+
+from ..config import settings
+from ..db import (
+    get_db,
+    get_task_by_plane_id,
+    get_next_work_item_id,
+    update_task_stage,
+    enqueue_job,
+)
+from ..stages import get_next_stage, get_agent_for_stage, get_qg_for_stage, get_previous_stage
+from ..qg.checks import QG_CHECKS
+from ..notifications import notify_stage_change, notify_qg_failure, notify_error
+from ..agents.launcher import launcher
+from ..plane_sync import (
+    notify_stage_change as plane_notify_stage,
+    notify_qg_failure as plane_notify_qg,
+    notify_done as plane_notify_done,
+)
+from ..projects import (
+    get_project_by_plane_id,
+    get_project_by_repo,
+    known_plane_project_ids,
+)
+
+logger = logging.getLogger("orchestrator.webhooks.plane")

 router = APIRouter()


+def verify_plane_signature(body: bytes, signature: str) -> bool:
+    """Verify Plane webhook HMAC-SHA256 signature."""
+    if not settings.plane_webhook_secret:
+        return True  # Skip verification if no secret configured
+    expected = hmac.new(
+        settings.plane_webhook_secret.encode(),
+        body,
+        hashlib.sha256,
+    ).hexdigest()
+    return hmac.compare_digest(expected, signature)
+
+
@router.post("/plane")
 async def plane_webhook(request: Request):
    """Handle Plane webhook events."""
    body = await request.body()
+
+    # Verify HMAC signature
+    signature = request.headers.get("X-Plane-Signature", "")
+    if not verify_plane_signature(body, signature):
+        logger.warning("Plane webhook: invalid signature")
+        raise HTTPException(status_code=401, detail="Invalid signature")
+
    payload = json.loads(body)

    # Log event
@@ -18,32 +68,368 @@ async def plane_webhook(request: Request):
        ("plane", payload.get("event", "unknown"), body.decode()),
    )
    conn.commit()
+    conn.close()

    event = payload.get("event")
+    action = payload.get("action", "")
    data = payload.get("data", {})

-    if event == "work_item.created":
-        await handle_work_item_created(data, conn)
-    elif event == "comment.created":
-        await handle_comment(data, conn)
+    # ORCH-6: filter by Plane project. Ignore issues from unknown/unconfigured
+    # projects so a webhook on the whole workspace cannot funnel everything into
+    # the default repo (root cause of the 2026-06-02 incident).
+    project_id = data.get("project") or data.get("project_id") or ""
+    if project_id not in known_plane_project_ids():
+        logger.info(
+            f"Plane webhook: ignoring event '{event}' from unknown project "
+            f"'{project_id}' (known: {len(known_plane_project_ids())})"
+        )
+        return {"status": "ignored", "reason": "unknown project"}
+
+    if (event == "work_item.created") or (event == "issue" and action == "created"):
+        await handle_work_item_created(data, project_id)
+    elif (event == "comment.created") or (event == "issue_comment" and action == "created"):
+        await handle_comment(data, project_id)

-    conn.close()
    return {"status": "accepted"}


-async def handle_work_item_created(data: dict, conn):
-    """New work item -> create task record."""
+async def handle_work_item_created(data: dict, project_id: str = ""):
+    """
+    New work item created in Plane.
+    QG-0: validate title, description, priority.
+    If valid: create branch, init docs, launch analyst.
+    If invalid: comment with what's missing, set Blocked.
+    """
    plane_id = data.get("id", "")
+    name = data.get("name", "untitled")
+    description = data.get("description_stripped", data.get("description", ""))
+    priority = data.get("priority", {})
+    priority_name = priority if isinstance(priority, str) else priority.get("name", "")
+
+    # ORCH-6: resolve repo / prefix / Plane project from the registry instead of
+    # the single hardcoded default_repo.
+    if not project_id:
+        project_id = data.get("project") or data.get("project_id") or ""
+    proj = get_project_by_plane_id(project_id)
+    if not proj:
+        logger.warning(f"handle_work_item_created: unknown project '{project_id}', ignoring {plane_id}")
+        return
+    repo = proj.repo
+    plane_project_id = proj.plane_project_id
+
+    # QG-0 validation
+    errors = []
+    if not name or len(name) < 5:
+        errors.append("Title \u0441\u043b\u0438\u0448\u043a\u043e\u043c \u043a\u043e\u0440\u043e\u0442\u043a\u0438\u0439 (\u043d\u0443\u0436\u043d\u043e >= 5 \u0441\u0438\u043c\u0432\u043e\u043b\u043e\u0432)")
+    if len(name) > 80:
+        errors.append("Title \u0441\u043b\u0438\u0448\u043a\u043e\u043c \u0434\u043b\u0438\u043d\u043d\u044b\u0439 (\u043c\u0430\u043a\u0441\u0438\u043c\u0443\u043c 80 \u0441\u0438\u043c\u0432\u043e\u043b\u043e\u0432)")
+    if not description or len(description.strip()) < 20:
+        errors.append("Description \u0441\u043b\u0438\u0448\u043a\u043e\u043c \u043a\u043e\u0440\u043e\u0442\u043a\u0438\u0439 (\u043d\u0443\u0436\u043d\u043e >= 20 \u0441\u0438\u043c\u0432\u043e\u043b\u043e\u0432)")
+
+    if errors:
+        # QG-0 failed
+        error_text = "\u26a0\ufe0f QG-0 failed:\n" + "\n".join(f"\u2022 {e}" for e in errors)
+        from ..plane_sync import PLANE_BASE, PLANE_HEADERS, WORKSPACE, PLANE_STATES
+        import httpx as _httpx
+        # Post comment (ORCH-6: route to the issue's own project)
+        url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{plane_project_id}/issues/{plane_id}/comments/"
+        try:
+            _httpx.post(url, headers=PLANE_HEADERS,
+                       json={"comment_html": f"<p>{error_text}</p>"}, timeout=10)
+        except Exception:
+            pass
+        # Set blocked
+        url2 = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{plane_project_id}/issues/{plane_id}/"
+        try:
+            _httpx.patch(url2, headers=PLANE_HEADERS,
+                        json={"state": PLANE_STATES["blocked"]}, timeout=10)
+        except Exception:
+            pass
+        logger.info(f"QG-0 failed for {plane_id}: {errors}")
+        return
+
+    # Generate work item ID
+    work_item_id = get_next_work_item_id(repo, proj.work_item_prefix)
+
+    # Create slug from name
+    slug = re.sub(r"[^a-z0-9]+", "-", name.lower()).strip("-")[:30]
+    branch = f"feature/{work_item_id}-{slug}"
+
+    # Insert task into DB
+    conn = get_db()
    conn.execute(
-        "INSERT INTO tasks (plane_id, repo, stage) VALUES (?, ?, ?)",
-        (plane_id, "enduro-trails", "analysis"),
+        "INSERT INTO tasks (plane_id, work_item_id, repo, branch, stage, plane_issue_id) VALUES (?, ?, ?, ?, ?, ?)",
+        (plane_id, work_item_id, repo, branch, "analysis", plane_id),
    )
    conn.commit()
+    conn.close()
+
+    # Create branch in Gitea
+    try:
+        await _create_gitea_branch(repo, branch)
+    except Exception as e:
+        logger.error(f"Failed to create branch '{branch}': {e}")
+        # Task is created, branch creation failed — log but don't crash
+        notify_error(0, f"Branch creation failed: {e}")
+        return
+
+    # Create initial docs structure via Gitea API (create file)
+    try:
+        await _create_initial_docs(repo, branch, work_item_id, name)
+    except Exception as e:
+        logger.error(f"Failed to create initial docs: {e}")
+
+    logger.info(f"Task created: {work_item_id} ({name}), branch={branch}, stage=analysis")
+
+    # Launch analyst agent
+    try:
+        task_row = get_db().execute("SELECT id FROM tasks WHERE work_item_id=?", (work_item_id,)).fetchone()
+        if task_row:
+            task_id = task_row[0]
+            task_desc = f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\nStage: analysis\nTitle: {name}"
+            job_id = enqueue_job("analyst", repo, task_desc, task_id=task_id)
+            logger.info(f"Task {task_id}: enqueued analyst (job_id={job_id})")
+            # Post start comment to Plane
+            from ..plane_sync import add_comment as _add_comment
+            _add_comment(work_item_id, "\U0001f50d Analyst \u0437\u0430\u043f\u0443\u0449\u0435\u043d. BRD/\u0422\u0417/AC/TestPlan \u0432 \u0440\u0430\u0431\u043e\u0442\u0435 (\u043e\u0436\u0438\u0434\u0430\u0439\u0442\u0435 8-15 \u043c\u0438\u043d).")
+    except Exception as e:
+        logger.error(f"Failed to launch analyst for {work_item_id}: {e}")


-async def handle_comment(data: dict, conn):
-    """Check for :approved: reaction -> advance stage."""
-    comment_body = data.get("comment", "")
+async def handle_comment(data: dict, project_id: str = ""):
+    """
+    Handle comment event — check for :approved: or :rejected:.
+    Advance or rollback stage accordingly.
+    """
+    comment_body = data.get("comment_stripped", data.get("comment", data.get("body", data.get("comment_html", ""))))
+    plane_id = str(data.get("work_item_id") or data.get("issue_id") or data.get("issue") or "")
+
+    if not plane_id:
+        logger.warning("Comment event without work_item_id, skipping")
+        return
+
+    task = get_task_by_plane_id(plane_id)
+    if not task:
+        logger.warning(f"No task found for plane_id={plane_id}")
+        return
+
+    task_id = task["id"]
+    current_stage = task["stage"]
+    repo = task["repo"]
+    work_item_id = task.get("work_item_id", "")
+    branch = task.get("branch", "")
+
+    if ":rejected:" in comment_body:
+        # Extract reason (text after :rejected:)
+        reason = comment_body.split(":rejected:", 1)[-1].strip()[:300]
+
+        if current_stage == "analysis":
+            # Already in analysis — just relaunch analyst with rejection reason
+            from ..plane_sync import set_issue_in_progress
+            set_issue_in_progress(work_item_id)
+            task_desc = (
+                f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
+                f"Stage: analysis\nNote: Stakeholder REJECTED your artifacts. "
+                f"Reason: {reason}\nRevise and improve."
+            )
+            new_job = enqueue_job("analyst", repo, task_desc, task_id=task_id)
+            from ..plane_sync import add_comment as _plane_comment
+            _plane_comment(work_item_id, f"\U0001f504 Analyst \u043f\u0435\u0440\u0435\u0437\u0430\u043f\u0443\u0449\u0435\u043d. \u041f\u0440\u0438\u0447\u0438\u043d\u0430 \u043e\u0442\u043a\u043b\u043e\u043d\u0435\u043d\u0438\u044f: {reason}")
+            logger.info(f"Task {task_id}: rejected at analysis, enqueued analyst (job_id={new_job})")
+        else:
+            # Rollback to previous stage
+            prev_stage = get_previous_stage(current_stage)
+            if prev_stage:
+                update_task_stage(task_id, prev_stage)
+                from ..plane_sync import set_issue_in_progress
+                set_issue_in_progress(work_item_id)
+                notify_stage_change(task_id, current_stage, prev_stage)
+                plane_notify_stage(work_item_id, current_stage, prev_stage)
+                from ..plane_sync import add_comment as _plane_comment
+                _plane_comment(work_item_id, f"\U0001f504 \u041e\u0442\u043a\u0430\u0442: {current_stage} \u2192 {prev_stage}. \u041f\u0440\u0438\u0447\u0438\u043d\u0430: {reason}")
+                logger.info(f"Task {task_id}: rejected, rolled back {current_stage} \u2192 {prev_stage}")
+        return
+
    if ":approved:" in comment_body:
-        # TODO: Determine which task, advance QG
-        pass
+        from ..plane_sync import set_issue_in_progress
+        set_issue_in_progress(work_item_id)
+        # Try to advance stage
+        await _try_advance_stage(task_id, current_stage, repo, work_item_id, branch)
+        return
+
+    # Task 3: If neither :approved: nor :rejected: — check if this is an answer to questions
+    if current_stage == "analysis":
+        from ..plane_sync import PLANE_STATES, set_issue_in_progress
+        issue_id = task.get("plane_issue_id") or task.get("plane_id")
+        if not issue_id:
+            issue_id = plane_id
+        if issue_id:
+            from ..plane_sync import PLANE_BASE, PLANE_HEADERS, WORKSPACE
+            from ..plane_sync import PROJECT_ID as _DEFAULT_PROJECT_ID
+            # ORCH-6: route to this task's own Plane project (resolved from repo).
+            _proj = get_project_by_repo(repo)
+            _pid = _proj.plane_project_id if _proj else (project_id or _DEFAULT_PROJECT_ID)
+            import httpx as _httpx
+            try:
+                _resp = _httpx.get(
+                    f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{_pid}/issues/{issue_id}/",
+                    headers=PLANE_HEADERS, timeout=10
+                )
+                if _resp.status_code == 200:
+                    issue_data = _resp.json()
+                    if issue_data.get("state") == PLANE_STATES["needs_input"]:
+                        # Task 11: Check analyst retry count (max 3 question rounds)
+                        conn3 = get_db()
+                        analyst_runs = conn3.execute(
+                            "SELECT COUNT(*) FROM agent_runs WHERE task_id=? AND agent='analyst'",
+                            (task_id,)
+                        ).fetchone()[0]
+                        conn3.close()
+
+                        if analyst_runs >= 4:  # initial + 3 retries
+                            from ..plane_sync import set_issue_blocked, add_comment as _pc
+                            set_issue_blocked(work_item_id)
+                            _pc(
+                                work_item_id,
+                                "\U0001f6a8 3 \u0440\u0430\u0443\u043d\u0434\u0430 \u0443\u0442\u043e\u0447\u043d\u0435\u043d\u0438\u0439 \u0438\u0441\u0447\u0435\u0440\u043f\u0430\u043d\u044b. Analyst \u043d\u0435 \u043c\u043e\u0436\u0435\u0442 \u0441\u0444\u043e\u0440\u043c\u0438\u0440\u043e\u0432\u0430\u0442\u044c \u0422\u0417. "
+                                "\u0422\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f \u0431\u043e\u043b\u0435\u0435 \u0434\u0435\u0442\u0430\u043b\u044c\u043d\u043e\u0435 \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0438\u043b\u0438 \u0432\u0441\u0442\u0440\u0435\u0447\u0430."
+                            )
+                            from ..notifications import send_telegram
+                            send_telegram(f"\U0001f6a8 {work_item_id}: 3 \u0440\u0430\u0443\u043d\u0434\u0430 \u0432\u043e\u043f\u0440\u043e\u0441\u043e\u0432 analyst'\u0430 \u0438\u0441\u0447\u0435\u0440\u043f\u0430\u043d\u044b. \u041d\u0443\u0436\u043d\u0430 \u043f\u043e\u043c\u043e\u0449\u044c.")
+                            return
+
+                        # This is an answer to analyst's questions — relaunch
+                        set_issue_in_progress(work_item_id)
+                        task_desc = (
+                            f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
+                            f"Stage: analysis\nNote: Stakeholder answered your questions. "
+                            f"Read the latest comment in Plane and revise your artifacts.\n"
+                            f"Answer: {comment_body[:500]}"
+                        )
+                        new_job = enqueue_job("analyst", repo, task_desc, task_id=task_id)
+                        from ..plane_sync import add_comment as _pc2
+                        _pc2(work_item_id, "\U0001f504 Analyst \u043f\u0435\u0440\u0435\u0437\u0430\u043f\u0443\u0449\u0435\u043d \u0441 \u043e\u0442\u0432\u0435\u0442\u0430\u043c\u0438 \u0441\u0442\u0435\u0439\u043a\u0445\u043e\u043b\u0434\u0435\u0440\u0430.")
+                        logger.info(f"Task {task_id}: stakeholder answered questions, enqueued analyst (job_id={new_job})")
+                        return
+            except Exception as e:
+                logger.error(f"Failed to check issue state: {e}")
+
+
+async def _try_advance_stage(
+    task_id: int, current_stage: str, repo: str, work_item_id: str, branch: str
+):
+    """Run QG check for current stage and advance if passed."""
+    qg_name = get_qg_for_stage(current_stage)
+    next_stage = get_next_stage(current_stage)
+
+    if not next_stage:
+        logger.info(f"Task {task_id}: already at terminal stage '{current_stage}'")
+        return
+
+    # Run QG check if one is required
+    if qg_name:
+        qg_func = QG_CHECKS.get(qg_name)
+        if not qg_func:
+            logger.error(f"QG function '{qg_name}' not found in registry")
+            return
+
+        # Determine args based on QG function
+        if qg_name in ("check_analysis_approved", "check_analysis_complete", "check_architecture_done", "check_tests_passed", "check_reviewer_verdict"):
+            # ORCH-2 / S-4: pass branch so artifacts are read from the task worktree.
+            passed, reason = qg_func(repo, work_item_id, branch)
+        elif qg_name in ("check_ci_green", "check_tests_local"):
+            passed, reason = qg_func(repo, branch)
+        elif qg_name == "check_review_approved":
+            # Find PR number by branch via Gitea API
+            import httpx as _httpx
+            from ..config import settings as _s
+            _owner = _s.gitea_owner
+            _url = f"{_s.gitea_url}/api/v1/repos/{_owner}/{repo}/pulls?state=open&limit=50"
+            _headers = {"Authorization": f"token {_s.gitea_token}"}
+            try:
+                _resp = _httpx.get(_url, headers=_headers, timeout=10)
+                _prs = _resp.json()
+                _pr_number = None
+                for _pr in _prs:
+                    if _pr.get("head", {}).get("ref") == branch:
+                        _pr_number = _pr["number"]
+                        break
+                if _pr_number:
+                    passed, reason = qg_func(repo, _pr_number)
+                else:
+                    # No open PR but review file exists — check file-based
+                    import os
+                    from ..git_worktree import get_worktree_path as _gwp
+                    _wt = _gwp(repo, branch) if os.path.isdir(_gwp(repo, branch)) else os.path.join(_s.repos_dir, repo)
+                    _review_path = os.path.join(_wt, f"docs/work-items/{work_item_id}/12-review.md")
+                    _review_path2 = os.path.join(_wt, f"docs/work-items/{work_item_id}/09-review.md")
+                    if os.path.isfile(_review_path) or os.path.isfile(_review_path2):
+                        passed, reason = True, "Review file exists (file-based approval)"
+                    else:
+                        passed, reason = False, "No open PR found and no review file"
+            except Exception as _e:
+                passed, reason = False, f"Error finding PR: {_e}"
+        else:
+            passed, reason = False, f"Unknown QG: {qg_name}"
+
+        if not passed:
+            notify_qg_failure(task_id, current_stage, qg_name, reason)
+            plane_notify_qg(work_item_id, current_stage, qg_name, reason)
+            return
+
+    # Advance stage
+    update_task_stage(task_id, next_stage)
+    notify_stage_change(task_id, current_stage, next_stage)
+    plane_notify_stage(work_item_id, current_stage, next_stage)
+
+    # Launch agent associated with the current stage's transition
+    agent = get_agent_for_stage(current_stage)
+    if agent:
+        try:
+            task_desc = f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\nStage: {next_stage}"
+            job_id = enqueue_job(agent, repo, task_desc, task_id=task_id)
+            plane_notify_stage(work_item_id, current_stage, next_stage, agent)
+            logger.info(f"Task {task_id}: enqueued agent '{agent}', job_id={job_id}")
+        except Exception as e:
+            notify_error(task_id, f"Failed to launch agent '{agent}': {e}")
+            logger.error(f"Agent launch failed: {e}")
+
+
+async def _create_gitea_branch(repo: str, branch: str):
+    """Create a new branch in Gitea from main."""
+    owner = settings.gitea_owner
+    url = f"{settings.gitea_url}/api/v1/repos/{owner}/{repo}/branches"
+    headers = {"Authorization": f"token {settings.gitea_token}"}
+    payload = {"new_branch_name": branch, "old_branch_name": "main"}
+
+    async with httpx.AsyncClient() as client:
+        resp = await client.post(url, json=payload, headers=headers, timeout=10)
+        if resp.status_code == 409:
+            logger.info(f"Branch '{branch}' already exists")
+            return
+        resp.raise_for_status()
+        logger.info(f"Created branch '{branch}' in {owner}/{repo}")
+
+
+async def _create_initial_docs(repo: str, branch: str, work_item_id: str, name: str):
+    """Create initial business request doc in the feature branch."""
+    owner = settings.gitea_owner
+    file_path = f"docs/work-items/{work_item_id}/00-business-request.md"
+    url = f"{settings.gitea_url}/api/v1/repos/{owner}/{repo}/contents/{file_path}"
+    headers = {"Authorization": f"token {settings.gitea_token}"}
+
+    import base64
+    content = f"# Business Request: {name}\n\nWork Item ID: {work_item_id}\n\n## Description\n\nTBD\n"
+    encoded = base64.b64encode(content.encode()).decode()
+
+    payload = {
+        "message": f"docs: init {work_item_id} business request",
+        "content": encoded,
+        "branch": branch,
+    }
+
+    async with httpx.AsyncClient() as client:
+        resp = await client.post(url, json=payload, headers=headers, timeout=10)
+        if resp.status_code in (201, 422):  # 422 = already exists
+            return
+        resp.raise_for_status()
--- a/tests/test_git_worktree.py
+++ b/tests/test_git_worktree.py
@@ -0,0 +1,152 @@
+"""Tests for src/git_worktree (ORCH-2 / S-4): isolated worktree per task/branch.
+
+Uses real local git repos in tmp (a bare 'origin' + a working main clone) so that
+`git fetch origin`, `git worktree add`, branch creation from origin/main, reuse and
+removal are all exercised without network access.
+"""
+import os
+import subprocess
+import tempfile
+
+import pytest
+
+# Env must be set before importing app modules (same convention as the other suites).
+_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_wt.db")
+os.environ["ORCH_DB_PATH"] = _test_db
+os.environ["ORCH_REPOS_DIR"] = tempfile.gettempdir()
+os.environ["ORCH_GITEA_TOKEN"] = "test-token"
+os.environ["ORCH_PLANE_API_TOKEN"] = "test-token"
+
+from src import git_worktree
+from src.git_worktree import (
+    _safe,
+    get_worktree_path,
+    ensure_worktree,
+    remove_worktree,
+)
+
+
+def _git(cwd, *args):
+    return subprocess.run(["git", "-C", cwd, *args], capture_output=True, text=True)
+
+
+@pytest.fixture
+def repos(tmp_path, monkeypatch):
+    """Build a bare 'origin' with main + a feature branch, plus a main clone at repos_dir/<repo>.
+
+    Returns the repo name. settings.repos_dir / worktrees_dir are pointed at tmp.
+    """
+    repo = "enduro-trails"
+    repos_dir = tmp_path / "repos"
+    wt_dir = tmp_path / "repos" / "_wt"
+    repos_dir.mkdir(parents=True)
+
+    monkeypatch.setattr(git_worktree.settings, "repos_dir", str(repos_dir))
+    monkeypatch.setattr(git_worktree.settings, "worktrees_dir", str(wt_dir))
+
+    # Bare origin
+    origin = tmp_path / "origin.git"
+    subprocess.run(["git", "init", "--bare", "-b", "main", str(origin)], capture_output=True)
+
+    # Seed repo
+    seed = tmp_path / "seed"
+    seed.mkdir()
+    _git(str(seed), "init", "-b", "main")
+    _git(str(seed), "config", "user.email", "t@t")
+    _git(str(seed), "config", "user.name", "t")
+    (seed / "README.md").write_text("# seed\n")
+    _git(str(seed), "add", ".")
+    _git(str(seed), "commit", "-m", "init")
+    _git(str(seed), "remote", "add", "origin", str(origin))
+    _git(str(seed), "push", "origin", "main")
+    # An existing feature branch on origin
+    _git(str(seed), "checkout", "-b", "feature/existing")
+    (seed / "f.txt").write_text("feature\n")
+    _git(str(seed), "add", ".")
+    _git(str(seed), "commit", "-m", "feat")
+    _git(str(seed), "push", "origin", "feature/existing")
+
+    # Main clone at repos_dir/<repo>
+    main_clone = repos_dir / repo
+    subprocess.run(["git", "clone", str(origin), str(main_clone)], capture_output=True)
+    _git(str(main_clone), "config", "user.email", "t@t")
+    _git(str(main_clone), "config", "user.name", "t")
+    return repo
+
+
+# ---------------------------------------------------------------------------
+# _safe / get_worktree_path
+# ---------------------------------------------------------------------------
+class TestSafeAndPath:
+    def test_safe_replaces_slashes_and_specials(self):
+        assert _safe("feature/ET-001-x") == "feature_ET-001-x"
+        assert _safe("a b/c:d") == "a_b_c_d"
+        assert _safe("keep.dots-and_underscores") == "keep.dots-and_underscores"
+
+    def test_get_worktree_path(self, monkeypatch):
+        monkeypatch.setattr(git_worktree.settings, "worktrees_dir", "/repos/_wt")
+        assert get_worktree_path("repo", "feature/x") == "/repos/_wt/repo/feature_x"
+
+
+# ---------------------------------------------------------------------------
+# ensure_worktree
+# ---------------------------------------------------------------------------
+class TestEnsureWorktree:
+    def test_missing_main_repo_raises(self, tmp_path, monkeypatch):
+        monkeypatch.setattr(git_worktree.settings, "repos_dir", str(tmp_path / "nope"))
+        monkeypatch.setattr(git_worktree.settings, "worktrees_dir", str(tmp_path / "_wt"))
+        with pytest.raises(FileNotFoundError):
+            ensure_worktree("enduro-trails", "main")
+
+    def test_creates_worktree_for_existing_branch(self, repos):
+        wt = ensure_worktree(repos, "feature/existing")
+        assert os.path.isdir(wt)
+        assert wt == get_worktree_path(repos, "feature/existing")
+        # On the right branch
+        cur = _git(wt, "branch", "--show-current").stdout.strip()
+        assert cur == "feature/existing"
+        # Feature file from that branch is present (proves correct checkout)
+        assert os.path.isfile(os.path.join(wt, "f.txt"))
+
+    def test_creates_new_branch_from_origin_main(self, repos):
+        wt = ensure_worktree(repos, "feature/brand-new")
+        assert os.path.isdir(wt)
+        cur = _git(wt, "branch", "--show-current").stdout.strip()
+        assert cur == "feature/brand-new"
+        # Based on main -> README present, no feature file
+        assert os.path.isfile(os.path.join(wt, "README.md"))
+        assert not os.path.isfile(os.path.join(wt, "f.txt"))
+
+    def test_reuse_returns_same_path(self, repos):
+        wt1 = ensure_worktree(repos, "feature/existing")
+        wt2 = ensure_worktree(repos, "feature/existing")
+        assert wt1 == wt2
+        assert os.path.isdir(wt2)
+
+    def test_two_branches_are_isolated(self, repos):
+        a = ensure_worktree(repos, "feature/wt-A")
+        b = ensure_worktree(repos, "feature/wt-B")
+        assert a != b
+        ba = _git(a, "branch", "--show-current").stdout.strip()
+        bb = _git(b, "branch", "--show-current").stdout.strip()
+        assert ba == "feature/wt-A"
+        assert bb == "feature/wt-B"
+        # Writing in A must not affect B
+        with open(os.path.join(a, "only-a.txt"), "w") as f:
+            f.write("a")
+        assert not os.path.isfile(os.path.join(b, "only-a.txt"))
+
+
+# ---------------------------------------------------------------------------
+# remove_worktree
+# ---------------------------------------------------------------------------
+class TestRemoveWorktree:
+    def test_remove_deletes_worktree_dir(self, repos):
+        wt = ensure_worktree(repos, "feature/to-remove")
+        assert os.path.isdir(wt)
+        remove_worktree(repos, "feature/to-remove")
+        assert not os.path.isdir(wt)
+
+    def test_remove_nonexistent_is_noop(self, repos):
+        # Should not raise even if the worktree was never created.
+        remove_worktree(repos, "feature/never-made")
--- a/tests/test_launcher.py
+++ b/tests/test_launcher.py
@@ -0,0 +1,140 @@
+"""Tests for launcher critical functions and reviewer verdict parsing.
+
+Covers the audit-2026-06-02 fixes:
+  - B-1: _write_task_file writes directly to /repos/<repo>/<task_file> (no docker),
+         and raises on write failure instead of failing silently.
+  - S-5: check_reviewer_verdict reads the machine-readable `verdict:` field from
+         the YAML frontmatter only (no fragile substring matching).
+"""
+import os
+import tempfile
+
+import pytest
+
+# Override env before importing app modules (same convention as test_qg.py)
+_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_launcher.db")
+os.environ["ORCH_DB_PATH"] = _test_db
+os.environ["ORCH_REPOS_DIR"] = tempfile.gettempdir()
+os.environ["ORCH_GITEA_TOKEN"] = "test-token"
+os.environ["ORCH_PLANE_API_TOKEN"] = "test-token"
+
+from src.agents.launcher import AgentLauncher
+from src.qg.checks import check_reviewer_verdict
+
+
+# ---------------------------------------------------------------------------
+# B-1: _write_task_file
+# ---------------------------------------------------------------------------
+class TestWriteTaskFile:
+    """B-1 fix preserved + ORCH-2/S-4: task file now lands in the per-branch worktree.
+
+    _write_task_file(repo, branch, task_file, content) writes to
+    <worktrees_dir>/<repo>/<safe-branch>/<task_file> with a plain open() (no docker).
+    """
+
+    def _wt_dir(self, tmp_path, repo, branch):
+        from src.git_worktree import _safe
+        d = tmp_path / "_wt" / repo / _safe(branch)
+        d.mkdir(parents=True)
+        return d
+
+    def test_writes_to_worktree_path(self, tmp_path, monkeypatch):
+        """Task file is written to the worktree path, content matches (B-1 + S-4)."""
+        monkeypatch.setattr("src.git_worktree.settings.worktrees_dir", str(tmp_path / "_wt"))
+        wt = self._wt_dir(tmp_path, "enduro-trails", "feature/ET-001-x")
+
+        launcher = AgentLauncher()
+        launcher._write_task_file("enduro-trails", "feature/ET-001-x", ".task-dev.md", "hello-content")
+
+        written = wt / ".task-dev.md"
+        assert written.is_file()
+        assert written.read_text() == "hello-content"
+
+    def test_does_not_use_docker(self, tmp_path, monkeypatch):
+        """No subprocess/docker call: if subprocess.run were used it would error here."""
+        monkeypatch.setattr("src.git_worktree.settings.worktrees_dir", str(tmp_path / "_wt"))
+        self._wt_dir(tmp_path, "enduro-trails", "main")
+
+        called = {"run": False}
+
+        def _fail_run(*a, **k):
+            called["run"] = True
+            raise AssertionError("subprocess.run must not be called by _write_task_file")
+
+        monkeypatch.setattr("src.agents.launcher.subprocess.run", _fail_run)
+
+        launcher = AgentLauncher()
+        launcher._write_task_file("enduro-trails", "main", ".task.md", "x")
+        assert called["run"] is False
+
+    def test_raises_on_write_failure(self, tmp_path, monkeypatch):
+        """If the target worktree dir does not exist, raise RuntimeError (no silent fail)."""
+        monkeypatch.setattr("src.git_worktree.settings.worktrees_dir", str(tmp_path / "_wt"))
+        # worktree dir intentionally NOT created -> open() raises OSError
+
+        launcher = AgentLauncher()
+        with pytest.raises(RuntimeError):
+            launcher._write_task_file("nonexistent-repo", "main", ".task.md", "x")
+
+
+# ---------------------------------------------------------------------------
+# S-5: check_reviewer_verdict (frontmatter-only)
+# ---------------------------------------------------------------------------
+@pytest.fixture
+def review_repo(tmp_path, monkeypatch):
+    monkeypatch.setattr("src.qg.checks.settings.repos_dir", str(tmp_path))
+    wi_dir = tmp_path / "enduro-trails" / "docs" / "work-items" / "ET-001"
+    wi_dir.mkdir(parents=True)
+    return wi_dir
+
+
+def _write_review(wi_dir, text):
+    (wi_dir / "12-review.md").write_text(text)
+
+
+class TestCheckReviewerVerdict:
+    def test_approved_in_frontmatter(self, review_repo):
+        _write_review(review_repo, "---\ntype: review\nverdict: APPROVED\n---\n# Review\nbody\n")
+        passed, reason = check_reviewer_verdict("enduro-trails", "ET-001")
+        assert passed is True
+        assert "APPROVED" in reason
+
+    def test_request_changes_in_frontmatter(self, review_repo):
+        _write_review(review_repo, "---\ntype: review\nverdict: REQUEST_CHANGES\n---\n# Review\n")
+        passed, reason = check_reviewer_verdict("enduro-trails", "ET-001")
+        assert passed is False
+        assert "REQUEST_CHANGES" in reason
+
+    def test_lowercase_verdict_normalized(self, review_repo):
+        _write_review(review_repo, "---\nverdict: approved\n---\nbody\n")
+        passed, _ = check_reviewer_verdict("enduro-trails", "ET-001")
+        assert passed is True
+
+    def test_no_verdict_field_is_not_approved(self, review_repo):
+        # Frontmatter present but no verdict -> must NOT approve.
+        _write_review(review_repo, "---\ntype: review\nstatus: done\n---\nbody\n")
+        passed, reason = check_reviewer_verdict("enduro-trails", "ET-001")
+        assert passed is False
+        assert "verdict" in reason.lower()
+
+    def test_no_frontmatter_is_not_approved(self, review_repo):
+        # APPROVED appears only in body/table text -> must NOT cause false positive (S-5).
+        _write_review(review_repo, "# Review\n| Finding | Status |\n|---|---|\n| F-01 | APPROVED |\n")
+        passed, _ = check_reviewer_verdict("enduro-trails", "ET-001")
+        assert passed is False
+
+    def test_request_changes_in_body_does_not_block_approved_frontmatter(self, review_repo):
+        # Body mentions REQUEST_CHANGES in a table, but frontmatter verdict is APPROVED.
+        _write_review(
+            review_repo,
+            "---\nverdict: APPROVED\n---\n# Review\n"
+            "| Item | Old verdict |\n|---|---|\n| x | REQUEST_CHANGES |\n",
+        )
+        passed, reason = check_reviewer_verdict("enduro-trails", "ET-001")
+        assert passed is True
+        assert "APPROVED" in reason
+
+    def test_missing_file(self, review_repo):
+        passed, reason = check_reviewer_verdict("enduro-trails", "ET-999")
+        assert passed is False
+        assert "not found" in reason.lower()
--- a/tests/test_plane_webhook.py
+++ b/tests/test_plane_webhook.py
@@ -0,0 +1,180 @@
+"""ORCH-6: Plane webhook project-filter + repo-resolution tests.
+
+Verifies the core of the 2026-06-02 incident fix:
+  * webhook from an UNKNOWN Plane project  -> {"status": "ignored"} and no task
+  * webhook from the orchestrator project   -> task created with repo=orchestrator
+  * webhook from the enduro project          -> task created with repo=enduro-trails
+
+launcher.launch is mocked so no real agents are spawned. Gitea branch/doc
+creation is mocked (network). FastAPI TestClient drives the real endpoint.
+
+This module configures its own registry via monkeypatch + reload_projects so it
+is independent of ORCH_PROJECTS_JSON set by other test modules.
+"""
+
+import os
+import tempfile
+
+import pytest
+
+# Test DB / disable signature checks (same convention as test_webhooks.py).
+_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_plane.db")
+os.environ["ORCH_DB_PATH"] = _test_db
+os.environ.setdefault("ORCH_PLANE_WEBHOOK_SECRET", "")
+os.environ.setdefault("ORCH_GITEA_WEBHOOK_SECRET", "")
+os.environ.setdefault("ORCH_GITEA_TOKEN", "test-token")
+os.environ.setdefault("ORCH_PLANE_API_TOKEN", "test-token")
+
+from unittest.mock import patch, AsyncMock  # noqa: E402
+
+from fastapi.testclient import TestClient  # noqa: E402
+
+from src.main import app  # noqa: E402
+from src.db import init_db, get_db  # noqa: E402
+from src import projects as P  # noqa: E402
+from src.projects import reload_projects  # noqa: E402
+
+ORCH_PLANE_ID = "8da6aa25-a60e-44d6-a1e2-d8ae59aa7d6a"
+ENDURO_PLANE_ID = "7a79f0a9-5278-49cd-9007-9a338f238f9c"
+UNKNOWN_PLANE_ID = "deadbeef-0000-0000-0000-000000000000"
+
+client = TestClient(app)
+
+
+@pytest.fixture(autouse=True)
+def setup(monkeypatch):
+    """Fresh DB + a known two-project registry for each test."""
+    # settings.db_path is resolved once at import; force it to our isolated DB so
+    # this suite is independent of whichever test module imported config first.
+    monkeypatch.setattr(P.settings, "db_path", _test_db)
+    import src.db as _db
+    monkeypatch.setattr(_db.settings, "db_path", _test_db)
+    if os.path.exists(_test_db):
+        os.unlink(_test_db)
+    init_db()
+
+    # The webhook signature secret may be baked into the runtime env; this suite
+    # focuses on the project filter, so bypass signature verification.
+    monkeypatch.setattr("src.webhooks.plane.verify_plane_signature", lambda body, sig: True)
+
+    registry_json = (
+        f'[{{"plane_project_id": "{ENDURO_PLANE_ID}", "repo": "enduro-trails",'
+        f' "work_item_prefix": "ET", "name": "enduro-trails"}},'
+        f' {{"plane_project_id": "{ORCH_PLANE_ID}", "repo": "orchestrator",'
+        f' "work_item_prefix": "ORCH", "name": "orchestrator"}}]'
+    )
+    monkeypatch.setattr(P.settings, "projects_json", registry_json)
+    reload_projects()
+
+    yield
+
+    reload_projects()  # restore from env
+    if os.path.exists(_test_db):
+        os.unlink(_test_db)
+
+
+def _post_created(plane_project_id, plane_id="wi-1", name="A valid work item title"):
+    return client.post(
+        "/webhook/plane",
+        json={
+            "event": "work_item.created",
+            "data": {
+                "id": plane_id,
+                "name": name,
+                "description_stripped": "This is a sufficiently long description.",
+                "project": plane_project_id,
+            },
+        },
+    )
+
+
+# ---------------------------------------------------------------------------
+# Filter: unknown project is ignored, no side effects
+# ---------------------------------------------------------------------------
+
+@patch("src.webhooks.plane.launcher")
+@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
+@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
+def test_unknown_project_ignored(mock_branch, mock_docs, mock_launcher):
+    resp = _post_created(UNKNOWN_PLANE_ID, plane_id="ignore-me")
+    assert resp.status_code == 200
+    assert resp.json()["status"] == "ignored"
+    assert resp.json().get("reason") == "unknown project"
+
+    # No task, no branch, no agent.
+    conn = get_db()
+    task = conn.execute("SELECT * FROM tasks WHERE plane_id='ignore-me'").fetchone()
+    conn.close()
+    assert task is None
+    mock_branch.assert_not_called()
+    mock_launcher.launch.assert_not_called()
+
+
+# ---------------------------------------------------------------------------
+# orchestrator project -> repo=orchestrator, prefix ORCH
+# ---------------------------------------------------------------------------
+
+@patch("src.webhooks.plane.launcher")
+@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
+@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
+def test_orchestrator_project_routes_to_orchestrator_repo(mock_branch, mock_docs, mock_launcher):
+    mock_launcher.launch.return_value = 1
+    resp = _post_created(ORCH_PLANE_ID, plane_id="orch-1")
+    assert resp.status_code == 200
+    assert resp.json()["status"] == "accepted"
+
+    conn = get_db()
+    task = conn.execute("SELECT * FROM tasks WHERE plane_id='orch-1'").fetchone()
+    conn.close()
+    assert task is not None
+    assert task["repo"] == "orchestrator"
+    assert task["work_item_id"].startswith("ORCH-")
+    assert task["stage"] == "analysis"
+    # Branch created against the orchestrator repo.
+    args = mock_branch.call_args.args
+    assert args[0] == "orchestrator"
+
+
+# ---------------------------------------------------------------------------
+# enduro project -> repo=enduro-trails, prefix ET
+# ---------------------------------------------------------------------------
+
+@patch("src.webhooks.plane.launcher")
+@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
+@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
+def test_enduro_project_routes_to_enduro_repo(mock_branch, mock_docs, mock_launcher):
+    mock_launcher.launch.return_value = 1
+    resp = _post_created(ENDURO_PLANE_ID, plane_id="et-1")
+    assert resp.status_code == 200
+    assert resp.json()["status"] == "accepted"
+
+    conn = get_db()
+    task = conn.execute("SELECT * FROM tasks WHERE plane_id='et-1'").fetchone()
+    conn.close()
+    assert task is not None
+    assert task["repo"] == "enduro-trails"
+    assert task["work_item_id"].startswith("ET-")
+    args = mock_branch.call_args.args
+    assert args[0] == "enduro-trails"
+
+
+# ---------------------------------------------------------------------------
+# prefixes are independent per repo (ORCH-001 vs ET-001 in parallel)
+# ---------------------------------------------------------------------------
+
+@patch("src.webhooks.plane.launcher")
+@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
+@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
+def test_prefixes_independent_per_project(mock_branch, mock_docs, mock_launcher):
+    mock_launcher.launch.return_value = 1
+    _post_created(ORCH_PLANE_ID, plane_id="o1", name="Orchestrator item one")
+    _post_created(ENDURO_PLANE_ID, plane_id="e1", name="Enduro item one")
+    _post_created(ORCH_PLANE_ID, plane_id="o2", name="Orchestrator item two")
+
+    conn = get_db()
+    rows = {r["plane_id"]: r["work_item_id"] for r in
+            conn.execute("SELECT plane_id, work_item_id FROM tasks").fetchall()}
+    conn.close()
+    assert rows["o1"] == "ORCH-001"
+    assert rows["o2"] == "ORCH-002"
+    assert rows["e1"] == "ET-001"
--- a/tests/test_projects.py
+++ b/tests/test_projects.py
@@ -0,0 +1,177 @@
+"""ORCH-6: tests for the project registry (src/projects.py).
+
+Covers resolvers (by plane_id, by repo, unknown -> None, known ids) against the
+built-in default registry, plus ORCH_PROJECTS_JSON parsing (valid + malformed
+-> default fallback).
+
+The pure parser ``_parse_projects_json`` is tested directly so we don't mutate
+the module-global registry. Resolver tests run against the default registry; if
+another test (e.g. test_webhooks) set ORCH_PROJECTS_JSON in the env, we restore
+the default via monkeypatch + reload_projects to keep this file order-independent.
+"""
+
+import pytest
+
+from src import projects as P
+from src.projects import (
+    ProjectConfig,
+    get_project_by_plane_id,
+    get_project_by_repo,
+    known_plane_project_ids,
+    reload_projects,
+    _parse_projects_json,
+    _DEFAULT_PROJECTS,
+)
+
+# Known ids from the default registry / task spec.
+ENDURO_PLANE_ID = "7a79f0a9-5278-49cd-9007-9a338f238f9c"
+ORCH_PLANE_ID = "8da6aa25-a60e-44d6-a1e2-d8ae59aa7d6a"
+
+
+@pytest.fixture
+def default_registry(monkeypatch):
+    """Force the default (built-in) registry regardless of ORCH_PROJECTS_JSON
+    that other test modules may have set in the process env."""
+    monkeypatch.setattr(P.settings, "projects_json", "")
+    reload_projects()
+    yield
+    # Restore from current settings (whatever env says) after the test.
+    reload_projects()
+
+
+# ---------------------------------------------------------------------------
+# Resolvers
+# ---------------------------------------------------------------------------
+
+def test_get_project_by_plane_id_orchestrator(default_registry):
+    proj = get_project_by_plane_id(ORCH_PLANE_ID)
+    assert proj is not None
+    assert proj.repo == "orchestrator"
+    assert proj.work_item_prefix == "ORCH"
+    assert proj.plane_project_id == ORCH_PLANE_ID
+
+
+def test_get_project_by_plane_id_enduro(default_registry):
+    proj = get_project_by_plane_id(ENDURO_PLANE_ID)
+    assert proj is not None
+    assert proj.repo == "enduro-trails"
+    assert proj.work_item_prefix == "ET"
+
+
+def test_get_project_by_plane_id_unknown_returns_none(default_registry):
+    assert get_project_by_plane_id("00000000-0000-0000-0000-000000000000") is None
+
+
+def test_get_project_by_plane_id_empty_returns_none(default_registry):
+    assert get_project_by_plane_id("") is None
+    assert get_project_by_plane_id(None) is None
+
+
+def test_get_project_by_repo(default_registry):
+    assert get_project_by_repo("enduro-trails").work_item_prefix == "ET"
+    assert get_project_by_repo("orchestrator").work_item_prefix == "ORCH"
+
+
+def test_get_project_by_repo_unknown_returns_none(default_registry):
+    assert get_project_by_repo("does-not-exist") is None
+    assert get_project_by_repo("") is None
+    assert get_project_by_repo(None) is None
+
+
+def test_known_plane_project_ids(default_registry):
+    ids = known_plane_project_ids()
+    assert isinstance(ids, set)
+    assert ENDURO_PLANE_ID in ids
+    assert ORCH_PLANE_ID in ids
+    assert len(ids) == len(_DEFAULT_PROJECTS)
+
+
+# ---------------------------------------------------------------------------
+# ORCH_PROJECTS_JSON parsing (pure function, no global mutation)
+# ---------------------------------------------------------------------------
+
+def test_parse_empty_returns_none():
+    assert _parse_projects_json("") is None
+    assert _parse_projects_json("   ") is None
+    assert _parse_projects_json(None) is None
+
+
+def test_parse_valid_json():
+    raw = (
+        '[{"plane_project_id": "p-1", "repo": "repo-a", '
+        '"work_item_prefix": "AAA", "name": "Alpha"}]'
+    )
+    parsed = _parse_projects_json(raw)
+    assert parsed is not None
+    assert len(parsed) == 1
+    assert isinstance(parsed[0], ProjectConfig)
+    assert parsed[0].plane_project_id == "p-1"
+    assert parsed[0].repo == "repo-a"
+    assert parsed[0].work_item_prefix == "AAA"
+    assert parsed[0].name == "Alpha"
+
+
+def test_parse_valid_json_multiple():
+    raw = (
+        '[{"plane_project_id": "p-1", "repo": "repo-a", "work_item_prefix": "A"},'
+        ' {"plane_project_id": "p-2", "repo": "repo-b", "work_item_prefix": "B"}]'
+    )
+    parsed = _parse_projects_json(raw)
+    assert len(parsed) == 2
+    # name defaults to repo when omitted
+    assert parsed[0].name == "repo-a"
+    assert parsed[1].repo == "repo-b"
+
+
+def test_parse_malformed_json_returns_none():
+    assert _parse_projects_json("{not valid json") is None
+    assert _parse_projects_json("[}") is None
+
+
+def test_parse_not_an_array_returns_none():
+    # A JSON object (not array) is invalid -> fallback.
+    assert _parse_projects_json('{"plane_project_id": "p-1"}') is None
+
+
+def test_parse_skips_bad_entries_keeps_good():
+    raw = (
+        '[{"repo": "missing-id"},'  # missing required key -> skipped
+        ' {"plane_project_id": "p-2", "repo": "repo-b", "work_item_prefix": "B"}]'
+    )
+    parsed = _parse_projects_json(raw)
+    assert parsed is not None
+    assert len(parsed) == 1
+    assert parsed[0].plane_project_id == "p-2"
+
+
+def test_parse_all_bad_entries_returns_none():
+    # No valid entries -> None (fallback to default).
+    assert _parse_projects_json('[{"repo": "no-id"}, "not-an-object"]') is None
+
+
+def test_reload_from_custom_json(monkeypatch):
+    """End-to-end: set settings.projects_json, reload, resolvers reflect it."""
+    custom = (
+        '[{"plane_project_id": "custom-uuid", "repo": "custom-repo", '
+        '"work_item_prefix": "CUS", "name": "Custom"}]'
+    )
+    monkeypatch.setattr(P.settings, "projects_json", custom)
+    reload_projects()
+    try:
+        assert get_project_by_plane_id("custom-uuid").repo == "custom-repo"
+        assert get_project_by_repo("custom-repo").work_item_prefix == "CUS"
+        assert known_plane_project_ids() == {"custom-uuid"}
+        # The built-in defaults must NOT be present when JSON overrides.
+        assert get_project_by_plane_id(ENDURO_PLANE_ID) is None
+    finally:
+        reload_projects()
+
+
+def test_reload_invalid_json_falls_back_to_default(monkeypatch):
+    monkeypatch.setattr(P.settings, "projects_json", "{garbage")
+    reload_projects()
+    try:
+        assert get_project_by_plane_id(ENDURO_PLANE_ID) is not None
+        assert get_project_by_plane_id(ORCH_PLANE_ID) is not None
+    finally:
+        reload_projects()
--- a/tests/test_qg.py
+++ b/tests/test_qg.py
@@ -0,0 +1,188 @@
+import pytest
+import os
+import tempfile
+from unittest.mock import patch, MagicMock
+import httpx
+
+# Override DB path before importing app
+_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator.db")
+os.environ["ORCH_DB_PATH"] = _test_db
+os.environ["ORCH_REPOS_DIR"] = tempfile.gettempdir()
+os.environ["ORCH_GITEA_TOKEN"] = "test-token"
+os.environ["ORCH_PLANE_API_TOKEN"] = "test-token"
+
+from src.qg.checks import (
+    check_analysis_complete,
+    check_architecture_done,
+    check_ci_green,
+    check_review_approved,
+    check_tests_passed,
+)
+
+
+@pytest.fixture(autouse=True)
+def setup_work_item_dir(tmp_path, monkeypatch):
+    """Create temp repo structure for filesystem checks."""
+    monkeypatch.setattr("src.qg.checks.settings.repos_dir", str(tmp_path))
+    repo_dir = tmp_path / "enduro-trails"
+    repo_dir.mkdir()
+    return repo_dir
+
+
+class TestCheckAnalysisComplete:
+    def test_all_files_present(self, setup_work_item_dir):
+        repo_dir = setup_work_item_dir
+        wi_dir = repo_dir / "docs" / "work-items" / "ET-001"
+        wi_dir.mkdir(parents=True)
+        (wi_dir / "01-brd.md").write_text("# BRD")
+        (wi_dir / "02-trz.md").write_text("# TRZ")
+        (wi_dir / "03-acceptance-criteria.md").write_text("# AC")
+        (wi_dir / "04-test-plan.yaml").write_text("tests: []")
+
+        passed, reason = check_analysis_complete("enduro-trails", "ET-001")
+        assert passed is True
+
+    def test_missing_files(self, setup_work_item_dir):
+        repo_dir = setup_work_item_dir
+        wi_dir = repo_dir / "docs" / "work-items" / "ET-002"
+        wi_dir.mkdir(parents=True)
+        (wi_dir / "01-brd.md").write_text("# BRD")
+
+        passed, reason = check_analysis_complete("enduro-trails", "ET-002")
+        assert passed is False
+        assert "Missing files" in reason
+
+    def test_no_directory(self, setup_work_item_dir):
+        passed, reason = check_analysis_complete("enduro-trails", "ET-999")
+        assert passed is False
+
+
+class TestCheckArchitectureDone:
+    def test_adr_directory_with_files(self, setup_work_item_dir):
+        repo_dir = setup_work_item_dir
+        adr_dir = repo_dir / "docs" / "work-items" / "ET-001" / "06-adr"
+        adr_dir.mkdir(parents=True)
+        (adr_dir / "001-use-postgres.md").write_text("# ADR")
+
+        passed, reason = check_architecture_done("enduro-trails", "ET-001")
+        assert passed is True
+
+    def test_infra_requirements(self, setup_work_item_dir):
+        repo_dir = setup_work_item_dir
+        wi_dir = repo_dir / "docs" / "work-items" / "ET-001"
+        wi_dir.mkdir(parents=True)
+        (wi_dir / "07-infra-requirements.md").write_text("# Infra")
+
+        passed, reason = check_architecture_done("enduro-trails", "ET-001")
+        assert passed is True
+
+    def test_empty_adr_directory(self, setup_work_item_dir):
+        repo_dir = setup_work_item_dir
+        adr_dir = repo_dir / "docs" / "work-items" / "ET-001" / "06-adr"
+        adr_dir.mkdir(parents=True)
+
+        passed, reason = check_architecture_done("enduro-trails", "ET-001")
+        assert passed is False
+
+    def test_nothing_present(self, setup_work_item_dir):
+        passed, reason = check_architecture_done("enduro-trails", "ET-001")
+        assert passed is False
+
+
+class TestCheckCIGreen:
+    @patch("src.qg.checks.httpx.get")
+    def test_ci_success(self, mock_get):
+        mock_resp = MagicMock()
+        mock_resp.status_code = 200
+        mock_resp.json.return_value = {"state": "success"}
+        mock_resp.raise_for_status = MagicMock()
+        mock_get.return_value = mock_resp
+
+        passed, reason = check_ci_green("enduro-trails", "feature/ET-001-test")
+        assert passed is True
+        assert "green" in reason.lower()
+
+    @patch("src.qg.checks.httpx.get")
+    def test_ci_pending(self, mock_get):
+        mock_resp = MagicMock()
+        mock_resp.status_code = 200
+        mock_resp.json.return_value = {"state": "pending"}
+        mock_resp.raise_for_status = MagicMock()
+        mock_get.return_value = mock_resp
+
+        passed, reason = check_ci_green("enduro-trails", "feature/ET-001-test")
+        assert passed is False
+
+    @patch("src.qg.checks.httpx.get")
+    def test_ci_branch_not_found(self, mock_get):
+        mock_resp = MagicMock()
+        mock_resp.status_code = 404
+        mock_get.return_value = mock_resp
+
+        passed, reason = check_ci_green("enduro-trails", "nonexistent")
+        assert passed is False
+
+
+class TestCheckReviewApproved:
+    @patch("src.qg.checks.httpx.get")
+    def test_approved(self, mock_get):
+        mock_resp = MagicMock()
+        mock_resp.status_code = 200
+        mock_resp.json.return_value = [
+            {"state": "APPROVED", "user": {"login": "reviewer1"}}
+        ]
+        mock_resp.raise_for_status = MagicMock()
+        mock_get.return_value = mock_resp
+
+        passed, reason = check_review_approved("enduro-trails", 1)
+        assert passed is True
+
+    @patch("src.qg.checks.httpx.get")
+    def test_changes_requested(self, mock_get):
+        mock_resp = MagicMock()
+        mock_resp.status_code = 200
+        mock_resp.json.return_value = [
+            {"state": "REQUEST_CHANGES", "user": {"login": "reviewer1"}}
+        ]
+        mock_resp.raise_for_status = MagicMock()
+        mock_get.return_value = mock_resp
+
+        passed, reason = check_review_approved("enduro-trails", 1)
+        assert passed is False
+        assert "Changes requested" in reason
+
+    @patch("src.qg.checks.httpx.get")
+    def test_no_reviews(self, mock_get):
+        mock_resp = MagicMock()
+        mock_resp.status_code = 200
+        mock_resp.json.return_value = []
+        mock_resp.raise_for_status = MagicMock()
+        mock_get.return_value = mock_resp
+
+        passed, reason = check_review_approved("enduro-trails", 1)
+        assert passed is False
+
+
+class TestCheckTestsPassed:
+    def test_report_with_pass(self, setup_work_item_dir):
+        repo_dir = setup_work_item_dir
+        wi_dir = repo_dir / "docs" / "work-items" / "ET-001"
+        wi_dir.mkdir(parents=True)
+        (wi_dir / "13-test-report.md").write_text("# Test Report\n\nResult: PASS\n")
+
+        passed, reason = check_tests_passed("enduro-trails", "ET-001")
+        assert passed is True
+
+    def test_report_without_pass(self, setup_work_item_dir):
+        repo_dir = setup_work_item_dir
+        wi_dir = repo_dir / "docs" / "work-items" / "ET-001"
+        wi_dir.mkdir(parents=True)
+        (wi_dir / "13-test-report.md").write_text("# Test Report\n\nResult: FAIL\n")
+
+        passed, reason = check_tests_passed("enduro-trails", "ET-001")
+        assert passed is False
+
+    def test_no_report(self, setup_work_item_dir):
+        passed, reason = check_tests_passed("enduro-trails", "ET-001")
+        assert passed is False
+        assert "not found" in reason.lower()
--- a/tests/test_queue.py
+++ b/tests/test_queue.py
@@ -0,0 +1,304 @@
+"""Tests for ORCH-1 (F-2b) persistent job queue.
+
+Covers:
+  - enqueue_job -> claim_next_job -> mark_job lifecycle
+  - claim_next_job atomicity (no double-dispatch of the same job)
+  - retry: fail -> requeue while attempts < max_attempts, then failed
+  - requeue_running_jobs (queue-recovery)
+  - count_running_jobs / job_status_counts / recent_jobs
+  - QueueWorker respects max_concurrency (Popen / launch fully mocked)
+
+The real claude/Popen is NEVER spawned: launcher.launch_job is mocked in worker
+tests, and the launcher finalize logic is exercised directly via mark_job.
+"""
+import os
+import tempfile
+
+import pytest
+
+# Override env before importing app modules (same convention as test_qg.py).
+_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_queue.db")
+os.environ["ORCH_DB_PATH"] = _test_db
+os.environ["ORCH_REPOS_DIR"] = tempfile.gettempdir()
+os.environ["ORCH_GITEA_TOKEN"] = "test-token"
+os.environ["ORCH_PLANE_API_TOKEN"] = "test-token"
+
+import src.db as db
+from src.db import (
+    init_db,
+    enqueue_job,
+    claim_next_job,
+    mark_job,
+    count_running_jobs,
+    requeue_running_jobs,
+    get_job,
+    job_status_counts,
+    recent_jobs,
+)
+
+
+@pytest.fixture(autouse=True)
+def fresh_db(tmp_path, monkeypatch):
+    """Point the DB at a fresh per-test sqlite file and init the schema."""
+    dbfile = tmp_path / "queue.db"
+    monkeypatch.setattr(db.settings, "db_path", str(dbfile))
+    init_db()
+    yield
+
+
+# ---------------------------------------------------------------------------
+# enqueue / claim / mark lifecycle
+# ---------------------------------------------------------------------------
+class TestLifecycle:
+    def test_enqueue_creates_queued_job(self):
+        jid = enqueue_job("analyst", "enduro-trails", "task body", task_id=7)
+        job = get_job(jid)
+        assert job["status"] == "queued"
+        assert job["agent"] == "analyst"
+        assert job["repo"] == "enduro-trails"
+        assert job["task_content"] == "task body"
+        assert job["task_id"] == 7
+        assert job["attempts"] == 0
+        assert job["max_attempts"] == 2
+
+    def test_claim_marks_running_and_increments_attempts(self):
+        jid = enqueue_job("developer", "repo")
+        claimed = claim_next_job()
+        assert claimed is not None
+        assert claimed["id"] == jid
+        assert claimed["status"] == "running"
+        assert claimed["attempts"] == 1
+        assert count_running_jobs() == 1
+
+    def test_claim_empty_queue_returns_none(self):
+        assert claim_next_job() is None
+
+    def test_claim_is_fifo(self):
+        a = enqueue_job("analyst", "r")
+        b = enqueue_job("developer", "r")
+        assert claim_next_job()["id"] == a
+        assert claim_next_job()["id"] == b
+
+    def test_mark_done(self):
+        jid = enqueue_job("tester", "r")
+        claim_next_job()
+        mark_job(jid, "done", run_id=42)
+        job = get_job(jid)
+        assert job["status"] == "done"
+        assert job["run_id"] == 42
+        assert job["finished_at"] is not None
+        assert count_running_jobs() == 0
+
+    def test_mark_failed_records_error(self):
+        jid = enqueue_job("tester", "r")
+        claim_next_job()
+        mark_job(jid, "failed", run_id=9, error="boom")
+        job = get_job(jid)
+        assert job["status"] == "failed"
+        assert job["error"] == "boom"
+        assert job["finished_at"] is not None
+
+
+# ---------------------------------------------------------------------------
+# claim atomicity — no double dispatch
+# ---------------------------------------------------------------------------
+class TestClaimAtomicity:
+    def test_single_job_claimed_once(self):
+        jid = enqueue_job("analyst", "r")
+        first = claim_next_job()
+        second = claim_next_job()
+        assert first["id"] == jid
+        assert second is None  # already running, not re-dispatched
+
+    def test_concurrent_claims_no_duplicate(self):
+        """Many enqueued jobs claimed from parallel threads -> each claimed once."""
+        import threading
+
+        n = 20
+        for _ in range(n):
+            enqueue_job("developer", "r")
+
+        claimed_ids = []
+        lock = threading.Lock()
+
+        def grab():
+            while True:
+                job = claim_next_job()
+                if job is None:
+                    return
+                with lock:
+                    claimed_ids.append(job["id"])
+
+        threads = [threading.Thread(target=grab) for _ in range(8)]
+        for t in threads:
+            t.start()
+        for t in threads:
+            t.join()
+
+        assert len(claimed_ids) == n
+        assert len(set(claimed_ids)) == n  # no id claimed twice
+        assert count_running_jobs() == n
+
+
+# ---------------------------------------------------------------------------
+# retry semantics (mirrors launcher._finalize_job logic)
+# ---------------------------------------------------------------------------
+class TestRetry:
+    def test_fail_requeues_while_under_max(self):
+        jid = enqueue_job("developer", "r", max_attempts=2)
+        job = claim_next_job()              # attempts=1
+        assert job["attempts"] == 1
+        # attempts(1) < max(2) -> requeue
+        mark_job(jid, "queued", error="exit 1")
+        j = get_job(jid)
+        assert j["status"] == "queued"
+        assert j["error"] == "exit 1"
+        assert j["started_at"] is None      # requeue clears started_at
+
+    def test_fail_fails_when_max_reached(self):
+        jid = enqueue_job("developer", "r", max_attempts=2)
+        claim_next_job()                    # attempts=1 -> requeue
+        mark_job(jid, "queued")
+        job2 = claim_next_job()             # attempts=2
+        assert job2["attempts"] == 2
+        # attempts(2) >= max(2) -> failed
+        mark_job(jid, "failed", error="exit 1")
+        assert get_job(jid)["status"] == "failed"
+
+    def test_finalize_job_done(self):
+        """launcher._finalize_job marks done on exit_code 0 (no Popen needed)."""
+        from src.agents.launcher import AgentLauncher
+        jid = enqueue_job("analyst", "r")
+        claim_next_job()
+        AgentLauncher()._finalize_job(jid, "analyst", run_id=5, exit_code=0)
+        assert get_job(jid)["status"] == "done"
+
+    def test_finalize_job_requeue_then_fail(self, monkeypatch):
+        from src.agents.launcher import AgentLauncher
+        # Silence telegram side-effect.
+        monkeypatch.setattr("src.notifications.send_telegram", lambda *a, **k: None)
+        lr = AgentLauncher()
+        jid = enqueue_job("developer", "r", max_attempts=2)
+
+        claim_next_job()                    # attempts=1
+        lr._finalize_job(jid, "developer", run_id=1, exit_code=2)
+        assert get_job(jid)["status"] == "queued"  # 1 < 2 -> requeue
+
+        claim_next_job()                    # attempts=2
+        lr._finalize_job(jid, "developer", run_id=2, exit_code=2)
+        assert get_job(jid)["status"] == "failed"  # 2 >= 2 -> failed
+
+
+# ---------------------------------------------------------------------------
+# queue-recovery
+# ---------------------------------------------------------------------------
+class TestRequeueRunning:
+    def test_requeue_running_jobs(self):
+        a = enqueue_job("analyst", "r")
+        b = enqueue_job("developer", "r")
+        claim_next_job()  # a -> running
+        claim_next_job()  # b -> running
+        assert count_running_jobs() == 2
+        n = requeue_running_jobs()
+        assert n == 2
+        assert count_running_jobs() == 0
+        assert get_job(a)["status"] == "queued"
+        assert get_job(b)["status"] == "queued"
+
+    def test_requeue_preserves_attempts(self):
+        jid = enqueue_job("analyst", "r")
+        claim_next_job()  # attempts=1
+        requeue_running_jobs()
+        assert get_job(jid)["attempts"] == 1  # not reset
+
+
+# ---------------------------------------------------------------------------
+# observability helpers
+# ---------------------------------------------------------------------------
+class TestObservability:
+    def test_status_counts(self):
+        enqueue_job("analyst", "r")        # stays queued
+        enqueue_job("developer", "r")      # first claimed -> running (FIFO)
+        claim_next_job()
+        counts = job_status_counts()
+        assert counts["running"] == 1
+        assert counts["queued"] == 1
+        assert counts["done"] == 0
+        assert counts["failed"] == 0
+
+    def test_recent_jobs_desc(self):
+        ids = [enqueue_job("analyst", "r") for _ in range(3)]
+        recent = recent_jobs(10)
+        assert [r["id"] for r in recent] == sorted(ids, reverse=True)
+
+
+# ---------------------------------------------------------------------------
+# QueueWorker max_concurrency (launch_job fully mocked — no real Popen)
+# ---------------------------------------------------------------------------
+class TestWorkerConcurrency:
+    @pytest.fixture(autouse=True)
+    def _ok_preflight(self, monkeypatch):
+        # ORCH-1 resilience: the worker gates claims behind preflight; in tests there
+        # is no claude binary, so stub preflight OK to exercise pure queue/concurrency.
+        monkeypatch.setattr("src.queue_worker.preflight.check", lambda *a, **k: (True, "ok"))
+
+    def test_worker_respects_max_concurrency(self, monkeypatch):
+        from src.queue_worker import QueueWorker
+
+        launched = []
+
+        def fake_launch_job(job):
+            # Simulate a long-running agent: the job stays 'running' (we do NOT
+            # mark it done), so the slot remains occupied.
+            launched.append(job["id"])
+            return 100 + job["id"]
+
+        monkeypatch.setattr("src.queue_worker.launcher.launch_job", fake_launch_job)
+
+        for _ in range(5):
+            enqueue_job("developer", "r")
+
+        w = QueueWorker(max_concurrency=2, poll_interval=0.01)
+        w._drain_once()
+
+        # Only max_concurrency jobs may be launched / running at once.
+        assert len(launched) == 2
+        assert count_running_jobs() == 2
+
+    def test_worker_drains_as_slots_free(self, monkeypatch):
+        from src.queue_worker import QueueWorker
+
+        def fake_launch_job(job):
+            # Immediately complete the job so the slot frees for the next claim.
+            mark_job(job["id"], "done", run_id=job["id"])
+            return job["id"]
+
+        monkeypatch.setattr("src.queue_worker.launcher.launch_job", fake_launch_job)
+
+        for _ in range(4):
+            enqueue_job("analyst", "r")
+
+        w = QueueWorker(max_concurrency=1, poll_interval=0.01)
+        w._drain_once()
+
+        # With instant completion and concurrency 1, one drain pass empties the queue.
+        assert job_status_counts()["done"] == 4
+        assert count_running_jobs() == 0
+
+    def test_worker_launch_failure_does_not_wedge_slot(self, monkeypatch):
+        from src.queue_worker import QueueWorker
+
+        def boom(job):
+            raise RuntimeError("repo missing")
+
+        monkeypatch.setattr("src.queue_worker.launcher.launch_job", boom)
+        monkeypatch.setattr("src.notifications.send_telegram", lambda *a, **k: None)
+
+        enqueue_job("developer", "r", max_attempts=1)
+        w = QueueWorker(max_concurrency=1, poll_interval=0.01)
+        w._drain_once()
+
+        # attempts=1 >= max_attempts=1 -> failed, not stuck running.
+        assert count_running_jobs() == 0
+        counts = job_status_counts()
+        assert counts["failed"] == 1
--- a/tests/test_resilience.py
+++ b/tests/test_resilience.py
@@ -0,0 +1,295 @@
+"""ORCH-1 resilience tests: preflight, 429-classifier, backoff, circuit breaker.
+
+No real claude/Popen is ever spawned: preflight subprocess and launcher.launch_job
+are mocked. DB is a fresh per-test sqlite file.
+"""
+import os
+import tempfile
+
+import pytest
+
+_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_resilience.db")
+os.environ["ORCH_DB_PATH"] = _test_db
+os.environ["ORCH_REPOS_DIR"] = tempfile.gettempdir()
+os.environ["ORCH_GITEA_TOKEN"] = "test-token"
+os.environ["ORCH_PLANE_API_TOKEN"] = "test-token"
+
+import src.db as db
+from src.db import (
+    init_db, enqueue_job, claim_next_job, get_job, count_running_jobs,
+    mark_job_transient,
+)
+from src import preflight, error_classifier
+from src.error_classifier import classify_text, parse_retry_after, classify_log_file
+from src.queue_worker import QueueWorker, CircuitBreaker
+from src.agents.launcher import AgentLauncher
+
+
+@pytest.fixture(autouse=True)
+def fresh_db(tmp_path, monkeypatch):
+    monkeypatch.setattr(db.settings, "db_path", str(tmp_path / "res.db"))
+    init_db()
+    preflight.reset_cache()
+    yield
+
+
+# ---------------------------------------------------------------------------
+# A. Preflight
+# ---------------------------------------------------------------------------
+class TestPreflight:
+    def test_fail_when_bin_missing(self, monkeypatch):
+        monkeypatch.setattr(preflight, "_claude_bin", lambda: "/no/such/claude")
+        ok, reason = preflight.check(force=True)
+        assert ok is False
+        assert "not found" in reason.lower()
+
+    def test_ok_when_version_succeeds(self, monkeypatch, tmp_path):
+        fake_bin = tmp_path / "claude"
+        fake_bin.write_text("#!/bin/sh\necho v1\n")
+        monkeypatch.setattr(preflight, "_claude_bin", lambda: str(fake_bin))
+        monkeypatch.setattr(preflight, "_run_version", lambda b: (True, "1.2.3"))
+        ok, reason = preflight.check(force=True)
+        assert ok is True
+
+    def test_cache_does_not_recheck_within_ttl(self, monkeypatch, tmp_path):
+        fake_bin = tmp_path / "claude"
+        fake_bin.write_text("x")
+        monkeypatch.setattr(preflight, "_claude_bin", lambda: str(fake_bin))
+        monkeypatch.setattr(db.settings, "preflight_cache_ttl", 999)
+
+        calls = {"n": 0}
+
+        def counting_version(b):
+            calls["n"] += 1
+            return True, "ok"
+
+        monkeypatch.setattr(preflight, "_run_version", counting_version)
+        preflight.reset_cache()
+        preflight.check()           # first -> runs version
+        preflight.check()           # cached -> no extra version call
+        preflight.check()
+        assert calls["n"] == 1
+
+    def test_force_bypasses_cache(self, monkeypatch, tmp_path):
+        fake_bin = tmp_path / "claude"
+        fake_bin.write_text("x")
+        monkeypatch.setattr(preflight, "_claude_bin", lambda: str(fake_bin))
+        calls = {"n": 0}
+        monkeypatch.setattr(preflight, "_run_version",
+                            lambda b: (calls.__setitem__("n", calls["n"] + 1), (True, "ok"))[1])
+        preflight.reset_cache()
+        preflight.check()
+        preflight.check(force=True)
+        assert calls["n"] == 2
+
+    def test_worker_does_not_claim_when_preflight_fails(self, monkeypatch):
+        # Preflight FAIL -> job stays queued, launch_job never called.
+        monkeypatch.setattr("src.queue_worker.preflight.check",
+                            lambda *a, **k: (False, "down"))
+        called = {"launch": False}
+        monkeypatch.setattr("src.queue_worker.launcher.launch_job",
+                            lambda job: called.__setitem__("launch", True))
+        jid = enqueue_job("analyst", "r")
+        QueueWorker(max_concurrency=1, poll_interval=0.01)._drain_once()
+        assert called["launch"] is False
+        assert get_job(jid)["status"] == "queued"
+        assert count_running_jobs() == 0
+
+
+# ---------------------------------------------------------------------------
+# B. Error classifier
+# ---------------------------------------------------------------------------
+class TestClassifier:
+    @pytest.mark.parametrize("text", [
+        "Error: 429 Too Many Requests",
+        "anthropic rate limit exceeded",
+        "overloaded_error: server is overloaded",
+        "API quota exhausted",
+        "503 Service Unavailable",
+        "connection reset by peer",
+    ])
+    def test_transient_patterns(self, text):
+        assert classify_text(text) == "transient"
+
+    @pytest.mark.parametrize("text", [
+        "Traceback: KeyError 'foo'",
+        "SyntaxError: invalid syntax",
+        "assertion failed in test",
+        "",
+    ])
+    def test_permanent_patterns(self, text):
+        assert classify_text(text) == "permanent"
+
+    def test_retry_after_header(self):
+        assert parse_retry_after("HTTP/1.1 429\nRetry-After: 42\n") == 42
+
+    def test_retry_after_json(self):
+        assert parse_retry_after('{"error":{"type":"rate_limit","retry_after": 7}}') == 7
+
+    def test_retry_after_absent(self):
+        assert parse_retry_after("just an error") is None
+
+    def test_classify_log_file(self, tmp_path):
+        p = tmp_path / "run.log"
+        p.write_text("...lots of output...\n429 rate limit. Retry-After: 30\n")
+        kind, ra = classify_log_file(str(p))
+        assert kind == "transient"
+        assert ra == 30
+
+    def test_classify_missing_file_is_permanent(self):
+        kind, ra = classify_log_file("/no/such/log")
+        assert kind == "permanent"
+        assert ra is None
+
+
+# ---------------------------------------------------------------------------
+# C. Backoff + available_at gating
+# ---------------------------------------------------------------------------
+class TestBackoff:
+    def test_backoff_grows_exponentially(self):
+        lr = AgentLauncher()
+        # base=10, cap=600 (defaults)
+        b1 = lr._backoff_seconds(1)
+        b2 = lr._backoff_seconds(2)
+        b3 = lr._backoff_seconds(3)
+        assert b1 == 20      # 2^1*10
+        assert b2 == 40      # 2^2*10
+        assert b3 == 80      # 2^3*10
+        assert b2 > b1 and b3 > b2
+
+    def test_backoff_capped(self):
+        lr = AgentLauncher()
+        assert lr._backoff_seconds(20) == 600  # capped at backoff_max_seconds
+
+    def test_retry_after_respected_when_larger(self):
+        lr = AgentLauncher()
+        # transient_attempts=1 -> base backoff 20; Retry-After=120 wins.
+        assert lr._backoff_seconds(1, retry_after=120) == 120
+
+    def test_retry_after_ignored_when_smaller(self):
+        lr = AgentLauncher()
+        assert lr._backoff_seconds(3, retry_after=5) == 80  # backoff bigger
+
+    def test_transient_requeue_sets_future_available_at_and_claim_skips(self):
+        jid = enqueue_job("developer", "r")
+        claim_next_job()
+        # Big backoff -> available_at far in the future.
+        mark_job_transient(jid, 3600, error="429")
+        job = get_job(jid)
+        assert job["status"] == "queued"
+        assert job["transient_attempts"] == 1
+        assert job["available_at"] is not None
+        # claim must NOT pick it up while available_at is in the future.
+        assert claim_next_job() is None
+
+    def test_transient_requeue_claimable_when_due(self):
+        jid = enqueue_job("developer", "r")
+        claim_next_job()
+        mark_job_transient(jid, -5, error="429")  # available_at in the past
+        c = claim_next_job()
+        assert c is not None and c["id"] == jid
+
+
+# ---------------------------------------------------------------------------
+# D. Launcher transient/permanent finalize (no Popen)
+# ---------------------------------------------------------------------------
+class TestFinalizeClassified:
+    def test_transient_failure_backoff_requeue(self, tmp_path, monkeypatch):
+        monkeypatch.setattr("src.notifications.send_telegram", lambda *a, **k: None)
+        log = tmp_path / "1.log"
+        log.write_text("Error 429 rate limit exceeded\n")
+        jid = enqueue_job("developer", "r", max_attempts=2)
+        claim_next_job()
+        AgentLauncher()._finalize_job(jid, "developer", run_id=1, exit_code=1,
+                                      output_path=str(log))
+        job = get_job(jid)
+        assert job["status"] == "queued"
+        assert job["transient_attempts"] == 1
+        assert job["available_at"] is not None     # backoff-gated
+        assert job["attempts"] == 1                 # code-fault budget NOT burned
+
+    def test_permanent_failure_uses_normal_attempts(self, tmp_path, monkeypatch):
+        monkeypatch.setattr("src.notifications.send_telegram", lambda *a, **k: None)
+        log = tmp_path / "2.log"
+        log.write_text("Traceback: ValueError\n")
+        jid = enqueue_job("developer", "r", max_attempts=2)
+        claim_next_job()
+        AgentLauncher()._finalize_job(jid, "developer", run_id=2, exit_code=1,
+                                      output_path=str(log))
+        job = get_job(jid)
+        assert job["status"] == "queued"
+        assert job["transient_attempts"] == 0       # not transient
+        assert job["available_at"] is None          # no backoff for code-fault
+
+    def test_transient_exhausts_to_failed(self, tmp_path, monkeypatch):
+        monkeypatch.setattr("src.notifications.send_telegram", lambda *a, **k: None)
+        monkeypatch.setattr(db.settings, "transient_max_attempts", 2)
+        log = tmp_path / "3.log"
+        log.write_text("overloaded_error\n")
+        lr = AgentLauncher()
+        jid = enqueue_job("developer", "r")
+        claim_next_job()
+        lr._finalize_job(jid, "developer", 1, exit_code=1, output_path=str(log))
+        assert get_job(jid)["status"] == "queued"   # transient 1 -> requeue
+        # force claimable and retry
+        mark_job_transient(jid, -1)                  # makes it due; transient=2 now
+        claim_next_job()
+        lr._finalize_job(jid, "developer", 2, exit_code=1, output_path=str(log))
+        assert get_job(jid)["status"] == "failed"    # transient budget exhausted
+
+
+# ---------------------------------------------------------------------------
+# E. Circuit breaker
+# ---------------------------------------------------------------------------
+class TestCircuitBreaker:
+    def test_opens_after_threshold(self):
+        cb = CircuitBreaker(threshold=3, pause_seconds=300)
+        assert cb.allow_claim() is True
+        cb.record_transient()
+        cb.record_transient()
+        assert cb.state == "closed"
+        cb.record_transient()                 # 3rd -> open
+        assert cb.state == "open"
+        assert cb.allow_claim() is False      # paused, no CLI calls
+
+    def test_recovered_resets_streak(self):
+        cb = CircuitBreaker(threshold=3)
+        cb.record_transient()
+        cb.record_transient()
+        cb.record_recovered()
+        assert cb.consecutive_transient == 0
+        assert cb.state == "closed"
+
+    def test_half_open_after_pause_then_closed_on_success(self, monkeypatch):
+        cb = CircuitBreaker(threshold=2, pause_seconds=300)
+        cb.record_transient()
+        cb.record_transient()                 # open
+        assert cb.state == "open"
+        # Simulate the pause elapsing.
+        cb.opened_at -= 301
+        assert cb.allow_claim() is True       # -> half-open (probe)
+        assert cb.state == "half-open"
+        cb.record_recovered()                 # probe succeeded
+        assert cb.state == "closed"
+
+    def test_half_open_reopens_on_transient(self):
+        cb = CircuitBreaker(threshold=2, pause_seconds=300)
+        cb.record_transient(); cb.record_transient()   # open
+        cb.opened_at -= 301
+        cb.allow_claim()                      # half-open
+        assert cb.state == "half-open"
+        cb.record_transient()                 # probe failed -> re-open
+        assert cb.state == "open"
+
+    def test_breaker_blocks_worker_claim(self, monkeypatch):
+        monkeypatch.setattr("src.queue_worker.preflight.check",
+                            lambda *a, **k: (True, "ok"))
+        called = {"launch": False}
+        monkeypatch.setattr("src.queue_worker.launcher.launch_job",
+                            lambda job: called.__setitem__("launch", True))
+        cb = CircuitBreaker(threshold=1, pause_seconds=300)
+        cb.record_transient()                 # open immediately
+        w = QueueWorker(max_concurrency=1, poll_interval=0.01, breaker=cb)
+        enqueue_job("analyst", "r")
+        w._drain_once()
+        assert called["launch"] is False      # breaker open -> no claim, no CLI
--- a/tests/test_webhooks.py
+++ b/tests/test_webhooks.py
@@ -1,12 +1,41 @@
 import pytest
-from fastapi.testclient import TestClient
 import os
 import tempfile
+from unittest.mock import patch, MagicMock, AsyncMock

 # Override DB path before importing app
-os.environ["ORCH_DB_PATH"] = os.path.join(tempfile.gettempdir(), "test_orchestrator.db")
+_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator.db")
+os.environ["ORCH_DB_PATH"] = _test_db
+os.environ["ORCH_PLANE_WEBHOOK_SECRET"] = ""
+os.environ["ORCH_GITEA_WEBHOOK_SECRET"] = ""
+os.environ["ORCH_REPOS_DIR"] = tempfile.gettempdir()
+os.environ["ORCH_HOST_REPOS_DIR"] = "/home/slin/repos"
+os.environ["ORCH_GITEA_TOKEN"] = "test-token"
+os.environ["ORCH_PLANE_API_TOKEN"] = "test-token"
+os.environ["ORCH_GITEA_OWNER"] = "admin"
+os.environ["ORCH_DEFAULT_REPO"] = "enduro-trails"
+# ORCH-6: register the test project so the project filter lets these fixtures
+# through. proj-1 maps to enduro-trails/ET, preserving the ET-001/ET-002 asserts.
+os.environ["ORCH_PROJECTS_JSON"] = (
+    '[{"plane_project_id": "proj-1", "repo": "enduro-trails", '
+    '"work_item_prefix": "ET", "name": "enduro-trails"}]'
+)

+from fastapi.testclient import TestClient
 from src.main import app
+from src.db import init_db, get_db
+
+
+@pytest.fixture(autouse=True)
+def setup_db():
+    """Ensure DB tables exist before each test."""
+    if os.path.exists(_test_db):
+        os.unlink(_test_db)
+    init_db()
+    yield
+    if os.path.exists(_test_db):
+        os.unlink(_test_db)
+

 client = TestClient(app)

@@ -18,7 +47,16 @@ def test_health():
    assert resp.json()["service"] == "orchestrator"


-def test_plane_webhook_accepts():
+def test_status_endpoint():
+    resp = client.get("/status")
+    assert resp.status_code == 200
+    assert "active_tasks" in resp.json()
+
+
+@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
+@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
+def test_plane_webhook_creates_task(mock_docs, mock_branch):
+    """work_item.created → task in DB with stage=analysis."""
    resp = client.post("/webhook/plane", json={
        "event": "work_item.created",
        "data": {"id": "test-123", "name": "Test task", "project": "proj-1"}
@@ -26,32 +64,208 @@ def test_plane_webhook_accepts():
    assert resp.status_code == 200
    assert resp.json()["status"] == "accepted"

+    # Verify task was created
+    conn = get_db()
+    task = conn.execute("SELECT * FROM tasks WHERE plane_id = 'test-123'").fetchone()
+    conn.close()
+    assert task is not None
+    assert task["stage"] == "analysis"
+    assert task["work_item_id"] is not None
+    assert "feature/" in task["branch"]

-def test_plane_webhook_comment():
+
+@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
+@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
+def test_plane_webhook_generates_sequential_ids(mock_docs, mock_branch):
+    """Multiple work items get sequential IDs."""
+    client.post("/webhook/plane", json={
+        "event": "work_item.created",
+        "data": {"id": "item-1", "name": "First task", "project": "proj-1"}
+    })
+    client.post("/webhook/plane", json={
+        "event": "work_item.created",
+        "data": {"id": "item-2", "name": "Second task", "project": "proj-1"}
+    })
+
+    conn = get_db()
+    tasks = conn.execute("SELECT work_item_id FROM tasks ORDER BY id").fetchall()
+    conn.close()
+    ids = [t["work_item_id"] for t in tasks]
+    assert ids[0] == "ET-001"
+    assert ids[1] == "ET-002"
+
+
+@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
+@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
+@patch("src.webhooks.plane.launcher")
+def test_plane_approved_advances_stage(mock_launcher, mock_docs, mock_branch, tmp_path, monkeypatch):
+    """Comment :approved: at stage=analysis → advance to architecture."""
+    # Patch repos_dir for QG check
+    monkeypatch.setattr("src.qg.checks.settings.repos_dir", str(tmp_path))
+
+    # Create task first
+    client.post("/webhook/plane", json={
+        "event": "work_item.created",
+        "data": {"id": "adv-001", "name": "Advance test", "project": "proj-1"}
+    })
+
+    # Get the task to find work_item_id
+    conn = get_db()
+    task = conn.execute("SELECT * FROM tasks WHERE plane_id = 'adv-001'").fetchone()
+    conn.close()
+    work_item_id = task["work_item_id"]
+
+    # Create required analysis files
+    wi_dir = tmp_path / "enduro-trails" / "docs" / "work-items" / work_item_id
+    wi_dir.mkdir(parents=True)
+    (wi_dir / "01-brd.md").write_text("# BRD")
+    (wi_dir / "02-trz.md").write_text("# TRZ")
+    (wi_dir / "03-acceptance-criteria.md").write_text("# AC")
+    (wi_dir / "04-test-plan.yaml").write_text("tests: []")
+
+    # Mock launcher
+    mock_launcher.launch.return_value = 1
+
+    # Send approved comment
    resp = client.post("/webhook/plane", json={
        "event": "comment.created",
-        "data": {"comment": "LGTM :approved:"}
+        "data": {
+            "work_item_id": "adv-001",
+            "comment": "Looks good :approved:"
+        }
    })
    assert resp.status_code == 200
-    assert resp.json()["status"] == "accepted"
+
+    # Verify stage advanced
+    conn = get_db()
+    task = conn.execute("SELECT * FROM tasks WHERE plane_id = 'adv-001'").fetchone()
+    conn.close()
+    assert task["stage"] == "architecture"
+
+
+@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
+@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
+def test_plane_rejected_rolls_back(mock_docs, mock_branch):
+    """Comment :rejected: rolls back stage."""
+    # Create task
+    client.post("/webhook/plane", json={
+        "event": "work_item.created",
+        "data": {"id": "rej-001", "name": "Reject test", "project": "proj-1"}
+    })
+
+    # Manually set stage to architecture
+    conn = get_db()
+    conn.execute("UPDATE tasks SET stage = 'architecture' WHERE plane_id = 'rej-001'")
+    conn.commit()
+    conn.close()
+
+    # Send rejected comment
+    resp = client.post("/webhook/plane", json={
+        "event": "comment.created",
+        "data": {
+            "work_item_id": "rej-001",
+            "comment": "Not ready :rejected:"
+        }
+    })
+    assert resp.status_code == 200
+
+    # Verify stage rolled back
+    conn = get_db()
+    task = conn.execute("SELECT * FROM tasks WHERE plane_id = 'rej-001'").fetchone()
+    conn.close()
+    assert task["stage"] == "analysis"


 def test_gitea_webhook_push():
+    """Push event is accepted."""
    resp = client.post(
        "/webhook/gitea",
-        json={"ref": "refs/heads/feature/test", "repository": {"name": "enduro-trails"}},
+        json={"ref": "refs/heads/feature/test", "repository": {"name": "enduro-trails"}, "commits": []},
        headers={"X-Gitea-Event": "push"}
    )
    assert resp.status_code == 200
    assert resp.json()["status"] == "accepted"


-def test_gitea_webhook_pr():
+@patch("src.webhooks.gitea.launcher")
+def test_gitea_push_with_adr_advances_stage(mock_launcher):
+    """Push with ADR files at architecture stage → advance to development."""
+    mock_launcher.launch.return_value = 1
+
+    # Create a task at architecture stage
+    conn = get_db()
+    conn.execute(
+        "INSERT INTO tasks (plane_id, work_item_id, repo, branch, stage) VALUES (?, ?, ?, ?, ?)",
+        ("push-001", "ET-010", "enduro-trails", "feature/ET-010-test", "architecture"),
+    )
+    conn.commit()
+    conn.close()
+
+    # Push with ADR file
    resp = client.post(
        "/webhook/gitea",
        json={
-            "action": "reviewed",
-            "pull_request": {"state": "approved", "number": 1}
+            "ref": "refs/heads/feature/ET-010-test",
+            "repository": {"name": "enduro-trails"},
+            "commits": [
+                {"added": ["docs/work-items/ET-010/06-adr/001-decision.md"], "modified": []}
+            ],
+        },
+        headers={"X-Gitea-Event": "push"}
+    )
+    assert resp.status_code == 200
+
+    # Verify stage advanced
+    conn = get_db()
+    task = conn.execute("SELECT * FROM tasks WHERE plane_id = 'push-001'").fetchone()
+    conn.close()
+    assert task["stage"] == "development"
+    mock_launcher.launch.assert_called_once()
+
+
+@patch("src.webhooks.gitea.check_ci_green")
+@patch("src.webhooks.gitea.launcher")
+def test_gitea_ci_success_advances_to_review(mock_launcher, mock_ci):
+    """CI success at development stage → advance to review."""
+    mock_ci.return_value = (True, "CI green")
+    mock_launcher.launch.return_value = 2
+
+    # Create a task at development stage
+    conn = get_db()
+    conn.execute(
+        "INSERT INTO tasks (plane_id, work_item_id, repo, branch, stage) VALUES (?, ?, ?, ?, ?)",
+        ("ci-001", "ET-011", "enduro-trails", "feature/ET-011-test", "development"),
+    )
+    conn.commit()
+    conn.close()
+
+    # CI status success
+    resp = client.post(
+        "/webhook/gitea",
+        json={
+            "state": "success",
+            "branches": [{"name": "feature/ET-011-test"}],
+            "repository": {"name": "enduro-trails"},
+        },
+        headers={"X-Gitea-Event": "status"}
+    )
+    assert resp.status_code == 200
+
+    # Verify stage advanced
+    conn = get_db()
+    task = conn.execute("SELECT * FROM tasks WHERE plane_id = 'ci-001'").fetchone()
+    conn.close()
+    assert task["stage"] == "review"
+
+
+def test_gitea_webhook_pr():
+    """PR event is accepted."""
+    resp = client.post(
+        "/webhook/gitea",
+        json={
+            "action": "opened",
+            "pull_request": {"head": {"ref": "feature/test"}, "number": 1},
+            "repository": {"name": "enduro-trails"},
        },
        headers={"X-Gitea-Event": "pull_request"}
    )
@@ -59,7 +273,17 @@ def test_gitea_webhook_pr():
    assert resp.json()["status"] == "accepted"


-def test_status_endpoint():
-    resp = client.get("/status")
-    assert resp.status_code == 200
-    assert "active_tasks" in resp.json()
+def test_plane_webhook_event_logged():
+    """Events are logged in the events table."""
+    client.post("/webhook/plane", json={
+        "event": "test.event",
+        "data": {"foo": "bar"}
+    })
+
+    conn = get_db()
+    event = conn.execute(
+        "SELECT * FROM events WHERE event_type = 'test.event'"
+    ).fetchone()
+    conn.close()
+    assert event is not None
+    assert event["source"] == "plane"
Author	SHA1	Message	Date
Dev Agent	c23f000c05	fix(preflight): check the binary the launcher actually spawns (ORCH-1) Container ORCH_CLAUDE_BIN pointed at a non-existent /usr/bin/claude while the launcher spawns the hardcoded /opt/claude-code/bin/claude.exe. Preflight now follows AgentLauncher.CLAUDE_BIN (the genuinely executed path), so it no longer falsely blocks every job in production.	2026-06-03 00:13:44 +03:00
Dev Agent	d0d47058b4	docs(resilience): document preflight/429/backoff/breaker + env vars (ORCH-1)	2026-06-03 00:12:17 +03:00
Dev Agent	a613fd8180	test(resilience): 34 tests for preflight/classifier/backoff/breaker (ORCH-1) Covers preflight FAIL->queued + cache, transient/permanent classifier + Retry-After, exp backoff + available_at gating, launcher transient vs permanent finalize, circuit breaker open/half-open/closed. test_queue worker tests stub preflight OK. Popen never spawned.	2026-06-03 00:12:17 +03:00
Dev Agent	f314ae09e5	feat(worker): preflight gate + circuit breaker + /queue resilience (ORCH-1) QueueWorker gates claims behind preflight and the CircuitBreaker (open -> pause, no CLI calls + Telegram alert; half-open probes one job; closed on recovery). Wires launcher.on_outcome. /queue exposes resilience snapshot.	2026-06-03 00:12:17 +03:00
Dev Agent	90fdd19394	feat(launcher): classify failures, backoff transient retry, breaker outcome (ORCH-1) _finalize_job classifies the run log: transient (429/overload) -> backoff requeue via mark_job_transient with separate transient_attempts budget honouring Retry-After; permanent -> normal attempts<max. on_outcome callback feeds the circuit breaker. _backoff_seconds = min(2^n*base, max) \| Retry-After.	2026-06-03 00:12:17 +03:00
Dev Agent	4ef87a3959	feat(resilience): cheap preflight + 429/transient error classifier (ORCH-1) preflight.py: cached CLAUDE_BIN exists + claude --version (no tokens, no prompt-ping). error_classifier.py: classify_log_file -> transient\|permanent from log tail + Retry-After parsing.	2026-06-03 00:12:17 +03:00
Dev Agent	0cd9b11fe0	feat(queue): resilience schema + backoff helper + config (ORCH-1) jobs.transient_attempts + available_at columns (idempotent _ensure_column migration); claim_next_job honours available_at; mark_job_transient (backoff requeue with separate transient budget). Config: preflight_cache_ttl, backoff_base/max_seconds, transient_max_attempts, breaker_threshold, breaker_pause_seconds.	2026-06-03 00:12:17 +03:00
Dev Agent	4be168c0ec	docs(queue): document job queue, /queue, env vars (ORCH-1) ARCHITECTURE job-queue section + flow diagram, README /queue endpoint and ORCH_MAX_CONCURRENCY/ORCH_QUEUE_POLL_INTERVAL, new docs/ORCH-1_JOB_QUEUE.md.	2026-06-02 23:58:44 +03:00
Dev Agent	2283b8898b	test(queue): 19 tests for job queue lifecycle/atomicity/retry/worker (ORCH-1) Covers enqueue->claim->mark, atomic claim (no double dispatch, 8-thread race), retry fail->queued->failed, requeue_running_jobs, observability, worker max_concurrency. Popen fully mocked (no real agent spawned).	2026-06-02 23:58:44 +03:00
Dev Agent	b6d4426a48	feat(worker): background queue worker + lifespan + queue-recovery + /queue (ORCH-1) queue_worker.QueueWorker drains the queue respecting max_concurrency. main.py lifespan: queue-recovery (requeue running jobs) after M-1 orphan-recovery, starts worker and stops it on shutdown. New GET /queue endpoint (counts + recent jobs).	2026-06-02 23:58:44 +03:00
Dev Agent	20d6556e22	refactor(webhooks): enqueue_job instead of in-process launch (ORCH-1) All 8 webhook launch points (plane x4, gitea x4) now enqueue a job and return immediately instead of synchronously spawning claude in the uvicorn process.	2026-06-02 23:58:44 +03:00
Dev Agent	3345c2fa0a	feat(launcher): launch_job + job-status finalize with retries (ORCH-1) Refactor launch() into shared _spawn(); add launch_job(job) that threads job_id through monitor/watchdog. _finalize_job marks done / requeue (attempts<max) / failed+notify. Internal advance-chain self.launch -> enqueue_job. B-1/B-2/M-1/ORCH-2 spawn logic unchanged.	2026-06-02 23:58:44 +03:00
Dev Agent	fd3dac7d22	feat(queue): add jobs table + queue helpers and config (ORCH-1) Persistent SQLite job queue (F-2b): jobs table + idx, atomic claim_next_job, enqueue/mark/count/requeue/get helpers. New settings max_concurrency (ORCH_MAX_CONCURRENCY) and queue_poll_interval (ORCH_QUEUE_POLL_INTERVAL).	2026-06-02 23:58:44 +03:00
Slava	b021ff7cb0	Merge pull request 'ORCH-6: multi-repo (project filter + repo/prefix per project)' (#2 ) from feature/ORCH-6-multirepo into main	2026-06-02 23:42:29 +03:00
Dev Agent	ca81f38330	docs: document multi-repo registry + ORCH-6 bugfix and incident ORCH-6: ARCHITECTURE.md gets a project-registry section; README explains how to add a project via ORCH_PROJECTS_JSON; BUGFIXES_2026-06-03.md records the fix and links the 2026-06-02 webhook autorun incident.	2026-06-02 22:30:51 +03:00
Dev Agent	c1f35a2047	test(projects,webhook): cover registry resolvers + project filter ORCH-6: test_projects.py covers resolvers and ORCH_PROJECTS_JSON parsing (valid/malformed/fallback). test_plane_webhook.py covers the webhook project filter via TestClient (unknown->ignored, orchestrator->orchestrator repo, enduro->enduro-trails, independent ORCH/ET prefixes); launcher mocked. test_webhooks.py: register proj-1 so existing ET fixtures pass.	2026-06-02 22:30:51 +03:00
Dev Agent	a6f6a43c1c	fix(webhooks/gitea): ignore pushes/events for repos outside the registry ORCH-6: get_project_by_repo None -> ignored, so events for unknown repos do not trigger the pipeline.	2026-06-02 22:30:42 +03:00
Dev Agent	171f4eb304	fix(webhooks/plane): filter by project + resolve repo/prefix from registry ORCH-6 / incident 2026-06-02: ignore work items from unknown Plane projects (status=ignored) instead of funneling everything into default_repo. Resolve repo, work-item prefix and Plane sync project from the registry by data.project.	2026-06-02 22:30:42 +03:00
Dev Agent	a87c633003	refactor(plane_sync): parameterize project_id (backward compatible) ORCH-6: sync functions resolve the issue PROJECT_ID via the registry (get_project_by_repo) and accept project_id; default stays enduro so existing ET callers keep working.	2026-06-02 22:30:42 +03:00
Dev Agent	0797f958dc	feat(db): per-project work-item prefix in get_next_work_item_id ORCH-6: get_next_work_item_id(repo, prefix="ET") numbers per (repo, prefix) so orchestrator issues number ORCH-001 independently of the ET sequence. Default prefix stays ET for backward compatibility.	2026-06-02 22:30:42 +03:00
Dev Agent	36d5f25f2a	feat(projects): add project registry (Plane id -> repo/prefix mapping) ORCH-6: src/projects.py introduces ProjectConfig + resolvers (get_project_by_plane_id/by_repo, known_plane_project_ids) keyed by Plane project uuid. Source: ORCH_PROJECTS_JSON env (config.projects_json), with a built-in default registry (enduro-trails + orchestrator) and robust parsing (malformed JSON/entries fall back to default).	2026-06-02 22:30:42 +03:00
Dev Agent	1ebe8afc23	feat(worktree): git worktree per task to isolate shared /repos (ORCH-2 / S-4) - add src/git_worktree.py: ensure/remove/get_worktree_path - config: worktrees_dir=/repos/_wt - launcher: agent runs in per-branch worktree; task-file + commit/push in worktree; no shared checkout - qg/checks: read artifacts + run make test from worktree (branch arg, backward-compatible) - webhooks/plane: pass branch into QG dispatch; review fallback from worktree - webhooks/gitea: keep read-only branch --contains in main clone (documented) - tests: test_git_worktree.py (isolation) + update test_launcher write-task-file - docs: ARCHITECTURE worktree section + BUGFIXES_2026-06-02_ORCH2 Preserves B-1/B-2/S-1/S-5 fixes (paths now point at worktree).	2026-06-02 21:12:06 +03:00
Dev Agent	66a37612fd	docs(bugfixes): add safe.directory, init:true findings and autonomy test result	2026-06-02 20:22:51 +03:00
Dev Agent	57cca14ed3	fix(compose): init:true (PID1 reaper) to reap claude grandchild zombies (B-2)	2026-06-02 20:20:33 +03:00
Dev Agent	5de8462a13	fix(docker): trust /repos for git (safe.directory) so launcher commit/push works	2026-06-02 20:18:44 +03:00
Dev Agent	553e0aae0c	docs: update QG table, task-file write, orphan recovery; add BUGFIXES_2026-06-02	2026-06-02 20:12:29 +03:00
Dev Agent	67b9f814b5	test(launcher): cover _write_task_file and reviewer verdict parsing (L-5)	2026-06-02 20:12:29 +03:00
Dev Agent	212352997e	fix(main): proper orphan recovery with per-run warning + notify (M-1)	2026-06-02 20:12:29 +03:00
Dev Agent	b585701c62	fix(webhooks): dispatch new QGs; stop false Gitea CI alerts (S-1) - plane._try_advance_stage handles check_tests_local + check_reviewer_verdict - gitea.handle_ci_status: failure -> debug log only (CI not authoritative)	2026-06-02 20:12:29 +03:00
Dev Agent	0924783be3	fix(qg): frontmatter-only reviewer verdict + local test gate (S-5, S-1) - check_reviewer_verdict reads verdict: from YAML frontmatter of 12-review.md only - add check_tests_local: orchestrator runs make test in /repos/<repo> - stages: development QG -> check_tests_local	2026-06-02 20:12:29 +03:00
Dev Agent	265a5ef1e6	fix(launcher): write task file to /repos without docker; stdout->file, no PIPE zombies (B-1, B-2) - _write_task_file writes directly to mounted /repos/<repo>, raises on failure - Popen stdout=log_fh at OS level; _monitor_agent simplified to proc.wait()+close - remove PIPE reader thread and startup-timeout (watchdog by pid stays) - dispatch check_tests_local args (repo, branch)	2026-06-02 20:12:29 +03:00
Dev Agent	f575f6bc6a	chore: save WIP changes before audit fixes - notifications: Telegram integration, richer stage/agent/QG notifications - plane_sync: explicit Plane state IDs, needs_input/in_review/blocked helpers, links in comments - launcher: deployer stage, model flag (opus), PR auto-create, REQUEST_CHANGES/tester/architect rollback+retry logic, partial check_reviewer_verdict path - qg/checks: add check_reviewer_verdict (substring-based, will be hardened in S-5) - stages: review->check_reviewer_verdict, testing->deployer agent - config: telegram_bot_token/chat_id settings	2026-06-02 19:57:43 +03:00
claude-bot	8715dd7148	feat(deploy): SSH key mount, deploy env vars, openssh-client in image	2026-06-01 20:03:27 +03:00
Dev Agent	e27e489157	fix(plane-webhook): read issue/comment_stripped fields from Plane comment payload	2026-06-01 19:17:14 +03:00
claude-bot	51f7364532	feat: integrate Analyst into Plane/Orchestrator pipeline - Add git fetch+checkout in agent launch cmd (ensures correct branch) - Add git fetch+checkout in _monitor_agent before commit/push - Post start comment in Plane when analyst launches - Post :approved: request comment after analyst completes successfully - Branch lookup moved before cmd construction for reuse	2026-05-31 20:15:01 +03:00
Dev Agent	81e0e383e0	feat(analysis): add check_analysis_approved QG with stakeholder approval requirement - stages.py: QG renamed to check_analysis_approved (requires :approved: comment) - qg/checks.py: new check_analysis_approved verifies files + Plane :approved: comment - launcher.py: skip auto-advance for analysis stage (requires human approval) - plane.py: route check_analysis_approved in _try_advance_stage - docs/ARCHITECTURE.md: updated QG table and flow description	2026-05-31 15:19:03 +03:00
Dev Agent	0f0b984656	docs: add pipeline design backlog (audit + backlog mgmt)	2026-05-23 09:17:41 +03:00
Dev Agent	267bc58fb2	docs: update README, add ARCHITECTURE.md with full system documentation	2026-05-22 14:09:24 +03:00
Dev Agent	0ad56e1f0a	fix: tini entrypoint, event routing wildcard, orphan recovery	2026-05-22 13:52:46 +03:00
Dev Agent	c326ef0ac4	docs: lessons learned ET-006 — problems and solutions	2026-05-22 13:45:40 +03:00
Dev Agent	b545665e2d	feat: full pipeline fixes - CI status branch lookup, review webhook routing, auto-advance, plane sync - handle_ci_status: fallback git branch -r --contains when branches[] empty - webhook router: handle pull_request_approved event type - handle_pr: map review.type to review.state for new Gitea format - launcher: auto-advance stage after agent completion (_try_advance_stage) - plane_sync: notify Plane on stage changes - stages.py: stage machine with QG definitions - notifications.py: stage change notifications - safe.directory fix for container git operations	2026-05-22 01:57:02 +03:00
Dev Agent	b428163c32	docs: bugfixes 2026-05-21 (5 fixes for CI status, review webhook, auto-advance)	2026-05-22 01:56:47 +03:00
Dev Agent	3116ae67bb	chore: clean up .gitignore, remove cached files from tracking	2026-05-19 15:58:45 +03:00
Dev Agent	95072e000f	fix: tests — add setup_db fixture for init_db in test env	2026-05-19 15:58:37 +03:00
Dev Agent	8859c38a2a	chore: add .gitignore, remove .env from tracking	2026-05-19 15:57:13 +03:00