Конвейер продвигается только входящими webhook; потерянное событие (502 на ребилде, отсутствие ретраев у Plane/Gitea, неразрезолвленный sha→branch) оставляет задачу молча застрявшей (класс инцидента ORCH-044). Новый фоновый daemon-поток src/reconciler.py (паттерн queue_worker) доигрывает пропущенный переход через те же штатные гейты/обработчики, что и webhook: - F-1 gate-side: для задач stage≠done, без активного job и age(updated_at) ≥ grace_for_stage(stage) — read-only пред-оценка канонического QG; зелёный → stage_engine.advance_stage(..., finished_agent=None); красный → тишина (спам нотификаций структурно невозможен). analysis F-1 не трогает (человеческий гейт). - F-2 plane-side: опрос Plane API per-project (plane_sync.list_issues_by_state, курсорная пагинация, never-raise) → реплей In Progress/Approved/Rejected через существующие handle_status_start/handle_verdict (async из sync-потока, asyncio.run). - F-3: усиление sha→branch в handle_ci_status — БД-fallback по единственной development-задаче repo (неоднозначность → не резолвим), debug→info. - Анти-дубль на создании (db.create_task_atomic под process-wide Lock): гонка reconcile↔webhook не плодит второй task/branch/worktree/analyst-job (AC-4). - F-4 observability: лог-строка разблокировки + Telegram + блок reconcile в /queue. Старт/стоп в main.lifespan (после worker.start() / перед worker.stop()), restart-safe, never-raise на единицу работы. Kill-switches ORCH_RECONCILE_ENABLED / ORCH_RECONCILE_PLANE_ENABLED + grace-настройки. Схема БД и реестры STAGE_TRANSITIONS/QG_CHECKS не менялись. Тесты: test_reconciler.py, test_reconciler_plane.py, test_gitea_sha_resolve.py, test_config.py (33 новых, 563 всего зелёные). Документация обновлена (golden source): architecture/README.md, INFRA.md, README.md, CHANGELOG.md, adr-0007 → accepted. Refs: ORCH-053 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
56 lines
2.9 KiB
Plaintext
56 lines
2.9 KiB
Plaintext
ORCH_PLANE_API_URL=http://plane-app-api-1:8000
|
|
# External (browser) web URL of Plane for clickable issue links in notifications
|
|
# (ORCH-017). Falls back to ORCH_PLANE_API_URL; a loopback fallback is treated as
|
|
# "no web URL" and the Plane link is omitted. Example: https://plane.example.org
|
|
ORCH_PLANE_WEB_URL=
|
|
ORCH_PLANE_API_TOKEN=
|
|
ORCH_PLANE_WORKSPACE_SLUG=
|
|
ORCH_PLANE_WEBHOOK_SECRET=
|
|
ORCH_GITEA_URL=http://localhost:3000
|
|
ORCH_GITEA_TOKEN=
|
|
ORCH_GITEA_WEBHOOK_SECRET=
|
|
ORCH_CLAUDE_BIN=/usr/bin/claude
|
|
ORCH_REPOS_DIR=/home/slin/repos
|
|
ORCH_DB_PATH=/app/data/orchestrator.db
|
|
# ORCH-042: live-tracker mode. edit (DEFAULT) -> the task card is edited in place
|
|
# (editMessageText). bump -> on every update the old card is deleted and a fresh
|
|
# one is sent silently to the BOTTOM of the chat (deleteMessage + sendMessage +
|
|
# repoint). One card per task in both modes. Any value other than "bump" -> edit.
|
|
ORCH_TRACKER_MODE=edit
|
|
# ORCH-043: merge-gate (auto-rebase onto current origin/main + re-test + merge-lock)
|
|
# on the deploy-staging -> deploy edge. Deterministic sub-gate (no LLM) that catches
|
|
# the branch up to the CURRENT origin/main, re-tests it, and serialises merges so two
|
|
# green parallel branches can't break main.
|
|
# ENABLED -> global kill-switch (false -> whole gate is a no-op pass).
|
|
# REPOS -> CSV of repos where the gate is REAL; empty -> only the self-hosting
|
|
# repo (orchestrator); other repos -> conditional no-op (mirrors ORCH-35).
|
|
# RETEST_TIMEOUT_S -> wall-clock budget for the post-rebase re-test.
|
|
# RETEST_TARGET -> pytest target for the re-test.
|
|
# LOCK_TIMEOUT_S -> max merge-lease age before a stale lease is reclaimed.
|
|
# DEFER_DELAY_S -> delay before re-running the gate when the lock is busy.
|
|
# DEFER_MAX_ATTEMPTS -> defer retries before escalation (avoids livelock).
|
|
ORCH_MERGE_GATE_ENABLED=true
|
|
ORCH_MERGE_GATE_REPOS=
|
|
ORCH_MERGE_RETEST_TIMEOUT_S=600
|
|
ORCH_MERGE_RETEST_TARGET=tests/
|
|
ORCH_MERGE_LOCK_TIMEOUT_S=300
|
|
ORCH_MERGE_DEFER_DELAY_S=60
|
|
ORCH_MERGE_DEFER_MAX_ATTEMPTS=5
|
|
|
|
# ORCH-053: stuck-task reconciler (sweeper for lost webhooks). A background daemon
|
|
# replays a missed stage transition through the SAME gates/handlers a webhook would,
|
|
# fixing tasks that got stuck on a dropped event (502 on rebuild, no Plane/Gitea
|
|
# retries, unresolved sha->branch).
|
|
# ENABLED -> global kill-switch (self-hosting safety / staged rollout).
|
|
# PLANE_ENABLED -> separate flag for the F-2 Plane-API poll (mute only F-2).
|
|
# INTERVAL_S -> background sweep period (seconds).
|
|
# GRACE_DEFAULT_S -> default "stuck" threshold on tasks.updated_at (seconds).
|
|
# GRACE_OVERRIDES_JSON -> per-stage thresholds, e.g. {"development":300}; bad JSON -> default.
|
|
# NOTIFY_UNBLOCK -> send a Telegram message when a stuck task is unblocked.
|
|
ORCH_RECONCILE_ENABLED=true
|
|
ORCH_RECONCILE_PLANE_ENABLED=true
|
|
ORCH_RECONCILE_INTERVAL_S=120
|
|
ORCH_RECONCILE_GRACE_DEFAULT_S=600
|
|
ORCH_RECONCILE_GRACE_OVERRIDES_JSON=
|
|
ORCH_RECONCILE_NOTIFY_UNBLOCK=true
|