Commit Graph

25 Commits

Author SHA1 Message Date
7d2d77217a feat(reconciler): sweeper потерянных webhook (реконсиляция застрявших стадий)
Конвейер продвигается только входящими webhook; потерянное событие (502 на
ребилде, отсутствие ретраев у Plane/Gitea, неразрезолвленный sha→branch)
оставляет задачу молча застрявшей (класс инцидента ORCH-044). Новый фоновый
daemon-поток src/reconciler.py (паттерн queue_worker) доигрывает пропущенный
переход через те же штатные гейты/обработчики, что и webhook:

- F-1 gate-side: для задач stage≠done, без активного job и age(updated_at) ≥
  grace_for_stage(stage) — read-only пред-оценка канонического QG; зелёный →
  stage_engine.advance_stage(..., finished_agent=None); красный → тишина (спам
  нотификаций структурно невозможен). analysis F-1 не трогает (человеческий гейт).
- F-2 plane-side: опрос Plane API per-project (plane_sync.list_issues_by_state,
  курсорная пагинация, never-raise) → реплей In Progress/Approved/Rejected через
  существующие handle_status_start/handle_verdict (async из sync-потока, asyncio.run).
- F-3: усиление sha→branch в handle_ci_status — БД-fallback по единственной
  development-задаче repo (неоднозначность → не резолвим), debug→info.
- Анти-дубль на создании (db.create_task_atomic под process-wide Lock): гонка
  reconcile↔webhook не плодит второй task/branch/worktree/analyst-job (AC-4).
- F-4 observability: лог-строка разблокировки + Telegram + блок reconcile в /queue.

Старт/стоп в main.lifespan (после worker.start() / перед worker.stop()),
restart-safe, never-raise на единицу работы. Kill-switches ORCH_RECONCILE_ENABLED
/ ORCH_RECONCILE_PLANE_ENABLED + grace-настройки. Схема БД и реестры
STAGE_TRANSITIONS/QG_CHECKS не менялись.

Тесты: test_reconciler.py, test_reconciler_plane.py, test_gitea_sha_resolve.py,
test_config.py (33 новых, 563 всего зелёные). Документация обновлена (golden source):
architecture/README.md, INFRA.md, README.md, CHANGELOG.md, adr-0007 → accepted.

Refs: ORCH-053

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-06 20:55:25 +00:00
Dev Agent
00325bcab0 fix(plane): resolve issue states per-project instead of hardcoded enduro UUIDs (ORCH-10)
All checks were successful
CI / test (push) Successful in 12s
CI / test (pull_request) Successful in 10s
ORCH-10 root cause: PLANE_STATES was a global dict hardcoding enduro-trails
UUIDs. The webhook comparison  only
matched ET UUID (b873d9eb) and silently ignored the ORCH in_progress UUID
(e331bfb3), blocking pipeline start for all orchestrator-project tasks.

Changes:
- src/plane_sync.py:
  * Rename PLANE_STATES -> _DEFAULT_STATES (enduro UUIDs kept as safe fallback).
  * PLANE_STATES preserved as alias to _DEFAULT_STATES (backward compat).
  * Add get_project_states(project_id) -> {logical_key: state_uuid}:
    fetches Plane API GET /projects/<id>/states/, maps by state name,
    caches per project_id, falls back to _DEFAULT_STATES on API failure.
  * Add _STATES_CACHE: dict, reload_project_states(project_id=None).
  * Add _PLANE_NAME_TO_KEY mapping and _STAGE_TO_STATE_KEY for clean lookup.
  * Add stage_to_state(stage, project_id) using get_project_states().
  * update_issue_state() uses stage_to_state() instead of STAGE_TO_STATE dict.
  * set_issue_{needs_input,in_review,blocked,done,in_progress,stage_state}()
    all resolve state UUID via get_project_states(project_id) instead of
    the global PLANE_STATES dict.

- src/webhooks/plane.py:
  * handle_issue_updated: import get_project_states, resolve proj_states per
    incoming project_id, compare new_state against proj_states["in_progress"],
    proj_states["approved"], proj_states["rejected"].
  * start_pipeline QG-0 blocked path: use get_project_states(plane_project_id)
    instead of PLANE_STATES["blocked"].

- tests/test_orch10_states.py: 23 new tests covering:
  * get_project_states returns correct UUIDs for both ET and ORCH projects.
  * API failure / empty response / None project_id -> _DEFAULT_STATES fallback.
  * Caching and reload_project_states (per-project and full flush).
  * stage_to_state() per-project resolution.
  * Webhook in_progress triggers pipeline for BOTH b873d9eb (ET) and e331bfb3 (ORCH).
  * Webhook approved/rejected routes correctly per project.
  * PLANE_STATES alias and _DEFAULT_STATES backward compat.
2026-06-05 14:23:31 +03:00
dev-bot
9a0298de9d feat(telegram): live editable task tracker (Variant B+), replace 15-message spam
Replace the ~15 separate Telegram messages per task (agent start/finish, stage
transition, QG-pending, tech noise) with ONE live tracker message edited in
place (editMessageText) on every stage transition. Only attention-worthy events
are still sent as SEPARATE, notifying messages: approve-gate, deploy-fail,
agent-fail, task error.

- db.py: idempotent ALTERs — tasks.tracker_message_id, tasks.title,
  tasks.brd_review_started_at/ended_at, agent_runs.model. Helpers for
  tracker message_id + BRD-review clock.
- usage.py: short_model_name() (strip provider/claude- prefix); parse model
  from result-JSON modelUsage; record_usage persists model.
- notifications.py: render_task_tracker(task_id) (stateless render from
  agent_runs), update_task_tracker (sendMessage->store id->editMessageText with
  fallback to a new message, silent), edit_telegram(). Per-stage line
  in↓/out↑·cost·model, ⏸️ Ревью БРД (human time), 💰 totals, finish block
  (⏱️ wall/agents/yours, 🔗 PR · 📦). notify_* are now tracker-only/log-only
  except the four alerts.
- stage_engine.py: stamp brd_review_ended on analysis->architecture advance.
- webhooks/plane.py: persist task title on creation.
- tests/test_telegram_tracker.py: render, short_model_name, send/edit/fallback,
  separate-vs-silent alert behavior.
2026-06-04 11:42:46 +03:00
dev-agent
96c5e6b2f9 fix(pipeline): fetch issue name from Plane API on status-trigger start
issue.updated ships only the changed fields, so name was absent and the branch slug became feature/<id>-untitled. Add fetch_issue_fields (single issue-detail GET returning name+description, reusing the endpoint/token of fetch_issue_description) and pull the name above the slug build. Empty name still falls back to untitled.
2026-06-03 22:42:53 +03:00
dev-agent
b91be74692 fix(pipeline): pass issue description to analyst task file
start_pipeline built the analyst .task.md with only the Title, so the analyst received a ~101-byte file and reported the business request as empty even though the description was already fetched. Append the resolved description to task_desc.
2026-06-03 22:42:02 +03:00
Dev Agent
857bad314c feat(webhook): pull reject reason from latest comment
handle_verdict(rejected): the reason is now pulled from the issue latest Plane
comment (_latest_comment_reason: GET comments, newest by created_at, HTML
stripped) instead of a fixed stub. Slava writes the reason in a comment before
flipping the status to Rejected. Falls back to a fixed note when there is no
comment / the API call fails.

tests: add test_status_only_verdict.py (test_inreview_comment_does_not_revert
[bug 3 root], test_any_comment_no_pipeline_action,
test_approved_status_advances_without_inprogress_reset,
test_rejected_status_pulls_reason_from_comment) and
test_inprogress_from_needs_input_relaunches_analyst in test_status_trigger.py.
Rewrote the comment-based tests (test_verdict_status, test_plane_approved/
rejected in test_webhooks) under the status-only model: comments are no-ops,
verdicts come from status changes.
2026-06-03 22:18:24 +03:00
Dev Agent
c4be50ee20 fix(webhook): drop redundant in_progress reset on Approved
handle_verdict(approved): removed set_issue_in_progress(work_item_id) before
_try_advance_stage. _try_advance_stage -> advance_stage -> plane_notify_stage
already PATCHes the issue to the NEXT stage status, so the reset only made the
board flicker In Progress before the next stage (part of bug 3).
2026-06-03 22:18:13 +03:00
Dev Agent
6b3e144949 fix(webhook): remove comment-based approve, keep status-only verdict
Status-only verdict model: comments NEVER drive the pipeline. Removed the
whole comment-based control mechanism from handle_comment (:approved: /
:rejected: / answer-to-questions) which caused bug 3 (echo self-hit): the
analyst posts its own "waiting for approval" comment, handle_comment catches
its own comment and reverts In Review -> In Progress. handle_comment is now a
pure logger with no side effects.

handle_status_start: a return to In Progress on an EXISTING task (Slava
answered the analyst questions in Needs Input) now RELAUNCHES the stage agent
instead of being a no-op. Distinguished from a duplicate In Progress webhook
via has_active_job_for_task() (new db helper): no active job => agent idle =>
relaunch; active job => busy => skip (no double launch).
2026-06-03 22:18:02 +03:00
Dev Agent
ac9f5a05a6 fix(work-item): prevent work_item_id collision and bind branch per task
ET-006 was handed to two different tasks because M-6 derives work_item_id from
the Plane sequence_id, which can collide -> the two tasks shared a branch/worktree
slug prefix and stepped on each other.

2a: ensure_unique_work_item_id() is a uniqueness-guard LAYERED ON TOP of the M-6
derive (derive is untouched): if the derived ET-NNN already exists in tasks for
the repo, it walks forward to the next free number. Applied in start_pipeline
after the derive.

2b (defense-in-depth): worktree is keyed by branch; if the resulting branch is
already owned by another task in the repo, disambiguate it with the unique
work_item_id + plane id so two tasks can never share a worktree.
2026-06-03 21:12:51 +03:00
Dev Agent
fa746105fd fix(webhook): fetch description from Plane API on status-start
Plane issue.updated (status -> In Progress) ships only changed fields, so the
webhook payload has no description and QG-0 wrongly blocked issues. start_pipeline
now pulls the full description from the Plane issue detail API (reusing the same
GET endpoint + shared token as fetch_issue_sequence_id) when the payload field is
empty/short, before QG-0 runs. Empty API -> honest QG-0 fail (truly empty ticket).
2026-06-03 21:12:38 +03:00
Dev Agent
09b1c5e1b9 feat(webhook): start pipeline on In Progress status (not on create)
Feature 1. work_item.created no longer starts the pipeline (soft QG-0 log only);
the issue stays in the backlog until moved to In Progress. The pipeline-start body
is extracted into start_pipeline(); a new issue updated handler routes a state
change to In Progress -> handle_status_start, which is idempotent: an existing task
for the plane_id is NOT re-created or restarted (protects handle_comment, which also
flips issues to In Progress). Real Plane payload: event=issue, action=updated,
data.state.id. Existing m6/plane_webhook/dedup tests updated to drive the new
trigger; new test_status_trigger.py covers created-no-op / start / idempotent.
2026-06-03 18:18:26 +03:00
Dev Agent
d305521067 feat(plane): per-agent bot authorship for comments
add_comment now accepts an optional author (agent role) and POSTs under the matching Plane bot token via _headers_for(), so Plane shows the real author (Analyst/Architect/Developer/Reviewer/Tester/Deployer/Stream) instead of a single shared account. Unknown/empty roles or missing tokens fall back to the shared orchestrator token (autonomy preserved). GET/PATCH (find_issue_id, set_state) are unchanged and stay on the shared token. Call sites in stage_engine, launcher, webhooks/plane and the plane_sync notify helpers now pass author by stage role; stage transitions use stream. Adds tests/test_plane_author.py.
2026-06-03 10:53:25 +03:00
Dev Agent
1d978caea7 feat(webhook): derive work_item_id from Plane sequence_id (M-6) 2026-06-03 10:02:15 +03:00
Dev Agent
e6a7c6de8d feat(webhook): dedup deliveries by delivery_id (M-7) 2026-06-03 09:18:02 +03:00
Dev Agent
51401a3ba9 refactor(launcher,plane): delegate stage advance to stage_engine
launcher._try_advance_stage and plane._try_advance_stage are now thin
wrappers over stage_engine.advance_stage. The plane webhook calls the sync
engine via asyncio.to_thread so there is exactly one implementation. The
launcher forwards finished_agent so the agent-specific rollback branches still
fire; the webhook passes None (human :approved:), matching prior behavior.

Also fixes the agent-selection bug in the launcher path: it used to enqueue
get_agent_for_stage(next_stage) (skipping a stage, e.g. analysis->architecture
launched developer instead of architect). The unified engine uses
get_agent_for_stage(current_stage), consistent with plane and gitea.
2026-06-03 08:56:25 +03:00
Dev Agent
20d6556e22 refactor(webhooks): enqueue_job instead of in-process launch (ORCH-1)
All 8 webhook launch points (plane x4, gitea x4) now enqueue a job and return
immediately instead of synchronously spawning claude in the uvicorn process.
2026-06-02 23:58:44 +03:00
Dev Agent
171f4eb304 fix(webhooks/plane): filter by project + resolve repo/prefix from registry
ORCH-6 / incident 2026-06-02: ignore work items from unknown Plane
projects (status=ignored) instead of funneling everything into
default_repo. Resolve repo, work-item prefix and Plane sync project from
the registry by data.project.
2026-06-02 22:30:42 +03:00
Dev Agent
1ebe8afc23 feat(worktree): git worktree per task to isolate shared /repos (ORCH-2 / S-4)
- add src/git_worktree.py: ensure/remove/get_worktree_path
- config: worktrees_dir=/repos/_wt
- launcher: agent runs in per-branch worktree; task-file + commit/push in worktree; no shared checkout
- qg/checks: read artifacts + run make test from worktree (branch arg, backward-compatible)
- webhooks/plane: pass branch into QG dispatch; review fallback from worktree
- webhooks/gitea: keep read-only branch --contains in main clone (documented)
- tests: test_git_worktree.py (isolation) + update test_launcher write-task-file
- docs: ARCHITECTURE worktree section + BUGFIXES_2026-06-02_ORCH2

Preserves B-1/B-2/S-1/S-5 fixes (paths now point at worktree).
2026-06-02 21:12:06 +03:00
Dev Agent
b585701c62 fix(webhooks): dispatch new QGs; stop false Gitea CI alerts (S-1)
- plane._try_advance_stage handles check_tests_local + check_reviewer_verdict
- gitea.handle_ci_status: failure -> debug log only (CI not authoritative)
2026-06-02 20:12:29 +03:00
Dev Agent
e27e489157 fix(plane-webhook): read issue/comment_stripped fields from Plane comment payload 2026-06-01 19:17:14 +03:00
51f7364532 feat: integrate Analyst into Plane/Orchestrator pipeline
- Add git fetch+checkout in agent launch cmd (ensures correct branch)
- Add git fetch+checkout in _monitor_agent before commit/push
- Post start comment in Plane when analyst launches
- Post :approved: request comment after analyst completes successfully
- Branch lookup moved before cmd construction for reuse
2026-05-31 20:15:01 +03:00
Dev Agent
81e0e383e0 feat(analysis): add check_analysis_approved QG with stakeholder approval requirement
- stages.py: QG renamed to check_analysis_approved (requires :approved: comment)
- qg/checks.py: new check_analysis_approved verifies files + Plane :approved: comment
- launcher.py: skip auto-advance for analysis stage (requires human approval)
- plane.py: route check_analysis_approved in _try_advance_stage
- docs/ARCHITECTURE.md: updated QG table and flow description
2026-05-31 15:19:03 +03:00
Dev Agent
0ad56e1f0a fix: tini entrypoint, event routing wildcard, orphan recovery 2026-05-22 13:52:46 +03:00
Dev Agent
b545665e2d feat: full pipeline fixes - CI status branch lookup, review webhook routing, auto-advance, plane sync
- handle_ci_status: fallback git branch -r --contains when branches[] empty
- webhook router: handle pull_request_approved event type
- handle_pr: map review.type to review.state for new Gitea format
- launcher: auto-advance stage after agent completion (_try_advance_stage)
- plane_sync: notify Plane on stage changes
- stages.py: stage machine with QG definitions
- notifications.py: stage change notifications
- safe.directory fix for container git operations
2026-05-22 01:57:02 +03:00
Dev Agent
daf8cdad9e feat: orchestrator MVP — webhooks, agent launcher, QG checks 2026-05-19 15:57:00 +03:00