Compare commits

..

89 Commits

Author SHA1 Message Date
Dev Agent
1baae81165 test: reset webhook secret per-test to fix cross-file isolation (CI green)
All checks were successful
CI / test (push) Successful in 10s
CI / test (pull_request) Successful in 10s
Adds autouse fixture _reset_webhook_secrets to tests/conftest.py that
resets the process-wide Pydantic settings singleton before every test:

1. gitea_webhook_secret / plane_webhook_secret → "" (HMAC disabled by
   default). Tests that deliberately test the 401 path
   (test_webhook_dedup.py:268,278) override this with their own monkeypatch
   which runs after autouse fixtures and wins for that test only.

2. db_path → os.environ["ORCH_DB_PATH"] (last written value after all test
   modules are imported). Without this, test_webhook_dedup.py (imported
   first alphabetically) seeds settings.db_path = dedup.db, while
   test_webhooks.py setup_db tries to remove test_orchestrator.db — leaving
   the DB dirty between tests that share a branch name and causing
   get_task_by_repo_branch() to return a stale row with the wrong stage.
   Per-test monkeypatches in test_webhook_dedup.setup_db still override it.

Root cause: both leaks come from the same singleton settings being read once
at import, before any per-test isolation runs. The autouse fixture is the
correct per-test reset point for process-wide singletons.

Result: pytest tests/ → 294 passed, 0 failed (was 10 failed/284 passed).
2026-06-05 00:00:01 +03:00
Dev Agent
e856e0940b test: migrate sequential_ids test to In Progress contract
Some checks failed
CI / test (push) Failing after 9s
CI / test (pull_request) Failing after 9s
2026-06-04 22:38:09 +03:00
Dev Agent
7bbab9c38b test: isolate webhook tests from live Plane API (fix CI)
Some checks failed
CI / test (push) Failing after 9s
CI / test (pull_request) Failing after 9s
2026-06-04 22:15:40 +03:00
a33a971c9c Merge pull request 'docs: Product Vision платформы (MD + PPTX)' (#25) from docs/product-vision into main 2026-06-04 17:37:36 +03:00
Стрим
d0c604bc66 docs: Product Vision платформы (MD + PPTX, 8 слайдов) 2026-06-04 17:37:16 +03:00
83f5020f94 Merge pull request 'fix(qg): gate testing->deploy on machine-readable test verdict, not substring (ET-013)' (#24) from fix/tests-machine-verdict into main 2026-06-04 16:08:10 +03:00
dev-agent
757745a221 fix(qg): gate testing->deploy on machine-readable test verdict, not substring (ET-013)
check_tests_passed did "if PASS in content" over the whole 13-test-report.md
body, so a report explicitly marked verdict: BLOCKED / status: blocked whose
prose mentioned "23 passed" / "PASS" / "All checks passed" passed the gate.
On ET-013 an unfinished feature (P1 AC-19 failed) reached Done.

Now mirrors check_reviewer_verdict (S-5) and check_deploy_status: read ONLY the
YAML frontmatter verdict:/status: fields. Positive tokens (PASS/PASSED/
READY-TO-DEPLOY/GREEN/APPROVED) -> True; negative tokens (BLOCKED/FAILED/...) are
authoritative -> False; missing/empty/no-frontmatter/bad-YAML -> False with reason;
file missing -> not found. Never raises.

Positive token set derived from REAL enduro-trails reports ET-001..ET-014
(inconsistent: PASS, ready-to-deploy+status:PASSED, stage:ready-to-deploy+status:pass,
PASS — ready-to-deploy). Validated: all 9 prior passing WIs stay True, ET-013 -> False.
2026-06-04 16:05:52 +03:00
34894f4684 Merge pull request 'fix(qg): find 14-deploy-log.md in origin/main when absent in feature worktree (false-FAILED deploy)' (#23) from fix/deploy-gate-log-path into main 2026-06-04 13:38:30 +03:00
dev-agent
4e4cc6c724 fix(qg): find 14-deploy-log.md in origin/main when absent in feature worktree
ET-013: deployer writes 14-deploy-log.md and merges deploy artifacts into
main via a separate PR, so the log lands in origin/main, not the feature
branch worktree that check_deploy_status reads via _repo_path(repo, branch).
Result: every successful deploy was falsely failed (Deploy log not found)
and rolled back deploy->development.

Fix: when the log is absent in the worktree, fall back to reading it from
origin/main on the shared clone (git fetch origin main + git show
origin/main:docs/work-items/<WI>/14-deploy-log.md). Lookup order:
worktree -> origin/main -> not found. Fetch/show failures degrade to
not found (never raise). Does not touch the merge-gate in gitea.py.

Tests: origin/main SUCCESS->PASS (ET-013 case), origin/main FAILED->FAILED,
absent everywhere->not found, fetch failure->degrades no exception,
worktree log short-circuits main lookup.
2026-06-04 13:35:35 +03:00
b222d7af27 Merge pull request 'fix(tracker): no duplicate Telegram messages on not-modified/transient edits' (#22) from fix/tracker-edit-not-modified into main 2026-06-04 13:22:46 +03:00
dev-bot
ec9aa74492 fix(tracker): no duplicate Telegram messages on not-modified/transient edits
edit_telegram now returns a distinguishable outcome (ok|not_modified|gone|
failed) instead of a bare bool. update_task_tracker only sends a NEW message
when the original is truly gone; not_modified and transient failures no longer
spawn duplicate trackers or orphan the live one.

render_task_tracker shows "попытка N" on an actively re-run stage (>=2 agent
runs) so the text changes between review<->development cycles. Finished ()
lines are unchanged.

Tests: edit_telegram classification (ok/not_modified/gone/failed via mocked
httpx), update_task_tracker (not_modified/failed -> no send, gone -> send+id),
render attempt marker.
2026-06-04 13:20:40 +03:00
3e5c74ce4f Merge pull request 'feat(telegram): live editable task tracker (Variant B+)' (#21) from feat/telegram-live-tracker into main 2026-06-04 11:46:21 +03:00
dev-bot
9a0298de9d feat(telegram): live editable task tracker (Variant B+), replace 15-message spam
Replace the ~15 separate Telegram messages per task (agent start/finish, stage
transition, QG-pending, tech noise) with ONE live tracker message edited in
place (editMessageText) on every stage transition. Only attention-worthy events
are still sent as SEPARATE, notifying messages: approve-gate, deploy-fail,
agent-fail, task error.

- db.py: idempotent ALTERs — tasks.tracker_message_id, tasks.title,
  tasks.brd_review_started_at/ended_at, agent_runs.model. Helpers for
  tracker message_id + BRD-review clock.
- usage.py: short_model_name() (strip provider/claude- prefix); parse model
  from result-JSON modelUsage; record_usage persists model.
- notifications.py: render_task_tracker(task_id) (stateless render from
  agent_runs), update_task_tracker (sendMessage->store id->editMessageText with
  fallback to a new message, silent), edit_telegram(). Per-stage line
  in↓/out↑·cost·model, ⏸️ Ревью БРД (human time), 💰 totals, finish block
  (⏱️ wall/agents/yours, 🔗 PR · 📦). notify_* are now tracker-only/log-only
  except the four alerts.
- stage_engine.py: stamp brd_review_ended on analysis->architecture advance.
- webhooks/plane.py: persist task title on creation.
- tests/test_telegram_tracker.py: render, short_model_name, send/edit/fallback,
  separate-vs-silent alert behavior.
2026-06-04 11:42:46 +03:00
2801983d7b Merge pull request 'fix(observability): merge-gate on deploy, full token input, Plane Done, artifact links' (#20) from fix/observability-and-merge-gate into main 2026-06-04 11:21:50 +03:00
Dev Agent
61e26a8930 fix(observability): merge-gate on deploy, full token input, Plane Done, artifact links
1. BUG 8 (second door): merge webhook no longer fake-completes a task at the
   deploy stage; done is gated by the deployer verdict (check_deploy_status).
   Other stages keep merge->done.
2. Token accounting: parse+persist cache_creation_input_tokens (new
   idempotent agent_runs column). usage_comment / task_summary now show the
   FULL input (input + cache_read + cache_creation) with a cached breakdown.
   cost_usd untouched.
3. deploy->done success now forces the Plane issue to terminal Done state.
4. All agents (architect/developer/reviewer/tester/deployer) attach artifact
   links to their finish comment via gitea_public_url.

Tests added for each fix; pytest 244 passed / 9 failed (off-limits HMAC group).
2026-06-04 11:17:58 +03:00
2629dffe1b Merge pull request 'fix(deploy): gate deploy->done on deployer verdict, not LLM exit code' (#19) from fix/deploy-verdict-gate into main 2026-06-04 02:46:52 +03:00
dev-agent
e4a9c48395 fix(deploy): gate deploy->done on deployer verdict, not LLM exit code 2026-06-04 02:43:01 +03:00
a0621b9952 Merge pull request 'fix(ci): bounce task back to developer on red CI (capped retries)' (#18) from fix/ci-fail-retry-developer into main 2026-06-04 01:41:01 +03:00
Dev Agent
3a285de11d fix(ci): bounce task back to developer on red CI (capped retries) 2026-06-04 01:39:40 +03:00
7922f6b67b Merge pull request 'fix(qg): use check_ci_green instead of local tests on development stage' (#17) from fix/drop-local-tests-qg into main 2026-06-04 01:24:14 +03:00
Dev Agent
e15d339b14 fix(qg): use check_ci_green instead of local tests on development stage 2026-06-04 01:22:43 +03:00
994f73a78e Merge pull request 'fix(qg): run pytest directly instead of make in check_tests_local' (#16) from fix/qg-pytest-no-make into main 2026-06-04 00:44:40 +03:00
orchestrator-dev
90c9ffe839 fix(qg): run pytest directly instead of make in check_tests_local 2026-06-04 00:43:04 +03:00
b6aa107f93 Merge pull request 'fix(stage): approved verdict advances analysis->architecture instead of re-running gate' (#15) from fix/approved-advances-stage into main 2026-06-03 23:31:45 +03:00
Dev Agent
0b8013cb06 fix(stage): approved verdict advances analysis->architecture instead of re-running gate 2026-06-03 23:30:08 +03:00
b01643fcc3 Merge pull request 'feat(config): external gitea_public_url for clickable doc links' (#14) from fix/gitea-public-url into main 2026-06-03 22:59:17 +03:00
Dev Agent
ca63bc26bb feat(config): external gitea_public_url for clickable doc links 2026-06-03 22:58:18 +03:00
dce9ac806b Merge pull request 'fix(pipeline): description+name to analyst, status-only analyst comment with doc links' (#13) from fix/taskmd-description into main 2026-06-03 22:45:17 +03:00
dev-agent
a9cdb17614 feat(plane): analyst comment asks for Approved status + links docs
The analyst ready-comment used the obsolete :approved: wording (comment-based approve was removed in PR #12). Rewrite it for the status-only model: ask the stakeholder to move the issue to Approved (reject = reason comment + Rejected), and add clickable Gitea links to the analyst docs that actually exist in the worktree.
2026-06-03 22:42:53 +03:00
dev-agent
96c5e6b2f9 fix(pipeline): fetch issue name from Plane API on status-trigger start
issue.updated ships only the changed fields, so name was absent and the branch slug became feature/<id>-untitled. Add fetch_issue_fields (single issue-detail GET returning name+description, reusing the endpoint/token of fetch_issue_description) and pull the name above the slug build. Empty name still falls back to untitled.
2026-06-03 22:42:53 +03:00
dev-agent
b91be74692 fix(pipeline): pass issue description to analyst task file
start_pipeline built the analyst .task.md with only the Title, so the analyst received a ~101-byte file and reported the business request as empty even though the description was already fetched. Append the resolved description to task_desc.
2026-06-03 22:42:02 +03:00
2d392b6fc7 Merge pull request 'fix: status-only verdict — remove comment-based approve + fix bug 3 (echo self-hit)' (#12) from fix/status-only-verdict into main 2026-06-03 22:20:46 +03:00
Dev Agent
857bad314c feat(webhook): pull reject reason from latest comment
handle_verdict(rejected): the reason is now pulled from the issue latest Plane
comment (_latest_comment_reason: GET comments, newest by created_at, HTML
stripped) instead of a fixed stub. Slava writes the reason in a comment before
flipping the status to Rejected. Falls back to a fixed note when there is no
comment / the API call fails.

tests: add test_status_only_verdict.py (test_inreview_comment_does_not_revert
[bug 3 root], test_any_comment_no_pipeline_action,
test_approved_status_advances_without_inprogress_reset,
test_rejected_status_pulls_reason_from_comment) and
test_inprogress_from_needs_input_relaunches_analyst in test_status_trigger.py.
Rewrote the comment-based tests (test_verdict_status, test_plane_approved/
rejected in test_webhooks) under the status-only model: comments are no-ops,
verdicts come from status changes.
2026-06-03 22:18:24 +03:00
Dev Agent
c4be50ee20 fix(webhook): drop redundant in_progress reset on Approved
handle_verdict(approved): removed set_issue_in_progress(work_item_id) before
_try_advance_stage. _try_advance_stage -> advance_stage -> plane_notify_stage
already PATCHes the issue to the NEXT stage status, so the reset only made the
board flicker In Progress before the next stage (part of bug 3).
2026-06-03 22:18:13 +03:00
Dev Agent
6b3e144949 fix(webhook): remove comment-based approve, keep status-only verdict
Status-only verdict model: comments NEVER drive the pipeline. Removed the
whole comment-based control mechanism from handle_comment (:approved: /
:rejected: / answer-to-questions) which caused bug 3 (echo self-hit): the
analyst posts its own "waiting for approval" comment, handle_comment catches
its own comment and reverts In Review -> In Progress. handle_comment is now a
pure logger with no side effects.

handle_status_start: a return to In Progress on an EXISTING task (Slava
answered the analyst questions in Needs Input) now RELAUNCHES the stage agent
instead of being a no-op. Distinguished from a duplicate In Progress webhook
via has_active_job_for_task() (new db helper): no active job => agent idle =>
relaunch; active job => busy => skip (no double launch).
2026-06-03 22:18:02 +03:00
cd73c75cda Merge pull request 'fix: pipeline-start bugs (ET-006) — fetch description on status-start + work_item_id collision guard' (#11) from fix/pipeline-start-bugs into main 2026-06-03 21:14:44 +03:00
Dev Agent
c69e11348b test(pipeline): cover status-start description fetch and work_item_id uniqueness
- test_status_start_fetches_description: empty payload description -> pulled from
  Plane API (mocked) -> QG-0 passes, analyst enqueued.
- test_status_start_empty_api_still_blocks: empty API -> honest QG-0 fail.
- test_work_item_id_uniqueness: ET-006 taken -> next free id, per-repo isolation.
- test_collision_reassigns_in_start_pipeline: end-to-end collision reassignment.
- test_worktree_per_task: two tasks never share a worktree path.
2026-06-03 21:12:59 +03:00
Dev Agent
ac9f5a05a6 fix(work-item): prevent work_item_id collision and bind branch per task
ET-006 was handed to two different tasks because M-6 derives work_item_id from
the Plane sequence_id, which can collide -> the two tasks shared a branch/worktree
slug prefix and stepped on each other.

2a: ensure_unique_work_item_id() is a uniqueness-guard LAYERED ON TOP of the M-6
derive (derive is untouched): if the derived ET-NNN already exists in tasks for
the repo, it walks forward to the next free number. Applied in start_pipeline
after the derive.

2b (defense-in-depth): worktree is keyed by branch; if the resulting branch is
already owned by another task in the repo, disambiguate it with the unique
work_item_id + plane id so two tasks can never share a worktree.
2026-06-03 21:12:51 +03:00
Dev Agent
fa746105fd fix(webhook): fetch description from Plane API on status-start
Plane issue.updated (status -> In Progress) ships only changed fields, so the
webhook payload has no description and QG-0 wrongly blocked issues. start_pipeline
now pulls the full description from the Plane issue detail API (reusing the same
GET endpoint + shared token as fetch_issue_sequence_id) when the payload field is
empty/short, before QG-0 runs. Empty API -> honest QG-0 fail (truly empty ticket).
2026-06-03 21:12:38 +03:00
4773137b52 Merge pull request 'feat: pipeline UX — status-trigger, verdict statuses, stage visibility, token usage' (#10) from feature/pipeline-ux into main 2026-06-03 18:27:07 +03:00
Dev Agent
7fd6529a35 test(conftest): mute Telegram in all tests to stop prod leakage
A pytest run on prod was sending REAL Telegram messages to Slava: some tests
(e.g. test_webhook_dedup advancing a stage) reach notify_stage_change ->
send_telegram, which read the live .env token/chat_id and actually POSTed.

Add an autouse fixture stubbing send_telegram to a no-op for every test. Patch
the SOURCE src.notifications.send_telegram (covers all notify_* helpers and the
many modules that do a local from .notifications import send_telegram inside
functions) AND src.stage_engine.send_telegram (module-level binding, would not be
intercepted by the source patch alone). webhooks/plane, launcher, queue_worker are
patched defensively with raising=False.

Verified: full suite run with FAKE telegram creds + an un-swallowable httpx.post
trip-wire (BaseException, so send_telegram except Exception can not hide it) shows
ZERO calls to api.telegram.org. Without the fixture the trip-wire fires, proving
the guard is real.
2026-06-03 18:23:09 +03:00
Dev Agent
9a702a0216 feat(metrics): per-agent token/cost accounting
Feature 4. claude is now launched with --output-format json; the run-log trailing
result JSON is parsed (defensively, never fatal) for usage + total_cost_usd. New
idempotent ALTERs add input_tokens/output_tokens/cache_read_tokens/cost_usd to
agent_runs; the launcher monitor records usage per run, posts a per-agent finish
comment under that agent bot (e.g. Developer gotov · 45.2k in / 12.1k out · $0.21),
and the deployer posts an end-of-task summary (SUM over agent_runs GROUP BY agent)
on done. New src/usage.py holds parse/format/record/summary helpers; test_usage.py
covers parsing a real CLI JSON blob, NULL-on-garbage, recording, formatting, and the
per-task aggregate.
2026-06-03 18:18:46 +03:00
Dev Agent
38a741d24e feat(webhook): verdict via Approved/Rejected statuses (variant B)
Feature 2. The issue updated dispatch (shipped with the status-trigger handler)
also routes Approved -> _try_advance_stage (== :approved: comment) and Rejected ->
_rollback_stage (== :rejected: comment). The :rejected: comment branch was
refactored into the shared _rollback_stage so both mechanisms behave identically;
a status reject passes Reason: (rejected via status, see latest comment) since no
inline reason arrives with a status change. Comments stay fully working. This
commit adds test_verdict_status.py proving both status and comment paths funnel
into the same advance/rollback logic.
2026-06-03 18:18:36 +03:00
Dev Agent
09b1c5e1b9 feat(webhook): start pipeline on In Progress status (not on create)
Feature 1. work_item.created no longer starts the pipeline (soft QG-0 log only);
the issue stays in the backlog until moved to In Progress. The pipeline-start body
is extracted into start_pipeline(); a new issue updated handler routes a state
change to In Progress -> handle_status_start, which is idempotent: an existing task
for the plane_id is NOT re-created or restarted (protects handle_comment, which also
flips issues to In Progress). Real Plane payload: event=issue, action=updated,
data.state.id. Existing m6/plane_webhook/dedup tests updated to drive the new
trigger; new test_status_trigger.py covers created-no-op / start / idempotent.
2026-06-03 18:18:26 +03:00
Dev Agent
a4668c0303 feat(plane): stage visibility on board + verdict status UUIDs
Feature 3 + Feature 2 infra. Extend the global PLANE_STATES with the 6 new
enduro status UUIDs (architecture/development/review/testing + approved/rejected),
remap STAGE_TO_STATE so the 4 mid-pipeline stages move the issue across its own
board column instead of all sitting in In Progress, and add the
set_issue_stage_state() helper. Needs Input / In Review / Blocked keep their own
explicit setters and stay higher priority. TODO(ORCH-10): statuses are per-project;
resolve per project when more projects are onboarded.
2026-06-03 18:18:17 +03:00
e9fd30528f Merge pull request 'feat(plane): per-agent bot authorship for comments' (#9) from feature/plane-per-agent-author into main 2026-06-03 10:55:29 +03:00
Dev Agent
d305521067 feat(plane): per-agent bot authorship for comments
add_comment now accepts an optional author (agent role) and POSTs under the matching Plane bot token via _headers_for(), so Plane shows the real author (Analyst/Architect/Developer/Reviewer/Tester/Deployer/Stream) instead of a single shared account. Unknown/empty roles or missing tokens fall back to the shared orchestrator token (autonomy preserved). GET/PATCH (find_issue_id, set_state) are unchanged and stay on the shared token. Call sites in stage_engine, launcher, webhooks/plane and the plane_sync notify helpers now pass author by stage role; stage transitions use stream. Adds tests/test_plane_author.py.
2026-06-03 10:53:25 +03:00
Dev Agent
30d6dd0557 feat(config): add per-agent Plane bot token settings
Add 7 optional bot-token fields (plane_bot_analyst..stream) read from the ORCH_PLANE_BOT_* env vars, default empty. Required for per-agent comment authorship; empty values fall back to the shared orchestrator token.
2026-06-03 10:53:17 +03:00
12e2691a24 Merge pull request 'M-6: derive work_item_id from Plane sequence_id' (#8) from feature/ORCH-M6-plane-sequence into main 2026-06-03 10:04:32 +03:00
Dev Agent
c431a3d055 fix(plane_sync): drop hardcoded ET- prefix in find_issue_id (M-6) 2026-06-03 10:02:15 +03:00
Dev Agent
1d978caea7 feat(webhook): derive work_item_id from Plane sequence_id (M-6) 2026-06-03 10:02:15 +03:00
be27f506e3 Merge pull request 'ORCH cleanup L-1/L-2/L-3: stages comment, prune run logs, emoji constants' (#7) from feature/ORCH-cleanup-L1L2L3 into main 2026-06-03 09:55:53 +03:00
Dev Agent
8f11971bfc refactor(plane_sync): extract emoji literals to constants (L-3) 2026-06-03 09:54:43 +03:00
Dev Agent
0653c2437f feat(launcher): prune old run logs (L-2) 2026-06-03 09:53:55 +03:00
Dev Agent
48b7707eb3 docs(stages): fix misleading STAGE_TRANSITIONS comment (L-1) 2026-06-03 09:51:46 +03:00
2fdc6856ba Merge pull request 'ORCH-5: webhook delivery dedup (M-7)' (#6) from feature/ORCH-5-webhook-dedup into main 2026-06-03 09:20:38 +03:00
Dev Agent
4ac449ff63 test(webhook): cover delivery dedup + migration safety (M-7) 2026-06-03 09:18:02 +03:00
Dev Agent
e6a7c6de8d feat(webhook): dedup deliveries by delivery_id (M-7) 2026-06-03 09:18:02 +03:00
Dev Agent
0b924208dc feat(db): add events.delivery_id + partial unique index (M-7) 2026-06-03 09:18:02 +03:00
2f0fd24670 Merge pull request 'ORCH-4: unified stage-engine (M-3)' (#5) from feature/ORCH-4-stage-engine into main 2026-06-03 08:59:51 +03:00
Dev Agent
6abdc220d2 test(stage): cover unified stage_engine + launcher/plane delegation
18 tests: happy-path advance per stage with correct agent (ORCH-4 fix),
QG-fail no-advance, reviewer REQUEST_CHANGES rollback+retry/alert, tester FAIL
rollback+retry/block, architect conflict rollback to analysis, analyst
approved-flow no-advance, and launcher+plane both delegating to the engine.
2026-06-03 08:56:25 +03:00
Dev Agent
51401a3ba9 refactor(launcher,plane): delegate stage advance to stage_engine
launcher._try_advance_stage and plane._try_advance_stage are now thin
wrappers over stage_engine.advance_stage. The plane webhook calls the sync
engine via asyncio.to_thread so there is exactly one implementation. The
launcher forwards finished_agent so the agent-specific rollback branches still
fire; the webhook passes None (human :approved:), matching prior behavior.

Also fixes the agent-selection bug in the launcher path: it used to enqueue
get_agent_for_stage(next_stage) (skipping a stage, e.g. analysis->architecture
launched developer instead of architect). The unified engine uses
get_agent_for_stage(current_stage), consistent with plane and gitea.
2026-06-03 08:56:25 +03:00
Dev Agent
0befc49b1e refactor(stage): extract unified stage_engine.advance_stage (M-3)
Merge the two diverged _try_advance_stage implementations (launcher sync +
plane async) into one synchronous engine. Preserves all launcher business
logic (analyst approved-flow, reviewer REQUEST_CHANGES rollback+retry, tester
FAIL rollback+retry, architect conflict rollback) and the plane
check_review_approved PR-by-branch dispatch. Unifies the QG signature
dispatch. Fixes agent selection: advancing FROM current_stage launches
get_agent_for_stage(current_stage), not next_stage.
2026-06-03 08:56:14 +03:00
fd554c8a5a Merge pull request 'ORCH-7: cleanup + hardening (M-4 dead code + M-2 graceful timeout)' (#4) from feature/ORCH-7-hardening into main 2026-06-03 08:31:26 +03:00
Dev Agent
c167c6930d test(launcher): watchdog graceful kill ordering + timeout config + M-4 removal
Cover M-2: SIGTERM-before-SIGKILL ordering, graceful exit within grace skips
SIGKILL, ProcessLookupError before SIGTERM is tolerated (no _record_kill), and
_resolve_timeout per-agent override / default / malformed-JSON fallback.
Cover M-4: _auto_merge_pr removed, _ensure_pr retained.
2026-06-03 08:28:09 +03:00
Dev Agent
49ecb48eb0 feat(launcher): graceful SIGTERM->SIGKILL + configurable agent timeout (M-2)
The watchdog used to time.sleep(timeout) then immediately SIGKILL, which cut
claude off mid-write and left half-written artifacts. It now sends SIGTERM,
polls os.kill(pid, 0) for up to agent_kill_grace_seconds, and only SIGKILL if
the process is still alive; ProcessLookupError is tolerated at every step.

Timeout is now configurable via config.py: agent_timeout_seconds (default 1800),
agent_kill_grace_seconds (default 20), and agent_timeout_overrides_json for
per-agent overrides (e.g. {"reviewer": 3600}). AGENT_TIMEOUT is kept as a
backward-compatible alias. The recorded exit_code stays -9 so the ORCH-1
monitor retry/fail logic is unchanged (timeout-kills classify as permanent and
requeue within max_attempts, no retry loop).
2026-06-03 08:28:03 +03:00
Dev Agent
237732bc64 refactor(launcher): remove dead _auto_merge_pr (M-4)
_auto_merge_pr had zero callers (merge is handled by the deployer agent).
Removed the method; _ensure_pr (still used by the auto-advance path) is kept.
2026-06-03 08:27:52 +03:00
4e52e192e4 Merge pull request 'ORCH-1 (F-2b): persistent job queue instead of in-process daemon threads' (#3) from feature/ORCH-1-job-queue into main 2026-06-03 08:09:23 +03:00
Dev Agent
c23f000c05 fix(preflight): check the binary the launcher actually spawns (ORCH-1)
Container ORCH_CLAUDE_BIN pointed at a non-existent /usr/bin/claude while the
launcher spawns the hardcoded /opt/claude-code/bin/claude.exe. Preflight now
follows AgentLauncher.CLAUDE_BIN (the genuinely executed path), so it no longer
falsely blocks every job in production.
2026-06-03 00:13:44 +03:00
Dev Agent
d0d47058b4 docs(resilience): document preflight/429/backoff/breaker + env vars (ORCH-1) 2026-06-03 00:12:17 +03:00
Dev Agent
a613fd8180 test(resilience): 34 tests for preflight/classifier/backoff/breaker (ORCH-1)
Covers preflight FAIL->queued + cache, transient/permanent classifier +
Retry-After, exp backoff + available_at gating, launcher transient vs permanent
finalize, circuit breaker open/half-open/closed. test_queue worker tests stub
preflight OK. Popen never spawned.
2026-06-03 00:12:17 +03:00
Dev Agent
f314ae09e5 feat(worker): preflight gate + circuit breaker + /queue resilience (ORCH-1)
QueueWorker gates claims behind preflight and the CircuitBreaker (open ->
pause, no CLI calls + Telegram alert; half-open probes one job; closed on
recovery). Wires launcher.on_outcome. /queue exposes resilience snapshot.
2026-06-03 00:12:17 +03:00
Dev Agent
90fdd19394 feat(launcher): classify failures, backoff transient retry, breaker outcome (ORCH-1)
_finalize_job classifies the run log: transient (429/overload) -> backoff
requeue via mark_job_transient with separate transient_attempts budget honouring
Retry-After; permanent -> normal attempts<max. on_outcome callback feeds the
circuit breaker. _backoff_seconds = min(2^n*base, max) | Retry-After.
2026-06-03 00:12:17 +03:00
Dev Agent
4ef87a3959 feat(resilience): cheap preflight + 429/transient error classifier (ORCH-1)
preflight.py: cached CLAUDE_BIN exists + claude --version (no tokens, no
prompt-ping). error_classifier.py: classify_log_file -> transient|permanent
from log tail + Retry-After parsing.
2026-06-03 00:12:17 +03:00
Dev Agent
0cd9b11fe0 feat(queue): resilience schema + backoff helper + config (ORCH-1)
jobs.transient_attempts + available_at columns (idempotent _ensure_column
migration); claim_next_job honours available_at; mark_job_transient (backoff
requeue with separate transient budget). Config: preflight_cache_ttl,
backoff_base/max_seconds, transient_max_attempts, breaker_threshold,
breaker_pause_seconds.
2026-06-03 00:12:17 +03:00
Dev Agent
4be168c0ec docs(queue): document job queue, /queue, env vars (ORCH-1)
ARCHITECTURE job-queue section + flow diagram, README /queue endpoint and
ORCH_MAX_CONCURRENCY/ORCH_QUEUE_POLL_INTERVAL, new docs/ORCH-1_JOB_QUEUE.md.
2026-06-02 23:58:44 +03:00
Dev Agent
2283b8898b test(queue): 19 tests for job queue lifecycle/atomicity/retry/worker (ORCH-1)
Covers enqueue->claim->mark, atomic claim (no double dispatch, 8-thread race),
retry fail->queued->failed, requeue_running_jobs, observability, worker
max_concurrency. Popen fully mocked (no real agent spawned).
2026-06-02 23:58:44 +03:00
Dev Agent
b6d4426a48 feat(worker): background queue worker + lifespan + queue-recovery + /queue (ORCH-1)
queue_worker.QueueWorker drains the queue respecting max_concurrency. main.py
lifespan: queue-recovery (requeue running jobs) after M-1 orphan-recovery, starts
worker and stops it on shutdown. New GET /queue endpoint (counts + recent jobs).
2026-06-02 23:58:44 +03:00
Dev Agent
20d6556e22 refactor(webhooks): enqueue_job instead of in-process launch (ORCH-1)
All 8 webhook launch points (plane x4, gitea x4) now enqueue a job and return
immediately instead of synchronously spawning claude in the uvicorn process.
2026-06-02 23:58:44 +03:00
Dev Agent
3345c2fa0a feat(launcher): launch_job + job-status finalize with retries (ORCH-1)
Refactor launch() into shared _spawn(); add launch_job(job) that threads job_id
through monitor/watchdog. _finalize_job marks done / requeue (attempts<max) /
failed+notify. Internal advance-chain self.launch -> enqueue_job. B-1/B-2/M-1/ORCH-2
spawn logic unchanged.
2026-06-02 23:58:44 +03:00
Dev Agent
fd3dac7d22 feat(queue): add jobs table + queue helpers and config (ORCH-1)
Persistent SQLite job queue (F-2b): jobs table + idx, atomic claim_next_job,
enqueue/mark/count/requeue/get helpers. New settings max_concurrency
(ORCH_MAX_CONCURRENCY) and queue_poll_interval (ORCH_QUEUE_POLL_INTERVAL).
2026-06-02 23:58:44 +03:00
b021ff7cb0 Merge pull request 'ORCH-6: multi-repo (project filter + repo/prefix per project)' (#2) from feature/ORCH-6-multirepo into main 2026-06-02 23:42:29 +03:00
Dev Agent
ca81f38330 docs: document multi-repo registry + ORCH-6 bugfix and incident
ORCH-6: ARCHITECTURE.md gets a project-registry section; README explains
how to add a project via ORCH_PROJECTS_JSON; BUGFIXES_2026-06-03.md
records the fix and links the 2026-06-02 webhook autorun incident.
2026-06-02 22:30:51 +03:00
Dev Agent
c1f35a2047 test(projects,webhook): cover registry resolvers + project filter
ORCH-6: test_projects.py covers resolvers and ORCH_PROJECTS_JSON parsing
(valid/malformed/fallback). test_plane_webhook.py covers the webhook
project filter via TestClient (unknown->ignored, orchestrator->orchestrator
repo, enduro->enduro-trails, independent ORCH/ET prefixes); launcher
mocked. test_webhooks.py: register proj-1 so existing ET fixtures pass.
2026-06-02 22:30:51 +03:00
Dev Agent
a6f6a43c1c fix(webhooks/gitea): ignore pushes/events for repos outside the registry
ORCH-6: get_project_by_repo None -> ignored, so events for unknown repos
do not trigger the pipeline.
2026-06-02 22:30:42 +03:00
Dev Agent
171f4eb304 fix(webhooks/plane): filter by project + resolve repo/prefix from registry
ORCH-6 / incident 2026-06-02: ignore work items from unknown Plane
projects (status=ignored) instead of funneling everything into
default_repo. Resolve repo, work-item prefix and Plane sync project from
the registry by data.project.
2026-06-02 22:30:42 +03:00
Dev Agent
a87c633003 refactor(plane_sync): parameterize project_id (backward compatible)
ORCH-6: sync functions resolve the issue PROJECT_ID via the registry
(get_project_by_repo) and accept project_id; default stays enduro so
existing ET callers keep working.
2026-06-02 22:30:42 +03:00
Dev Agent
0797f958dc feat(db): per-project work-item prefix in get_next_work_item_id
ORCH-6: get_next_work_item_id(repo, prefix="ET") numbers per (repo, prefix)
so orchestrator issues number ORCH-001 independently of the ET sequence.
Default prefix stays ET for backward compatibility.
2026-06-02 22:30:42 +03:00
Dev Agent
36d5f25f2a feat(projects): add project registry (Plane id -> repo/prefix mapping)
ORCH-6: src/projects.py introduces ProjectConfig + resolvers
(get_project_by_plane_id/by_repo, known_plane_project_ids) keyed by
Plane project uuid. Source: ORCH_PROJECTS_JSON env (config.projects_json),
with a built-in default registry (enduro-trails + orchestrator) and
robust parsing (malformed JSON/entries fall back to default).
2026-06-02 22:30:42 +03:00
47 changed files with 9628 additions and 623 deletions

22
.gitea/workflows/ci.yml Normal file
View File

@@ -0,0 +1,22 @@
name: CI
on:
push:
branches: ["feature/**", "bugfix/**", "hotfix/**", "fix/**", "ci/**"]
pull_request:
branches: [main]
jobs:
test:
runs-on: self-hosted
steps:
- uses: actions/checkout@v4
- name: Install dependencies
run: |
python3 -m pip install --user --upgrade pip
python3 -m pip install --user -r requirements.txt
- name: Test
env:
PYTHONPATH: ${{ github.workspace }}
run: |
export PATH="$HOME/.local/bin:$PATH"
python3 -m pytest tests/ -q

View File

@@ -39,6 +39,7 @@ created → analysis → architecture → development → review → testing →
|--------|------|----------|
| GET | `/health` | Health check |
| GET | `/status` | Активные задачи (stage != done) |
| GET | `/queue` | Очередь задач (ORCH-1): counts по статусам + max_concurrency + последние 10 jobs |
| POST | `/webhook/plane` | Plane webhook receiver |
| POST | `/webhook/gitea` | Gitea webhook receiver |
@@ -52,8 +53,9 @@ src/
├── stages.py # State machine (transitions, agents, QG)
├── notifications.py # Уведомления (логирование)
├── plane_sync.py # Синхронизация статусов с Plane API
├── queue_worker.py # ORCH-1: фоновый воркер очереди (claim → launch_job)
├── agents/
│ └── launcher.py # AgentLauncher: launch, monitor, watchdog, auto-advance
│ └── launcher.py # AgentLauncher: launch/launch_job, monitor, watchdog, auto-advance
├── webhooks/
│ ├── plane.py # Plane webhook handler
│ └── gitea.py # Gitea webhook handler (push, PR, CI status)
@@ -101,11 +103,80 @@ uvicorn src.main:app --reload --port 8500
| `ORCH_GITEA_TOKEN` | Gitea API token | — |
| `ORCH_GITEA_WEBHOOK_SECRET` | Gitea webhook secret | — |
| `ORCH_GITEA_OWNER` | Gitea repo owner | `admin` |
| `ORCH_DEFAULT_REPO` | Default repository | `enduro-trails` |
| `ORCH_DEFAULT_REPO` | Default repository (fallback) | `enduro-trails` |
| `ORCH_PROJECTS_JSON` | Multi-repo реестр (JSON-массив, ORCH-6) | `""` → дефолт в `src/projects.py` |
| `ORCH_CLAUDE_BIN` | Путь к Claude CLI | `/opt/claude-code/bin/claude.exe` |
| `ORCH_REPOS_DIR` | Repos dir (container) | `/repos` |
| `ORCH_HOST_REPOS_DIR` | Repos dir (host) | `/home/slin/repos` |
| `ORCH_DB_PATH` | SQLite path | `/app/data/orchestrator.db` |
| `ORCH_MAX_CONCURRENCY` | Сколько jobs воркер запускает параллельно (ORCH-1) | `1` |
| `ORCH_QUEUE_POLL_INTERVAL` | Период опроса очереди воркером, сек (ORCH-1) | `2.0` |
| `ORCH_PREFLIGHT_CACHE_TTL` | Кэш preflight (CLI/net), сек (ORCH-1 resilience) | `45` |
| `ORCH_BACKOFF_BASE_SECONDS` | База exp-backoff для transient (429) | `10` |
| `ORCH_BACKOFF_MAX_SECONDS` | Потолок backoff | `600` |
| `ORCH_TRANSIENT_MAX_ATTEMPTS` | Ретраи для 429/недоступности | `5` |
| `ORCH_BREAKER_THRESHOLD` | transient подряд до открытия breaker | `3` |
| `ORCH_BREAKER_PAUSE_SECONDS` | Пауза при открытом breaker | `300` |
## Очередь задач (ORCH-1 / F-2b)
Webhook-хэндлеры больше не спавнят claude-агентов синхронно в процессе uvicorn.
Вместо этого они кладут **job** в персистентную SQLite-таблицу `jobs`
(`enqueue_job`, мгновенный ответ), а фоновый воркер (`src/queue_worker.py`)
забирает jobs с учётом `ORCH_MAX_CONCURRENCY` и запускает агента (`launch_job`,
та же Popen-логика, что и раньше).
Преимущества:
- **Рестарт-safe.** При старте jobs со статусом `running` возвращаются в `queued`
(queue-recovery в lifespan) — работа не теряется.
- **Лимит параллелизма.** Воркер не превышает `ORCH_MAX_CONCURRENCY`.
- **Ретраи.** Упавший job (exit≠0) ретраится пока `attempts < max_attempts`,
потом `failed` + Telegram-нотификация.
Статусы job: `queued → running → done | failed`. Наблюдаемость — через `GET /queue`.
**Resilience-слой:** дешёвый preflight (CLI/net, кэш, без токенов) гейтит claim;
429/overload детектится по логу (transient vs permanent), transient ретраится с
exp-backoff (`available_at`, Retry-After); circuit breaker паузит воркер после N
transient подряд. Подробности: `docs/ORCH-1_JOB_QUEUE.md`.
## Multi-repo: реестр проектов (ORCH-6)
Оркестратор обслуживает несколько репозиториев через реестр проектов
(`src/projects.py`), ключ = **Plane project id**. Plane-webhook фильтрует события
по проекту (неизвестный проект → `ignored`) и резолвит `repo` / `work_item_prefix` /
Plane-проект из маппинга.
По умолчанию (если `ORCH_PROJECTS_JSON` пуст) зарегистрированы два проекта:
| Проект | Plane project id | repo | prefix |
|--------|------------------|------|--------|
| enduro-trails | `7a79f0a9-5278-49cd-9007-9a338f238f9c` | `enduro-trails` | `ET` |
| orchestrator | `8da6aa25-a60e-44d6-a1e2-d8ae59aa7d6a` | `orchestrator` | `ORCH` |
### Как добавить новый проект
1. Убедись, что gitea-репо уже клонировано в `/repos/<repo>` (авто-clone — отдельно).
2. Узнай Plane project uuid (из URL проекта в Plane или через Plane API).
3. Добавь запись в `ORCH_PROJECTS_JSON` в `.env` (JSON-массив). **Важно:** если
задаёшь `ORCH_PROJECTS_JSON`, он полностью заменяет дефолт — перечисли **все**
нужные проекты (включая enduro-trails и orchestrator):
```bash
ORCH_PROJECTS_JSON='[
{"plane_project_id":"7a79f0a9-5278-49cd-9007-9a338f238f9c","repo":"enduro-trails","work_item_prefix":"ET","name":"enduro-trails"},
{"plane_project_id":"8da6aa25-a60e-44d6-a1e2-d8ae59aa7d6a","repo":"orchestrator","work_item_prefix":"ORCH","name":"orchestrator"},
{"plane_project_id":"<новый-uuid>","repo":"<новый-repo>","work_item_prefix":"<PREFIX>","name":"<имя>"}
]'
```
4. Пересобери: `docker compose up -d --build`.
5. Проверь резолв:
```bash
docker exec orchestrator python3 -c "from src.projects import get_project_by_plane_id as g; print(g('<новый-uuid>'))"
```
Поля `name` опционально (по умолчанию = `repo`). Подробности — `docs/ARCHITECTURE.md`.
## Ключевые механизмы

View File

@@ -9,9 +9,39 @@ Orchestrator — event-driven FastAPI сервис, который управл
### 1. Webhook Receivers
#### Plane Webhook (`src/webhooks/plane.py`)
- Принимает `work_item.created` — создаёт задачу в DB, запускает analyst
- **Фильтр по проекту (ORCH-6):** извлекает `data.project` (Plane project uuid) и игнорирует событие, если проект не в реестре (`known_plane_project_ids()`) → ответ `{"status":"ignored","reason":"unknown project"}`. Это предотвращает инцидент 2026-06-02 (webhook на весь workspace без фильтра).
- Принимает `work_item.created` — резолвит repo/prefix/Plane-проект из реестра по `project`, создаёт задачу в DB, запускает analyst
- Принимает `work_item.updated` — синхронизация статусов
#### Реестр проектов (`src/projects.py`, multi-repo, ORCH-6)
Маппинг **Plane project id → (repo, work_item_prefix, name)**. Позволяет одному
оркестратору обслуживать несколько репозиториев, не путая их.
```python
@dataclass(frozen=True)
class ProjectConfig:
plane_project_id: str # uuid Plane-проекта (ключ реестра)
repo: str # имя gitea-репо (= папка в /repos)
work_item_prefix: str # ET / ORCH
name: str # человекочитаемое
```
Резолверы:
- `get_project_by_plane_id(uuid) -> ProjectConfig | None` — для фильтра/резолва в plane-webhook.
- `get_project_by_repo(repo) -> ProjectConfig | None` — когда известен только repo (gitea-webhook, plane_sync).
- `known_plane_project_ids() -> set[str]` — множество разрешённых проектов (фильтр).
**Источник конфигурации:** env `ORCH_PROJECTS_JSON` (JSON-массив `ProjectConfig`).
Если пусто/битый JSON — используется встроенный дефолт-реестр (enduro-trails + orchestrator),
чтобы система работала из коробки. Парсинг устойчив: битые записи пропускаются,
полностью невалидный JSON → fallback на дефолт.
Следствия multi-repo:
- **repo per project:** `repo = get_project_by_plane_id(project_id).repo` вместо хардкода `default_repo`.
- **prefix per project:** `get_next_work_item_id(repo, prefix)` нумерует независимо — `ORCH-001` vs `ET-010` (`src/db.py`).
- **plane_sync в правильный проект:** state/comment пишутся в Plane-проект самой задачи (резолв по repo через `get_project_by_repo`), а не в единственный хардкоженный `PROJECT_ID` (обратная совместимость сохранена дефолтом на enduro).
- **gitea-webhook:** push в repo вне реестра → `ignored` (не триггерит конвейер).
#### Gitea Webhook (`src/webhooks/gitea.py`)
- **push** — проверяет наличие артефактов (docs/, src/), продвигает стадию
- **pull_request\*** (wildcard) — обрабатывает review approved/rejected, PR merge
@@ -234,9 +264,71 @@ services:
- ~~Shared `/repos` checkout (гонки при параллельных задачах).~~ **РЕШЕНО (ORCH-2 / S-4):**
git worktree per task/branch — см. раздел «Изоляция через git worktree» ниже.
- **In-process daemon-потоки.** Агенты запускаются в daemon-потоках uvicorn. При
рестарте uvicorn запущенные агенты осиротевают → ловит orphan-recovery (M-1).
Целевая архитектура — очередь задач (F-2b, отдельно).
- ~~In-process daemon-потоки (рестарт → сироты, потеря работы).~~ **РЕШЕНО (ORCH-1 / F-2b):**
персистентная очередь jobs + фоновый воркер — см. раздел «Очередь задач (ORCH-1)» ниже.
Daemon-потоки monitor/watchdog остаются для одного запущенного агента, но при
рестарте его job возвращается в `queued` (queue-recovery) и переподхватывается.
## Очередь задач (ORCH-1 / F-2b)
Раньше webhook-хэндлер **синхронно** спавнил `subprocess.Popen` + 2 daemon-thread
прямо в процессе uvicorn (8 точек вызова). Рестарт = сироты + потеря работы,
нет лимита параллелизма, нет ретраев.
### Flow
```
webhook (plane/gitea) background thread (queue_worker)
│ │
enqueue_job() ---> [ jobs table ] <--- claim_next_job() (atomic queued->running)
(мгновенный status=queued │
ответ 200) launch_job(job)
AgentLauncher._spawn (Popen claude)
_monitor_agent (proc.wait, commit/push,
│ advance stage)
_finalize_job:
exit 0 -> mark_job done
exit !=0 & attempts<max -> requeue (queued)
exit !=0 & attempts>=max -> failed + Telegram
```
### Таблица `jobs`
| Колонка | Назначение |
|--------|------------|
| `status` | `queued``running``done` \| `failed` |
| `attempts` / `max_attempts` | счётчик попыток (инкремент при claim) / лимит ретраев (default 2) |
| `run_id` | FK на `agent_runs.id` после старта |
| `task_content` | ТЗ, которое пишется в task-файл агента |
| `error` | последняя ошибка |
`idx_jobs_status (status, id)` — быстрый FIFO-выбор queued.
### Атомарный claim
`claim_next_job()` делает `SELECT queued ORDER BY id LIMIT 1``UPDATE ... WHERE id=? AND
status='queued'` и проверяет `rowcount`. При гонке двух тиков лишь один UPDATE
переведёт строку в `running` (rowcount==1); проигравший берёт следующий job.
### Queue-recovery (рестарт-safe)
В `main.py` lifespan **после** M-1 orphan-recovery вызывается `requeue_running_jobs()`:
jobs со статусом `running` (воркер умёр на рестарте) → возвращаются в `queued`.
Потом стартует воркер; на shutdown — `worker.stop()` (Event.set + join).
### Конфиг
- `ORCH_MAX_CONCURRENCY` (default 1) — лимит параллельных jobs.
- `ORCH_QUEUE_POLL_INTERVAL` (default 2.0) — период опроса.
Наблюдаемость: `GET /queue` — counts по статусам + последние 10 jobs.
> Совместимость: `launcher.launch()` (прямой синхронный запуск, `job_id=None`)
> сохранён для обратной совместимости. Очередь использует `launch_job()`;
> оба разделяют `_spawn()` (Popen-логика B-2 не изменена).
- **Gitea CI не настроен.** QG развития теперь локальный (`check_tests_local`);
Gitea CI-статусы не являются authoritative и не блокируют pipeline.
- **Docker внутри контейнера orchestrator НЕДОСТУПЕН.** Деплой идёт только через

View File

@@ -0,0 +1,82 @@
# BUGFIXES / CHANGES — 2026-06-03
## ORCH-6 — Multi-repo: фильтр проекта + маппинг repo per project
**Тип:** root-fix инцидента + новая возможность (multi-repo)
**Ветка:** `feature/ORCH-6-multirepo`
**Plane:** ORCH-6 (project `8da6aa25-a60e-44d6-a1e2-d8ae59aa7d6a`)
**Связанный инцидент:** [`INCIDENT_2026-06-02_webhook_autorun.txt`](./INCIDENT_2026-06-02_webhook_autorun.txt)
### Контекст инцидента
При создании задач ORCH-1..7 в Plane (проект `orchestrator`) Plane-webhook
(id `93f0c342-a614-4248-9d0f-c107276f5620`) сработал на каждую задачу и запустил
конвейер — но **всё ушло в репо `enduro-trails`**, потому что `plane.py:91`
хардкодил `repo = settings.default_repo`. Webhook слушал **весь workspace без
фильтра по проекту**, наплодив мусорные ET-010..016.
Митигация на время фикса: Plane-webhook **деактивирован** (`is_active=false`).
### Root cause
1. Нет фильтра по Plane-проекту — любая issue из любого проекта попадала в конвейер.
2. `repo` хардкожен на единственный `default_repo` (enduro-trails).
3. `work_item_prefix` всегда `ET` (db.py).
4. `plane_sync` ходил в единственный хардкоженный `PROJECT_ID` (enduro).
### Что сделано
| Файл | Изменение |
|------|-----------|
| `src/projects.py` (новый) | Реестр проектов: `ProjectConfig` + дефолт-список (enduro-trails + orchestrator) + резолверы `get_project_by_plane_id` / `get_project_by_repo` / `known_plane_project_ids`. Источник переопределения — `ORCH_PROJECTS_JSON`; устойчивый парсинг (битый JSON / битые записи → fallback на дефолт). |
| `src/config.py` | Добавлен `projects_json: str = ""` (env `ORCH_PROJECTS_JSON`). |
| `src/webhooks/plane.py` | **Фильтр по проекту**: `data.project` не в реестре → `{"status":"ignored","reason":"unknown project"}`. Резолв `repo`/`prefix`/Plane-проекта из реестра. Plane-sync для задачи идёт в её собственный проект. |
| `src/db.py` | `get_next_work_item_id(repo, prefix="ET")` — нумерация per (repo, prefix); `ORCH-001` независимо от `ET-010`. Дефолт `ET` сохранён для обратной совместимости. |
| `src/plane_sync.py` | `_resolve_project_id` + параметризация `project_id` (дефолт на enduro → обратная совместимость существующих вызовов). |
| `src/webhooks/gitea.py` | Неизвестный repo (`get_project_by_repo` → None) → `ignored` в 3 хэндлерах. |
### Тесты
- `tests/test_projects.py` (16 тестов): резолверы (by plane_id, by repo, unknown→None,
known_plane_project_ids), парсинг `ORCH_PROJECTS_JSON` (валидный / битый JSON / не массив /
битые записи → skip / all-bad → fallback), reload с кастомным JSON.
- `tests/test_plane_webhook.py` (4 теста, FastAPI TestClient, `launcher.launch` замокан):
unknown project → `ignored` + нет task/branch/agent; orchestrator-проект → `repo=orchestrator`,
`ORCH-*`; enduro-проект → `repo=enduro-trails`, `ET-*`; независимые префиксы (`ORCH-001`/`ORCH-002`
параллельно с `ET-001`).
**Прогон (в контейнере, образ `orchestrator-orchestrator`):** `57 passed`. 9 падений в
`tests/test_webhooks.py`**pre-existing** (webhook signature 401 / TypeError, не связаны с ORCH-6,
не трогались).
```bash
IMG=$(docker inspect orchestrator --format '{{.Config.Image}}')
docker run --rm -v /home/slin/repos/orchestrator:/code -w /code --entrypoint python3 $IMG -m pytest tests/ -q
```
### Проверка резолва (offline, в работающем контейнере)
```bash
docker exec orchestrator python3 -c "
from src.projects import get_project_by_plane_id, known_plane_project_ids
o = get_project_by_plane_id('8da6aa25-a60e-44d6-a1e2-d8ae59aa7d6a')
e = get_project_by_plane_id('7a79f0a9-5278-49cd-9007-9a338f238f9c')
assert o.repo=='orchestrator' and o.work_item_prefix=='ORCH'
assert e.repo=='enduro-trails' and e.work_item_prefix=='ET'
assert get_project_by_plane_id('00000000-0000-0000-0000-000000000000') is None
print('RESOLVE OK:', o.repo, e.repo, '| known:', len(known_plane_project_ids()))
"
```
### ⚠️ Важно
- Plane-webhook **остаётся выключенным** (`is_active=false`). Включение — отдельный
шаг Стрим после ревью PR.
- `ORCH_PROJECTS_JSON` (если задан) **полностью заменяет** дефолт — перечислять все нужные проекты.
- Обратная совместимость `plane_sync` сохранена (дефолт project_id = enduro), ET-задачи не сломаны.
### Re-enable webhook (после ревью, делает Стрим)
```sql
UPDATE webhooks SET is_active=true WHERE id='93f0c342-a614-4248-9d0f-c107276f5620';
```

View File

@@ -0,0 +1,7 @@
INCIDENT 2026-06-02: Plane webhook auto-triggered pipeline for ALL ORCH-1..7 tasks
- Plane webhook (id 93f0c342) fires on ANY issue creation in workspace, no project filter
- plane.py:91 hardcodes repo=settings.default_repo (enduro-trails)
- Result: ORCH-x tasks ran analyst/architect in WRONG repo (enduro-trails), created junk ET-010..016
- MITIGATION: Plane webhook DEACTIVATED (is_active=false) until ORCH-6 adds project filter
- ROOT FIX = ORCH-6 (multi-repo): filter by plane_project_id + repo mapping per project
- To re-enable webhook after ORCH-6: UPDATE webhooks SET is_active=true WHERE id=93f0c342...

127
docs/ORCH-1_JOB_QUEUE.md Normal file
View File

@@ -0,0 +1,127 @@
# ORCH-1 (F-2b): Persistent Job Queue
**Дата:** 2026-06-02
**Ветка:** `feature/ORCH-1-job-queue`
**Источник:** AUDIT_2026-06-02 (B-2 / F-2b)
## Проблема
Агенты запускались **in-process**: `launcher.launch()` синхронно спавнил
`subprocess.Popen` + 2 daemon-thread (`_watchdog`, `_monitor_agent`) прямо в
процессе uvicorn, из **8 webhook-точек**. Последствия:
- **Рестарт = катастрофа.** daemon-threads умирают, claude-процессы → сироты,
работа теряется (M-1 лишь помечал `exit=-1` и звал человека).
- **Нет лимита параллелизма** — N webhook'ов = N одновременных claude.
- **Нет ретраев** — упавший агент просто мёртв.
## Решение
Персистентная очередь задач (SQLite-таблица `jobs`) + фоновый воркер:
1. Webhook-хэндлер кладёт job (`enqueue_job`) → мгновенный ответ 200.
2. Фоновый воркер (`src/queue_worker.py`, отдельный daemon-thread) забирает
jobs с учётом `max_concurrency` (`claim_next_job`, атомарно) и спавнит агента
(`launcher.launch_job`, та же Popen-логика).
3. По завершении `_monitor_agent``_finalize_job`:
- `exit 0``done`;
- `exit != 0` & `attempts < max_attempts` → requeue (`queued`);
- `exit != 0` & `attempts >= max_attempts``failed` + Telegram.
## Что изменено
| Файл | Изменение |
|------|-----------|
| `src/db.py` | Таблица `jobs` + индекс; хелперы `enqueue_job`, `claim_next_job` (атомарный), `mark_job`, `count_running_jobs`, `requeue_running_jobs`, `get_job`, `job_status_counts`, `recent_jobs` |
| `src/config.py` | `max_concurrency` (env `ORCH_MAX_CONCURRENCY`, default 1), `queue_poll_interval` (env `ORCH_QUEUE_POLL_INTERVAL`, default 2.0) |
| `src/agents/launcher.py` | `launch()` → тонкая обёртка над `_spawn()`; новый `launch_job(job)`; `_spawn()` (общий, `job_id` опционально); monitor/watchdog принимают `job_id`; новый `_finalize_job()` (статусы + ретраи). 4 внутренних advance-вызова `self.launch``enqueue_job` |
| `src/webhooks/plane.py` | 4 точки `launcher.launch``enqueue_job` |
| `src/webhooks/gitea.py` | 4 точки `launcher.launch``enqueue_job` |
| `src/queue_worker.py` | **НОВЫЙ**`QueueWorker` (drain loop + max_concurrency + graceful stop) |
| `src/main.py` | lifespan: queue-recovery (`requeue_running_jobs`) после M-1, старт/останов воркера; новый `GET /queue` |
| `tests/test_queue.py` | **НОВЫЙ** — 19 тестов (lifecycle, атомарность claim, ретраи, requeue, observability, worker max_concurrency; Popen полностью замокан) |
## Атомарность claim
```sql
SELECT id FROM jobs WHERE status='queued' ORDER BY id LIMIT 1;
UPDATE jobs SET status='running', attempts=attempts+1, started_at=datetime('now')
WHERE id=? AND status='queued'; -- rowcount==1 => claimed, ==0 => проиграл гонку
```
Гарантия: один job не выдаётся дважды даже при параллельных тиках воркера
(проверено `test_concurrent_claims_no_duplicate` — 8 потоков, 20 jobs).
## Сохранённые фиксы (НЕ сломаны)
- **B-1** task-file write (direct `open()` в worktree) — без изменений.
- **B-2** Popen → log_fh (no PIPE), monitor reap — без изменений, только обёрнут.
- **M-1** orphan-recovery в `main.py` — оставлен, queue-recovery добавлен ПОСЛЕ него.
- **ORCH-2** worktree per task — без изменений.
- **ORCH-6** project registry/filter — без изменений.
## Acceptance
| # | Проверка | Статус |
|---|----------|--------|
| 1 | webhook кладёт job (queued) | ✅ enqueue_job |
| 2 | воркер исполняет queued→running→done | ✅ worker + _finalize_job |
| 3 | running ≤ max_concurrency | ✅ test_worker_respects_max_concurrency |
| 4 | ретрай fail→queued→failed+notify | ✅ test_finalize_job_requeue_then_fail |
| 5 | рестарт-safe (running→requeue) | ✅ requeue_running_jobs + lifespan |
| 6 | M-1 не сломан | ✅ оставлен в lifespan |
| 7 | тесты (new green, 9 pre-existing) | ✅ 76 passed / 9 pre-existing |
| 8 | `/queue` | ✅ counts + recent |
## Тесты
```bash
IMG=$(docker inspect orchestrator --format '{{.Config.Image}}')
docker run --rm -v /home/slin/repos/orchestrator:/code -w /code \
--entrypoint python3 $IMG -m pytest tests/ -q
# 110 passed, 9 failed (pre-existing test_webhooks 401/signature/TypeError)
```
---
## Resilience-слой (ДОПОЛНЕНИЕ: preflight + 429 + backoff + circuit breaker)
Надёжность очереди против недоступности CLI и rate-limit. Два РАЗНЫХ класса
проблем лечатся по-разному.
### A. Дешёвый preflight (`src/preflight.py`) — не жжёт токены
Перед claim воркер проверяет: `os.path.exists(CLAUDE_BIN)` + `claude --version`
(timeout 5с, токены НЕ тратит). Результат кэшируется `preflight_cache_ttl` (45с).
FAIL → воркер НЕ claimит (job остаётся `queued`), ждёт. 🚫 НЕТ prompt-ping.
### B. 429 — детект НА ВЫХОДЕ (`src/error_classifier.py`)
rate-limit нельзя предсказать — классифицируем по логу прогона. `classify_log_file`
читает хвост лога (16KB), ищет `429/rate limit/overloaded/quota/503/529/timeout/...`
`transient` или `permanent`. Извлекает `Retry-After`.
- **transient** (429/сеть) → backoff-ретрай с ОТДЕЛЬНЫМ `transient_attempts`
(лимит `transient_max_attempts=5`) — не жжёт code-fault бюджет.
- **permanent** (code-fault) → обычные `attempts < max_attempts` (2), потом `failed`.
### C. Backoff + `available_at`
Колонки `jobs.available_at TEXT` + `jobs.transient_attempts INTEGER` (миграция
`_ensure_column`). `claim_next_job`: `WHERE status='queued' AND (available_at IS NULL
OR available_at <= datetime('now'))`. При transient: `available_at = now +
min(2^n * base, max)` (base=10с, max=600с), `Retry-After` уважается (берёмся max).
### D. Circuit breaker (`CircuitBreaker` в queue_worker)
N=3 transient подряд → **open**: воркер паузит `breaker_pause_seconds=300`, ВООБЩЕ
не дёргает CLI, Telegram-алерт. Через паузу → **half-open** (пробует 1 job);
ожил (exit 0) → **closed**; снова transient → опять open. Состояние в памяти
воркера, отражается в `/queue.resilience`.
Связь launcher→breaker — через callback `launcher.on_outcome` (без import-цикла).
### Конфиг (config.py)
`preflight_cache_ttl=45`, `backoff_base_seconds=10`, `backoff_max_seconds=600`,
`transient_max_attempts=5`, `breaker_threshold=3`, `breaker_pause_seconds=300`.
### Тесты
`tests/test_resilience.py` — 34 теста: preflight (FAIL→queued, кэш, force),
классификатор (transient/permanent/Retry-After), backoff (рост/cap/Retry-After,
`available_at` гейтинг), launcher transient/permanent finalize, breaker
(open/half-open/closed/re-open, блок claim).

132
docs/PRODUCT_VISION.md Normal file
View File

@@ -0,0 +1,132 @@
# Product Vision — Автономная фабрика разработки (Orchestrator)
> Мультиагентная платформа, которая превращает идею или баг в задеплоенный на прод результат — автономно, надёжно и дёшево.
**Версия:** 1.0 · **Дата:** 2026-06-04 · **Статус:** концепция развития
---
## 1. Зачем это (бизнес-взгляд)
### Проблема
Классическая разработка — это люди-бутылочное-горлышко на каждом шаге: аналитик, архитектор, разработчик, ревьюер, тестировщик, деплой-инженер. Каждая передача задачи между ними — потеря времени, контекста и денег. Мелкая фича или баг едут днями.
### Решение
**Orchestrator** — это конвейер из ИИ-агентов, который проводит задачу через все стадии разработки сам: от бизнес-постановки до релиза на прод. Человек ставит задачу и принимает результат. Всё между — автономно.
### Ценность
-**Скорость:** фича проходит полный цикл (анализ → архитектура → код → ревью → тесты → деплой) за ~35 минут без ручных вмешательств.
- 💰 **Стоимость:** работа агентов в разы дешевле команды; адаптивный выбор моделей экономит на простых задачах.
- 🎯 **Автономность:** 0 ручных пинков в штатном прогоне. Человек — постановщик и приёмщик, а не оператор.
- 🛡️ **Надёжность:** многоуровневые гейты качества не пускают недоделку на прод.
- 🔁 **Масштаб:** одна платформа ведёт несколько проектов; саму платформу можно тиражировать на новые хосты.
---
## 2. Как это работает (обзор)
### Конвейер
```
created → analysis → architecture → development → review → testing → deploy → done
```
На каждом переходе стоит **quality gate** — автоматическая проверка, которая не пускает задачу дальше, пока стадия не выполнена честно:
| Переход | Гейт | Что проверяет |
|---|---|---|
| analysis → architecture | check_analysis_approved | BRD/TRZ/AC готовы + апрув человека |
| architecture → development | check_architecture_done | Архитектура/ADR зафиксированы |
| development → review | check_ci_green | CI зелёный (тесты проходят) |
| review → testing | check_reviewer_verdict | Машинный вердикт ревьюера: APPROVED |
| testing → deploy | check_tests_passed | Машинный вердикт тестера (не подделать) |
| deploy → done | check_deploy_status | Деплой реально успешен, лог в origin/main |
### Агенты
- **Analyst** — собирает бизнес-требования, пишет BRD/TRZ/критерии приёмки.
- **Architect** — проектирует решение, фиксирует ADR.
- **Developer** — пишет код в изолированном git-worktree.
- **Reviewer** — ревьюит, выносит машинный вердикт.
- **Tester** — прогоняет тесты, фиксирует результат в отчёте.
- **Deployer** — мержит, тегирует, деплоит на прод, пишет deploy-log.
### Объекты
- **Project** — проект в реестре (Plane project ↔ git-репозиторий ↔ префикс задач).
- **Work-Item** — задача, проходящая конвейер; на каждой стадии накапливает артефакты (00-business-request … 14-deploy-log).
- **Job** — единица работы в очереди (atomic claim, ретраи, restart-safe).
### Интеграции
- **Plane** — управление задачами, статусы как триггеры конвейера, webhooks.
- **Gitea** — репозитории, PR, защита main (pre-receive hook).
- **Telegram** — живой трекер прогресса, апрувы, уведомления.
- **LLM** — модели агентов (сейчас Claude, в планах мультипровайдерность).
---
## 3. Что уже сделано (фундамент)
**Автономный конвейер** — подтверждён живым прогоном: задача от issue до Done без ручных вмешательств (~35 мин).
**Очередь задач** — atomic claim, max_concurrency, ретраи, restart-safe.
**Изоляция через git-worktree** — каждая задача в своём дереве, без конфликтов в shared-репо.
**Машинные гейты качества** — вердикты читаются из структурированных артефактов, а не угадываются по тексту.
**Multi-repo** — платформа ведёт несколько проектов (enduro-trails, сам orchestrator).
**Идемпотентность webhooks** — дедуп по delivery-id, защита от дублей.
**Наблюдаемость** — учёт токенов и стоимости каждой задачи.
**Живой Telegram-трекер** — прогресс редактируется в одном сообщении, без спама.
---
## 4. Куда движемся (дорожная карта)
Развитие сгруппировано в 5 стратегических направлений.
### 🛡️ Надёжность и безопасность
- **Post-deploy мониторинг + авто-rollback** — следить за продом после релиза, откатывать при деградации.
- **Security-гейт** — secret-scanning + аудит зависимостей перед мержем.
- **Бюджетный circuit-breaker** — хард-лимит стоимости на задачу, защита от «убегающих» расходов.
- **Опциональная human-приёмка** — финальный взгляд человека для критичных фич.
### 💰 Экономика и интеллект
- **Мультипровайдерность LLM** — Claude, OpenRouter, другие провайдеры на выбор.
- **Оценка задачи** — прогноз стоимости/времени до старта.
- **Адаптивный выбор модели** — по сложности: тривиальное на дешёвой, сложное на сильной.
- **Багфикс-трек** — упрощённый дешёвый путь для багов (без потери качества).
### 🏗️ Платформа и масштаб
- **Self-hosting** — оркестратор пилит сам себя через собственный конвейер.
- **Саморазвитие** — петля уроков: ловить отклонения → фиксировать → предлагать улучшения.
- **Онбординг проектов** — turnkey-заведение нового проекта в систему.
- **Тиражирование** — развернуть платформу на новой инфраструктуре под ключ.
### 💬 Взаимодействие с человеком
- **UX/UI дизайнер** — макеты интерфейсов на этапе аналитики.
- **Интерактивный аналитик** — живой диалог для уточнения требований и обсуждения макетов.
- **Единые коммент-артефакты** — все агенты прикладывают результаты с кликабельными ссылками.
- **Прямые ссылки в Telegram** — апрув в один клик, без блужданий.
### 🧩 Расширение возможностей
- **Тяжёлые расчёты данных** — опциональная стадия для миграций/обработки больших данных.
- **Android-разработка** — мобильный стек через тот же конвейер.
- **Декомпозиция эпиков** — большая фича → подзадачи → сборка.
- **Управление зависимостями** — задача B ждёт задачу A.
- **Code coverage gate** — защита покрытия тестами от деградации.
- **База знаний проекта** — персистентный контекст для агентов.
---
## 5. Принципы (что для нас неизменно)
1. **Автономность по умолчанию, человек — на ключевых развилках.** Машина делает, человек ставит и принимает.
2. **Качество не приносится в жертву скорости/цене.** Удешевляем аналитику — гейты качества остаются. Урок дорого выученный: срезанная проверка = недоделка на проде.
3. **Машинные вердикты, а не угадывание.** Гейты читают структурированные поля, а не ищут слова в тексте.
4. **Самоизменение — только через PR + ревью + апрув.** Агент, меняющий агентов, всегда под контролем человека.
5. **Документация — сразу, не потом.** Изменил функционал → обновил доки.
6. **Прод — источник правды.** «Деплой прошёл» ≠ «работает». Проверяем реальный результат.
---
## 6. Видение в одну фразу
> **Самодостаточная фабрика разработки, которая размножается, учится на ошибках, оценивает себя, бережёт бюджет и не ломает прод — превращая намерение человека в работающий продукт почти без его участия.**
---
*Документ поддерживается в репозитории orchestrator. Источник дорожной карты — задачи проекта ORCH в Plane (ORCH-7…ORCH-28).*

BIN
docs/PRODUCT_VISION.pptx Normal file

Binary file not shown.

View File

@@ -1,10 +1,12 @@
import subprocess
import os
import json
import logging
import threading
import signal
import time
from ..config import settings
from ..db import get_db, get_task_by_repo_branch, update_task_stage
from ..db import get_db, get_task_by_repo_branch, update_task_stage, enqueue_job
from ..stages import get_next_stage, get_qg_for_stage, get_agent_for_stage
from ..git_worktree import ensure_worktree, get_worktree_path
from ..qg.checks import QG_CHECKS
@@ -14,6 +16,62 @@ from ..plane_sync import notify_stage_change as plane_notify_stage, add_comment
logger = logging.getLogger("orchestrator.launcher")
def prune_run_logs(runs_dir, keep_days=30, keep_max=500, active_paths=None):
"""L-2: best-effort rotation of per-run logs (<runs_dir>/*.log).
A log file is removed if it is older than keep_days OR it is not within the
keep_max most-recent logs (whichever condition is met first). Only *.log
files directly inside runs_dir are considered; non-.log files and
subdirectories are never touched. Files whose path is in active_paths (the
currently running log) are always kept.
Returns the number of files removed. Never raises: any error is logged and
swallowed so log rotation can never bring the app down.
"""
removed = 0
try:
active = set()
for ap in (active_paths or []):
try:
active.add(os.path.realpath(ap))
except Exception:
active.add(ap)
if not os.path.isdir(runs_dir):
return 0
logs = []
for name in os.listdir(runs_dir):
if not name.endswith(".log"):
continue
path = os.path.join(runs_dir, name)
if not os.path.isfile(path):
continue
if os.path.realpath(path) in active:
continue
try:
mtime = os.path.getmtime(path)
except OSError:
continue
logs.append((path, mtime))
logs.sort(key=lambda t: t[1], reverse=True)
cutoff = time.time() - keep_days * 86400
for idx, (path, mtime) in enumerate(logs):
too_old = mtime < cutoff
over_max = idx >= keep_max
if too_old or over_max:
try:
os.remove(path)
removed += 1
except OSError as e:
logger.warning(f"prune_run_logs: failed to remove {path}: {e}")
except Exception as e:
logger.warning(f"prune_run_logs failed for {runs_dir}: {e}")
return removed
class AgentLauncher:
"""Launch Claude CLI agents directly (binary mounted into container)."""
@@ -53,11 +111,17 @@ class AgentLauncher:
}
CLAUDE_BIN = "/opt/claude-code/bin/claude.exe"
AGENT_TIMEOUT = 1800 # 30 minutes
# ORCH-7 (M-2): timeout is now configurable. AGENT_TIMEOUT stays as a
# backward-compatible alias for the default; the actual value (and per-agent
# overrides) live in settings and are resolved via _resolve_timeout().
AGENT_TIMEOUT = settings.agent_timeout_seconds
def launch(self, agent: str, repo: str, task_content: str = None, task_id: int = None) -> int:
"""
Launch a Claude CLI agent.
Launch a Claude CLI agent directly (legacy synchronous path).
Kept for backward compatibility (direct callers / existing tests). The
ORCH-1 job queue uses launch_job() instead, but both share _spawn().
Args:
agent: Agent role (analyst, architect, developer, reviewer, tester)
@@ -68,6 +132,31 @@ class AgentLauncher:
Returns:
agent_run_id from DB
"""
return self._spawn(agent, repo, task_content, task_id, job_id=None)
def launch_job(self, job: dict) -> int:
"""ORCH-1: launch an agent for a claimed queue job.
Same spawn path as launch(), but threads job['id'] through so the monitor
can update the job's status (done / requeue / failed) and link jobs.run_id
to the agent_runs row. Returns the agent_run_id.
"""
return self._spawn(
job["agent"],
job["repo"],
job.get("task_content"),
job.get("task_id"),
job_id=job["id"],
)
def _spawn(self, agent: str, repo: str, task_content: str = None,
task_id: int = None, job_id: int = None) -> int:
"""Shared spawn implementation for launch() and launch_job().
When job_id is set, the monitor/watchdog drive the jobs table status
(ORCH-1). The claude-CLI Popen logic (B-2) and worktree/task-file logic
(B-1 / ORCH-2) are unchanged.
"""
config = self.AGENT_CONFIGS.get(agent)
if not config:
raise ValueError(f"Unknown agent: {agent}")
@@ -98,6 +187,14 @@ class AgentLauncher:
run_id = cursor.lastrowid
conn.commit()
# ORCH-1: link this job to the agent_runs row and stamp started_at.
if job_id is not None:
conn.execute(
"UPDATE jobs SET run_id = ?, started_at = datetime('now') WHERE id = ?",
(run_id, job_id),
)
conn.commit()
# Prepare output log path
output_path = f"/app/data/runs/{run_id}.log"
os.makedirs(os.path.dirname(output_path), exist_ok=True)
@@ -112,9 +209,15 @@ class AgentLauncher:
# No git fetch/checkout here: ensure_worktree() already put the worktree on
# the right branch. The agent simply runs inside its isolated work_path.
# Feature 4 (token usage): --output-format json makes claude emit a single
# result JSON (with usage + total_cost_usd) at the end of stdout. The log
# still captures it; _monitor_agent parses the trailing JSON after the run
# to record per-agent tokens/cost. _monitor_agent's failure handling keys
# off the process exit_code (not stdout shape), so this is safe.
cmd = (
f'cd {work_path} && '
f'{self.CLAUDE_BIN} --print '
f'--output-format json '
f'{model_flag}'
f'"$(cat {task_file})" '
f'--system-prompt "$(cat {system_prompt})" '
@@ -154,6 +257,7 @@ class AgentLauncher:
t = threading.Thread(
target=self._watchdog,
args=(proc.pid, run_id),
kwargs={"job_id": job_id, "agent": agent},
daemon=True,
)
t.start()
@@ -163,6 +267,7 @@ class AgentLauncher:
m = threading.Thread(
target=self._monitor_agent,
args=(proc, run_id, agent, repo, agent_branch, output_path, log_fh),
kwargs={"job_id": job_id},
daemon=True,
)
m.start()
@@ -171,26 +276,102 @@ class AgentLauncher:
notify_agent_started(run_id, agent, task_id)
return run_id
def _watchdog(self, pid: int, run_id: int, timeout: int = None):
"""Kill agent if it exceeds timeout."""
import time
@staticmethod
def _resolve_timeout(agent: str = None) -> int:
"""ORCH-7 (M-2): resolve the wall-clock timeout for an agent.
Per-agent override from settings.agent_timeout_overrides_json (a JSON object
like {"reviewer": 3600}) wins; otherwise the global default
settings.agent_timeout_seconds is used. A malformed override JSON is ignored
(falls back to the default) and only logged, so a bad env never bricks runs.
"""
default = settings.agent_timeout_seconds
raw = (settings.agent_timeout_overrides_json or "").strip()
if agent and raw:
try:
overrides = json.loads(raw)
if isinstance(overrides, dict) and agent in overrides:
return int(overrides[agent])
except (ValueError, TypeError) as e:
logger.warning(f"Invalid agent_timeout_overrides_json, using default: {e}")
return default
def _watchdog(self, pid: int, run_id: int, timeout: int = None,
job_id: int = None, agent: str = None):
"""Kill agent if it exceeds its timeout.
ORCH-1: on a timeout-kill the monitor's proc.wait() returns the kill exit
code and drives the job retry/fail logic, so the watchdog itself only needs
to terminate the process and record the agent_runs exit. job_id is accepted
for symmetry.
ORCH-7 (M-2): graceful shutdown. Instead of an immediate SIGKILL (which cuts
claude off mid-write and leaves half-written artifacts), send SIGTERM first,
give the process up to settings.agent_kill_grace_seconds to flush and exit on
its own, and only SIGKILL if it is still alive after the grace window. If the
process exits during the grace window, SIGKILL is NOT sent.
ProcessLookupError is tolerated at every step (the process may already be
gone). The recorded exit_code stays -9 to match the existing retry/fail
contract regardless of which signal actually reaped it.
"""
if timeout is None:
timeout = self.AGENT_TIMEOUT
timeout = self._resolve_timeout(agent)
time.sleep(timeout)
# Phase 1: SIGTERM (graceful). If the process is already gone, we're done.
try:
os.kill(pid, signal.SIGTERM)
logger.warning(
f"Agent run_id={run_id} exceeded {timeout}s timeout: sent SIGTERM "
f"(pid={pid}), grace={settings.agent_kill_grace_seconds}s"
)
except ProcessLookupError:
logger.info(f"Agent run_id={run_id} already exited before SIGTERM")
return # nothing to record: the monitor's proc.wait() owns the exit
# Phase 2: poll for graceful exit within the grace window.
grace = settings.agent_kill_grace_seconds
poll_interval = 0.5
waited = 0.0
while waited < grace:
time.sleep(poll_interval)
waited += poll_interval
try:
os.kill(pid, 0) # signal 0 = liveness probe, does not kill
except ProcessLookupError:
logger.info(
f"Agent run_id={run_id} exited gracefully after SIGTERM "
f"({waited:.1f}s); no SIGKILL needed"
)
self._record_kill(run_id)
return
# Phase 3: still alive -> hard SIGKILL.
try:
os.kill(pid, signal.SIGKILL)
logger.warning(f"Agent run_id={run_id} killed after {timeout}s timeout")
conn = get_db()
conn.execute(
"UPDATE agent_runs SET finished_at=datetime('now'), exit_code=-9 WHERE id=?",
(run_id,),
logger.warning(
f"Agent run_id={run_id} did not exit within {grace}s grace: sent SIGKILL"
)
conn.commit()
conn.close()
except ProcessLookupError:
pass # Already finished
logger.info(f"Agent run_id={run_id} exited just before SIGKILL")
self._record_kill(run_id)
def _monitor_agent(self, proc, run_id, agent, repo, branch, output_path=None, log_fh=None):
@staticmethod
def _record_kill(run_id: int):
"""Stamp the agent_runs row as timeout-killed (exit_code=-9).
ORCH-1: -9 is the existing kill-exit contract the monitor/retry logic keys
off, so we keep it stable whether the reap came from SIGTERM or SIGKILL.
"""
conn = get_db()
conn.execute(
"UPDATE agent_runs SET finished_at=datetime('now'), exit_code=-9 WHERE id=?",
(run_id,),
)
conn.commit()
conn.close()
def _monitor_agent(self, proc, run_id, agent, repo, branch, output_path=None, log_fh=None, job_id=None):
"""Wait for agent to finish, commit+push results, update DB.
B-2 fix: stdout already goes straight to the log file via Popen, so we just
@@ -225,6 +406,17 @@ class AgentLauncher:
notify_agent_finished(run_id, agent, exit_code, task_id=_task_id, duration_s=_duration_s)
# Feature 4: parse token usage / cost from the (json) run log and record
# it on the agent_runs row. Never fatal — a garbled/missing JSON records
# NULLs and logs a warning so a broken run can't crash the monitor.
try:
from ..usage import parse_usage_from_log, record_usage
_usage = parse_usage_from_log(output_path) if output_path else None
record_usage(run_id, _usage)
except Exception as e:
logger.warning(f"run_id={run_id}: usage accounting failed: {e}")
_usage = None
# Commit and push any changes — in the per-branch worktree (ORCH-2 / S-4),
# NOT in the shared /repos/<repo>. The worktree is already on `branch`
# (ensure_worktree did the checkout), so no checkout is needed here.
@@ -296,7 +488,8 @@ class AgentLauncher:
set_issue_blocked(_wid)
plane_add_comment(
_wid,
"\u274c Deploy FAILED (smoke/healthcheck). Rolled back. Developer \u043d\u0443\u0436\u0435\u043d \u0434\u043b\u044f \u0444\u0438\u043a\u0441\u0430."
"\u274c Deploy FAILED (smoke/healthcheck). Rolled back. Developer \u043d\u0443\u0436\u0435\u043d \u0434\u043b\u044f \u0444\u0438\u043a\u0441\u0430.",
author="deployer",
)
from ..notifications import send_telegram
send_telegram(f"\U0001f6a8 {_wid}: Deploy failed! Rolled back. Needs fix.")
@@ -314,12 +507,154 @@ class AgentLauncher:
from ..notifications import send_telegram
send_telegram(f"\u26a0\ufe0f {_wid}: Agent {agent} failed (exit_code={exit_code}). Check logs: /app/data/runs/{run_id}.log")
# Feature 4: post the per-agent usage comment under that agent's bot, and
# — for the deployer finishing the task — the per-task usage summary.
if exit_code == 0:
try:
self._post_usage_comments(run_id, agent, repo, branch, _usage)
except Exception as e:
logger.warning(f"run_id={run_id}: usage comment failed: {e}")
# Auto-advance stage if agent finished successfully and QG passes
if exit_code == 0:
self._try_advance_stage(run_id, agent, repo, branch)
# ORCH-1: drive the job-queue status for queue-launched jobs only.
# (Legacy direct launch() has job_id=None and is unaffected.)
if job_id is not None:
self._finalize_job(job_id, agent, run_id, exit_code, output_path=output_path)
def _backoff_seconds(self, transient_attempts: int, retry_after: int = None) -> int:
"""Exponential backoff for transient failures, honouring Retry-After.
backoff = min(2^transient_attempts * base, max). If the server sent a
Retry-After, use the larger of the two (never poll sooner than asked).
"""
base = settings.backoff_base_seconds
cap = settings.backoff_max_seconds
backoff = min((2 ** max(transient_attempts, 0)) * base, cap)
if retry_after is not None and retry_after > 0:
backoff = max(backoff, min(retry_after, cap))
return int(backoff)
def _finalize_job(self, job_id: int, agent: str, run_id: int, exit_code, output_path=None):
"""ORCH-1: update the jobs row after the agent process finished.
exit_code == 0 -> done (and resets the breaker streak via on_outcome).
exit_code != 0 -> classify the failure from the run log tail (token-free):
- TRANSIENT (429/overload/network): backoff-requeue with available_at in
the future + a SEPARATE transient_attempts budget
(settings.transient_max_attempts), honouring Retry-After. Reported to
the breaker so it opens after N consecutive transient failures.
- PERMANENT (code fault): ordinary attempts < max_attempts requeue,
otherwise 'failed' + Telegram.
"""
from ..db import get_job, mark_job
from ..error_classifier import classify_log_file
try:
job = get_job(job_id)
if not job:
return
if exit_code == 0:
mark_job(job_id, "done", run_id=run_id)
logger.info(f"Job {job_id} ({agent}) done (run_id={run_id})")
self._record_outcome(transient=False, recovered=True)
return
# Classify the failure from the agent log tail (no token cost).
kind, retry_after = "permanent", None
log_path = output_path or f"/app/data/runs/{run_id}.log"
try:
kind, retry_after = classify_log_file(log_path)
except Exception:
pass
if kind == "transient":
self._finalize_transient(job_id, agent, run_id, exit_code, job, retry_after)
else:
self._finalize_permanent(job_id, agent, run_id, exit_code, job)
except Exception as e:
logger.error(f"Job {job_id}: _finalize_job error: {e}")
def _finalize_transient(self, job_id, agent, run_id, exit_code, job, retry_after):
"""Transient (429/overload/net) failure -> backoff requeue or fail when budget out."""
from ..db import mark_job, mark_job_transient
tattempts = job.get("transient_attempts", 0)
tmax = settings.transient_max_attempts
err = (f"transient (429/overload) agent {agent} exit={exit_code} "
f"(run_id={run_id}); retry_after={retry_after}")
self._record_outcome(transient=True, recovered=False)
if tattempts < tmax:
backoff = self._backoff_seconds(tattempts + 1, retry_after)
mark_job_transient(job_id, backoff, error=err)
logger.warning(
f"Job {job_id} ({agent}) TRANSIENT fail (exit={exit_code}), "
f"backoff {backoff}s, transient_attempt {tattempts + 1}/{tmax}"
)
else:
mark_job(job_id, "failed", run_id=run_id, error=err)
logger.error(
f"Job {job_id} ({agent}) failed after {tattempts} transient attempts"
)
self._notify_failed(job_id, agent, job, run_id,
f"transient (rate-limit) after {tattempts} attempts")
def _finalize_permanent(self, job_id, agent, run_id, exit_code, job):
"""Permanent (code-fault) failure -> normal attempts<max requeue, then fail."""
from ..db import mark_job
attempts = job.get("attempts", 0)
max_attempts = job.get("max_attempts", 2)
err = f"agent {agent} exit_code={exit_code} (run_id={run_id})"
self._record_outcome(transient=False, recovered=False)
if attempts < max_attempts:
mark_job(job_id, "queued", run_id=run_id, error=err)
logger.warning(
f"Job {job_id} ({agent}) failed (exit={exit_code}), "
f"requeued (attempt {attempts}/{max_attempts})"
)
else:
mark_job(job_id, "failed", run_id=run_id, error=err)
logger.error(
f"Job {job_id} ({agent}) failed permanently after "
f"{attempts} attempts (exit={exit_code})"
)
self._notify_failed(job_id, agent, job, run_id,
f"{attempts} attempts (exit={exit_code})")
def _notify_failed(self, job_id, agent, job, run_id, why):
try:
from ..notifications import send_telegram
send_telegram(
f"\U0001f6a8 Job {job_id} ({agent}, repo {job.get('repo')}) "
f"failed: {why}. Logs: /app/data/runs/{run_id}.log"
)
except Exception:
pass
def _record_outcome(self, transient: bool, recovered: bool):
"""Forward the run outcome to the circuit breaker (if a worker is wired).
Decoupled via a settable callback (set by QueueWorker.start) so the launcher
does not hard-import the worker (avoids a cycle) and tests can run the
launcher standalone.
"""
cb = getattr(self, "on_outcome", None)
if cb:
try:
cb(transient=transient, recovered=recovered)
except Exception:
pass
def _try_advance_stage(self, run_id: int, agent: str, repo: str, branch: str):
"""After agent finishes successfully, check QG and advance stage if possible."""
"""After agent finishes successfully, advance the stage via the unified engine.
ORCH-4 / M-3: the 174-line body that used to live here moved into
src/stage_engine.advance_stage(). This is now a thin wrapper: it looks up
the task by (repo, branch) and delegates. `agent` is forwarded as
finished_agent so the analyst/reviewer/tester/architect rollback branches
still trigger exactly as before. The agent-selection bug (it used to call
get_agent_for_stage(next_stage)) is fixed inside the engine.
"""
try:
conn = get_db()
task_row = conn.execute(
@@ -331,178 +666,82 @@ class AgentLauncher:
return
task_id, current_stage, work_item_id = task_row
qg_name = get_qg_for_stage(current_stage)
next_stage = get_next_stage(current_stage)
if not next_stage:
return
# Run QG check if defined
if qg_name and qg_name in QG_CHECKS:
check_fn = QG_CHECKS[qg_name]
if qg_name in ("check_analysis_approved",):
# Requires human approval - post request comment if analyst just finished
if agent == "analyst" and qg_name == "check_analysis_approved" and work_item_id:
files_check = QG_CHECKS.get("check_analysis_complete")
if files_check:
files_ok, _ = files_check(repo, work_item_id, branch)
if files_ok:
# Full artifacts ready -> In Review
from ..plane_sync import set_issue_in_review
set_issue_in_review(work_item_id)
plane_add_comment(
work_item_id,
"\U0001f4cb BRD/\u0422\u0417/AC/TestPlan \u0433\u043e\u0442\u043e\u0432\u044b. "
"\u041f\u0440\u043e\u0448\u0443 review \u0438 \u0440\u0435\u0430\u043a\u0446\u0438\u044e :approved: \u0434\u043b\u044f \u043f\u0440\u043e\u0434\u0432\u0438\u0436\u0435\u043d\u0438\u044f \u0432 Architecture."
)
notify_approve_requested(task_id)
logger.info(f"Task {task_id}: analyst finished, requested :approved: in Plane")
else:
# Check if questions file exists (in the task worktree)
import os as _os
questions_path = _os.path.join(
get_worktree_path(repo, branch),
f"docs/work-items/{work_item_id}/01-questions.md"
)
if _os.path.isfile(questions_path):
# Analyst has questions -> Needs Input
from ..plane_sync import set_issue_needs_input
set_issue_needs_input(work_item_id)
with open(questions_path, "r") as qf:
questions_text = qf.read()
plane_add_comment(
work_item_id,
f"\u2753 Analyst \u043d\u0443\u0436\u0434\u0430\u0435\u0442\u0441\u044f \u0432 \u0443\u0442\u043e\u0447\u043d\u0435\u043d\u0438\u0438:\n\n{questions_text}"
)
from ..notifications import send_telegram
send_telegram(
f"\u2753 {work_item_id}: Analyst \u0437\u0430\u0434\u0430\u0451\u0442 \u0432\u043e\u043f\u0440\u043e\u0441\u044b. \u041e\u0442\u0432\u0435\u0442\u044c \u0432 Plane."
)
else:
# No artifacts and no questions
plane_add_comment(
work_item_id,
"\u26a0\ufe0f Analyst \u0437\u0430\u0432\u0435\u0440\u0448\u0438\u043b\u0441\u044f \u0431\u0435\u0437 \u0430\u0440\u0442\u0435\u0444\u0430\u043a\u0442\u043e\u0432 \u0438 \u0431\u0435\u0437 \u0432\u043e\u043f\u0440\u043e\u0441\u043e\u0432. \u041f\u0440\u043e\u0432\u0435\u0440\u044c\u0442\u0435 \u043b\u043e\u0433."
)
return
elif qg_name in ("check_ci_green", "check_tests_local"):
# (repo, branch) signature — already worktree-aware.
passed, reason = check_fn(repo, branch)
elif qg_name == "check_tests_passed":
# Artifact check — pass branch so it reads from the worktree.
passed, reason = check_fn(repo, work_item_id or "", branch)
else:
# Other artifact checks (check_architecture_done, etc.) — worktree-aware.
passed, reason = check_fn(repo, work_item_id or "", branch)
if not passed:
logger.info(f"Task {task_id}: QG '{qg_name}' not passed after {agent}: {reason}")
# If reviewer says REQUEST_CHANGES, rollback to development
if agent == "reviewer" and "REQUEST_CHANGES" in reason:
update_task_stage(task_id, "development")
notify_stage_change(task_id, current_stage, "development")
plane_notify_stage(work_item_id, current_stage, "development")
# Count retries
conn2 = get_db()
retry_count = conn2.execute(
"SELECT COUNT(*) FROM agent_runs WHERE task_id=? AND agent='developer'",
(task_id,)
).fetchone()[0]
conn2.close()
if retry_count < 3:
task_desc = (
f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
f"Stage: development\nNote: REQUEST_CHANGES from reviewer "
f"(attempt {retry_count+1}/3). Fix findings in "
f"docs/work-items/{work_item_id}/12-review.md"
)
new_run = self.launch("developer", repo, task_desc, task_id=task_id)
logger.info(f"Task {task_id}: reviewer REQUEST_CHANGES, relaunched developer (run_id={new_run})")
else:
from ..notifications import send_telegram
send_telegram(f"\u26a0\ufe0f {work_item_id}: Max developer retries (3) reached. Manual intervention needed.")
logger.error(f"Task {task_id}: max retries reached")
# Task 6: Tester FAIL -> rollback to development
if agent == "tester" and qg_name == "check_tests_passed" and not passed:
update_task_stage(task_id, "development")
notify_stage_change(task_id, current_stage, "development")
plane_notify_stage(work_item_id, current_stage, "development")
from ..plane_sync import set_issue_in_progress
set_issue_in_progress(work_item_id)
plane_add_comment(
work_item_id,
f"\u274c \u0422\u0435\u0441\u0442\u044b \u043d\u0435 \u043f\u0440\u043e\u0448\u043b\u0438: {reason}. Developer \u043f\u0435\u0440\u0435\u0437\u0430\u043f\u0443\u0449\u0435\u043d \u0434\u043b\u044f \u0444\u0438\u043a\u0441\u0430."
)
conn2 = get_db()
retry_count = conn2.execute(
"SELECT COUNT(*) FROM agent_runs WHERE task_id=? AND agent='developer'",
(task_id,)
).fetchone()[0]
conn2.close()
if retry_count < 3:
task_desc = (
f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
f"Stage: development\nNote: Tests FAILED. "
f"Fix failures described in docs/work-items/{work_item_id}/13-test-report.md"
)
new_run = self.launch("developer", repo, task_desc, task_id=task_id)
logger.info(f"Task {task_id}: tester FAIL, relaunched developer (run_id={new_run})")
else:
from ..notifications import send_telegram
from ..plane_sync import set_issue_blocked
set_issue_blocked(work_item_id)
send_telegram(f"\U0001f6a8 {work_item_id}: Tests still failing after 3 developer retries. Manual intervention needed.")
# Task 8: Architect conflict -> rollback to analysis
if agent == "architect" and qg_name == "check_architecture_done" and not passed:
import os as _os
conflict_path = _os.path.join(
get_worktree_path(repo, branch),
f"docs/work-items/{work_item_id}/10-conflict.md"
)
if _os.path.isfile(conflict_path):
update_task_stage(task_id, "analysis")
notify_stage_change(task_id, current_stage, "analysis")
plane_notify_stage(work_item_id, current_stage, "analysis")
from ..plane_sync import set_issue_in_progress
set_issue_in_progress(work_item_id)
with open(conflict_path, "r") as cf:
conflict_text = cf.read()[:500]
plane_add_comment(
work_item_id,
f"\u26a0\ufe0f Architect \u043d\u0430\u0448\u0451\u043b \u043a\u043e\u043d\u0444\u043b\u0438\u043a\u0442 \u0441 \u0422\u0417. \u0412\u043e\u0437\u0432\u0440\u0430\u0442 \u0432 Analysis.\n\n{conflict_text}"
)
task_desc = (
f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
f"Stage: analysis\nNote: Architect conflict. Revise TRZ. "
f"See docs/work-items/{work_item_id}/10-conflict.md"
)
new_run = self.launch("analyst", repo, task_desc, task_id=task_id)
logger.info(f"Task {task_id}: architect conflict, relaunched analyst")
return
return
elif qg_name:
return
# Advance stage
update_task_stage(task_id, next_stage)
notify_stage_change(task_id, current_stage, next_stage)
plane_notify_stage(work_item_id, current_stage, next_stage)
logger.info(f"Task {task_id}: {current_stage} -> {next_stage} (auto-advance after {agent})")
# Launch next agent if defined
next_agent = get_agent_for_stage(next_stage)
if next_agent:
task_desc = f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\nStage: {next_stage}"
new_run_id = self.launch(next_agent, repo, task_desc, task_id=task_id)
logger.info(f"Task {task_id}: launched '{next_agent}' (run_id={new_run_id})")
from ..stage_engine import advance_stage
advance_stage(
task_id=task_id,
current_stage=current_stage,
repo=repo,
work_item_id=work_item_id,
branch=branch,
finished_agent=agent,
)
except Exception as e:
logger.error(f"Auto-advance failed for run_id={run_id}: {e}")
def _post_usage_comments(self, run_id, agent, repo, branch, usage):
"""Feature 4: post the per-agent usage comment (and Deployer summary).
- Always (on success, with a work_item_id): a per-agent finish comment
with token/cost, authored by the finishing agent's Plane bot.
- When the deployer finishes: also a per-task summary (SUM over
agent_runs GROUP BY agent), authored by the deployer.
"""
from ..usage import usage_comment, task_summary_comment
conn = get_db()
row = conn.execute(
"SELECT id, work_item_id FROM tasks WHERE repo=? AND branch=?",
(repo, branch),
).fetchone()
conn.close()
if not row:
return
task_id, work_item_id = row[0], row[1]
if not work_item_id:
return
# Observability: every agent's finish comment links its artifact(s)
# (reviewer->12-review, tester->13-test-report, deployer->14-deploy-log,
# architect->ADR, developer->PR/branch). For the developer we resolve the
# open PR number so the link points straight at it.
pr_number = None
if agent == "developer":
pr_number = self._open_pr_number(repo, branch)
plane_add_comment(
work_item_id,
usage_comment(
agent,
usage,
repo=repo,
branch=branch,
work_item_id=work_item_id,
pr_number=pr_number,
),
author=agent,
)
if agent == "deployer":
plane_add_comment(
work_item_id, task_summary_comment(task_id), author="deployer"
)
def _open_pr_number(self, repo: str, branch: str):
"""Return the open PR number for `branch`, or None. Never raises."""
try:
import httpx
owner = settings.gitea_owner
headers = {"Authorization": f"token {settings.gitea_token}"}
resp = httpx.get(
f"{settings.gitea_url}/api/v1/repos/{owner}/{repo}/pulls",
params={"state": "open", "head": branch},
headers=headers, timeout=5,
)
if resp.status_code == 200:
prs = resp.json()
if prs:
return prs[0].get("number")
except Exception:
pass
return None
def _ensure_pr(self, repo: str, branch: str, run_id: int):
import httpx
owner = settings.gitea_owner
@@ -534,47 +773,6 @@ class AgentLauncher:
logger.error(f"Failed to create PR for {branch}: {e}")
return None
def _auto_merge_pr(self, repo: str, branch: str, task_id: int, work_item_id: str):
import httpx
owner = settings.gitea_owner
headers = {"Authorization": f"token {settings.gitea_token}"}
base_url = f"{settings.gitea_url}/api/v1"
try:
resp = httpx.get(
f"{base_url}/repos/{owner}/{repo}/pulls",
params={"state": "open", "head": branch},
headers=headers, timeout=10
)
resp.raise_for_status()
prs = resp.json()
if not prs:
pr_number = self._ensure_pr(repo, branch, 0)
if not pr_number:
return False
else:
pr_number = prs[0]["number"]
resp = httpx.post(
f"{base_url}/repos/{owner}/{repo}/pulls/{pr_number}/merge",
json={"Do": "merge"},
headers=headers, timeout=30
)
if resp.status_code in (200, 204):
logger.info(f"PR #{pr_number} merged for {branch}")
update_task_stage(task_id, "done")
notify_stage_change(task_id, "deploy", "done")
plane_notify_stage(work_item_id, "deploy", "done")
from ..notifications import send_telegram
send_telegram(f"\u2705 {work_item_id}: PR #{pr_number} merged! deploy -> done. Task complete.")
return True
else:
logger.error(f"Merge failed for PR #{pr_number}: {resp.status_code} {resp.text}")
from ..notifications import send_telegram
send_telegram(f"\u26a0\ufe0f {work_item_id}: Auto-merge failed (HTTP {resp.status_code}). Manual merge needed.")
return False
except Exception as e:
logger.error(f"Auto-merge failed for {branch}: {e}")
return False
def _write_task_file(self, repo: str, branch: str, task_file: str, content: str):
"""Write task file directly into the task's worktree.

View File

@@ -9,13 +9,30 @@ class Settings(BaseSettings):
plane_webhook_secret: str = ""
plane_project_id: str = ""
# Per-agent Plane bot tokens (feat: per-agent comment authorship).
# When set, add_comment posts under the matching bot so Plane shows the
# real author (Analyst/Architect/...). Empty -> fallback to plane_api_token.
plane_bot_analyst: str = ""
plane_bot_architect: str = ""
plane_bot_developer: str = ""
plane_bot_reviewer: str = ""
plane_bot_tester: str = ""
plane_bot_deployer: str = ""
plane_bot_stream: str = ""
# Gitea
gitea_url: str = "http://localhost:3000"
gitea_public_url: str = "" # external URL for clickable links in comments; falls back to gitea_url
gitea_token: str = ""
gitea_webhook_secret: str = ""
gitea_owner: str = "admin"
default_repo: str = "enduro-trails"
# ORCH-6: multi-repo project registry. JSON array of
# {plane_project_id, repo, work_item_prefix, name}.
# Empty -> built-in default registry in src/projects.py.
projects_json: str = ""
# Claude CLI
claude_bin: str = "/opt/claude-code/bin/claude.exe"
repos_dir: str = "/repos"
@@ -25,6 +42,51 @@ class Settings(BaseSettings):
# DB
db_path: str = "/app/data/orchestrator.db"
# ORCH-1 (F-2b): persistent job queue / background worker.
# max_concurrency -> max agent jobs running in parallel (env ORCH_MAX_CONCURRENCY)
# queue_poll_interval -> worker loop poll seconds (env ORCH_QUEUE_POLL_INTERVAL)
max_concurrency: int = 1
queue_poll_interval: float = 2.0
# ORCH-1b (resilience): preflight + 429/rate-limit + backoff + circuit breaker.
# preflight_cache_ttl -> cache the cheap CLI/network preflight result (seconds);
# the worker does NOT re-run `claude --version` more often
# than this (env ORCH_PREFLIGHT_CACHE_TTL).
# backoff_base_seconds -> base for exponential transient backoff.
# backoff_max_seconds -> ceiling for the transient backoff.
# transient_max_attempts -> retry budget for transient (429/overload/network)
# failures, separate from code-fault `attempts`.
# breaker_threshold -> consecutive transient failures that OPEN the breaker.
# breaker_pause_seconds -> how long the breaker stays open before half-open.
preflight_cache_ttl: int = 45
backoff_base_seconds: int = 10
backoff_max_seconds: int = 600
transient_max_attempts: int = 5
breaker_threshold: int = 3
breaker_pause_seconds: int = 300
# ORCH-7 (M-2): agent timeout + graceful kill.
# agent_timeout_seconds -> default per-agent wall-clock budget; the watchdog
# kills the run after this (env ORCH_AGENT_TIMEOUT_SECONDS).
# agent_kill_grace_seconds-> pause between SIGTERM and SIGKILL so claude can
# flush artifacts before the hard kill
# (env ORCH_AGENT_KILL_GRACE_SECONDS).
# agent_timeout_overrides_json -> optional per-agent override JSON object,
# e.g. {"reviewer": 3600, "architect": 2700}
# (env ORCH_AGENT_TIMEOUT_OVERRIDES_JSON).
agent_timeout_seconds: int = 1800
agent_kill_grace_seconds: int = 20
agent_timeout_overrides_json: str = ""
# L-2: run-log rotation. Old per-run logs in <data>/runs/*.log are pruned at
# app startup (best-effort). A *.log is removed if it is older than
# log_keep_days OR not within the log_keep_max most-recent logs (whichever
# hits first). Only *.log files are touched; the active run log is skipped.
# log_keep_days -> max age in days (env ORCH_LOG_KEEP_DAYS).
# log_keep_max -> max number of newest logs to retain (env ORCH_LOG_KEEP_MAX).
log_keep_days: int = 30
log_keep_max: int = 500
# Telegram notifications
telegram_bot_token: str = ""

478
src/db.py
View File

@@ -40,10 +40,87 @@ def init_db():
exit_code INTEGER,
output_path TEXT
);
-- ORCH-1 (F-2b): persistent job queue. Webhook handlers enqueue a job and
-- return immediately; a background worker claims jobs (respecting
-- max_concurrency), spawns the claude agent, and updates the status.
-- Restart-safe: running jobs are requeued on startup (queue-recovery).
CREATE TABLE IF NOT EXISTS jobs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
agent TEXT NOT NULL,
repo TEXT NOT NULL,
task_id INTEGER, -- FK tasks.id (nullable)
task_content TEXT, -- written to the agent task_file
status TEXT NOT NULL DEFAULT 'queued', -- queued|running|done|failed
attempts INTEGER NOT NULL DEFAULT 0,
max_attempts INTEGER NOT NULL DEFAULT 2,
run_id INTEGER, -- agent_runs.id once started
error TEXT, -- last error message
transient_attempts INTEGER NOT NULL DEFAULT 0, -- ORCH-1 resilience: 429/transient retries
available_at TEXT, -- ORCH-1 resilience: backoff gate (claim when <= now)
created_at TEXT DEFAULT (datetime('now')),
started_at TEXT,
finished_at TEXT
);
CREATE INDEX IF NOT EXISTS idx_jobs_status ON jobs(status, id);
""")
# Lightweight migration: add resilience columns to a pre-existing jobs table
# (CREATE TABLE IF NOT EXISTS won't add columns to an already-created table).
_ensure_column(conn, "jobs", "transient_attempts", "INTEGER NOT NULL DEFAULT 0")
_ensure_column(conn, "jobs", "available_at", "TEXT")
# ORCH-5 (M-7): webhook delivery de-dup. Add events.delivery_id and a PARTIAL
# unique index. Partial (WHERE delivery_id IS NOT NULL) so pre-existing rows
# (which have NULL delivery_id) never collide with each other. Restart-safe:
# _ensure_column is a no-op once the column exists, and CREATE INDEX IF NOT
# EXISTS is a no-op once the index exists, so this is safe on the live prod DB.
_ensure_column(conn, "events", "delivery_id", "TEXT")
conn.execute(
"CREATE UNIQUE INDEX IF NOT EXISTS idx_events_delivery "
"ON events(delivery_id) WHERE delivery_id IS NOT NULL"
)
# Feature 4 (token usage): per-run token / cost accounting. Parsed from the
# claude --output-format json result by the launcher monitor. Idempotent
# ALTERs (no-op once the columns exist) so this is safe on the live prod DB.
_ensure_column(conn, "agent_runs", "input_tokens", "INTEGER")
_ensure_column(conn, "agent_runs", "output_tokens", "INTEGER")
_ensure_column(conn, "agent_runs", "cache_read_tokens", "INTEGER")
# Observability fix: also persist cache-CREATION input tokens. Claude CLI
# reports the real input split across input_tokens (fresh, ~tens) +
# cache_read_input_tokens (cache hit, millions) + cache_creation_input_tokens
# (writing new cache). Without this column the cache_creation slice is lost
# and the "X in" figure understates the true prompt size. Idempotent ALTER.
_ensure_column(conn, "agent_runs", "cache_creation_tokens", "INTEGER")
_ensure_column(conn, "agent_runs", "cost_usd", "REAL")
# Telegram live tracker (feat/telegram-live-tracker): persist the FULL model
# name (e.g. "tokenator/claude-opus-4-8") per agent_runs row so the tracker
# can render a short model tag per stage. Parsed from the run-log result JSON
# (modelUsage key) by the launcher monitor; NULL when unknown. Idempotent ALTER.
_ensure_column(conn, "agent_runs", "model", "TEXT")
# Telegram live tracker: one editable Telegram message per task. We store its
# message_id so each stage transition can editMessageText the same message
# instead of spamming a new one. Idempotent ALTER (safe on the live prod DB).
_ensure_column(conn, "tasks", "tracker_message_id", "INTEGER")
# Telegram live tracker: human-readable task title for the tracker header
# ("🛠️ ET-012 · <title>"). Populated from the Plane work-item name at task
# creation; falls back to the work_item_id when absent. Idempotent ALTER.
_ensure_column(conn, "tasks", "title", "TEXT")
# Telegram live tracker: "BRD review" is the only HUMAN gate time — the delta
# between "BRD ready / approve requested" and the analysis->architecture
# advance (human flipped Plane to Approved). Persisted on the task so the
# tracker can show "твоё время" without recomputing from activity history.
_ensure_column(conn, "tasks", "brd_review_started_at", "TEXT")
_ensure_column(conn, "tasks", "brd_review_ended_at", "TEXT")
conn.commit()
conn.close()
def _ensure_column(conn, table: str, column: str, decl: str):
"""Add a column to `table` if it does not already exist (idempotent migration)."""
cols = [r[1] for r in conn.execute(f"PRAGMA table_info({table})").fetchall()]
if column not in cols:
conn.execute(f"ALTER TABLE {table} ADD COLUMN {column} {decl}")
conn.commit()
def get_task_by_plane_id(plane_id: str) -> dict | None:
"""Find task by Plane work item ID (checks plane_id and plane_issue_id)."""
conn = get_db()
@@ -79,21 +156,408 @@ def update_task_stage(task_id: int, stage: str):
conn.close()
def get_next_work_item_id(repo: str) -> str:
"""Generate next work item ID (e.g., ET-003)."""
# ---------------------------------------------------------------------------
# Telegram live tracker helpers (feat/telegram-live-tracker)
# ---------------------------------------------------------------------------
def get_tracker_message_id(task_id: int) -> int | None:
"""Return the stored Telegram tracker message_id for a task, or None."""
conn = get_db()
try:
row = conn.execute(
"SELECT tracker_message_id FROM tasks WHERE id=?", (task_id,)
).fetchone()
finally:
conn.close()
return row[0] if row and row[0] is not None else None
def set_tracker_message_id(task_id: int, message_id: int) -> None:
"""Persist the Telegram tracker message_id for a task (idempotent overwrite)."""
conn = get_db()
try:
conn.execute(
"UPDATE tasks SET tracker_message_id=? WHERE id=?",
(message_id, task_id),
)
conn.commit()
finally:
conn.close()
def mark_brd_review_started(task_id: int) -> None:
"""Stamp when BRD review (the human approve gate) started, if not already set.
Idempotent: only sets it the first time (a retried analyst run must not reset
the clock). The delta to brd_review_ended_at is the only "твоё время".
"""
conn = get_db()
try:
conn.execute(
"UPDATE tasks SET brd_review_started_at=datetime('now') "
"WHERE id=? AND brd_review_started_at IS NULL",
(task_id,),
)
conn.commit()
finally:
conn.close()
def mark_brd_review_ended(task_id: int) -> None:
"""Stamp when BRD review ended (analysis->architecture advance / Approved).
Idempotent: only sets it the first time and only if a start exists.
"""
conn = get_db()
try:
conn.execute(
"UPDATE tasks SET brd_review_ended_at=datetime('now') "
"WHERE id=? AND brd_review_started_at IS NOT NULL "
"AND brd_review_ended_at IS NULL",
(task_id,),
)
conn.commit()
finally:
conn.close()
def get_next_work_item_id(repo: str, prefix: str = "ET") -> str:
"""Generate next work item ID (e.g., ET-003 / ORCH-001).
ORCH-6: numbering is per (repo, prefix). The prefix comes from the project
registry (proj.work_item_prefix), so orchestrator issues number ORCH-001,
ORCH-002 independently of the ET sequence in enduro-trails. Default prefix
stays "ET" for backward compatibility with existing callers.
"""
conn = get_db()
row = conn.execute(
"SELECT work_item_id FROM tasks WHERE repo = ? AND work_item_id IS NOT NULL ORDER BY id DESC LIMIT 1",
(repo,),
"SELECT work_item_id FROM tasks "
"WHERE repo = ? AND work_item_id LIKE ? AND work_item_id IS NOT NULL "
"ORDER BY id DESC LIMIT 1",
(repo, f"{prefix}-%"),
).fetchone()
conn.close()
if row and row["work_item_id"]:
# Parse ET-003 -> 3, increment
prefix, num = row["work_item_id"].rsplit("-", 1)
# Parse <PREFIX>-003 -> 3, increment (keep the existing prefix).
existing_prefix, num = row["work_item_id"].rsplit("-", 1)
prefix = existing_prefix
next_num = int(num) + 1
else:
prefix = "ET"
next_num = 1
return f"{prefix}-{next_num:03d}"
def ensure_unique_work_item_id(work_item_id: str, repo: str) -> str:
"""BUG 2a: guarantee work_item_id uniqueness within (repo) over M-6 derive.
M-6 derives the work_item_id from the Plane sequence_id. That number can
collide (e.g. an issue was deleted and the sequence reused, or two issues
map to the same number) -> the SAME ET-NNN gets handed to two different
tasks, which then physically share a branch/worktree slug prefix and step on
each other (see ET-006: task 8 and task 25).
This is a guard LAYERED ON TOP of the M-6 derive (it does NOT replace it):
given the derived id, if that exact <PREFIX>-NNN already exists in the tasks
table for this repo, walk forward (ET-007, ET-008, ...) until a free number
is found and return that instead. If the derived id is free, it is returned
unchanged.
"""
if not work_item_id or "-" not in work_item_id:
return work_item_id
prefix, num_str = work_item_id.rsplit("-", 1)
try:
num = int(num_str)
except ValueError:
return work_item_id
width = len(num_str)
conn = get_db()
try:
candidate = work_item_id
while conn.execute(
"SELECT 1 FROM tasks WHERE repo = ? AND work_item_id = ? LIMIT 1",
(repo, candidate),
).fetchone() is not None:
num += 1
candidate = f"{prefix}-{num:0{width}d}"
return candidate
finally:
conn.close()
# ---------------------------------------------------------------------------
# ORCH-5 (M-7): idempotent webhook event logging
# ---------------------------------------------------------------------------
def insert_event_dedup(
source: str, event_type: str, payload: str, delivery_id: str
) -> bool:
"""Idempotently log a webhook event keyed by delivery_id.
Returns True if a NEW row was inserted (caller should dispatch the event) and
False if this delivery_id was already present (a duplicate delivery -> caller
must skip dispatch/enqueue). Uses INSERT OR IGNORE against the partial UNIQUE
index idx_events_delivery; rowcount==1 means the row was actually inserted.
"""
conn = get_db()
try:
cur = conn.execute(
"INSERT OR IGNORE INTO events (source, event_type, payload, delivery_id) "
"VALUES (?, ?, ?, ?)",
(source, event_type, payload, delivery_id),
)
conn.commit()
return cur.rowcount == 1
finally:
conn.close()
# ---------------------------------------------------------------------------
# ORCH-1 (F-2b): job queue helpers
# ---------------------------------------------------------------------------
def enqueue_job(
agent: str,
repo: str,
task_content: str | None = None,
task_id: int | None = None,
max_attempts: int = 2,
) -> int:
"""Enqueue a new job (status='queued'). Returns the new job id.
This is what webhook handlers call instead of launching an agent in-process:
it is a fast DB INSERT that returns immediately. The background worker
(queue_worker) picks the job up later.
"""
conn = get_db()
cursor = conn.execute(
"INSERT INTO jobs (agent, repo, task_id, task_content, max_attempts) "
"VALUES (?, ?, ?, ?, ?)",
(agent, repo, task_id, task_content, max_attempts),
)
job_id = cursor.lastrowid
conn.commit()
conn.close()
return job_id
def claim_next_job() -> dict | None:
"""Atomically claim the oldest queued job and mark it 'running'.
Atomicity: the UPDATE carries the `status='queued'` guard in its WHERE clause
and we check `rowcount`. If two worker ticks race for the same row, only the
first UPDATE flips it to 'running' (rowcount==1); the loser sees rowcount==0
and retries the SELECT. We rely on SQLite's default per-connection transaction
so the SELECT+UPDATE pair is consistent. Returns the claimed job dict or None
when the queue is empty.
"""
conn = get_db()
try:
while True:
row = conn.execute(
"SELECT id FROM jobs WHERE status='queued' "
"AND (available_at IS NULL OR available_at <= datetime('now')) "
"ORDER BY id LIMIT 1"
).fetchone()
if not row:
return None
job_id = row["id"]
cur = conn.execute(
"UPDATE jobs SET status='running', "
"attempts = attempts + 1, started_at = datetime('now') "
"WHERE id = ? AND status='queued'",
(job_id,),
)
conn.commit()
if cur.rowcount == 1:
claimed = conn.execute(
"SELECT * FROM jobs WHERE id = ?", (job_id,)
).fetchone()
return dict(claimed)
# Lost the race for this row; loop and try the next queued job.
finally:
conn.close()
def mark_job_transient(job_id: int, available_at_sql_offset_seconds: int,
error: str | None = None) -> None:
"""ORCH-1 resilience: requeue a job after a *transient* failure (429/overload/net).
Increments `transient_attempts` (separate from the code-fault `attempts`),
sets status back to 'queued', and gates re-pickup via `available_at` =
now + backoff seconds. started_at/finished_at are cleared.
"""
conn = get_db()
sets = [
"status='queued'",
"transient_attempts = transient_attempts + 1",
"available_at = datetime('now', ?)",
"started_at = NULL",
"finished_at = NULL",
]
params: list = [f"+{int(available_at_sql_offset_seconds)} seconds"]
if error is not None:
sets.append("error = ?")
params.append(error)
params.append(job_id)
conn.execute(f"UPDATE jobs SET {', '.join(sets)} WHERE id = ?", params)
conn.commit()
conn.close()
def mark_job(
job_id: int,
status: str,
run_id: int | None = None,
error: str | None = None,
):
"""Update a job's status (queued|running|done|failed).
- run_id (optional): link to the agent_runs row that executed this job.
- error (optional): last error message (for failed/retry).
- 'done'/'failed' also stamp finished_at.
- 'queued' (requeue for retry) clears started_at/finished_at so the next
claim treats it as fresh.
"""
conn = get_db()
sets = ["status = ?"]
params: list = [status]
if run_id is not None:
sets.append("run_id = ?")
params.append(run_id)
if error is not None:
sets.append("error = ?")
params.append(error)
if status in ("done", "failed"):
sets.append("finished_at = datetime('now')")
elif status == "queued":
sets.append("started_at = NULL")
sets.append("finished_at = NULL")
params.append(job_id)
conn.execute(f"UPDATE jobs SET {', '.join(sets)} WHERE id = ?", params)
conn.commit()
conn.close()
def has_active_job_for_task(task_id: int) -> bool:
"""True if the task already has a queued or running job.
Used by the status-only verdict model (handle_status_start) to guard against
double-launching an agent when a duplicate In Progress webhook arrives or a
job is still in flight. The events de-dup absorbs identical webhook bodies;
this guards against distinct webhooks while a job is pending/running.
"""
conn = get_db()
row = conn.execute(
"SELECT 1 FROM jobs WHERE task_id = ? AND status IN ('queued','running') LIMIT 1",
(task_id,),
).fetchone()
conn.close()
return row is not None
def count_running_jobs() -> int:
"""Number of jobs currently in 'running' status (for max_concurrency)."""
conn = get_db()
n = conn.execute(
"SELECT COUNT(*) FROM jobs WHERE status='running'"
).fetchone()[0]
conn.close()
return int(n)
def requeue_running_jobs() -> int:
"""Queue-recovery: on startup, any job left 'running' belongs to a worker that
died on restart -> put it back to 'queued'. attempts are kept as-is (the next
claim does NOT re-increment beyond what is needed; claim_next_job increments on
pickup). Returns the number of requeued jobs.
"""
conn = get_db()
cur = conn.execute(
"UPDATE jobs SET status='queued', started_at = NULL "
"WHERE status='running'"
)
conn.commit()
n = cur.rowcount
conn.close()
return int(n)
def get_job(job_id: int) -> dict | None:
"""Fetch a single job by id."""
conn = get_db()
row = conn.execute("SELECT * FROM jobs WHERE id = ?", (job_id,)).fetchone()
conn.close()
return dict(row) if row else None
def job_status_counts() -> dict:
"""Return counts grouped by status (for /queue observability)."""
conn = get_db()
rows = conn.execute(
"SELECT status, COUNT(*) AS n FROM jobs GROUP BY status"
).fetchall()
conn.close()
counts = {"queued": 0, "running": 0, "done": 0, "failed": 0}
for r in rows:
counts[r["status"]] = r["n"]
return counts
def recent_jobs(limit: int = 10) -> list[dict]:
"""Return the most recent jobs (for /queue observability)."""
conn = get_db()
rows = conn.execute(
"SELECT * FROM jobs ORDER BY id DESC LIMIT ?", (limit,)
).fetchall()
conn.close()
return [dict(r) for r in rows]
# ---------------------------------------------------------------------------
# ORCH-1b (resilience): transient backoff helpers
# ---------------------------------------------------------------------------
def requeue_job_transient(job_id: int, delay_seconds: float, error: str | None = None):
"""ORCH-1b: requeue a job after a TRANSIENT (429/overload/network) failure.
Unlike a code-fault requeue, this:
- increments `transient_attempts` (a separate budget from code-fault attempts)
- sets `available_at = now + delay_seconds` so claim_next_job won't pick it
up until the backoff window elapses
- sets status back to 'queued' and clears started_at/finished_at
delay_seconds is computed by the caller (exp backoff, capped, Retry-After).
"""
conn = get_db()
conn.execute(
"UPDATE jobs SET status='queued', "
"transient_attempts = transient_attempts + 1, "
"available_at = datetime('now', ? || ' seconds'), "
"started_at = NULL, finished_at = NULL, "
"error = COALESCE(?, error) "
"WHERE id = ?",
(f"+{int(round(delay_seconds))}", error, job_id),
)
conn.commit()
conn.close()
def compute_backoff(transient_attempts: int, retry_after: float | None = None) -> float:
"""ORCH-1b: exponential backoff (seconds) for a transient failure.
delay = min(2**transient_attempts * base, max). If the server sent a
Retry-After hint we honour it as a floor (use the larger of the two so we
never poll sooner than the server asked).
`transient_attempts` is the count AFTER this failure (i.e. how many transient
failures have occurred), so the first backoff uses 2**1.
"""
base = getattr(settings, "backoff_base_seconds", 10)
cap = getattr(settings, "backoff_max_seconds", 600)
exp = min((2 ** max(transient_attempts, 0)) * base, cap)
if retry_after is not None and retry_after > 0:
return float(min(max(exp, retry_after), cap))
return float(exp)

87
src/error_classifier.py Normal file
View File

@@ -0,0 +1,87 @@
"""ORCH-1 resilience: classify an agent failure as transient vs permanent.
Rate limits / overload / network blips cannot be reliably predicted in advance,
so we classify *after the run* by scanning the agent's combined stdout/stderr log
(B-2 sends both to /app/data/runs/<run_id>.log).
- transient -> 429 / rate limit / overloaded / network / quota-exhausted etc.
=> backoff + transient retry (separate counter, larger budget).
- permanent -> a genuine code fault / agent error
=> normal attempts < max_attempts, then 'failed'.
Also extracts a Retry-After hint (seconds) when the server provided one.
"""
import re
# Case-insensitive substrings/patterns that signal a transient/rate-limit issue.
_TRANSIENT_PATTERNS = [
r"\b429\b",
r"rate[\s_-]*limit",
r"rate_limit_error",
r"overloaded",
r"overloaded_error",
r"too many requests",
r"quota",
r"insufficient[_\s-]*quota",
r"retry[\s-]*after",
r"service unavailable",
r"\b503\b",
r"\b529\b",
r"timed out",
r"timeout",
r"connection (reset|refused|error|aborted)",
r"temporarily unavailable",
r"econnreset",
r"etimedout",
]
_TRANSIENT_RE = re.compile("|".join(_TRANSIENT_PATTERNS), re.IGNORECASE)
# Retry-After: header style ("Retry-After: 30") or JSON ("retry_after": 30) or
# "retry after 30 seconds". Returns the integer seconds.
_RETRY_AFTER_RE = re.compile(
r"retry[\s_-]*after[\"']?\s*[:=]?\s*[\"']?\s*(\d+)",
re.IGNORECASE,
)
def classify_text(text: str) -> str:
"""Return 'transient' or 'permanent' for a chunk of log/stderr text."""
if not text:
return "permanent"
return "transient" if _TRANSIENT_RE.search(text) else "permanent"
def parse_retry_after(text: str) -> int | None:
"""Return Retry-After seconds if present in the text, else None."""
if not text:
return None
m = _RETRY_AFTER_RE.search(text)
if m:
try:
return int(m.group(1))
except (TypeError, ValueError):
return None
return None
def classify_log_file(path: str, tail_bytes: int = 16384) -> tuple[str, int | None]:
"""Classify the tail of a log file.
Reads the last `tail_bytes` of the log (rate-limit messages appear near the
end) and returns (classification, retry_after_seconds_or_None).
On any read error, treats it as 'permanent' (no special backoff).
"""
if not path:
return "permanent", None
try:
with open(path, "rb") as f:
try:
f.seek(-tail_bytes, 2)
except OSError:
f.seek(0)
data = f.read()
text = data.decode("utf-8", errors="replace")
except Exception:
return "permanent", None
return classify_text(text), parse_retry_after(text)

View File

@@ -51,7 +51,41 @@ async def lifespan(app: FastAPI):
except Exception:
pass
log.warning(f"Recovered {len(orphan_rows)} orphaned agent runs")
yield
# ORCH-1 (F-2b): queue-recovery. Any job left in 'running' status belongs to a
# worker that died on the previous restart -> put it back to 'queued' so the
# worker re-picks it up (restart-safe, no lost work). Runs AFTER M-1.
from .db import requeue_running_jobs
requeued = requeue_running_jobs()
if requeued:
log.warning(f"Queue-recovery: requeued {requeued} running job(s) after restart")
# L-2: rotate old per-run logs at startup (best-effort; never fatal).
try:
import os as _os
from .config import settings as _settings
from .agents.launcher import prune_run_logs
_runs_dir = _os.path.join(_os.path.dirname(_settings.db_path), "runs")
_removed = prune_run_logs(
_runs_dir,
keep_days=_settings.log_keep_days,
keep_max=_settings.log_keep_max,
)
if _removed:
log.info(f"Log rotation: pruned {_removed} old run log(s) from {_runs_dir}")
except Exception as e:
log.warning(f"Log rotation skipped: {e}")
# Start the background job-queue worker (ORCH-1).
from .queue_worker import worker
worker.start()
try:
yield
finally:
# Graceful shutdown of the worker (running agents keep going; their jobs
# are requeued on next start via queue-recovery if the process dies).
worker.stop()
app = FastAPI(title="Multi-Agent Orchestrator", lifespan=lifespan)
@@ -73,3 +107,17 @@ async def status():
).fetchall()
conn.close()
return {"active_tasks": [dict(t) for t in tasks]}
@app.get("/queue")
async def queue():
"""ORCH-1: job-queue observability — status counts + recent jobs."""
from .db import job_status_counts, recent_jobs
from .queue_worker import worker
return {
"counts": job_status_counts(),
"max_concurrency": worker.max_concurrency,
"poll_interval": worker.poll_interval,
"resilience": worker.status(),
"recent": recent_jobs(10),
}

View File

@@ -1,6 +1,24 @@
"""Notifications and logging for orchestrator events."""
"""Notifications and logging for orchestrator events.
feat/telegram-live-tracker (Variant B+): instead of ~15 separate Telegram
messages per task (agent start / finish / stage transition / QG-pending / tech
noise), the orchestrator now maintains ONE live tracker message per task that is
edited in place (editMessageText) on every stage transition. Only events that
NEED Slava's attention are sent as SEPARATE, notifying messages:
* approve-gate (notify_approve_requested) — BRD/TZ/AC ready, flip to Approved
* deploy failed / rolled back — send_telegram from launcher/engine
* agent failed (exit_code != 0) — send_telegram from launcher
* task error (notify_error)
The tracker itself is edited SILENTLY (disable_notification: true). Stage-change,
agent-start, agent-finish and QG-pending no longer emit their own messages — they
just refresh the tracker (or are log-only).
"""
import html
import logging
import httpx
logger = logging.getLogger("orchestrator")
@@ -17,25 +35,115 @@ def _get_settings():
return _settings
def send_telegram(text: str):
"""Send notification to Telegram. Fire-and-forget, never raises."""
# --------------------------------------------------------------------------- #
# Low-level Telegram primitives
# --------------------------------------------------------------------------- #
def send_telegram(text: str, disable_notification: bool = False):
"""Send a notification to Telegram. Fire-and-forget, never raises.
Returns the Telegram message_id on success, else None (so callers that want
to track the message — the tracker — can store it; legacy callers ignore it).
"""
s = _get_settings()
if not s.telegram_bot_token or not s.telegram_chat_id:
return
return None
try:
url = f"https://api.telegram.org/bot{s.telegram_bot_token}/sendMessage"
httpx.post(
resp = httpx.post(
url,
json={
"chat_id": s.telegram_chat_id,
"text": text,
"parse_mode": "HTML",
"disable_notification": False,
"disable_notification": disable_notification,
},
timeout=5,
)
data = resp.json()
if data.get("ok"):
return data["result"]["message_id"]
except Exception:
pass # Never crash orchestrator due to notification failure
return None
# edit_telegram outcome codes -> let update_task_tracker decide what to do:
# "ok" edit applied -> nothing else to do
# "not_modified" Telegram says text is identical (400 "message is not
# modified" / "exactly the same") -> success, NO new message
# "gone" original message can't be edited (deleted / too old /
# invalid id) -> caller must fall back to a NEW message
# "failed" transient failure (network / timeout / 5xx / unknown 400)
# -> caller must NOT send a new message (avoid duplicates)
EDIT_OK = "ok"
EDIT_NOT_MODIFIED = "not_modified"
EDIT_GONE = "gone"
EDIT_FAILED = "failed"
# Telegram error descriptions that mean the message is permanently un-editable
# (it is gone / orphaned) -> fall back to a fresh message.
_GONE_MARKERS = (
"message to edit not found",
"message can't be edited",
"message_id_invalid",
)
# Telegram "nothing changed" -> treat as success, never a duplicate.
_NOT_MODIFIED_MARKERS = (
"message is not modified",
"exactly the same",
)
def edit_telegram(message_id: int, text: str) -> str:
"""Edit an existing Telegram message. Never raises.
Returns a distinguishable outcome (see EDIT_* constants) so the caller can
tell apart "all good" / "nothing changed" / "message gone" / "transient
failure" and only fall back to a NEW message when the original is truly gone.
"""
s = _get_settings()
if not s.telegram_bot_token or not s.telegram_chat_id:
return EDIT_FAILED
try:
url = f"https://api.telegram.org/bot{s.telegram_bot_token}/editMessageText"
resp = httpx.post(
url,
json={
"chat_id": s.telegram_chat_id,
"message_id": message_id,
"text": text,
"parse_mode": "HTML",
},
timeout=5,
)
data = resp.json()
if data.get("ok"):
return EDIT_OK
# ok:false -> inspect the description to classify the 400.
desc = str(data.get("description") or "").lower()
if any(m in desc for m in _NOT_MODIFIED_MARKERS):
# Text is identical between transitions (e.g. repeat review cycle
# renders the same line). Nothing to do, NOT a duplicate.
logger.debug(
f"edit_telegram(mid={message_id}): not modified, skipping"
)
return EDIT_NOT_MODIFIED
if any(m in desc for m in _GONE_MARKERS):
logger.warning(
f"edit_telegram(mid={message_id}): message gone ({desc!r}), "
f"will fall back to a new message"
)
return EDIT_GONE
# Unknown 400 / other non-ok -> transient/unknown, do NOT duplicate.
logger.warning(
f"edit_telegram(mid={message_id}): edit failed ({desc!r})"
)
return EDIT_FAILED
except Exception as e:
# Network / timeout / 5xx -> transient, do NOT duplicate.
logger.warning(f"edit_telegram(mid={message_id}): transient error: {e}")
return EDIT_FAILED
def _get_work_item_id(task_id: int) -> str:
@@ -50,26 +158,355 @@ def _get_work_item_id(task_id: int) -> str:
return f"task-{task_id}"
# --------------------------------------------------------------------------- #
# Live task tracker
# --------------------------------------------------------------------------- #
# Pipeline stages shown in the tracker, in order, with their display label and
# the agent whose agent_runs rows describe that stage's work. "Ревью БРД" is NOT
# an agent stage — it is the human approve gate rendered between Analysis and
# Architecture from the task's brd_review_* timestamps.
_TRACKER_STAGES = [
("analysis", "Analysis", "analyst"),
("architecture", "Architecture", "architect"),
("development", "Development", "developer"),
("review", "Review", "reviewer"),
("testing", "Testing", "tester"),
("deploy", "Deploy", "deployer"),
]
# Map a pipeline stage -> the agent that is RUNNING while the task sits in it.
# (development is entered after architecture finishes, etc.) Used to render the
# "🔄 <Stage> … идёт" line for the currently-active stage.
_BRD_LABEL = "\u0420\u0435\u0432\u044c\u044e \u0411\u0420\u0414" # "Ревью БРД"
_STAGE_ACTIVE_AGENT = {
"analysis": "analyst",
"architecture": "architect",
"development": "developer",
"review": "reviewer",
"testing": "tester",
"deploy": "deployer",
}
def _fmt_minutes(seconds) -> str:
"""Render a duration in whole minutes: 0..59s -> '<1м', else '<n>м'."""
try:
seconds = int(seconds or 0)
except (TypeError, ValueError):
seconds = 0
if seconds <= 0:
return ""
if seconds < 60:
return "<1м"
return f"{seconds // 60}\u043c"
def _parse_sql_ts(ts):
"""Parse a SQLite 'YYYY-MM-DD HH:MM:SS' UTC timestamp -> aware datetime/None."""
if not ts:
return None
from datetime import datetime, timezone
for fmt in ("%Y-%m-%d %H:%M:%S", "%Y-%m-%dT%H:%M:%S"):
try:
return datetime.strptime(str(ts)[:19], fmt).replace(tzinfo=timezone.utc)
except (ValueError, TypeError):
continue
return None
def _duration_seconds(started, finished):
"""Seconds between two SQL timestamps; None if either is missing/unparseable."""
a = _parse_sql_ts(started)
b = _parse_sql_ts(finished)
if a is None or b is None:
return None
return max(int((b - a).total_seconds()), 0)
def render_task_tracker(task_id: int) -> str:
"""Build the full live-tracker text for a task from the DB (stateless render).
Pulls the task header (work_item_id, title, stage), every agent_runs row, and
the BRD-review timestamps, then renders:
- one '✅ <Stage> <dur> · <in>↓/<out>↑ · <cost> · <model>' line per finished
stage (latest run per stage),
- the '⏸️ Ревью БРД <dur> · твоё время[ ⏳]' line between Analysis/Architecture,
- a '🔄 <Stage> … идёт' line for the active (in-progress) stage,
- the '💰 <in>↓ / <out>↑ · <cost>' totals,
- on done: '⏱️ Всего .. · агенты .. · твоё ..' and a '🔗 PR / 📦' line.
Never raises (returns a minimal fallback string on error).
"""
from .db import get_db
from .usage import fmt_tokens, fmt_cost, _input_total, short_model_name
try:
conn = get_db()
task = conn.execute(
"SELECT id, work_item_id, title, stage, created_at, updated_at, "
"brd_review_started_at, brd_review_ended_at "
"FROM tasks WHERE id=?",
(task_id,),
).fetchone()
if not task:
conn.close()
return f"task-{task_id}"
runs = conn.execute(
"SELECT agent, started_at, finished_at, exit_code, input_tokens, "
"output_tokens, cache_read_tokens, cache_creation_tokens, cost_usd, model "
"FROM agent_runs WHERE task_id=? ORDER BY id ASC",
(task_id,),
).fetchall()
conn.close()
except Exception as e:
logger.warning(f"render_task_tracker({task_id}) DB error: {e}")
return f"task-{task_id}"
work_item_id = task["work_item_id"] or f"task-{task_id}"
title = task["title"] or work_item_id
stage = task["stage"] or "created"
done = stage == "done"
# Latest completed run per agent (a stage may have multiple runs on retry;
# we show the most recent FINISHED, successful run for the stage line).
last_done = {}
agent_runs_by_agent = {}
for r in runs:
agent_runs_by_agent.setdefault(r["agent"], []).append(r)
if r["finished_at"] and (r["exit_code"] == 0 or r["exit_code"] is None):
last_done[r["agent"]] = r
# Totals across ALL runs (every input/output token + cost counts).
total_in = 0
total_out = 0
total_cost = 0.0
agent_seconds = 0
for r in runs:
usage = {
"input_tokens": r["input_tokens"],
"cache_read_tokens": r["cache_read_tokens"],
"cache_creation_tokens": r["cache_creation_tokens"],
}
total_in += _input_total(usage)
total_out += int(r["output_tokens"] or 0)
total_cost += float(r["cost_usd"] or 0.0)
d = _duration_seconds(r["started_at"], r["finished_at"])
if d is not None:
agent_seconds += d
esc_title = html.escape(title)
header = (
f"\U0001f389 {html.escape(work_item_id)} \u00b7 {esc_title} \u2014 \u0413\u041e\u0422\u041e\u0412\u041e"
if done
else f"\U0001f6e0\ufe0f {html.escape(work_item_id)} \u00b7 {esc_title}"
)
bar = "\u2501" * 22
lines = [header, bar]
def _stage_line(label, run):
usage = {
"input_tokens": run["input_tokens"],
"cache_read_tokens": run["cache_read_tokens"],
"cache_creation_tokens": run["cache_creation_tokens"],
}
in_tok = fmt_tokens(_input_total(usage))
out_tok = fmt_tokens(run["output_tokens"])
cost = fmt_cost(run["cost_usd"])
dur = _fmt_minutes(_duration_seconds(run["started_at"], run["finished_at"]))
model = short_model_name(run["model"])
model_suffix = f" \u00b7 {model}" if model else ""
return (
f"\u2705 {label:<13} {dur} \u00b7 "
f"{in_tok}\u2193/{out_tok}\u2191 \u00b7 {cost}{model_suffix}"
)
# BRD review line: between Analysis and Architecture, only once Analysis has
# produced a run (i.e. the gate is live). Time = human review delta.
brd_started = task["brd_review_started_at"]
brd_ended = task["brd_review_ended_at"]
review_seconds = _duration_seconds(brd_started, brd_ended)
for stage_key, label, agent in _TRACKER_STAGES:
run = last_done.get(agent)
# The stage is "in progress" only when it is the task's current stage AND
# there is an unfinished run for its agent (the agent is actually still
# working). A finished run with no in-flight run -> show the \u2705 result,
# even if the task still sits in that stage (just-finished snapshot).
agent_runs = agent_runs_by_agent.get(agent, [])
has_inflight = any(ar["finished_at"] is None for ar in agent_runs)
is_active_stage = (
_STAGE_ACTIVE_AGENT.get(stage) == agent
and stage == stage_key
and (has_inflight or run is None)
)
if is_active_stage:
# Live "\U0001f504 ... \u0438\u0434\u0451\u0442" line. Count how many times THIS stage's
# agent has run for this task; a 2nd+ run means we're re-doing the
# stage (e.g. review->development->review), so show "\u043f\u043e\u043f\u044b\u0442\u043a\u0430 N"
# to make the text change between cycles and to honestly show Slava
# the stage is being re-worked.
attempt = len(agent_runs)
if attempt >= 2:
lines.append(
f"\U0001f504 {label} \u00b7 \u043f\u043e\u043f\u044b\u0442\u043a\u0430 {attempt} "
f"\u2026 \u0438\u0434\u0451\u0442"
)
else:
lines.append(
f"\U0001f504 {label:<13} \u2026 \u00b7 \u0438\u0434\u0451\u0442"
)
elif run is not None:
lines.append(_stage_line(label, run))
# else: not started yet -> not shown.
# Insert the BRD review line right after Analysis.
if stage_key == "analysis" and brd_started:
brd_label = f"{_BRD_LABEL:<13}"
if review_seconds is not None:
dur = _fmt_minutes(review_seconds)
lines.append(
f"\u23f8\ufe0f {brd_label} {dur} \u00b7 \u0442\u0432\u043e\u0451 \u0432\u0440\u0435\u043c\u044f"
)
else:
# Still waiting on the human (ended not stamped yet).
from datetime import datetime, timezone
start_dt = _parse_sql_ts(brd_started)
waited = None
if start_dt is not None:
waited = int(
(datetime.now(timezone.utc) - start_dt).total_seconds()
)
dur = _fmt_minutes(waited) if waited is not None else "\u2026"
lines.append(
f"\u23f8\ufe0f {brd_label} {dur} \u00b7 \u0442\u0432\u043e\u0451 \u0432\u0440\u0435\u043c\u044f \u23f3"
)
lines.append(bar)
lines.append(
f"\U0001f4b0 {fmt_tokens(total_in)}\u2193 / {fmt_tokens(total_out)}\u2191 \u00b7 "
f"{fmt_cost(total_cost)}"
)
if done:
wall = _duration_seconds(task["created_at"], task["updated_at"])
wall_str = _fmt_minutes(wall) if wall is not None else "?"
review_str = _fmt_minutes(review_seconds) if review_seconds else ""
lines.append(
f"\u23f1\ufe0f \u0412\u0441\u0435\u0433\u043e {wall_str} \u00b7 "
f"\u0430\u0433\u0435\u043d\u0442\u044b {_fmt_minutes(agent_seconds)} \u00b7 "
f"\u0442\u0432\u043e\u0451 {review_str}"
)
link = _done_link(task_id, task["work_item_id"])
if link:
lines.append(link)
return "\n".join(lines)
def _done_link(task_id: int, work_item_id) -> str | None:
"""Build the final '🔗 PR #n · 📦 deployed' line. Never raises -> None."""
try:
from .config import settings
from .db import get_db
conn = get_db()
row = conn.execute(
"SELECT repo, branch FROM tasks WHERE id=?", (task_id,)
).fetchone()
conn.close()
if not row:
return None
repo, branch = row["repo"], row["branch"]
pr_part = None
try:
owner = settings.gitea_owner
headers = {"Authorization": f"token {settings.gitea_token}"}
resp = httpx.get(
f"{settings.gitea_url}/api/v1/repos/{owner}/{repo}/pulls",
params={"state": "all", "head": branch},
headers=headers, timeout=5,
)
if resp.status_code == 200:
prs = resp.json()
if prs:
pr_part = f"\U0001f517 PR #{prs[0].get('number')}"
except Exception:
pr_part = None
parts = []
if pr_part:
parts.append(pr_part)
parts.append("\U0001f4e6 deployed")
return " \u00b7 ".join(parts)
except Exception:
return None
def update_task_tracker(task_id: int):
"""Render + push the live tracker for a task. Never raises.
First call (no stored tracker_message_id): sendMessage (silent) and store the
returned message_id. Subsequent calls: editMessageText the stored message.
A NEW message is sent ONLY when the original is truly gone (deleted / too old
/ invalid id). On "not modified" (text unchanged) or transient failures
(network / timeout / 5xx / unknown 400) we do NOT send a new message — that
is exactly what produced duplicate trackers and orphaned (lagging) messages.
The tracker is always sent with disable_notification so it never pings —
only the dedicated alert helpers ping.
"""
try:
from .db import get_tracker_message_id, set_tracker_message_id
text = render_task_tracker(task_id)
mid = get_tracker_message_id(task_id)
if mid is not None:
result = edit_telegram(mid, text)
if result in (EDIT_OK, EDIT_NOT_MODIFIED):
# Edited in place (or nothing to change) -> done, no duplicate.
return
if result == EDIT_FAILED:
# Transient -> don't duplicate; tracker redraws next transition.
logger.debug(
f"update_task_tracker({task_id}): edit failed transiently, "
f"keeping message {mid}"
)
return
# result == EDIT_GONE -> the stored message is gone; fall through
# to send a fresh one and re-point tracker_message_id at it.
new_mid = send_telegram(text, disable_notification=True)
if new_mid is not None:
set_tracker_message_id(task_id, new_mid)
except Exception as e:
logger.warning(f"update_task_tracker({task_id}) failed: {e}")
# --------------------------------------------------------------------------- #
# Stage / agent lifecycle notifications (now tracker-only, no separate message)
# --------------------------------------------------------------------------- #
def notify_stage_change(task_id: int, old_stage: str, new_stage: str, agent: str = None):
"""Log and notify stage transition."""
"""Log a stage transition and refresh the live tracker (no separate message)."""
work_item_id = _get_work_item_id(task_id)
msg = f"\U0001f504 {work_item_id}: {old_stage} \u2192 {new_stage}"
if agent:
msg += f" (\u0437\u0430\u043f\u0443\u0449\u0435\u043d {agent})"
logger.info(msg)
send_telegram(msg)
update_task_tracker(task_id)
def notify_agent_started(run_id: int, agent: str, task_id: int):
"""Notify agent launch."""
"""Log an agent launch and refresh the tracker (no separate message)."""
work_item_id = _get_work_item_id(task_id)
msg = f"\U0001f680 {work_item_id}: {agent} \u0437\u0430\u043f\u0443\u0449\u0435\u043d (run_id={run_id})"
logger.info(msg)
send_telegram(msg)
logger.info(f"\U0001f680 {work_item_id}: {agent} \u0437\u0430\u043f\u0443\u0449\u0435\u043d (run_id={run_id})")
if task_id:
update_task_tracker(task_id)
def notify_agent_finished(run_id: int, agent: str, exit_code: int, task_id: int = None, duration_s: int = None):
"""Notify agent completion."""
"""Log agent completion and refresh the tracker (no separate message).
The agent-FAILED alert (exit_code != 0) is still sent separately by the
launcher via send_telegram; this helper itself only logs + refreshes.
"""
work_item_id = _get_work_item_id(task_id) if task_id else "?"
if exit_code == 0:
dur = f" ({duration_s // 60} \u043c\u0438\u043d)" if duration_s else ""
@@ -79,47 +516,66 @@ def notify_agent_finished(run_id: int, agent: str, exit_code: int, task_id: int
else:
msg = f"\u274c {work_item_id}: {agent} \u0443\u043f\u0430\u043b (exit_code={exit_code})"
logger.info(msg)
send_telegram(msg)
if task_id:
update_task_tracker(task_id)
def notify_qg_result(task_id: int, check: str, passed: bool, reason: str = None):
"""Notify QG check result."""
"""Log a QG check result (NO separate Telegram message: QG-pending is noise).
Kept for callers; QG outcomes are log-only now and reflected by the tracker
through the resulting stage transition.
"""
work_item_id = _get_work_item_id(task_id)
if passed:
msg = f"\u2705 {work_item_id}: QG {check} \u2014 passed"
logger.info(f"\u2705 {work_item_id}: QG {check} \u2014 passed")
else:
msg = f"\u26a0\ufe0f {work_item_id}: QG {check} \u2014 failed: {reason}"
logger.info(msg)
send_telegram(msg)
logger.warning(f"\u26a0\ufe0f {work_item_id}: QG {check} \u2014 failed: {reason}")
def notify_qg_failure(task_id: int, stage: str, check: str, reason: str):
"""Log and notify QG check failure."""
"""Log a QG check failure (log-only).
QG-pending / QG-failed are NOT pinged as separate messages anymore (they are
not actionable for Slava). Real rollbacks/deploy-fails are alerted by their
own dedicated send_telegram calls in the engine/launcher.
"""
work_item_id = _get_work_item_id(task_id)
msg = f"\u26a0\ufe0f {work_item_id}: QG {check} \u2014 failed: {reason}"
logger.warning(msg)
send_telegram(msg)
logger.warning(f"\u26a0\ufe0f {work_item_id}: QG {check} \u2014 failed: {reason}")
def notify_approve_requested(task_id: int):
"""Notify that analyst requests :approved:."""
"""ALERT (separate, notifying): BRD/TZ/AC ready -> flip Plane to Approved.
Also starts the BRD-review clock and refreshes the tracker so the
'⏸️ Ревью БРД · твоё время ⏳' line appears.
"""
work_item_id = _get_work_item_id(task_id)
msg = f"\U0001f4cb {work_item_id}: BRD/\u0422\u0417/AC \u0433\u043e\u0442\u043e\u0432\u044b. \u0416\u0434\u0443 :approved: \u0432 Plane"
try:
from .db import mark_brd_review_started
mark_brd_review_started(task_id)
except Exception as e:
logger.warning(f"notify_approve_requested: brd clock start failed: {e}")
msg = (
f"\U0001f4cb {work_item_id}: BRD/\u0422\u0417/AC \u0433\u043e\u0442\u043e\u0432\u044b. "
f"\u041f\u0435\u0440\u0435\u0432\u0435\u0434\u0438\u0442\u0435 \u0437\u0430\u0434\u0430\u0447\u0443 \u0432 \u0441\u0442\u0430\u0442\u0443\u0441 Approved "
f"\u0432 Plane \u0434\u043b\u044f \u043f\u0440\u043e\u0434\u043e\u043b\u0436\u0435\u043d\u0438\u044f."
)
logger.info(msg)
send_telegram(msg)
update_task_tracker(task_id)
send_telegram(msg) # separate, notifying
def notify_done(task_id: int):
"""Notify task completion."""
"""Task completion: refresh the tracker to its final ГОТОВО form (no separate ping)."""
work_item_id = _get_work_item_id(task_id)
msg = f"\U0001f389 {work_item_id}: \u0437\u0430\u0434\u0430\u0447\u0430 \u0437\u0430\u0432\u0435\u0440\u0448\u0435\u043d\u0430!"
logger.info(msg)
send_telegram(msg)
logger.info(f"\U0001f389 {work_item_id}: \u0437\u0430\u0434\u0430\u0447\u0430 \u0437\u0430\u0432\u0435\u0440\u0448\u0435\u043d\u0430!")
update_task_tracker(task_id)
def notify_error(task_id: int, error: str):
"""Log and notify error for a task."""
"""ALERT (separate, notifying): task error."""
work_item_id = _get_work_item_id(task_id) if task_id else "system"
msg = f"\U0001f534 {work_item_id}: ERROR \u2014 {error}"
logger.error(msg)
send_telegram(msg)
send_telegram(msg) # separate, notifying

View File

@@ -6,12 +6,90 @@ from .config import settings
logger = logging.getLogger("orchestrator.plane_sync")
# L-3: emoji literals used in Plane comment bodies, named for readability.
# Message text stays byte-for-byte identical to the previous output.
EMOJI_STAGE = "\U0001F504" # stage transition
EMOJI_QG_FAIL = "\u26A0\uFE0F" # quality-gate failure
EMOJI_DONE = "\u2705" # task completed
PLANE_BASE = f"{settings.plane_api_url}/api/v1"
PLANE_HEADERS = {"X-API-Key": settings.plane_api_token}
WORKSPACE = settings.plane_workspace_slug
# feat(plane): per-agent comment authorship.
# Map an agent role -> its dedicated Plane bot token (read from config / env).
# When the token is present, add_comment() POSTs under that bot so Plane shows
# the real author. Empty/unknown role -> fallback to the shared orchestrator
# token (PLANE_HEADERS), so commenting stays autonomous.
PLANE_BOT_TOKENS = {
"analyst": settings.plane_bot_analyst,
"architect": settings.plane_bot_architect,
"developer": settings.plane_bot_developer,
"reviewer": settings.plane_bot_reviewer,
"tester": settings.plane_bot_tester,
"deployer": settings.plane_bot_deployer,
"stream": settings.plane_bot_stream,
}
# Map a pipeline stage -> the agent role that owns work in that stage. Used to
# pick an author for rollback/stage notifications targeting a specific stage.
STAGE_AUTHORS = {
"analysis": "analyst",
"architecture": "architect",
"development": "developer",
"review": "reviewer",
"testing": "tester",
"deploy": "deployer",
}
def _headers_for(author: str | None) -> dict:
"""Return X-API-Key headers for the given agent role.
Falls back to the shared orchestrator token (PLANE_HEADERS /
settings.plane_api_token) when the role is None, unknown, or its bot token
is not configured. This keeps comment posting autonomous: a comment is
always written, just attributed to the orchestrator if no bot is set.
"""
tok = PLANE_BOT_TOKENS.get(author or "") if author else None
return {"X-API-Key": tok} if tok else PLANE_HEADERS
PROJECT_ID = settings.plane_project_id or "7a79f0a9-5278-49cd-9007-9a338f238f9c"
# Plane state IDs
def _resolve_project_id(work_item_id: str = None, project_id: str = None) -> str:
"""ORCH-6: resolve the Plane project id for a sync call.
Priority:
1. explicit project_id arg (caller already knows the project),
2. project derived from the task's repo in the DB (by work_item_id),
3. legacy default PROJECT_ID (enduro) for backward compatibility.
"""
if project_id:
return project_id
if work_item_id:
try:
from .db import get_db
from .projects import get_project_by_repo
conn = get_db()
row = conn.execute(
"SELECT repo FROM tasks WHERE work_item_id = ? ORDER BY id DESC LIMIT 1",
(work_item_id,),
).fetchone()
conn.close()
if row and row[0]:
proj = get_project_by_repo(row[0])
if proj:
return proj.plane_project_id
except Exception as e:
logger.debug(f"_resolve_project_id fallback for {work_item_id}: {e}")
return PROJECT_ID
# Plane state IDs.
# TODO(ORCH-10): these UUIDs are PER-PROJECT. The 6 stage-visibility / verdict
# statuses below were created only in the enduro project (7a79f0a9-...). One
# project is in prod today, so a single global dict is acceptable. When more
# projects are onboarded these must be resolved per project (see ORCH-10 in
# BACKLOG.md / the ORCH-6 project registry) — do NOT hardcode globally then.
PLANE_STATES = {
"backlog": "113b24f6-cce8-4be9-9a22-a359b9cf0122",
"todo": "2c7d3df3-9eb9-419b-92b7-d7d560bcdd10",
@@ -21,23 +99,143 @@ PLANE_STATES = {
"blocked": "6c4543f9-ac47-4ef7-ae0f-070020dc9920",
"done": "381a2833-3c4e-4be5-bd0f-be84cb946ad8",
"cancelled": "b1cae7f9-961d-4889-a179-f3acea697d17",
# Feature 3 (stage visibility) — per-stage statuses on the board.
"architecture": "3020bbb7-6122-4663-930c-0315ba8dfa3d",
"development": "9920609b-f140-4e46-ab95-89acda8412c8",
"review": "ba0d802c-5218-41d4-ab43-978b0ea123ed",
"testing": "7855d807-b1bf-42ef-8dae-6cde0df92d02",
# Feature 2 (verdict statuses) — Approved / Rejected.
"approved": "a519a341-dada-4a91-8910-7604f82b79c5",
"rejected": "ba958f3c-5db5-461d-8f82-89425e413b97",
}
# Map orchestrator stages to Plane states
# Feature 3: map an orchestrator stage -> the Plane status to show on the board
# when the pipeline ENTERS that stage. analysis stays driven by the existing
# in_progress/in_review/needs_input logic (no dedicated status). deploy keeps
# in_progress until done. Needs Input / In Review / Blocked remain higher
# priority and are set explicitly elsewhere — do NOT override them from here.
STAGE_VISIBILITY_STATE = {
"architecture": "architecture",
"development": "development",
"review": "review",
"testing": "testing",
}
# Map orchestrator stages to Plane states (used by update_issue_state /
# notify_stage_change). Feature 3: architecture/development/review/testing now
# point at their dedicated board statuses so the task physically moves across
# columns. analysis -> in_progress, deploy -> in_progress, done -> done.
STAGE_TO_STATE = {
"created": PLANE_STATES["todo"],
"analysis": PLANE_STATES["in_progress"],
"architecture": PLANE_STATES["in_progress"],
"development": PLANE_STATES["in_progress"],
"review": PLANE_STATES["in_progress"],
"testing": PLANE_STATES["in_progress"],
"architecture": PLANE_STATES["architecture"],
"development": PLANE_STATES["development"],
"review": PLANE_STATES["review"],
"testing": PLANE_STATES["testing"],
"deploy": PLANE_STATES["in_progress"],
"done": PLANE_STATES["done"],
}
def find_issue_id(work_item_id: str) -> str | None:
def fetch_issue_sequence_id(issue_id: str, project_id: str) -> int | None:
"""M-6: GET the Plane issue by UUID and return its sequence_id (the
authoritative per-project number), or None if unavailable.
Returns None on network error, non-2xx, or a missing field - never raises,
so the webhook handler can fall back to DB increment and stay autonomous.
"""
url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{project_id}/issues/{issue_id}/"
try:
resp = httpx.get(url, headers=PLANE_HEADERS, timeout=10)
resp.raise_for_status()
seq = resp.json().get("sequence_id")
return int(seq) if seq is not None else None
except Exception as e:
logger.warning(f"fetch_issue_sequence_id failed for {issue_id}: {e}")
return None
import re as _re
def _strip_html(html: str) -> str:
"""Crude HTML -> text: drop tags and collapse whitespace. Good enough to
feed QG-0's length check when Plane only gives us description_html."""
if not html:
return ""
text = _re.sub(r"<[^>]+>", " ", html)
return _re.sub(r"\s+", " ", text).strip()
def fetch_issue_description(issue_id: str, project_id: str) -> str:
"""BUG 1: GET the Plane issue by UUID and return its description text.
Plane's ``issue.updated`` webhook (e.g. a status change) only carries the
CHANGED fields, so ``description``/``description_stripped`` are usually
absent there. start_pipeline calls this to pull the full description from the
issue detail endpoint so QG-0 does not blow up on an empty payload field.
Reuses the exact GET issue detail endpoint / shared token already used by
``fetch_issue_sequence_id`` (same URL, same PLANE_HEADERS). Prefers
``description_stripped``; falls back to stripping ``description_html``.
Returns "" on network error, non-2xx, or a missing field - never raises, so
a Plane outage degrades to the honest "empty description" QG-0 path instead
of crashing the webhook.
"""
url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{project_id}/issues/{issue_id}/"
try:
resp = httpx.get(url, headers=PLANE_HEADERS, timeout=10)
resp.raise_for_status()
body = resp.json()
desc = body.get("description_stripped")
if desc and desc.strip():
return desc
return _strip_html(body.get("description_html") or "")
except Exception as e:
logger.warning(f"fetch_issue_description failed for {issue_id}: {e}")
return ""
def fetch_issue_fields(issue_id: str, project_id: str) -> tuple[str, str]:
"""BUG B: GET the Plane issue by UUID ONCE and return (name, description).
Plane's ``issue.updated`` webhook (e.g. a status change) only carries the
CHANGED fields, so BOTH ``name`` and ``description`` are usually absent in
the payload. start_pipeline needs the real title (for the branch slug) and
the real description (for the analyst .task.md). To avoid issuing two
separate issue-detail GETs (one for name, one for description), this single
request returns both.
Reuses the exact GET issue detail endpoint / shared token already used by
``fetch_issue_sequence_id`` / ``fetch_issue_description``. For the
description it applies the same logic as ``fetch_issue_description``
(prefer ``description_stripped``, fall back to stripping
``description_html``).
Returns ("", "") on network error, non-2xx, or missing body - never raises,
so a Plane outage degrades gracefully (caller keeps its payload fallbacks).
"""
url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{project_id}/issues/{issue_id}/"
try:
resp = httpx.get(url, headers=PLANE_HEADERS, timeout=10)
resp.raise_for_status()
body = resp.json()
name = (body.get("name") or "").strip()
desc = body.get("description_stripped")
if desc and desc.strip():
description = desc
else:
description = _strip_html(body.get("description_html") or "")
return name, description
except Exception as e:
logger.warning(f"fetch_issue_fields failed for {issue_id}: {e}")
return "", ""
def find_issue_id(work_item_id: str, project_id: str = None) -> str | None:
"""Find Plane issue UUID by work_item_id (e.g. 'ET-002')."""
project_id = _resolve_project_id(work_item_id, project_id)
# Primary: lookup from DB (plane_issue_id column)
try:
from .db import get_db
@@ -52,49 +250,51 @@ def find_issue_id(work_item_id: str) -> str | None:
logger.debug(f"DB lookup failed for {work_item_id}: {e}")
# Fallback: search via Plane API
url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{PROJECT_ID}/issues/"
url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{project_id}/issues/"
try:
# First try search by work_item_id
resp = httpx.get(url, headers=PLANE_HEADERS, params={"search": work_item_id}, timeout=10)
resp.raise_for_status()
data = resp.json()
results = data.get("results", data if isinstance(data, list) else [])
# M-6: match by sequence_id directly (the authoritative per-project
# number), parsed from the work_item_id suffix - no hardcoded prefix.
try:
target_num = int(work_item_id.rsplit("-", 1)[1])
except (IndexError, ValueError):
target_num = None
for issue in results:
seq = issue.get("sequence_id")
identifier = f"ET-{seq:03d}" if seq else ""
if identifier == work_item_id or work_item_id in issue.get("name", ""):
if target_num is not None and issue.get("sequence_id") == target_num:
return issue["id"]
# Fallback: get all issues and match by sequence_id number
if work_item_id.startswith("ET-"):
try:
target_num = int(work_item_id.split("-")[1])
except (IndexError, ValueError):
target_num = None
if target_num:
resp2 = httpx.get(url, headers=PLANE_HEADERS, timeout=10)
resp2.raise_for_status()
data2 = resp2.json()
results2 = data2.get("results", data2 if isinstance(data2, list) else [])
for issue in results2:
if issue.get("sequence_id") == target_num:
return issue["id"]
if work_item_id in issue.get("name", ""):
return issue["id"]
# Fallback: get all issues and match by sequence_id number (any prefix)
if target_num is not None:
resp2 = httpx.get(url, headers=PLANE_HEADERS, timeout=10)
resp2.raise_for_status()
data2 = resp2.json()
results2 = data2.get("results", data2 if isinstance(data2, list) else [])
for issue in results2:
if issue.get("sequence_id") == target_num:
return issue["id"]
except Exception as e:
logger.error(f"Failed to find issue for {work_item_id}: {e}")
return None
def update_issue_state(work_item_id: str, stage: str):
def update_issue_state(work_item_id: str, stage: str, project_id: str = None):
"""Update Plane issue state based on orchestrator stage."""
state_id = STAGE_TO_STATE.get(stage)
if not state_id:
return
issue_id = find_issue_id(work_item_id)
project_id = _resolve_project_id(work_item_id, project_id)
issue_id = find_issue_id(work_item_id, project_id)
if not issue_id:
logger.warning(f"Issue not found in Plane for {work_item_id}")
return
url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{PROJECT_ID}/issues/{issue_id}/"
url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{project_id}/issues/{issue_id}/"
try:
resp = httpx.patch(url, headers=PLANE_HEADERS, json={"state": state_id}, timeout=10)
resp.raise_for_status()
@@ -103,51 +303,85 @@ def update_issue_state(work_item_id: str, stage: str):
logger.error(f"Failed to update Plane state for {work_item_id}: {e}")
def add_comment(work_item_id: str, text: str):
"""Add a comment to Plane issue."""
issue_id = find_issue_id(work_item_id)
def add_comment(work_item_id: str, text: str, project_id: str = None, author: str = None):
"""Add a comment to a Plane issue.
feat(plane): when ``author`` (an agent role) maps to a configured bot
token, the comment is POSTed under that bot so Plane shows the real author.
Otherwise it falls back to the shared orchestrator token (see
``_headers_for``). GET/PATCH calls elsewhere keep using PLANE_HEADERS.
"""
project_id = _resolve_project_id(work_item_id, project_id)
issue_id = find_issue_id(work_item_id, project_id)
if not issue_id:
logger.warning(f"Issue not found in Plane for {work_item_id}, skipping comment")
return
url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{PROJECT_ID}/issues/{issue_id}/comments/"
url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{project_id}/issues/{issue_id}/comments/"
html = f"<p>{text}</p>"
try:
resp = httpx.post(url, headers=PLANE_HEADERS, json={"comment_html": html}, timeout=10)
resp = httpx.post(url, headers=_headers_for(author), json={"comment_html": html}, timeout=10)
resp.raise_for_status()
logger.info(f"Plane: comment added to {work_item_id}")
logger.info(f"Plane: comment added to {work_item_id} (author={author or 'orchestrator'})")
except Exception as e:
logger.error(f"Failed to add comment to {work_item_id}: {e}")
def set_issue_needs_input(work_item_id: str):
def set_issue_needs_input(work_item_id: str, project_id: str = None):
"""Set issue to 'Needs Input' state — waiting for stakeholder response."""
_set_issue_state_direct(work_item_id, PLANE_STATES["needs_input"])
_set_issue_state_direct(work_item_id, PLANE_STATES["needs_input"], project_id)
def set_issue_in_review(work_item_id: str):
def set_issue_in_review(work_item_id: str, project_id: str = None):
"""Set issue to 'In Review' state — waiting for :approved: or :rejected:."""
_set_issue_state_direct(work_item_id, PLANE_STATES["in_review"])
_set_issue_state_direct(work_item_id, PLANE_STATES["in_review"], project_id)
def set_issue_blocked(work_item_id: str):
def set_issue_blocked(work_item_id: str, project_id: str = None):
"""Set issue to 'Blocked' state — manual intervention needed."""
_set_issue_state_direct(work_item_id, PLANE_STATES["blocked"])
_set_issue_state_direct(work_item_id, PLANE_STATES["blocked"], project_id)
def set_issue_in_progress(work_item_id: str):
def set_issue_done(work_item_id: str, project_id: str = None):
"""Observability fix: force the issue into the TERMINAL Done state.
Used by the deploy->done success path so a completed task always reaches the
terminal Plane state (it used to stick on In Progress because the merge
webhook bypassed the stage engine). Uses the existing PLANE_STATES['done']
UUID — the mapping itself is NOT changed.
"""
_set_issue_state_direct(work_item_id, PLANE_STATES["done"], project_id)
def set_issue_in_progress(work_item_id: str, project_id: str = None):
"""Set issue to 'In Progress' state — agent working."""
_set_issue_state_direct(work_item_id, PLANE_STATES["in_progress"])
_set_issue_state_direct(work_item_id, PLANE_STATES["in_progress"], project_id)
def _set_issue_state_direct(work_item_id: str, state_id: str):
def set_issue_stage_state(work_item_id: str, stage: str, project_id: str = None):
"""Feature 3: move the issue to the board status for a pipeline stage.
Only the visible-stage statuses (architecture/development/review/testing)
are driven here — stages without a dedicated status (analysis/deploy) are a
no-op so the existing in_progress/in_review/needs_input logic stays in
charge. By design this does NOT touch Needs Input / In Review / Blocked,
which are higher priority and set explicitly by their own helpers.
"""
state_key = STAGE_VISIBILITY_STATE.get(stage)
if not state_key:
return
_set_issue_state_direct(work_item_id, PLANE_STATES[state_key], project_id)
def _set_issue_state_direct(work_item_id: str, state_id: str, project_id: str = None):
"""Set issue state directly by state_id."""
issue_id = find_issue_id(work_item_id)
project_id = _resolve_project_id(work_item_id, project_id)
issue_id = find_issue_id(work_item_id, project_id)
if not issue_id:
logger.warning(f"Issue not found in Plane for {work_item_id}")
return
url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{PROJECT_ID}/issues/{issue_id}/"
url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{project_id}/issues/{issue_id}/"
try:
resp = httpx.patch(url, headers=PLANE_HEADERS, json={"state": state_id}, timeout=10)
resp.raise_for_status()
@@ -156,11 +390,12 @@ def _set_issue_state_direct(work_item_id: str, state_id: str):
logger.error(f"Failed to update Plane state for {work_item_id}: {e}")
def notify_stage_change(work_item_id: str, old_stage: str, new_stage: str, agent: str = None):
def notify_stage_change(work_item_id: str, old_stage: str, new_stage: str, agent: str = None, project_id: str = None):
"""Notify Plane about stage transition with links."""
update_issue_state(work_item_id, new_stage)
project_id = _resolve_project_id(work_item_id, project_id)
update_issue_state(work_item_id, new_stage, project_id)
msg = f"🔄 Stage: {old_stage}{new_stage}"
msg = f"{EMOJI_STAGE} Stage: {old_stage}{new_stage}"
if agent:
msg += f" (launching {agent})"
@@ -193,15 +428,29 @@ def notify_stage_change(work_item_id: str, old_stage: str, new_stage: str, agent
except Exception:
pass
add_comment(work_item_id, msg)
# Stage transition is the orchestrator's own voice -> attribute to stream.
add_comment(work_item_id, msg, project_id, author="stream")
def notify_qg_failure(work_item_id: str, stage: str, check: str, reason: str):
def notify_qg_failure(work_item_id: str, stage: str, check: str, reason: str, project_id: str = None):
"""Notify Plane about QG failure."""
add_comment(work_item_id, f"⚠️ QG failed at {stage}: {check}{reason}")
# QG failure belongs to the agent that owns the failing stage.
add_comment(
work_item_id,
f"{EMOJI_QG_FAIL} QG failed at {stage}: {check}{reason}",
project_id,
author=STAGE_AUTHORS.get(stage, "stream"),
)
def notify_done(work_item_id: str):
def notify_done(work_item_id: str, project_id: str = None):
"""Mark issue as Done in Plane."""
update_issue_state(work_item_id, "done")
add_comment(work_item_id, "✅ Task completed! PR merged and deployed.")
project_id = _resolve_project_id(work_item_id, project_id)
update_issue_state(work_item_id, "done", project_id)
# Deploy finished the task -> attribute the completion comment to Deployer.
add_comment(
work_item_id,
f"{EMOJI_DONE} Task completed! PR merged and deployed.",
project_id,
author="deployer",
)

106
src/preflight.py Normal file
View File

@@ -0,0 +1,106 @@
"""ORCH-1 resilience: cheap preflight check (CLI / network available?).
Goal: before the worker claims a job, confirm the claude CLI binary and runtime
are reachable WITHOUT spending any tokens. We only do local/cheap checks:
1. os.path.exists(CLAUDE_BIN) -- instant
2. `claude --version` (timeout 5s) -- spawns CLI, does NOT call the API
The result is cached for `preflight_cache_ttl` seconds so we do not re-run
`claude --version` on every worker tick.
🚫 We deliberately do NOT do a prompt ping (ping->pong) — that would burn the
rate limit and add latency. Preflight is local-only.
"""
import os
import time
import logging
import subprocess
from .config import settings
logger = logging.getLogger("orchestrator.preflight")
_VERSION_TIMEOUT = 5
class _PreflightCache:
def __init__(self):
self.ts: float = 0.0
self.ok: bool = False
self.reason: str = "not checked yet"
_cache = _PreflightCache()
def _claude_bin() -> str:
"""Resolve the claude binary preflight should check.
Must match the binary the launcher actually spawns. The launcher hardcodes
AgentLauncher.CLAUDE_BIN for the real Popen, so we prefer that; we only fall
back to settings.claude_bin / a default if it is somehow unset. (Note: the
container's ORCH_CLAUDE_BIN may point elsewhere; preflight follows the path
that is genuinely executed, not the unused env override.)
"""
try:
from .agents.launcher import AgentLauncher
launcher_bin = getattr(AgentLauncher, "CLAUDE_BIN", None)
if launcher_bin and os.path.exists(launcher_bin):
return launcher_bin
# Launcher path not present -> fall back to configured/default.
return launcher_bin or getattr(settings, "claude_bin", None) or "/opt/claude-code/bin/claude.exe"
except Exception:
return getattr(settings, "claude_bin", None) or "/opt/claude-code/bin/claude.exe"
def _run_version(bin_path: str) -> tuple[bool, str]:
"""`claude --version` — proves the CLI runs without touching the API."""
try:
r = subprocess.run(
[bin_path, "--version"],
capture_output=True,
text=True,
timeout=_VERSION_TIMEOUT,
)
if r.returncode == 0:
return True, (r.stdout or r.stderr or "").strip()[:120] or "ok"
return False, f"--version exit {r.returncode}: {(r.stderr or r.stdout).strip()[:120]}"
except subprocess.TimeoutExpired:
return False, f"--version timed out after {_VERSION_TIMEOUT}s"
except FileNotFoundError:
return False, "claude binary not found (FileNotFoundError)"
except Exception as e: # pragma: no cover - defensive
return False, f"--version error: {e}"
def _compute() -> tuple[bool, str]:
bin_path = _claude_bin()
if not os.path.exists(bin_path):
return False, f"CLAUDE_BIN not found: {bin_path}"
return _run_version(bin_path)
def check(force: bool = False) -> tuple[bool, str]:
"""Return (ok, reason). Cached for preflight_cache_ttl seconds.
force=True bypasses the cache (used by the breaker half-open probe / tests).
"""
now = time.time()
ttl = settings.preflight_cache_ttl
if not force and _cache.ts > 0 and (now - _cache.ts) < ttl:
return _cache.ok, _cache.reason
ok, reason = _compute()
_cache.ts = now
_cache.ok = ok
_cache.reason = reason
if not ok:
logger.warning(f"Preflight FAIL: {reason}")
return ok, reason
def reset_cache() -> None:
"""Invalidate the cache (tests / forced recheck)."""
_cache.ts = 0.0
_cache.ok = False
_cache.reason = "reset"

127
src/projects.py Normal file
View File

@@ -0,0 +1,127 @@
"""ORCH-6: Project registry — map Plane project id -> repo / work-item prefix.
Root cause of the 2026-06-02 incident: the Plane webhook listened to the whole
workspace and hardcoded ``repo = settings.default_repo`` (enduro-trails). Every
issue from any project was funneled into one repo with one prefix (ET).
This module introduces a small registry keyed by the Plane project uuid so the
orchestrator can:
* filter webhooks by project (ignore unknown projects),
* resolve the gitea repo + work-item prefix for a known project,
* route Plane sync (state/comment) into the issue's own project.
Source of truth: ``settings.projects_json`` (a JSON array set via the
``ORCH_PROJECTS_JSON`` env var). If unset/empty/invalid, a built-in default
registry is used so the system works out of the box.
"""
import json
import logging
from dataclasses import dataclass
from .config import settings
logger = logging.getLogger("orchestrator.projects")
@dataclass(frozen=True)
class ProjectConfig:
plane_project_id: str # uuid of the Plane project (registry key)
repo: str # gitea repo name (== folder under /repos)
work_item_prefix: str # ET / ORCH
name: str # human-readable label
# Built-in default registry (used when ORCH_PROJECTS_JSON is empty/invalid).
# Keep enduro-trails first so existing behaviour is the safe default.
_DEFAULT_PROJECTS = [
ProjectConfig(
plane_project_id="7a79f0a9-5278-49cd-9007-9a338f238f9c",
repo="enduro-trails",
work_item_prefix="ET",
name="enduro-trails",
),
ProjectConfig(
plane_project_id="8da6aa25-a60e-44d6-a1e2-d8ae59aa7d6a",
repo="orchestrator",
work_item_prefix="ORCH",
name="orchestrator",
),
]
def _parse_projects_json(raw: str) -> list[ProjectConfig] | None:
"""Parse ORCH_PROJECTS_JSON. Returns None if empty/invalid (-> use default)."""
if not raw or not raw.strip():
return None
try:
data = json.loads(raw)
except (ValueError, TypeError) as e:
logger.error(f"ORCH_PROJECTS_JSON is not valid JSON, falling back to default: {e}")
return None
if not isinstance(data, list):
logger.error("ORCH_PROJECTS_JSON must be a JSON array, falling back to default")
return None
parsed: list[ProjectConfig] = []
for i, item in enumerate(data):
if not isinstance(item, dict):
logger.error(f"ORCH_PROJECTS_JSON[{i}] is not an object, skipping")
continue
try:
parsed.append(
ProjectConfig(
plane_project_id=str(item["plane_project_id"]),
repo=str(item["repo"]),
work_item_prefix=str(item["work_item_prefix"]),
name=str(item.get("name", item["repo"])),
)
)
except KeyError as e:
logger.error(f"ORCH_PROJECTS_JSON[{i}] missing required key {e}, skipping")
continue
if not parsed:
logger.error("ORCH_PROJECTS_JSON produced no valid entries, falling back to default")
return None
return parsed
def _load_projects() -> list[ProjectConfig]:
parsed = _parse_projects_json(getattr(settings, "projects_json", "") or "")
if parsed is not None:
logger.info(f"Project registry loaded from ORCH_PROJECTS_JSON: {len(parsed)} project(s)")
return parsed
return list(_DEFAULT_PROJECTS)
# Module-level registry, built once at import.
PROJECTS: list[ProjectConfig] = _load_projects()
_BY_PLANE_ID: dict[str, ProjectConfig] = {p.plane_project_id: p for p in PROJECTS}
_BY_REPO: dict[str, ProjectConfig] = {p.repo: p for p in PROJECTS}
def get_project_by_plane_id(plane_project_id: str) -> ProjectConfig | None:
"""Resolve project config by Plane project uuid. None if unknown."""
if not plane_project_id:
return None
return _BY_PLANE_ID.get(plane_project_id)
def get_project_by_repo(repo: str) -> ProjectConfig | None:
"""Resolve project config by gitea repo name. None if unknown."""
if not repo:
return None
return _BY_REPO.get(repo)
def known_plane_project_ids() -> set[str]:
"""Set of Plane project ids the orchestrator is configured to handle."""
return set(_BY_PLANE_ID.keys())
def reload_projects() -> None:
"""Rebuild the registry from current settings (used by tests)."""
global PROJECTS, _BY_PLANE_ID, _BY_REPO
PROJECTS = _load_projects()
_BY_PLANE_ID = {p.plane_project_id: p for p in PROJECTS}
_BY_REPO = {p.repo: p for p in PROJECTS}

View File

@@ -2,6 +2,7 @@
import os
import logging
import subprocess
import httpx
from ..config import settings
@@ -137,7 +138,16 @@ def check_review_approved(repo: str, pr_number: int) -> tuple[bool, str]:
def check_tests_passed(repo: str, work_item_id: str, branch: str | None = None) -> tuple[bool, str]:
"""
Check if test report exists and contains PASS indicator.
Gate the testing -> deploy transition on the tester's MACHINE-READABLE verdict
in 13-test-report.md frontmatter, NOT on a naive substring search of the body.
ET-013 fix: the previous implementation did `if "PASS" in content`, so a report
explicitly marked `verdict: BLOCKED` / `status: blocked` but whose prose mentioned
"23 passed" / "✅ PASS" / "All checks passed" was treated as a pass, and an
unfinished feature reached Done. This mirrors check_reviewer_verdict (S-5) and
check_deploy_status (БАГ 8): read ONLY the YAML frontmatter `verdict:` / `status:`
fields, never the body.
File: docs/work-items/<work_item_id>/13-test-report.md
"""
repo_path = _repo_path(repo, branch)
@@ -149,12 +159,67 @@ def check_tests_passed(repo: str, work_item_id: str, branch: str | None = None)
try:
with open(report_path, "r") as f:
content = f.read()
if "PASS" in content or "All tests passed" in content:
return True, "Test report indicates PASS"
return False, "Test report exists but no PASS indicator found"
except OSError as e:
return False, f"Error reading test report: {e}"
return _parse_tests_verdict(content)
# Positive / negative verdict tokens, derived from REAL tester reports in
# enduro-trails (ET-001..ET-014). The tester is inconsistent: most write
# `verdict: PASS`, but ET-006 used `verdict: ready-to-deploy` (with `status: PASSED`),
# ET-007 `verdict: PASS — ready-to-deploy`, ET-008 `verdict: stage:ready-to-deploy`
# (with `status: pass`). ET-013 (the bug) used `verdict: BLOCKED` / `status: blocked`.
# We therefore match known positive/negative TOKENS inside the normalized
# verdict/status fields, and treat a negative token as authoritative (a BLOCKED/FAILED
# report never passes, even if another field looks positive).
_TESTS_NEGATIVE_TOKENS = ("BLOCKED", "FAILED", "FAIL", "REQUEST_CHANGES", "REJECT", "RED")
_TESTS_POSITIVE_TOKENS = ("PASSED", "PASS", "READY-TO-DEPLOY", "READY_TO_DEPLOY", "GREEN", "APPROVED")
def _parse_tests_verdict(content: str) -> tuple[bool, str]:
"""Map a 13-test-report.md body to a quality-gate verdict by reading ONLY the
machine-readable `verdict:` (and corroborating `status:`) YAML frontmatter fields.
Rules:
- No frontmatter / bad YAML / neither field present -> (False, reason).
- A negative token (BLOCKED/FAILED/...) in verdict OR status -> (False) and is
authoritative (ET-013 main case: verdict BLOCKED wins over any prose PASS).
- Otherwise a positive token (PASS/PASSED/READY-TO-DEPLOY/...) in verdict OR
status -> (True).
- Anything else (unrecognized / empty verdict) -> (False, reason).
"""
import yaml
if not content.startswith("---"):
return False, "No YAML frontmatter in test report (cannot read machine verdict)"
parts = content.split("---", 2)
if len(parts) < 3:
return False, "Malformed YAML frontmatter in test report"
try:
fm = yaml.safe_load(parts[1]) or {}
except yaml.YAMLError as e:
return False, f"Invalid YAML frontmatter in test report: {e}"
if not isinstance(fm, dict):
return False, "Malformed YAML frontmatter in test report (not a mapping)"
verdict = str(fm.get("verdict", "") or "").upper().strip()
status = str(fm.get("status", "") or "").upper().strip()
if not verdict and not status:
return False, "No machine-readable verdict/status in test report frontmatter"
fields = f"{verdict} {status}"
for neg in _TESTS_NEGATIVE_TOKENS:
if neg in fields:
return False, f"Test verdict: {verdict or status} ({neg})"
for pos in _TESTS_POSITIVE_TOKENS:
if pos in fields:
return True, f"Test verdict: {verdict or status} (PASS)"
return False, f"No recognized PASS verdict in frontmatter (verdict={verdict!r}, status={status!r})"
def check_analysis_approved(repo: str, work_item_id: str, branch: str | None = None) -> tuple[bool, str]:
@@ -175,11 +240,15 @@ def check_analysis_approved(repo: str, work_item_id: str, branch: str | None = N
# Check for :approved: comment via Plane API
try:
from ..plane_sync import find_issue_id, PLANE_BASE, PLANE_HEADERS, WORKSPACE, PROJECT_ID
issue_id = find_issue_id(work_item_id)
from ..projects import get_project_by_repo
# ORCH-6: verify approval in the issue's own Plane project.
_proj = get_project_by_repo(repo)
_pid = _proj.plane_project_id if _proj else PROJECT_ID
issue_id = find_issue_id(work_item_id, _pid)
if not issue_id:
return False, "Cannot find Plane issue to verify approval"
url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{PROJECT_ID}/issues/{issue_id}/comments/"
url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{_pid}/issues/{issue_id}/comments/"
resp = httpx.get(url, headers=PLANE_HEADERS, timeout=10)
resp.raise_for_status()
comments = resp.json()
@@ -245,9 +314,17 @@ def check_reviewer_verdict(repo: str, work_item_id: str, branch: str | None = No
def check_tests_local(repo: str, branch: str) -> tuple[bool, str]:
"""
DEPRECATED: replaced by check_ci_green on the development stage (CI is now
configured). Kept for backward-compat; not wired to any stage.
S-1 fix: run the project test suite locally and judge by exit code, instead of
depending on Gitea CI (which is not configured -> always false).
БАГ 5 fix: invoke pytest directly instead of make test. make is not installed
in the orchestrator container, so the previous ["make", "test"] call raised
FileNotFoundError. This reproduces the Makefile test target 1:1
(cd src/api && python -m pytest ../../tests/ -v).
ORCH-2 / S-4: tests run inside the per-branch worktree (ensure_worktree), so this
is safe for concurrent active tasks — no shared /repos checkout race.
"""
@@ -255,7 +332,8 @@ def check_tests_local(repo: str, branch: str) -> tuple[bool, str]:
try:
repo_path = ensure_worktree(repo, branch)
r = subprocess.run(
["make", "test"], cwd=repo_path,
["python", "-m", "pytest", "../../tests/", "-v"],
cwd=os.path.join(repo_path, "src", "api"),
capture_output=True, text=True, timeout=600,
)
if r.returncode == 0:
@@ -268,6 +346,100 @@ def check_tests_local(repo: str, branch: str) -> tuple[bool, str]:
return False, f"Local test run error: {e}"
def _parse_deploy_status(content: str) -> tuple[bool, str]:
"""Parse a 14-deploy-log.md body and map its `deploy_status:` frontmatter to a
quality-gate verdict. Reads ONLY the machine-readable YAML field, never prose.
deploy_status: SUCCESS -> (True, "Deploy status: SUCCESS")
deploy_status: FAILED -> (False, "Deploy status: FAILED")
missing field / no frontmatter / bad YAML -> (False, <reason>)
"""
import yaml
status = None
if content.startswith("---"):
parts = content.split("---", 2)
if len(parts) >= 3:
try:
fm = yaml.safe_load(parts[1]) or {}
except yaml.YAMLError as e:
return False, f"Invalid YAML frontmatter in deploy log: {e}"
status = str(fm.get("deploy_status", "")).upper().strip()
if status == "SUCCESS":
return True, "Deploy status: SUCCESS"
if status == "FAILED":
return False, "Deploy status: FAILED"
return False, f"No machine-readable deploy_status in frontmatter (got: {status!r})"
def _deploy_log_from_main(repo: str, work_item_id: str) -> str | None:
"""Best-effort read of 14-deploy-log.md from origin/main on the shared clone.
The deployer writes 14-deploy-log.md and merges the deploy artifacts into main
via a separate PR (see ET-013), so the file lands in origin/main, NOT in the
feature branch worktree the gate normally reads. This recovers it from main.
Degrades gracefully: any git failure (no clone, network/fetch error, file
absent in main) returns None instead of raising, so the caller falls back to
the plain "not found" verdict. Never raises.
"""
repo_clone = os.path.join(settings.repos_dir, repo)
if not os.path.isdir(os.path.join(repo_clone, ".git")):
return None
rel = f"docs/work-items/{work_item_id}/14-deploy-log.md"
try:
# Refresh origin/main so we see freshly-merged deploy artifacts.
subprocess.run(
["git", "-C", repo_clone, "fetch", "origin", "main"],
check=False, capture_output=True, timeout=30,
)
show = subprocess.run(
["git", "-C", repo_clone, "show", f"origin/main:{rel}"],
check=False, capture_output=True, text=True, timeout=15,
)
except (subprocess.SubprocessError, OSError) as e:
logger.warning("deploy-log origin/main lookup failed for %s/%s: %s", repo, work_item_id, e)
return None
if show.returncode != 0:
return None
return show.stdout
def check_deploy_status(repo: str, work_item_id: str, branch: str | None = None) -> tuple[bool, str]:
"""
БАГ 8 fix: gate the deploy -> done transition on the deployer's machine-readable
verdict in 14-deploy-log.md frontmatter, NOT on the LLM process exit code
(which is always 0 on a successful agent session even when the deploy failed).
Mirrors check_reviewer_verdict (S-5): reads ONLY `deploy_status:` from YAML
frontmatter. Returns:
(True, ...) -> deploy_status: SUCCESS
(False, ...) -> deploy_status: FAILED, missing field, or no frontmatter
ET-013 path-sync fix: the deployer writes 14-deploy-log.md and merges the deploy
artifacts into main via a SEPARATE PR, so the log lands in origin/main, not in
the feature-branch worktree this gate reads via _repo_path(repo, branch). If the
file is absent in the worktree we fall back to reading it from origin/main on the
shared clone. Lookup order: worktree -> origin/main -> not found.
"""
repo_path = _repo_path(repo, branch)
log_path = os.path.join(repo_path, f"docs/work-items/{work_item_id}/14-deploy-log.md")
if os.path.isfile(log_path):
try:
with open(log_path, "r") as f:
content = f.read()
except OSError as e:
return False, f"Error reading deploy log: {e}"
return _parse_deploy_status(content)
# Not in the feature worktree — the deployer may have merged it into main.
main_content = _deploy_log_from_main(repo, work_item_id)
if main_content is not None:
return _parse_deploy_status(main_content)
return False, "Deploy log not found (14-deploy-log.md)"
# Registry for dynamic lookup by name
QG_CHECKS = {
"check_analysis_approved": check_analysis_approved,
@@ -278,4 +450,5 @@ QG_CHECKS = {
"check_tests_passed": check_tests_passed,
"check_reviewer_verdict": check_reviewer_verdict,
"check_tests_local": check_tests_local,
"check_deploy_status": check_deploy_status,
}

246
src/queue_worker.py Normal file
View File

@@ -0,0 +1,246 @@
"""ORCH-1 (F-2b): background job-queue worker with resilience layer.
A single background thread polls the `jobs` table and spawns agents:
while running:
if breaker.open and not cooled_down: sleep; continue # don't touch CLI
if not preflight.ok: sleep; continue # CLI/net down -> wait
while count_running_jobs() < max_concurrency:
job = claim_next_job() # atomic queued -> running (available_at-gated)
if not job: break
launcher.launch_job(job) # spawns claude (Popen) + monitor thread
sleep(poll_interval)
Resilience (ДОПОЛНЕНИЕ):
A. Preflight — cheap local CLI/net check (cached, no tokens) gates claiming.
B/C. The launcher classifies failures (transient vs permanent) and applies
backoff via available_at; the worker only needs to honour available_at
(claim_next_job does) and react to transient outcomes via the breaker.
D. Circuit breaker — N consecutive transient failures -> open (pause M minutes,
no CLI calls, Telegram alert) -> half-open (probe one job) -> closed.
Design: plain daemon thread + threading.Event (the launcher already manages its
own monitor/watchdog threads + blocking Popen).
"""
import time
import logging
import threading
from .config import settings
from .db import claim_next_job, count_running_jobs
from .agents.launcher import launcher
from . import preflight
logger = logging.getLogger("orchestrator.queue_worker")
class CircuitBreaker:
"""Trips after `threshold` consecutive transient failures.
States: closed -> (threshold transient) -> open -> (after pause) half-open
-> (recovered) closed | (transient again) open.
Thread-safe enough for our single-worker + monitor-thread callbacks (a lock
guards the counters).
"""
def __init__(self, threshold: int = None, pause_seconds: int = None):
self.threshold = threshold if threshold is not None else settings.breaker_threshold
self.pause_seconds = (
pause_seconds if pause_seconds is not None else settings.breaker_pause_seconds
)
self._lock = threading.Lock()
self.state = "closed" # closed | open | half-open
self.consecutive_transient = 0
self.opened_at = 0.0
self._notify = None # optional callable(message) for alerts
def set_notifier(self, fn):
self._notify = fn
def record_transient(self):
with self._lock:
self.consecutive_transient += 1
if self.state == "half-open":
# Probe failed -> re-open.
self._open("circuit re-opened: probe job hit transient again")
elif self.consecutive_transient >= self.threshold and self.state == "closed":
self._open(
f"circuit OPEN: {self.consecutive_transient} consecutive "
f"transient failures; pausing {self.pause_seconds}s (no CLI calls)"
)
def record_recovered(self):
with self._lock:
self.consecutive_transient = 0
if self.state in ("half-open", "open"):
self.state = "closed"
logger.info("Circuit CLOSED: recovered")
def record_permanent(self):
# A clean permanent (code-fault) failure breaks the transient streak.
with self._lock:
self.consecutive_transient = 0
def _open(self, msg: str):
self.state = "open"
self.opened_at = time.time()
logger.warning(msg)
if self._notify:
try:
self._notify(f"\U0001f534 {msg}")
except Exception:
pass
def allow_claim(self) -> bool:
"""Return True if the worker may attempt to claim/launch a job now.
- closed -> yes.
- open -> no until pause elapsed; then transition to half-open (yes, one probe).
- half-open -> yes (the single probe).
"""
with self._lock:
if self.state == "closed":
return True
if self.state == "open":
if (time.time() - self.opened_at) >= self.pause_seconds:
self.state = "half-open"
logger.info("Circuit HALF-OPEN: probing one job")
return True
return False
# half-open: allow the probe.
return True
def snapshot(self) -> dict:
with self._lock:
remaining = 0
if self.state == "open":
remaining = max(0, int(self.pause_seconds - (time.time() - self.opened_at)))
return {
"state": self.state,
"consecutive_transient": self.consecutive_transient,
"pause_remaining_s": remaining,
}
class QueueWorker:
"""Background worker that drains the persistent job queue (with resilience)."""
def __init__(self, max_concurrency: int = None, poll_interval: float = None,
breaker: CircuitBreaker = None):
self.max_concurrency = (
max_concurrency if max_concurrency is not None else settings.max_concurrency
)
self.poll_interval = (
poll_interval if poll_interval is not None else settings.queue_poll_interval
)
self.breaker = breaker or CircuitBreaker()
self.last_preflight_ok = True
self.last_preflight_reason = "not checked"
self._stop = threading.Event()
self._thread: threading.Thread | None = None
# --- circuit breaker outcome callback wired into the launcher ----------
def _on_outcome(self, transient: bool, recovered: bool):
if recovered:
self.breaker.record_recovered()
elif transient:
self.breaker.record_transient()
else:
self.breaker.record_permanent()
def _drain_once(self):
"""Claim and launch jobs until concurrency is full or the queue is empty.
Gated by the circuit breaker and preflight: if the breaker is open (and
not yet cooled down) or preflight fails, we do NOT claim — jobs stay
queued and no CLI/tokens are touched.
"""
if not self.breaker.allow_claim():
return
ok, reason = preflight.check()
self.last_preflight_ok = ok
self.last_preflight_reason = reason
if not ok:
logger.info(f"Preflight not ok ({reason}) -> not claiming jobs this tick")
return
# In half-open we only probe a single job, regardless of max_concurrency.
half_open = self.breaker.snapshot()["state"] == "half-open"
launched = 0
while not self._stop.is_set():
if half_open and launched >= 1:
return
if count_running_jobs() >= self.max_concurrency:
return
job = claim_next_job()
if not job:
return
launched += 1
try:
run_id = launcher.launch_job(job)
logger.info(
f"Worker launched job {job['id']} ({job['agent']}, "
f"repo {job['repo']}) -> run_id={run_id}"
)
except Exception as e:
# Launch itself failed (e.g. repo missing): treat as a permanent
# launch error so the job does not wedge as 'running' forever.
logger.error(f"Worker failed to launch job {job['id']}: {e}")
try:
from .db import get_job, mark_job
j = get_job(job["id"])
attempts = j.get("attempts", 0) if j else 0
max_attempts = j.get("max_attempts", 2) if j else 2
if attempts < max_attempts:
mark_job(job["id"], "queued", error=f"launch error: {e}")
else:
mark_job(job["id"], "failed", error=f"launch error: {e}")
except Exception:
pass
def _run(self):
logger.info(
f"Queue worker started (max_concurrency={self.max_concurrency}, "
f"poll_interval={self.poll_interval}s, breaker_threshold={self.breaker.threshold})"
)
while not self._stop.is_set():
try:
self._drain_once()
except Exception as e:
logger.error(f"Queue worker loop error: {e}")
self._stop.wait(self.poll_interval)
logger.info("Queue worker stopped")
def start(self):
if self._thread and self._thread.is_alive():
return
# Wire breaker alerting + launcher outcome callback.
try:
from .notifications import send_telegram
self.breaker.set_notifier(send_telegram)
except Exception:
pass
launcher.on_outcome = self._on_outcome
self._stop.clear()
self._thread = threading.Thread(
target=self._run, name="queue-worker", daemon=True
)
self._thread.start()
def stop(self, timeout: float = 5.0):
self._stop.set()
if self._thread:
self._thread.join(timeout=timeout)
def status(self) -> dict:
"""Resilience snapshot for /queue."""
return {
"breaker": self.breaker.snapshot(),
"preflight_ok": self.last_preflight_ok,
"preflight_reason": self.last_preflight_reason,
}
# Module-level singleton used by the FastAPI lifespan.
worker = QueueWorker()

546
src/stage_engine.py Normal file
View File

@@ -0,0 +1,546 @@
"""Unified stage engine (ORCH-4 / M-3).
Single source of truth for "an agent finished / a human approved -> run the
stage's quality gate and either advance the pipeline or roll it back".
Before ORCH-4 this logic was duplicated in two places that had silently
diverged:
- src/agents/launcher.py::_try_advance_stage (sync, rich business logic:
analyst approved-flow, reviewer REQUEST_CHANGES rollback+retry, tester FAIL
rollback+retry, architect conflict rollback) — but it picked the next agent
with get_agent_for_stage(next_stage), which is WRONG.
- src/webhooks/plane.py::_try_advance_stage (async, leaner, but it had the
check_review_approved PR-by-branch dispatch and used the CORRECT
get_agent_for_stage(current_stage)).
This module merges both into one sync `advance_stage(...)`. launcher calls it
directly; the plane webhook calls it through asyncio.to_thread so there is
exactly one implementation.
Agent-selection bug fix (ORCH-4):
stages.py defines `agent` as "the agent to launch when advancing FROM this
stage". So when advancing current -> next, the correct agent to launch is
get_agent_for_stage(current_stage). launcher's old next_stage lookup skipped a
stage (e.g. analysis->architecture launched 'developer' instead of
'architect'). plane and gitea already used current_stage; we unify on that.
"""
import logging
import os
from dataclasses import dataclass, field
from .db import get_db, update_task_stage, enqueue_job
from .stages import get_next_stage, get_qg_for_stage, get_agent_for_stage
from .git_worktree import get_worktree_path
from .qg.checks import QG_CHECKS
from .notifications import (
notify_stage_change,
notify_qg_failure,
notify_approve_requested,
send_telegram,
)
from .plane_sync import (
notify_stage_change as plane_notify_stage,
notify_qg_failure as plane_notify_qg,
add_comment as plane_add_comment,
set_issue_in_review,
set_issue_needs_input,
set_issue_in_progress,
set_issue_blocked,
set_issue_done,
)
from .config import settings
logger = logging.getLogger("orchestrator.stage_engine")
MAX_DEVELOPER_RETRIES = 3
@dataclass
class AdvanceResult:
"""Outcome of an advance_stage() call (mostly for tests/observability)."""
advanced: bool = False
from_stage: str | None = None
to_stage: str | None = None
enqueued_agent: str | None = None
enqueued_job_id: int | None = None
qg_name: str | None = None
qg_passed: bool | None = None
qg_reason: str | None = None
rolled_back_to: str | None = None
alerted: bool = False
note: str | None = None
notes: list = field(default_factory=list)
def _run_qg(qg_name: str, repo: str, work_item_id: str, branch: str):
"""Dispatch a quality-gate check to the right signature and run it.
Signatures (unified from launcher + plane):
- check_ci_green / check_tests_local -> (repo, branch)
- check_review_approved -> (repo, pr_number) [PR found by branch]
- everything else (artifact checks) -> (repo, work_item_id, branch)
Returns (passed: bool, reason: str).
"""
check_fn = QG_CHECKS.get(qg_name)
if not check_fn:
logger.error(f"QG function '{qg_name}' not found in registry")
return False, f"Unknown QG: {qg_name}"
if qg_name in ("check_ci_green", "check_tests_local"):
# (repo, branch) — already worktree-aware.
return check_fn(repo, branch)
if qg_name == "check_review_approved":
# Special case kept from plane: find the open PR for this branch via
# Gitea, then check it; fall back to a file-based review marker.
return _check_review_approved_by_branch(check_fn, repo, work_item_id, branch)
# All other artifact checks: (repo, work_item_id, branch). Pass branch so the
# check reads from the task worktree (ORCH-2 / S-4).
return check_fn(repo, work_item_id or "", branch)
def _check_review_approved_by_branch(check_fn, repo: str, work_item_id: str, branch: str):
"""check_review_approved dispatch preserved from plane._try_advance_stage.
Finds the open PR whose head ref == branch via the Gitea API and runs
check_review_approved(repo, pr_number). If no open PR exists, falls back to a
file-based review marker (12-review.md / 09-review.md) like the original.
"""
import httpx as _httpx
owner = settings.gitea_owner
url = f"{settings.gitea_url}/api/v1/repos/{owner}/{repo}/pulls?state=open&limit=50"
headers = {"Authorization": f"token {settings.gitea_token}"}
try:
resp = _httpx.get(url, headers=headers, timeout=10)
prs = resp.json()
pr_number = None
for pr in prs:
if pr.get("head", {}).get("ref") == branch:
pr_number = pr["number"]
break
if pr_number:
return check_fn(repo, pr_number)
# No open PR but a review file may exist — check file-based.
wt = get_worktree_path(repo, branch)
if not os.path.isdir(wt):
wt = os.path.join(settings.repos_dir, repo)
review_path = os.path.join(wt, f"docs/work-items/{work_item_id}/12-review.md")
review_path2 = os.path.join(wt, f"docs/work-items/{work_item_id}/09-review.md")
if os.path.isfile(review_path) or os.path.isfile(review_path2):
return True, "Review file exists (file-based approval)"
return False, "No open PR found and no review file"
except Exception as e:
return False, f"Error finding PR: {e}"
def _developer_retry_count(task_id: int) -> int:
"""How many developer runs have already happened for this task."""
conn = get_db()
n = conn.execute(
"SELECT COUNT(*) FROM agent_runs WHERE task_id=? AND agent='developer'",
(task_id,),
).fetchone()[0]
conn.close()
return n
def advance_stage(
task_id: int,
current_stage: str,
repo: str,
work_item_id: str,
branch: str,
finished_agent: str | None = None,
) -> AdvanceResult:
"""Run the current stage's quality gate and advance / roll back the pipeline.
This is the single merged implementation (ORCH-4 / M-3). It is synchronous;
the async plane webhook calls it via asyncio.to_thread.
Args:
task_id: tasks.id
current_stage: the stage the task is currently in
repo: repository name
work_item_id: Plane work item id (may be "" / None)
branch: feature branch
finished_agent: the agent that just finished (launcher path). Drives the
approved/REQUEST_CHANGES/tester/architect branches. In the
plane webhook path it is None, so those agent-specific
branches simply do not trigger (matches old plane behavior).
Returns AdvanceResult describing what happened.
"""
result = AdvanceResult(from_stage=current_stage)
agent = finished_agent
try:
qg_name = get_qg_for_stage(current_stage)
next_stage = get_next_stage(current_stage)
result.qg_name = qg_name
result.to_stage = next_stage
if not next_stage:
logger.info(f"Task {task_id}: already at terminal stage '{current_stage}'")
result.note = "terminal"
return result
# --- Quality gate ----------------------------------------------------
if qg_name and qg_name in QG_CHECKS:
# Human-approval gate: split by path.
if qg_name == "check_analysis_approved":
# Launcher path (analyst just finished): set In Review + ask for
# the Approved status. This gate never advances on its own -- a
# human Approved verdict does that.
if agent == "analyst":
_handle_analysis_approved_flow(
task_id, current_stage, repo, work_item_id, branch, agent, result
)
return result
# Webhook Approved-verdict path (agent is None): the human flipped
# the Plane status to Approved, which IS the approval. The gate is
# satisfied -- do NOT re-run check_analysis_approved (it looks for
# an :approved: *comment* and would block on a status-only
# approval). Mark it passed and fall through to the Advance block.
result.qg_name = qg_name
result.qg_passed = True
result.qg_reason = "approved-via-status"
else:
passed, reason = _run_qg(qg_name, repo, work_item_id, branch)
result.qg_passed = passed
result.qg_reason = reason
if not passed:
logger.info(
f"Task {task_id}: QG '{qg_name}' not passed after {agent}: {reason}"
)
# Behaviour parity:
# - webhook path (finished_agent is None): emit the generic
# QG-failure notification, exactly like the old plane handler.
# - launcher path (finished_agent set): NO generic notification;
# the rollback branches below own their own messaging, exactly
# like the old launcher handler.
if agent is None:
notify_qg_failure(task_id, current_stage, qg_name, reason)
plane_notify_qg(work_item_id, current_stage, qg_name, reason)
_handle_qg_failure_rollbacks(
task_id, current_stage, repo, work_item_id, branch,
agent, qg_name, reason, result,
)
return result
elif qg_name:
# QG name set but not registered — do not advance (launcher behavior).
result.note = f"qg '{qg_name}' not in registry"
return result
# --- Advance ---------------------------------------------------------
update_task_stage(task_id, next_stage)
# Telegram live tracker: the analysis->architecture advance is the human
# Approved gate clearing -> stamp the END of "Ревью БРД" (the only
# human time). Idempotent: only the first stamp counts.
if current_stage == "analysis" and next_stage == "architecture":
try:
from .db import mark_brd_review_ended
mark_brd_review_ended(task_id)
except Exception as e:
logger.warning(f"Task {task_id}: brd review end stamp failed: {e}")
notify_stage_change(task_id, current_stage, next_stage)
plane_notify_stage(work_item_id, current_stage, next_stage)
result.advanced = True
logger.info(
f"Task {task_id}: {current_stage} -> {next_stage} "
f"(auto-advance after {agent})"
)
# --- Terminal sync: deploy -> done must reach Plane's Done -----------
# When the deployer's check_deploy_status passes we advance to the
# terminal 'done' stage. Previously a merged-PR webhook completed the
# task out-of-band and Plane stuck on In Progress. Now done flows through
# here, so explicitly drive the Plane issue into the terminal Done state
# (PLANE_STATES['done'] — mapping unchanged) in addition to the
# stage-change comment above.
if next_stage == "done" and work_item_id:
try:
set_issue_done(work_item_id)
logger.info(
f"Task {task_id}: deploy->done, Plane state forced to Done"
)
except Exception as e:
logger.error(f"Task {task_id}: failed to set Plane Done: {e}")
# --- Launch the next agent (ORCH-4 fix: current_stage, not next) -----
next_agent = get_agent_for_stage(current_stage)
if next_agent:
task_desc = (
f"Work item: {work_item_id}\nRepo: {repo}\n"
f"Branch: {branch}\nStage: {next_stage}"
)
new_job_id = enqueue_job(next_agent, repo, task_desc, task_id=task_id)
result.enqueued_agent = next_agent
result.enqueued_job_id = new_job_id
logger.info(
f"Task {task_id}: enqueued '{next_agent}' (job_id={new_job_id})"
)
return result
except Exception as e:
logger.error(f"advance_stage failed for task_id={task_id}: {e}")
result.note = f"error: {e}"
return result
def _build_analyst_ready_comment(repo: str, work_item_id: str, branch: str) -> str:
"""BUG C: HTML comment posted when analyst artifacts are ready.
Status-only model (PR #12): approval is the **Approved** status, NOT a
``:approved:`` comment and NOT moving back to In Progress. The comment asks
the stakeholder to flip the status and links the documents the analyst
actually produced.
Links point at the Gitea web view:
{gitea_url}/{owner}/{repo}/src/branch/{branch}/docs/work-items/{wid}/<file>
Only files that REALLY exist in the worktree are listed (no invented docs).
"""
text = (
"\u2705 BRD/\u0422\u0417/AC \u0433\u043e\u0442\u043e\u0432\u044b. "
"\u0414\u043b\u044f \u043f\u0440\u043e\u0434\u0432\u0438\u0436\u0435\u043d\u0438\u044f "
"\u043f\u0435\u0440\u0435\u0432\u0435\u0434\u0438\u0442\u0435 \u0437\u0430\u0434\u0430\u0447\u0443 "
"\u0432 \u0441\u0442\u0430\u0442\u0443\u0441 Approved. "
"\u0414\u043b\u044f \u043e\u0442\u043a\u043b\u043e\u043d\u0435\u043d\u0438\u044f \u2014 "
"\u043d\u0430\u043f\u0438\u0448\u0438\u0442\u0435 \u043f\u0440\u0438\u0447\u0438\u043d\u0443 "
"\u043a\u043e\u043c\u043c\u0435\u043d\u0442\u043e\u043c \u0438 \u043f\u0435\u0440\u0435\u0432\u0435\u0434\u0438\u0442\u0435 "
"\u0432 Rejected."
)
# Candidate analyst artifacts (label -> filename). Only existing ones linked.
candidates = [
("Business request", "00-business-request.md"),
("BRD", "01-brd.md"),
("\u0422\u0417 (TRZ)", "02-trz.md"),
("Acceptance Criteria", "03-acceptance-criteria.md"),
("Test Plan", "04-test-plan.yaml"),
("UI Test Cases", "04b-ui-test-cases.md"),
]
rel_dir = f"docs/work-items/{work_item_id}"
try:
wt_dir = os.path.join(get_worktree_path(repo, branch), rel_dir)
except Exception:
wt_dir = None
owner = getattr(settings, "gitea_owner", "admin")
base = (getattr(settings, "gitea_public_url", "") or settings.gitea_url).rstrip("/")
links = []
for label, fname in candidates:
if wt_dir and not os.path.isfile(os.path.join(wt_dir, fname)):
continue
href = f"{base}/{owner}/{repo}/src/branch/{branch}/{rel_dir}/{fname}"
links.append(f'<li><a href="{href}">{label}</a></li>')
if links:
text += "<br><b>\u0414\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u044b:</b><ul>" + "".join(links) + "</ul>"
return text
def _handle_analysis_approved_flow(
task_id, current_stage, repo, work_item_id, branch, agent, result: AdvanceResult
):
"""Analyst approved-flow (launcher only).
Only triggers when the analyst just finished (agent == 'analyst') in the
launcher path. Decides between: artifacts ready -> In Review + request
:approved:; questions file -> Needs Input; otherwise a warning comment.
This gate never advances on its own (human approval does that via the plane
webhook), matching the original launcher behavior.
"""
result.qg_name = "check_analysis_approved"
result.note = "analysis-approval-gate"
if not (agent == "analyst" and work_item_id):
return
files_check = QG_CHECKS.get("check_analysis_complete")
if not files_check:
return
files_ok, _ = files_check(repo, work_item_id, branch)
if files_ok:
# Full artifacts ready -> In Review, ask for the Approved STATUS (BUG C).
set_issue_in_review(work_item_id)
plane_add_comment(
work_item_id,
_build_analyst_ready_comment(repo, work_item_id, branch),
author="analyst",
)
notify_approve_requested(task_id)
result.note = "analysis-in-review"
logger.info(
f"Task {task_id}: analyst finished, requested Approved status in Plane"
)
return
questions_path = os.path.join(
get_worktree_path(repo, branch),
f"docs/work-items/{work_item_id}/01-questions.md",
)
if os.path.isfile(questions_path):
set_issue_needs_input(work_item_id)
with open(questions_path, "r") as qf:
questions_text = qf.read()
plane_add_comment(
work_item_id,
f"\u2753 Analyst \u043d\u0443\u0436\u0434\u0430\u0435\u0442\u0441\u044f \u0432 \u0443\u0442\u043e\u0447\u043d\u0435\u043d\u0438\u0438:\n\n{questions_text}",
author="analyst",
)
send_telegram(
f"\u2753 {work_item_id}: Analyst \u0437\u0430\u0434\u0430\u0451\u0442 \u0432\u043e\u043f\u0440\u043e\u0441\u044b. \u041e\u0442\u0432\u0435\u0442\u044c \u0432 Plane."
)
result.note = "analysis-needs-input"
return
# No artifacts and no questions.
plane_add_comment(
work_item_id,
"\u26a0\ufe0f Analyst \u0437\u0430\u0432\u0435\u0440\u0448\u0438\u043b\u0441\u044f \u0431\u0435\u0437 \u0430\u0440\u0442\u0435\u0444\u0430\u043a\u0442\u043e\u0432 \u0438 \u0431\u0435\u0437 \u0432\u043e\u043f\u0440\u043e\u0441\u043e\u0432. \u041f\u0440\u043e\u0432\u0435\u0440\u044c\u0442\u0435 \u043b\u043e\u0433.",
author="analyst",
)
result.note = "analysis-empty"
def _handle_qg_failure_rollbacks(
task_id, current_stage, repo, work_item_id, branch,
agent, qg_name, reason, result: AdvanceResult,
):
"""All rollback/retry branches from the original launcher, preserved verbatim.
Only fire on the launcher path (finished_agent is set). The webhook path
passes finished_agent=None, so none of these agent-specific branches trigger
— that matches the old plane behavior (it just reported the QG failure).
"""
# Reviewer REQUEST_CHANGES -> rollback to development + retry (max 3).
if agent == "reviewer" and "REQUEST_CHANGES" in (reason or ""):
update_task_stage(task_id, "development")
notify_stage_change(task_id, current_stage, "development")
plane_notify_stage(work_item_id, current_stage, "development")
result.rolled_back_to = "development"
retry_count = _developer_retry_count(task_id)
if retry_count < MAX_DEVELOPER_RETRIES:
task_desc = (
f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
f"Stage: development\nNote: REQUEST_CHANGES from reviewer "
f"(attempt {retry_count+1}/3). Fix findings in "
f"docs/work-items/{work_item_id}/12-review.md"
)
new_job = enqueue_job("developer", repo, task_desc, task_id=task_id)
result.enqueued_agent = "developer"
result.enqueued_job_id = new_job
logger.info(
f"Task {task_id}: reviewer REQUEST_CHANGES, enqueued developer "
f"(job_id={new_job})"
)
else:
send_telegram(
f"\u26a0\ufe0f {work_item_id}: Max developer retries (3) reached. "
f"Manual intervention needed."
)
result.alerted = True
logger.error(f"Task {task_id}: max retries reached")
# Tester check_tests_passed FAIL -> rollback to development + retry (max 3).
if agent == "tester" and qg_name == "check_tests_passed":
update_task_stage(task_id, "development")
notify_stage_change(task_id, current_stage, "development")
plane_notify_stage(work_item_id, current_stage, "development")
result.rolled_back_to = "development"
set_issue_in_progress(work_item_id)
plane_add_comment(
work_item_id,
f"\u274c \u0422\u0435\u0441\u0442\u044b \u043d\u0435 \u043f\u0440\u043e\u0448\u043b\u0438: {reason}. "
f"Developer \u043f\u0435\u0440\u0435\u0437\u0430\u043f\u0443\u0449\u0435\u043d \u0434\u043b\u044f \u0444\u0438\u043a\u0441\u0430.",
author="tester",
)
retry_count = _developer_retry_count(task_id)
if retry_count < MAX_DEVELOPER_RETRIES:
task_desc = (
f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
f"Stage: development\nNote: Tests FAILED. "
f"Fix failures described in docs/work-items/{work_item_id}/13-test-report.md"
)
new_job = enqueue_job("developer", repo, task_desc, task_id=task_id)
result.enqueued_agent = "developer"
result.enqueued_job_id = new_job
logger.info(
f"Task {task_id}: tester FAIL, enqueued developer (job_id={new_job})"
)
else:
set_issue_blocked(work_item_id)
send_telegram(
f"\U0001f6a8 {work_item_id}: Tests still failing after 3 developer "
f"retries. Manual intervention needed."
)
result.alerted = True
# Architect conflict (10-conflict.md exists) -> rollback to analysis.
if agent == "architect" and qg_name == "check_architecture_done":
conflict_path = os.path.join(
get_worktree_path(repo, branch),
f"docs/work-items/{work_item_id}/10-conflict.md",
)
if os.path.isfile(conflict_path):
update_task_stage(task_id, "analysis")
notify_stage_change(task_id, current_stage, "analysis")
plane_notify_stage(work_item_id, current_stage, "analysis")
result.rolled_back_to = "analysis"
set_issue_in_progress(work_item_id)
with open(conflict_path, "r") as cf:
conflict_text = cf.read()[:500]
plane_add_comment(
work_item_id,
f"\u26a0\ufe0f Architect \u043d\u0430\u0448\u0451\u043b \u043a\u043e\u043d\u0444\u043b\u0438\u043a\u0442 \u0441 \u0422\u0417. "
f"\u0412\u043e\u0437\u0432\u0440\u0430\u0442 \u0432 Analysis.\n\n{conflict_text}",
author="architect",
)
task_desc = (
f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
f"Stage: analysis\nNote: Architect conflict. Revise TRZ. "
f"See docs/work-items/{work_item_id}/10-conflict.md"
)
new_job = enqueue_job("analyst", repo, task_desc, task_id=task_id)
result.enqueued_agent = "analyst"
result.enqueued_job_id = new_job
logger.info(
f"Task {task_id}: architect conflict, enqueued analyst "
f"(job_id={new_job})"
)
# БАГ 8: deployer verdict FAILED -> roll deploy back to development.
# The launcher's exit_code-based guard (launcher.py:475) never fires because
# the LLM process exit code is always 0; this gate fires on the machine-readable
# deploy_status verdict in 14-deploy-log.md instead. Mirrors the launcher block
# (rollback + set_issue_blocked + notify) but is driven by the VERDICT.
if agent == "deployer" and qg_name == "check_deploy_status":
update_task_stage(task_id, "development")
notify_stage_change(task_id, current_stage, "development")
plane_notify_stage(work_item_id, current_stage, "development")
result.rolled_back_to = "development"
set_issue_blocked(work_item_id)
notify_qg_failure(task_id, "deploy", "check_deploy_status", reason)
plane_add_comment(
work_item_id,
f"\u274c Deploy FAILED ({reason}). Rolled back to development. "
f"Developer \u043d\u0443\u0436\u0435\u043d \u0434\u043b\u044f \u0444\u0438\u043a\u0441\u0430.",
author="deployer",
)
send_telegram(
f"\U0001f6a8 {work_item_id}: Deploy FAILED ({reason}). "
f"Rolled back to development. Needs fix."
)
result.alerted = True
logger.error(
f"Task {task_id}: deployer verdict FAILED, rolled back deploy -> "
f"development ({reason})"
)

View File

@@ -5,7 +5,7 @@ Stages:
Each stage defines:
- next: the stage to advance to
- agent: the agent to launch when entering the NEXT stage
- agent: the agent to launch when advancing FROM this stage (NOT the next stage's agent)
- qg: the quality gate check required to leave this stage
"""
@@ -13,10 +13,10 @@ STAGE_TRANSITIONS = {
"created": {"next": "analysis", "agent": "analyst", "qg": None},
"analysis": {"next": "architecture", "agent": "architect", "qg": "check_analysis_approved"},
"architecture": {"next": "development", "agent": "developer", "qg": "check_architecture_done"},
"development": {"next": "review", "agent": "reviewer", "qg": "check_tests_local"},
"development": {"next": "review", "agent": "reviewer", "qg": "check_ci_green"},
"review": {"next": "testing", "agent": "tester", "qg": "check_reviewer_verdict"},
"testing": {"next": "deploy", "agent": "deployer", "qg": "check_tests_passed"},
"deploy": {"next": "done", "agent": None, "qg": None},
"deploy": {"next": "done", "agent": None, "qg": "check_deploy_status"},
"done": {"next": None, "agent": None, "qg": None},
}

464
src/usage.py Normal file
View File

@@ -0,0 +1,464 @@
"""Feature 4: token / cost accounting for agent runs.
claude --output-format json emits a single result JSON object at the end of the
run log with fields:
total_cost_usd
usage.input_tokens / output_tokens / cache_read_input_tokens /
cache_creation_input_tokens
modelUsage, num_turns, duration_ms
This module parses that JSON out of a (text-or-json) run log, records the usage
on the agent_runs row, formats a Plane comment for the finishing agent, and
builds the per-task summary the Deployer posts on deploy/done.
Everything here is defensive: a missing/garbled JSON never raises \u2014 we record
NULL/0 and log a warning so a broken agent run can't crash the monitor.
"""
import json
import logging
from .db import get_db
logger = logging.getLogger("orchestrator.usage")
def parse_usage_from_text(text: str) -> dict | None:
"""Extract the claude result-JSON usage from a run log's text.
The log may contain plain text before/after the JSON; with
--output-format json the JSON is the final object. We scan for the LAST
top-level '{' ... '}' that parses and carries usage/total_cost_usd.
Returns a normalised dict
{input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens,
cost_usd}
(ints / float, missing fields -> 0 / 0.0), or None if no usable JSON found.
"""
if not text:
return None
candidate = _extract_last_json_object(text)
if candidate is None:
return None
usage = candidate.get("usage") or {}
if not isinstance(usage, dict):
usage = {}
cost = candidate.get("total_cost_usd")
if cost is None:
cost = candidate.get("cost_usd")
# If there is neither a usage block nor a cost, this isn't a result object.
if not usage and cost is None:
return None
def _int(v):
try:
return int(v)
except (TypeError, ValueError):
return 0
def _float(v):
try:
return float(v)
except (TypeError, ValueError):
return 0.0
return {
"input_tokens": _int(usage.get("input_tokens")),
"output_tokens": _int(usage.get("output_tokens")),
"cache_read_tokens": _int(
usage.get("cache_read_input_tokens", usage.get("cache_read_tokens"))
),
# The cache-CREATION slice (writing new cache entries) is part of the
# REAL input and used to be dropped on the floor. Persist it so the
# "X in" figure reflects the full prompt size, not just fresh tokens.
"cache_creation_tokens": _int(
usage.get("cache_creation_input_tokens", usage.get("cache_creation_tokens"))
),
"cost_usd": _float(cost),
# Telegram live tracker: the model the run actually used. claude
# --output-format json reports it under modelUsage (a dict keyed by the
# full model id) and/or a top-level "model" field. We keep the FULL name
# here; short_model_name() trims it for the tracker. None when unknown.
"model": _extract_model(candidate),
}
def _extract_model(candidate: dict) -> str | None:
"""Best-effort: pull the model id out of a claude result JSON object.
Prefers modelUsage (a dict keyed by full model ids, e.g.
{"claude-opus-4-8": {...}}) and returns the key with the most output
tokens; falls back to a top-level "model" string. Never raises -> None.
"""
try:
mu = candidate.get("modelUsage")
if isinstance(mu, dict) and mu:
def _out(v):
try:
return int((v or {}).get("outputTokens", 0))
except (TypeError, ValueError, AttributeError):
return 0
best = max(mu.items(), key=lambda kv: _out(kv[1]))
if best and best[0]:
return str(best[0])
model = candidate.get("model")
if isinstance(model, str) and model:
return model
except Exception:
pass
return None
def short_model_name(full: str | None) -> str:
"""Trim a full model id to a short tag for the tracker.
'tokenator/claude-opus-4-8' -> 'opus-4-8'
'vibecode/claude-sonnet-4.6' -> 'sonnet-4.6'
'claude-opus-4-8' -> 'opus-4-8'
Returns '' when full is falsy so callers can omit the ' · <model>' suffix.
"""
if not full:
return ""
name = str(full).strip()
# Drop any provider prefix up to and including the last '/'.
if "/" in name:
name = name.rsplit("/", 1)[-1]
# Drop a leading 'claude-' marketing prefix.
if name.startswith("claude-"):
name = name[len("claude-"):]
return name
def _extract_last_json_object(text: str) -> dict | None:
"""Return the last balanced top-level JSON object in `text` that parses.
Scans from the end for '}' and walks back to the matching '{' using a depth
counter (string-aware), trying json.loads on each candidate. Robust to log
lines or text emitted before the JSON.
"""
# Fast path: the whole stripped text is the JSON.
stripped = text.strip()
try:
obj = json.loads(stripped)
if isinstance(obj, dict):
return obj
except (ValueError, TypeError):
pass
# Otherwise find the last balanced { ... } block.
end = len(text)
while True:
close = text.rfind("}", 0, end)
if close == -1:
return None
depth = 0
in_str = False
esc = False
start = None
for i in range(close, -1, -1):
ch = text[i]
if in_str:
if esc:
esc = False
elif ch == "\\":
esc = True
elif ch == '"':
in_str = False
continue
if ch == '"':
in_str = True
elif ch == "}":
depth += 1
elif ch == "{":
depth -= 1
if depth == 0:
start = i
break
if start is not None:
blob = text[start:close + 1]
try:
obj = json.loads(blob)
if isinstance(obj, dict):
return obj
except (ValueError, TypeError):
pass
end = close # keep scanning earlier in the text
def parse_usage_from_log(path: str) -> dict | None:
"""Read a run log file and parse usage from it. Never raises."""
try:
with open(path, "r", encoding="utf-8", errors="replace") as f:
return parse_usage_from_text(f.read())
except OSError as e:
logger.warning(f"parse_usage_from_log: cannot read {path}: {e}")
return None
def record_usage(run_id: int, usage: dict | None):
"""Write parsed usage onto the agent_runs row. NULLs if usage is None."""
if usage is None:
logger.warning(f"run_id={run_id}: no usage JSON parsed, recording NULLs")
usage = {}
conn = get_db()
try:
conn.execute(
"UPDATE agent_runs SET input_tokens=?, output_tokens=?, "
"cache_read_tokens=?, cache_creation_tokens=?, cost_usd=?, "
"model=COALESCE(?, model) WHERE id=?",
(
usage.get("input_tokens"),
usage.get("output_tokens"),
usage.get("cache_read_tokens"),
usage.get("cache_creation_tokens"),
usage.get("cost_usd"),
usage.get("model"),
run_id,
),
)
conn.commit()
finally:
conn.close()
def fmt_tokens(n) -> str:
"""Format a token count compactly: 1234 -> '1.2k', 2_500_000 -> '2.5M'."""
try:
n = int(n or 0)
except (TypeError, ValueError):
n = 0
if n >= 1_000_000:
return f"{n / 1_000_000:.1f}M"
if n >= 1_000:
return f"{n / 1_000:.1f}k"
return str(n)
def fmt_cost(c) -> str:
"""Format USD cost with 2 decimals: '$0.21'."""
try:
c = float(c or 0.0)
except (TypeError, ValueError):
c = 0.0
return f"${c:.2f}"
# Pretty agent names for comments (mirrors STAGE_AUTHORS roles).
AGENT_DISPLAY = {
"analyst": "Analyst",
"architect": "Architect",
"developer": "Developer",
"reviewer": "Reviewer",
"tester": "Tester",
"deployer": "Deployer",
}
def _input_total(usage: dict) -> int:
"""FULL input = fresh input + cache-read + cache-creation tokens."""
def _i(k):
try:
return int(usage.get(k) or 0)
except (TypeError, ValueError):
return 0
return _i("input_tokens") + _i("cache_read_tokens") + _i("cache_creation_tokens")
def _cached_total(usage: dict) -> int:
"""Cached portion of the input = cache-read + cache-creation tokens."""
def _i(k):
try:
return int(usage.get(k) or 0)
except (TypeError, ValueError):
return 0
return _i("cache_read_tokens") + _i("cache_creation_tokens")
def fmt_in(usage: dict) -> str:
"""Render the input figure as full total with a cached breakdown.
'8.5M in (8.4M cached)' when there is a cache; '45.2k in' when cached==0.
"""
total = _input_total(usage)
cached = _cached_total(usage)
if cached > 0:
return f"{fmt_tokens(total)} in ({fmt_tokens(cached)} cached)"
return f"{fmt_tokens(total)} in"
def usage_comment(
agent: str,
usage: dict | None,
repo: str | None = None,
branch: str | None = None,
work_item_id: str | None = None,
pr_number=None,
) -> str:
"""Build the per-agent finish comment, e.g.
'\U0001f4bb Developer \u0433\u043e\u0442\u043e\u0432 \u00b7 8.5M in (8.4M cached) / 45.8k out \u00b7 $7.29'.
When repo/branch/work_item_id are supplied, the agent's artifact link(s) are
appended (BUG: only analyst used to link its docs). Missing artifacts are
silently skipped — link building never raises.
"""
usage = usage or {}
name = AGENT_DISPLAY.get(agent, agent.capitalize())
icon = AGENT_ICON.get(agent, "\u2705")
line = (
f"{icon} {name} \u0433\u043e\u0442\u043e\u0432 \u00b7 "
f"{fmt_in(usage)} / "
f"{fmt_tokens(usage.get('output_tokens'))} out \u00b7 "
f"{fmt_cost(usage.get('cost_usd'))}"
)
links = artifact_links(agent, repo, branch, work_item_id, pr_number)
if links:
line += "\n" + "\n".join(links)
return line
# Per-agent artifact file under docs/work-items/{wid}/ (architect/developer use
# special handling for ADR dirs / PR links, see artifact_links()).
AGENT_ARTIFACT = {
"reviewer": ("Review", "12-review.md"),
"tester": ("Test report", "13-test-report.md"),
"deployer": ("Deploy log", "14-deploy-log.md"),
}
def artifact_links(
agent: str,
repo: str | None,
branch: str | None,
work_item_id: str | None,
pr_number=None,
) -> list[str]:
"""Markdown link(s) to the finishing agent's artifact(s) in Gitea.
Uses gitea_public_url (falls back to gitea_url) for clickable links, mirroring
the analyst doc links. Returns [] (never raises) when there is nothing to
link or the required context is missing. analyst is intentionally NOT handled
here — its richer doc list lives in stage_engine._build_analyst_ready_comment.
"""
try:
from .config import settings
owner = getattr(settings, "gitea_owner", "admin")
base = (
getattr(settings, "gitea_public_url", "") or getattr(settings, "gitea_url", "")
).rstrip("/")
if not base or not repo:
return []
links: list[str] = []
if agent == "developer":
if branch:
links.append(
f"\U0001f4c2 [Branch {branch}]({base}/{owner}/{repo}/src/branch/{branch})"
)
if pr_number:
links.append(
f"\U0001f517 [PR #{pr_number}]({base}/{owner}/{repo}/pulls/{pr_number})"
)
return links
if agent == "architect":
if branch and work_item_id:
adr_dir = (
f"{base}/{owner}/{repo}/src/branch/{branch}/"
f"docs/work-items/{work_item_id}/06-adr"
)
links.append(f"\U0001f4d0 [ADR]({adr_dir})")
return links
spec = AGENT_ARTIFACT.get(agent)
if spec and branch and work_item_id:
label, fname = spec
href = (
f"{base}/{owner}/{repo}/src/branch/{branch}/"
f"docs/work-items/{work_item_id}/{fname}"
)
links.append(f"\U0001f4c4 [{label}]({href})")
return links
except Exception:
return []
AGENT_ICON = {
"analyst": "\U0001f50d",
"architect": "\U0001f4d0",
"developer": "\U0001f4bb",
"reviewer": "\U0001f50e",
"tester": "\U0001f9ea",
"deployer": "\U0001f680",
}
def task_usage_summary(task_id: int) -> dict:
"""Aggregate agent_runs usage for a task.
total_in counts the FULL input (input + cache_read + cache_creation), and
total_cached counts the cached portion (cache_read + cache_creation).
COALESCE(...,0) keeps pre-existing rows (NULL cache_creation) from breaking.
Returns {total_in, total_cached, total_out, total_cost,
per_agent: [(agent, in, cached, out, cost), ...]}.
"""
conn = get_db()
try:
rows = conn.execute(
"SELECT agent, "
"COALESCE(SUM(input_tokens),0) "
" + COALESCE(SUM(cache_read_tokens),0) "
" + COALESCE(SUM(cache_creation_tokens),0), "
"COALESCE(SUM(cache_read_tokens),0) "
" + COALESCE(SUM(cache_creation_tokens),0), "
"COALESCE(SUM(output_tokens),0), "
"COALESCE(SUM(cost_usd),0.0) "
"FROM agent_runs WHERE task_id=? GROUP BY agent ORDER BY agent",
(task_id,),
).fetchall()
finally:
conn.close()
per_agent = [(r[0], int(r[1]), int(r[2]), int(r[3]), float(r[4])) for r in rows]
total_in = sum(r[1] for r in per_agent)
total_cached = sum(r[2] for r in per_agent)
total_out = sum(r[3] for r in per_agent)
total_cost = sum(r[4] for r in per_agent)
return {
"total_in": total_in,
"total_cached": total_cached,
"total_out": total_out,
"total_cost": total_cost,
"per_agent": per_agent,
}
def task_summary_comment(task_id: int) -> str:
"""Build the Deployer end-of-task summary comment (Feature 4, variant B)."""
s = task_usage_summary(task_id)
cached = s.get("total_cached", 0)
head_in = (
f"{fmt_tokens(s['total_in'])} \u0432\u0445\u043e\u0434 ({fmt_tokens(cached)} cached)"
if cached > 0
else f"{fmt_tokens(s['total_in'])} \u0432\u0445\u043e\u0434"
)
lines = [
f"\U0001f4ca \u0418\u0442\u043e\u0433\u043e \u043f\u043e \u0437\u0430\u0434\u0430\u0447\u0435: "
f"{head_in} / "
f"{fmt_tokens(s['total_out'])} \u0432\u044b\u0445\u043e\u0434 \u00b7 "
f"{fmt_cost(s['total_cost'])}"
]
for agent, ti, tc, to, cost in s["per_agent"]:
name = AGENT_DISPLAY.get(agent, agent.capitalize())
in_str = (
f"{fmt_tokens(ti)} in ({fmt_tokens(tc)} cached)"
if tc > 0
else f"{fmt_tokens(ti)} in"
)
lines.append(
f"\u2022 {name}: {in_str} / {fmt_tokens(to)} out \u00b7 {fmt_cost(cost)}"
)
return "\n".join(lines)

52
src/webhooks/_dedup.py Normal file
View File

@@ -0,0 +1,52 @@
"""ORCH-5 (M-7): webhook delivery de-duplication helper.
Webhook providers (Gitea/Plane) retry deliveries on timeout, network reset, or
manual replay. Without idempotency a retried delivery re-enters the pipeline and
spawns a duplicate run (the ET-009 incident class: parallel conveyors on one
repo). This module computes a stable per-delivery id so the webhook handlers can
INSERT-OR-IGNORE into events and skip the dispatch on a repeat.
delivery_id format: ``f"{source}:{raw_or_hash}"`` where source prefixes
gitea/plane so their id-spaces never collide. ``raw`` is the provider's native
delivery header (a GUID) when present; otherwise we fall back to a sha256 of the
body (a retried identical body yields the same hash).
"""
import hashlib
def _sha256_hex(*parts: str) -> str:
h = hashlib.sha256()
for p in parts:
h.update(p.encode("utf-8", "replace"))
return h.hexdigest()
def gitea_delivery_id(headers, event_type: str, body: bytes) -> str:
"""Compute the delivery_id for a Gitea webhook.
Prefers the ``X-Gitea-Delivery`` header (a per-delivery GUID). Falls back to
sha256(source + event_type + body) so a retried identical body still maps to
one id even if Gitea omitted the header.
"""
raw = (headers.get("X-Gitea-Delivery") or "").strip()
if not raw:
raw = _sha256_hex("gitea", event_type or "", body.decode("utf-8", "replace"))
return f"gitea:{raw}"
def plane_delivery_id(headers, body: bytes) -> str:
"""Compute the delivery_id for a Plane webhook.
Plane does not reliably send a delivery header, so we try a couple of common
names and otherwise fall back to sha256("plane" + body): a retried identical
body yields the same id.
"""
raw = (
headers.get("X-Plane-Delivery")
or headers.get("X-Hook-Delivery")
or ""
).strip()
if not raw:
raw = _sha256_hex("plane", body.decode("utf-8", "replace"))
return f"plane:{raw}"

View File

@@ -10,12 +10,20 @@ import httpx
from fastapi import APIRouter, Request, HTTPException
from ..config import settings
from ..db import get_db, get_task_by_repo_branch, update_task_stage
from ..db import (
get_db,
get_task_by_repo_branch,
update_task_stage,
enqueue_job,
insert_event_dedup,
)
from ._dedup import gitea_delivery_id
from ..stages import get_next_stage, get_agent_for_stage
from ..qg.checks import check_ci_green, check_review_approved
from ..notifications import notify_stage_change, notify_qg_failure, notify_error
from ..agents.launcher import launcher
from ..plane_sync import notify_stage_change as plane_notify_stage
from ..projects import get_project_by_repo
logger = logging.getLogger("orchestrator.webhooks.gitea")
@@ -50,15 +58,17 @@ async def gitea_webhook(request: Request):
payload = json.loads(body)
# Log event
conn = get_db()
# ORCH-5 (M-7): idempotent logging. Compute a stable delivery_id (X-Gitea-Delivery
# GUID, or sha256 fallback) and INSERT OR IGNORE. A repeated delivery (Gitea retry
# / manual replay) returns inserted=False -> log + return {"status":"duplicate"}
# WITHOUT re-dispatching, so the pipeline is not re-triggered (ET-009 class).
# Runs AFTER HMAC verification above.
event_type = request.headers.get("X-Gitea-Event", "unknown")
conn.execute(
"INSERT INTO events (source, event_type, payload) VALUES (?, ?, ?)",
("gitea", event_type, body.decode()),
)
conn.commit()
conn.close()
delivery_id = gitea_delivery_id(request.headers, event_type, body)
inserted = insert_event_dedup("gitea", event_type, body.decode(), delivery_id)
if not inserted:
logger.info(f"Gitea webhook duplicate delivery_id={delivery_id}, skipping dispatch")
return {"status": "duplicate"}
if event_type == "push":
await handle_push(payload)
@@ -84,6 +94,11 @@ async def handle_push(payload: dict):
repo_name = payload.get("repository", {}).get("name", settings.default_repo)
# ORCH-6: ignore pushes to repos outside the project registry.
if not get_project_by_repo(repo_name):
logger.info(f"Gitea push: ignoring unknown repo '{repo_name}'")
return
task = get_task_by_repo_branch(repo_name, branch)
if not task:
logger.debug(f"Push to '{branch}' — no matching task found")
@@ -117,8 +132,8 @@ async def handle_push(payload: dict):
if agent:
try:
task_desc = f"Work item: {work_item_id}\nRepo: {repo_name}\nBranch: {branch}\nStage: {next_stage}"
run_id = launcher.launch(agent, repo_name, task_desc, task_id=task_id)
logger.info(f"Task {task_id}: push triggered {current_stage}{next_stage}, launched '{agent}' (run_id={run_id})")
job_id = enqueue_job(agent, repo_name, task_desc, task_id=task_id)
logger.info(f"Task {task_id}: push triggered {current_stage}{next_stage}, enqueued '{agent}' (job_id={job_id})")
except Exception as e:
notify_error(task_id, f"Failed to launch agent '{agent}': {e}")
@@ -167,6 +182,12 @@ async def handle_ci_status(payload: dict):
return
repo_name = payload.get("repository", {}).get("name", settings.default_repo)
# ORCH-6: ignore CI status for repos outside the project registry.
if not get_project_by_repo(repo_name):
logger.info(f"Gitea CI status: ignoring unknown repo '{repo_name}'")
return
task = get_task_by_repo_branch(repo_name, branch)
if not task:
return
@@ -188,19 +209,38 @@ async def handle_ci_status(payload: dict):
if agent:
try:
task_desc = f"Work item: {work_item_id}\nRepo: {repo_name}\nBranch: {branch}\nStage: {next_stage}"
run_id = launcher.launch(agent, repo_name, task_desc, task_id=task_id)
logger.info(f"Task {task_id}: CI green → {next_stage}, launched '{agent}' (run_id={run_id})")
job_id = enqueue_job(agent, repo_name, task_desc, task_id=task_id)
logger.info(f"Task {task_id}: CI green → {next_stage}, enqueued '{agent}' (job_id={job_id})")
except Exception as e:
notify_error(task_id, f"Failed to launch agent '{agent}': {e}")
else:
notify_qg_failure(task_id, current_stage, "check_ci_green", reason)
elif state == "failure":
# S-1: Gitea CI is NOT the authoritative gate anymore (the orchestrator runs
# tests locally via check_tests_local). Gitea CI is often unconfigured, so a
# "failure"/empty status here is not actionable. Log only, do not alert.
logger.debug(f"Task {task_id}: Gitea CI state='failure' on branch '{branch}' "
f"(non-authoritative, suppressed — local tests are the gate)")
elif state == "failure" and current_stage == "development":
# CI is the authoritative gate for development -> review.
# On red CI: notify, then bounce the task back to the developer (capped retries),
# symmetric to the review REQUEST_CHANGES path.
notify_qg_failure(task_id, current_stage, "check_ci_green", f"Gitea CI failed on branch '{branch}'")
conn = get_db()
retry_count = conn.execute(
"SELECT COUNT(*) as cnt FROM agent_runs WHERE task_id = ? AND agent = 'developer'",
(task_id,),
).fetchone()["cnt"]
conn.close()
if retry_count < MAX_DEV_RETRIES:
# task already on 'development' — no stage change needed, just relaunch developer
try:
task_desc = (
f"Work item: {work_item_id}\nRepo: {repo_name}\nBranch: {branch}\n"
f"Stage: development\nNote: CI failed, fix and re-push (attempt {retry_count + 1}/{MAX_DEV_RETRIES})"
)
job_id = enqueue_job("developer", repo_name, task_desc, task_id=task_id)
logger.info(f"Task {task_id}: CI failed, enqueued developer (attempt {retry_count + 1}, job_id={job_id})")
except Exception as e:
notify_error(task_id, f"Failed to relaunch developer after CI failure: {e}")
else:
notify_error(task_id, f"Max developer retries ({MAX_DEV_RETRIES}) reached after CI failure, escalating")
logger.error(f"Task {task_id}: max retries reached after CI failure, needs manual intervention")
async def handle_pr(payload: dict):
@@ -221,6 +261,11 @@ async def handle_pr(payload: dict):
if not head_branch:
return
# ORCH-6: ignore PR events for repos outside the project registry.
if not get_project_by_repo(repo_name):
logger.info(f"Gitea PR: ignoring unknown repo '{repo_name}'")
return
task = get_task_by_repo_branch(repo_name, head_branch)
if not task:
logger.debug(f"PR event for branch '{head_branch}' — no matching task")
@@ -255,8 +300,8 @@ async def handle_pr(payload: dict):
if agent:
try:
task_desc = f"Work item: {work_item_id}\nRepo: {repo_name}\nBranch: {head_branch}\nStage: {next_stage}"
run_id = launcher.launch(agent, repo_name, task_desc, task_id=task_id)
logger.info(f"Task {task_id}: PR approved → {next_stage}, launched '{agent}' (run_id={run_id})")
job_id = enqueue_job(agent, repo_name, task_desc, task_id=task_id)
logger.info(f"Task {task_id}: PR approved → {next_stage}, enqueued '{agent}' (job_id={job_id})")
except Exception as e:
notify_error(task_id, f"Failed to launch agent '{agent}': {e}")
else:
@@ -280,8 +325,8 @@ async def handle_pr(payload: dict):
f"Work item: {work_item_id}\nRepo: {repo_name}\nBranch: {head_branch}\n"
f"Stage: development\nNote: Changes requested in review (attempt {retry_count + 1}/{MAX_DEV_RETRIES})"
)
run_id = launcher.launch("developer", repo_name, task_desc, task_id=task_id)
logger.info(f"Task {task_id}: changes requested, relaunching developer (attempt {retry_count + 1})")
job_id = enqueue_job("developer", repo_name, task_desc, task_id=task_id)
logger.info(f"Task {task_id}: changes requested, enqueued developer (attempt {retry_count + 1}, job_id={job_id})")
except Exception as e:
notify_error(task_id, f"Failed to relaunch developer: {e}")
else:
@@ -289,6 +334,20 @@ async def handle_pr(payload: dict):
logger.error(f"Task {task_id}: max retries reached, needs manual intervention")
elif action == "closed" and pr.get("merged", False):
# BUG 8 (second door): at the deploy stage `done` is gated by the
# deployer's verdict (check_deploy_status via advance_stage), NOT by the
# fact that the PR was merged. The deployer merges the PR at the START of
# its run, so a merged webhook arrives ~30s later while the deployer is
# still working — blindly setting done here would fake-complete the task
# and discard a later deploy_status: FAILED verdict. advance_stage will
# drive deploy→done (and Plane→Done) when the deployer job finishes.
# For every OTHER stage the merge-driven done behaviour is preserved.
if current_stage == "deploy":
logger.info(
f"Task {task_id}: PR merged at deploy stage — done gated by "
f"deployer verdict (check_deploy_status), ignoring merge-driven done."
)
return
update_task_stage(task_id, "done")
notify_stage_change(task_id, current_stage, "done")
logger.info(f"Task {task_id}: PR merged, stage → done")

View File

@@ -13,8 +13,12 @@ from ..db import (
get_db,
get_task_by_plane_id,
get_next_work_item_id,
ensure_unique_work_item_id,
update_task_stage,
enqueue_job,
insert_event_dedup,
)
from ._dedup import plane_delivery_id
from ..stages import get_next_stage, get_agent_for_stage, get_qg_for_stage, get_previous_stage
from ..qg.checks import QG_CHECKS
from ..notifications import notify_stage_change, notify_qg_failure, notify_error
@@ -24,6 +28,11 @@ from ..plane_sync import (
notify_qg_failure as plane_notify_qg,
notify_done as plane_notify_done,
)
from ..projects import (
get_project_by_plane_id,
get_project_by_repo,
known_plane_project_ids,
)
logger = logging.getLogger("orchestrator.webhooks.plane")
@@ -55,42 +64,293 @@ async def plane_webhook(request: Request):
payload = json.loads(body)
# Log event
conn = get_db()
conn.execute(
"INSERT INTO events (source, event_type, payload) VALUES (?, ?, ?)",
("plane", payload.get("event", "unknown"), body.decode()),
)
conn.commit()
conn.close()
# ORCH-5 (M-7): idempotent logging. Plane rarely sends a delivery header, so the
# delivery_id falls back to sha256("plane" + body) (a retried identical body maps
# to one id). INSERT OR IGNORE; a duplicate returns inserted=False -> log + return
# {"status":"duplicate"} WITHOUT dispatching. Runs AFTER HMAC and BEFORE the ORCH-6
# project filter, so a repeat does no extra work; the FIRST delivery of an unknown
# project still falls through to the filter below and returns {"status":"ignored"}.
event_type = payload.get("event", "unknown")
delivery_id = plane_delivery_id(request.headers, body)
inserted = insert_event_dedup("plane", event_type, body.decode(), delivery_id)
if not inserted:
logger.info(f"Plane webhook duplicate delivery_id={delivery_id}, skipping dispatch")
return {"status": "duplicate"}
event = payload.get("event")
action = payload.get("action", "")
data = payload.get("data", {})
# ORCH-6: filter by Plane project. Ignore issues from unknown/unconfigured
# projects so a webhook on the whole workspace cannot funnel everything into
# the default repo (root cause of the 2026-06-02 incident).
project_id = data.get("project") or data.get("project_id") or ""
if project_id not in known_plane_project_ids():
logger.info(
f"Plane webhook: ignoring event '{event}' from unknown project "
f"'{project_id}' (known: {len(known_plane_project_ids())})"
)
return {"status": "ignored", "reason": "unknown project"}
if (event == "work_item.created") or (event == "issue" and action == "created"):
await handle_work_item_created(data)
# Feature 1: creation NO LONGER starts the pipeline. Slava keeps the
# backlog until he moves an issue to In Progress. We only run a soft
# QG-0 sanity log here (no branch, no analyst, no task row).
await handle_work_item_created(data, project_id)
elif (event == "work_item.updated") or (event == "issue" and action == "updated"):
# Status-only verdict model: status changes drive the pipeline.
# Backlog/Todo/Triage -> In Progress : START pipeline, or relaunch the
# stage agent if returned from
# Needs Input.
# -> Approved : advance to the next stage.
# -> Rejected : rollback (reason from latest comment).
await handle_issue_updated(data, project_id)
elif (event == "comment.created") or (event == "issue_comment" and action == "created"):
await handle_comment(data)
await handle_comment(data, project_id)
return {"status": "accepted"}
async def handle_work_item_created(data: dict):
def _state_id(data: dict) -> str:
"""Extract the new Plane state UUID from an 'issue updated' payload.
Real payload (verified from prod events): data.state is
{id, name, color, group}. Some payloads carry state as a bare UUID string.
"""
New work item created in Plane.
QG-0: validate title, description, priority.
If valid: create branch, init docs, launch analyst.
If invalid: comment with what's missing, set Blocked.
state = data.get("state")
if isinstance(state, dict):
return state.get("id", "") or ""
if isinstance(state, str):
return state
return ""
async def handle_issue_updated(data: dict, project_id: str = ""):
"""Feature 1 & 2: react to a Plane issue status change.
Routes the NEW state UUID (data.state.id) to:
- in_progress : start the pipeline if this issue has no task yet; if a
task already exists and the stage agent is idle (returned from Needs
Input), relaunch the stage agent so it reads Slava's fresh comments.
- approved : advance to the next stage.
- rejected : rollback to the previous stage (reason from latest comment).
Any other status (Needs Input, In Review, Blocked, Done, board stages, etc.)
is ignored here — those are statuses the orchestrator itself sets.
"""
from ..plane_sync import PLANE_STATES
plane_id = str(data.get("id") or "")
new_state = _state_id(data)
if not plane_id or not new_state:
logger.info("issue updated without id/state, ignoring")
return
if new_state == PLANE_STATES["in_progress"]:
await handle_status_start(data, project_id)
elif new_state == PLANE_STATES["approved"]:
await handle_verdict(data, project_id, approved=True)
elif new_state == PLANE_STATES["rejected"]:
await handle_verdict(data, project_id, approved=False)
else:
logger.info(f"issue {plane_id} updated to state {new_state[:8]}..., no pipeline action")
async def handle_status_start(data: dict, project_id: str = ""):
"""An issue moved into In Progress.
Two cases under the status-only verdict model:
1. No task yet for this plane_id -> START the pipeline (start_pipeline).
2. A task already exists -> this is Slava returning the issue from
Needs Input to In Progress after answering the analyst's questions. We
must RELAUNCH the current stage's agent so it reads the fresh comments
from Plane (the answer-to-questions flow used to live in handle_comment;
it is now status-driven).
KEY FORK — telling "answer to questions" apart from a plain duplicate In
Progress webhook (the dedup-protection case):
The tasks table stores no Plane status, and the issue.updated payload only
carries the NEW state (In Progress), so we cannot read the previous status
from here. Instead we use the only reliable local signal: whether the
stage's agent is currently in flight.
- The orchestrator sets In Progress itself while an agent runs. When the
agent FINISHES it leaves the issue in Needs Input or In Review and has
NO queued/running job. So: an existing task with NO active job means the
agent is idle / waiting -> a return to In Progress is a genuine relaunch
request -> enqueue the stage agent.
- If a queued/running job already exists for the task, the agent is busy
(or a duplicate webhook arrived) -> SKIP (no double launch). The events
de-dup at the top of plane_webhook already absorbs identical webhook
bodies; this job guard additionally covers distinct webhooks fired while
a job is still pending/running.
"""
from ..db import has_active_job_for_task
plane_id = str(data.get("id") or "")
existing = get_task_by_plane_id(plane_id)
if not existing:
logger.info(f"Status->In Progress for {plane_id}: starting pipeline")
await start_pipeline(data, project_id)
return
task_id = existing["id"]
current_stage = existing["stage"]
repo = existing["repo"]
work_item_id = existing.get("work_item_id", "")
branch = existing.get("branch", "")
# Duplicate / busy guard: a job is already pending or running for this task.
if has_active_job_for_task(task_id):
logger.info(
f"Status->In Progress for {plane_id}: task {task_id} already has an "
f"active job (stage={current_stage}), not relaunching"
)
return
# Agent is idle -> Slava answered questions and returned the issue to In
# Progress. Relaunch the current stage's agent to read the fresh comments.
from ..plane_sync import STAGE_AUTHORS, add_comment as _add_comment
stage_agent = STAGE_AUTHORS.get(current_stage)
if not stage_agent:
logger.info(
f"Status->In Progress for {plane_id}: no agent for stage "
f"'{current_stage}', not relaunching"
)
return
task_desc = (
f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
f"Stage: {current_stage}\nNote: Stakeholder returned the issue to In "
f"Progress (answered your questions). Read the latest comments in Plane "
f"and revise your artifacts."
)
job_id = enqueue_job(stage_agent, repo, task_desc, task_id=task_id)
logger.info(
f"Task {task_id}: returned to In Progress (Needs Input answered), "
f"relaunched {stage_agent} for stage {current_stage} (job_id={job_id})"
)
try:
_add_comment(
work_item_id,
"\U0001f504 \u0410\u0433\u0435\u043d\u0442 \u043f\u0435\u0440\u0435\u0437\u0430\u043f\u0443\u0449\u0435\u043d \u0441 \u043e\u0442\u0432\u0435\u0442\u0430\u043c\u0438 \u0441\u0442\u0435\u0439\u043a\u0445\u043e\u043b\u0434\u0435\u0440\u0430.",
author=stage_agent,
)
except Exception as e:
logger.error(f"Failed to post relaunch comment for {work_item_id}: {e}")
async def handle_verdict(data: dict, project_id: str, approved: bool):
"""Status-only verdict: a Plane status change drives advance / rollback.
Approved status -> _try_advance_stage. We do NOT touch the issue status here:
_try_advance_stage -> advance_stage -> plane_notify_stage already PATCHes the
issue to the NEXT stage's status. The old set_issue_in_progress call reset
the status to In Progress first, which made the board flicker In Progress
before the next stage (part of bug 3); it is removed.
Rejected status -> rollback to the previous stage. The reason is pulled from
the issue's latest comment (Slava writes the reason in a comment before/with
flipping the status to Rejected).
"""
plane_id = str(data.get("id") or "")
task = get_task_by_plane_id(plane_id)
if not task:
logger.warning(f"Verdict status for {plane_id} but no task found, ignoring")
return
task_id = task["id"]
current_stage = task["stage"]
repo = task["repo"]
work_item_id = task.get("work_item_id", "")
branch = task.get("branch", "")
if approved:
# NOTE: no set_issue_in_progress here — _try_advance_stage sets the next
# stage's status itself (advance_stage -> plane_notify_stage).
logger.info(f"Task {task_id}: Approved status -> advance from {current_stage}")
await _try_advance_stage(task_id, current_stage, repo, work_item_id, branch)
return
# Rejected: pull the rejection reason from the issue's latest comment.
issue_id = task.get("plane_issue_id") or task.get("plane_id") or plane_id
reason = _latest_comment_reason(issue_id, repo, project_id)
await _rollback_stage(
task_id, current_stage, repo, work_item_id, branch, reason
)
def _latest_comment_reason(issue_id: str, repo: str, project_id: str = "") -> str:
"""Fetch the issue's most recent comment text (HTML stripped) as the reject
reason. Slava writes the reason in a comment before/with flipping the status
to Rejected.
Returns a fixed fallback when there is no comment / the API call fails.
"""
from ..plane_sync import (
PLANE_BASE,
PLANE_HEADERS,
WORKSPACE,
PROJECT_ID as _DEFAULT_PROJECT_ID,
)
fallback = "Rejected via status, no reason comment"
if not issue_id:
return fallback
_proj = get_project_by_repo(repo)
pid = _proj.plane_project_id if _proj else (project_id or _DEFAULT_PROJECT_ID)
url = (
f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{pid}/issues/"
f"{issue_id}/comments/"
)
try:
resp = httpx.get(url, headers=PLANE_HEADERS, timeout=10)
if resp.status_code != 200:
logger.warning(
f"reject-reason: GET comments for {issue_id} returned "
f"{resp.status_code}"
)
return fallback
payload = resp.json()
comments = payload.get("results", payload) if isinstance(payload, dict) else payload
if not comments:
return fallback
latest = max(comments, key=lambda c: c.get("created_at", "") or "")
raw = (
latest.get("comment_stripped")
or latest.get("comment_html")
or latest.get("comment")
or ""
)
text = re.sub(r"<[^>]+>", "", raw).strip()
return text[:300] if text else fallback
except Exception as e:
logger.error(f"reject-reason: failed to fetch comments for {issue_id}: {e}")
return fallback
async def handle_work_item_created(data: dict, project_id: str = ""):
"""Feature 1: creation does NOT start the pipeline anymore.
The pipeline is started when Slava moves the issue into In Progress
(handle_status_start -> start_pipeline). On creation we only run a SOFT QG-0
sanity check and log the result — NO branch, NO docs, NO analyst, NO task row
— so the issue can sit in the backlog until Slava is ready.
"""
plane_id = data.get("id", "")
name = data.get("name", "untitled")
description = data.get("description_stripped", data.get("description", ""))
priority = data.get("priority", {})
priority_name = priority if isinstance(priority, str) else priority.get("name", "")
repo = settings.default_repo
errors = _qg0_errors(name, description)
if errors:
logger.info(f"work_item.created {plane_id}: soft QG-0 warnings: {errors}")
else:
logger.info(f"work_item.created {plane_id} ('{name}'): in backlog, awaiting In Progress")
# QG-0 validation
def _qg0_errors(name: str, description: str) -> list:
"""QG-0 validation: returns a list of human-readable problems (empty = OK)."""
errors = []
if not name or len(name) < 5:
errors.append("Title \u0441\u043b\u0438\u0448\u043a\u043e\u043c \u043a\u043e\u0440\u043e\u0442\u043a\u0438\u0439 (\u043d\u0443\u0436\u043d\u043e >= 5 \u0441\u0438\u043c\u0432\u043e\u043b\u043e\u0432)")
@@ -99,20 +359,80 @@ async def handle_work_item_created(data: dict):
if not description or len(description.strip()) < 20:
errors.append("Description \u0441\u043b\u0438\u0448\u043a\u043e\u043c \u043a\u043e\u0440\u043e\u0442\u043a\u0438\u0439 (\u043d\u0443\u0436\u043d\u043e >= 20 \u0441\u0438\u043c\u0432\u043e\u043b\u043e\u0432)")
return errors
async def start_pipeline(data: dict, project_id: str = ""):
"""Feature 1: start the pipeline for an issue (moved to In Progress).
This is the body extracted from the old handle_work_item_created: resolve the
project, run QG-0 (hard — blocks on failure), create the work item id +
branch + initial docs, insert the task row, and enqueue the analyst.
Callers (handle_status_start) already guarantee no existing task for this
plane_id, so this never duplicates.
"""
plane_id = data.get("id", "")
name = data.get("name", "untitled")
description = data.get("description_stripped", data.get("description", ""))
# ORCH-6: resolve repo / prefix / Plane project from the registry instead of
# the single hardcoded default_repo.
if not project_id:
project_id = data.get("project") or data.get("project_id") or ""
proj = get_project_by_plane_id(project_id)
if not proj:
logger.warning(f"start_pipeline: unknown project '{project_id}', ignoring {plane_id}")
return
repo = proj.repo
plane_project_id = proj.plane_project_id
# BUG 1 + BUG B: Plane's issue.updated webhook (status change -> In Progress)
# sends only the CHANGED fields, so BOTH description / description_stripped
# AND name are usually empty here even though the issue HAS them. Pull the
# full title + description from the Plane issue detail API in a SINGLE GET
# (fetch_issue_fields: same endpoint + shared token already used by
# fetch_issue_sequence_id) before QG-0 and before the branch slug is built.
# If the API is also empty, QG-0 legitimately fails (truly empty ticket) and
# name falls back to "untitled".
name_missing = (not name) or name.strip().lower() == "untitled" or len(name.strip()) < 3
desc_missing = (not description) or len(description.strip()) < 20
if name_missing or desc_missing:
from ..plane_sync import fetch_issue_fields
fetched_name, fetched_desc = fetch_issue_fields(plane_id, plane_project_id)
if desc_missing and fetched_desc and len(fetched_desc.strip()) >= len(description.strip()):
description = fetched_desc
logger.info(
f"start_pipeline: pulled description from Plane API for {plane_id} "
f"({len(description.strip())} chars)"
)
if name_missing and fetched_name and len(fetched_name.strip()) >= 3:
name = fetched_name
logger.info(
f"start_pipeline: pulled name from Plane API for {plane_id} "
f"('{name}')"
)
# BUG B fallback: if name is still empty/blank after the API pull, keep the
# legacy "untitled" so the slug/branch build never crashes on an empty name.
if not name or not name.strip():
name = "untitled"
# QG-0 validation (hard gate on pipeline start)
errors = _qg0_errors(name, description)
if errors:
# QG-0 failed
error_text = "\u26a0\ufe0f QG-0 failed:\n" + "\n".join(f"\u2022 {e}" for e in errors)
from ..plane_sync import PLANE_BASE, PLANE_HEADERS, WORKSPACE, PROJECT_ID, PLANE_STATES
from ..plane_sync import PLANE_BASE, PLANE_HEADERS, WORKSPACE, PLANE_STATES
import httpx as _httpx
# Post comment
url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{PROJECT_ID}/issues/{plane_id}/comments/"
# Post comment (ORCH-6: route to the issue's own project)
url = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{plane_project_id}/issues/{plane_id}/comments/"
try:
_httpx.post(url, headers=PLANE_HEADERS,
json={"comment_html": f"<p>{error_text}</p>"}, timeout=10)
except Exception:
pass
# Set blocked
url2 = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{PROJECT_ID}/issues/{plane_id}/"
url2 = f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{plane_project_id}/issues/{plane_id}/"
try:
_httpx.patch(url2, headers=PLANE_HEADERS,
json={"state": PLANE_STATES["blocked"]}, timeout=10)
@@ -121,18 +441,62 @@ async def handle_work_item_created(data: dict):
logger.info(f"QG-0 failed for {plane_id}: {errors}")
return
# Generate work item ID
work_item_id = get_next_work_item_id(repo)
# Generate work item ID.
# M-6: source of truth for the number is the Plane sequence_id. Fetch it by
# issue UUID; if Plane is unavailable, fall back to the DB increment so a
# Plane outage never blocks task creation (autonomy > exact numbering).
from ..plane_sync import fetch_issue_sequence_id
seq = fetch_issue_sequence_id(plane_id, plane_project_id)
if seq is not None:
work_item_id = f"{proj.work_item_prefix}-{seq:03d}"
else:
work_item_id = get_next_work_item_id(repo, proj.work_item_prefix)
logger.warning(
f"Plane sequence_id unavailable for {plane_id}, "
f"fell back to DB increment: {work_item_id}"
)
# BUG 2a: uniqueness-guard LAYERED ON TOP of the M-6 derive above (the derive
# itself is untouched). If the derived ET-NNN is already taken by another
# task in this repo (collision -> two tasks would share branch/worktree, see
# ET-006), bump to the next free number.
_derived = work_item_id
work_item_id = ensure_unique_work_item_id(work_item_id, repo)
if work_item_id != _derived:
logger.warning(
f"work_item_id collision: derived {_derived} already in use for "
f"{repo}; reassigned {plane_id} -> {work_item_id}"
)
# Create slug from name
slug = re.sub(r"[^a-z0-9]+", "-", name.lower()).strip("-")[:30]
branch = f"feature/{work_item_id}-{slug}"
# BUG 2b (defense-in-depth): the worktree/path is keyed by BRANCH
# (git_worktree.get_worktree_path) and tasks are reverse-resolved by
# (repo, branch). With 2a the work_item_id is unique, so the branch prefix is
# too; but the slug could still collide (e.g. two issues with the same title
# under different ids -> fine) or, worse, an identical branch already exist.
# Guard physically: if this exact branch is already owned by another task in
# this repo, disambiguate with the (now unique) work_item_id so two tasks can
# never share a worktree.
_conn_b = get_db()
_branch_taken = _conn_b.execute(
"SELECT 1 FROM tasks WHERE repo = ? AND branch = ? LIMIT 1", (repo, branch)
).fetchone()
_conn_b.close()
if _branch_taken is not None:
branch = f"feature/{work_item_id}-{plane_id[:8]}"
logger.warning(
f"branch collision for {repo}; disambiguated to unique branch {branch}"
)
# Insert task into DB
conn = get_db()
conn.execute(
"INSERT INTO tasks (plane_id, work_item_id, repo, branch, stage, plane_issue_id) VALUES (?, ?, ?, ?, ?, ?)",
(plane_id, work_item_id, repo, branch, "analysis", plane_id),
"INSERT INTO tasks (plane_id, work_item_id, repo, branch, stage, plane_issue_id, title) "
"VALUES (?, ?, ?, ?, ?, ?, ?)",
(plane_id, work_item_id, repo, branch, "analysis", plane_id, name),
)
conn.commit()
conn.close()
@@ -159,209 +523,133 @@ async def handle_work_item_created(data: dict):
task_row = get_db().execute("SELECT id FROM tasks WHERE work_item_id=?", (work_item_id,)).fetchone()
if task_row:
task_id = task_row[0]
task_desc = f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\nStage: analysis\nTitle: {name}"
run_id = launcher.launch("analyst", repo, task_desc, task_id=task_id)
logger.info(f"Task {task_id}: launched analyst (run_id={run_id})")
task_desc = (
f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
f"Stage: analysis\nTitle: {name}\n\nDescription:\n{description}"
)
job_id = enqueue_job("analyst", repo, task_desc, task_id=task_id)
logger.info(f"Task {task_id}: enqueued analyst (job_id={job_id})")
# Post start comment to Plane
from ..plane_sync import add_comment as _add_comment
_add_comment(work_item_id, "\U0001f50d Analyst \u0437\u0430\u043f\u0443\u0449\u0435\u043d. BRD/\u0422\u0417/AC/TestPlan \u0432 \u0440\u0430\u0431\u043e\u0442\u0435 (\u043e\u0436\u0438\u0434\u0430\u0439\u0442\u0435 8-15 \u043c\u0438\u043d).")
_add_comment(work_item_id, "\U0001f50d Analyst \u0437\u0430\u043f\u0443\u0449\u0435\u043d. BRD/\u0422\u0417/AC/TestPlan \u0432 \u0440\u0430\u0431\u043e\u0442\u0435 (\u043e\u0436\u0438\u0434\u0430\u0439\u0442\u0435 8-15 \u043c\u0438\u043d).", author="analyst")
except Exception as e:
logger.error(f"Failed to launch analyst for {work_item_id}: {e}")
async def handle_comment(data: dict):
async def handle_comment(data: dict, project_id: str = ""):
"""Status-only verdict model: comments NEVER drive the pipeline.
The whole comment-based control mechanism (``:approved:`` / ``:rejected:``
and the analysis answer-to-questions flow) was removed. It caused bug 3
(echo self-hit): the analyst posts its own "waiting for approval" comment,
handle_comment catches its own comment and reverts In Review -> In Progress.
Comments are now logged only — no status change, no enqueue, no side effect.
The pipeline is driven solely by status changes (handle_issue_updated):
- Approved -> advance
- Rejected -> rollback (reason pulled from the latest comment)
- In Progress (returned from Needs Input) -> relaunch the stage agent
"""
Handle comment event — check for :approved: or :rejected:.
Advance or rollback stage accordingly.
plane_id = str(
data.get("work_item_id") or data.get("issue_id") or data.get("issue") or ""
)
logger.info(
f"comment.created for {plane_id}: logged only, no pipeline action "
f"(status-only verdict model)"
)
async def _rollback_stage(
task_id: int, current_stage: str, repo: str, work_item_id: str, branch: str,
reason: str,
):
"""Rollback triggered by a status change to Rejected.
- at analysis: relaunch the analyst with the rejection reason;
- otherwise: roll back to the previous stage and relaunch its agent
(via the existing rollback notify + an enqueue of the prev-stage agent).
"""
comment_body = data.get("comment_stripped", data.get("comment", data.get("body", data.get("comment_html", ""))))
plane_id = str(data.get("work_item_id") or data.get("issue_id") or data.get("issue") or "")
if not plane_id:
logger.warning("Comment event without work_item_id, skipping")
return
task = get_task_by_plane_id(plane_id)
if not task:
logger.warning(f"No task found for plane_id={plane_id}")
return
task_id = task["id"]
current_stage = task["stage"]
repo = task["repo"]
work_item_id = task.get("work_item_id", "")
branch = task.get("branch", "")
if ":rejected:" in comment_body:
# Extract reason (text after :rejected:)
reason = comment_body.split(":rejected:", 1)[-1].strip()[:300]
if current_stage == "analysis":
# Already in analysis — just relaunch analyst with rejection reason
from ..plane_sync import set_issue_in_progress
set_issue_in_progress(work_item_id)
task_desc = (
f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
f"Stage: analysis\nNote: Stakeholder REJECTED your artifacts. "
f"Reason: {reason}\nRevise and improve."
)
new_run = launcher.launch("analyst", repo, task_desc, task_id=task_id)
from ..plane_sync import add_comment as _plane_comment
_plane_comment(work_item_id, f"\U0001f504 Analyst \u043f\u0435\u0440\u0435\u0437\u0430\u043f\u0443\u0449\u0435\u043d. \u041f\u0440\u0438\u0447\u0438\u043d\u0430 \u043e\u0442\u043a\u043b\u043e\u043d\u0435\u043d\u0438\u044f: {reason}")
logger.info(f"Task {task_id}: rejected at analysis, relaunched analyst")
else:
# Rollback to previous stage
prev_stage = get_previous_stage(current_stage)
if prev_stage:
update_task_stage(task_id, prev_stage)
from ..plane_sync import set_issue_in_progress
set_issue_in_progress(work_item_id)
notify_stage_change(task_id, current_stage, prev_stage)
plane_notify_stage(work_item_id, current_stage, prev_stage)
from ..plane_sync import add_comment as _plane_comment
_plane_comment(work_item_id, f"\U0001f504 \u041e\u0442\u043a\u0430\u0442: {current_stage} \u2192 {prev_stage}. \u041f\u0440\u0438\u0447\u0438\u043d\u0430: {reason}")
logger.info(f"Task {task_id}: rejected, rolled back {current_stage} \u2192 {prev_stage}")
return
if ":approved:" in comment_body:
if current_stage == "analysis":
# Already in analysis — just relaunch analyst with rejection reason
from ..plane_sync import set_issue_in_progress
set_issue_in_progress(work_item_id)
# Try to advance stage
await _try_advance_stage(task_id, current_stage, repo, work_item_id, branch)
task_desc = (
f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
f"Stage: analysis\nNote: Stakeholder REJECTED your artifacts. "
f"Reason: {reason}\nRevise and improve."
)
new_job = enqueue_job("analyst", repo, task_desc, task_id=task_id)
from ..plane_sync import add_comment as _plane_comment
_plane_comment(work_item_id, f"\U0001f504 Analyst \u043f\u0435\u0440\u0435\u0437\u0430\u043f\u0443\u0449\u0435\u043d. \u041f\u0440\u0438\u0447\u0438\u043d\u0430 \u043e\u0442\u043a\u043b\u043e\u043d\u0435\u043d\u0438\u044f: {reason}", author="analyst")
logger.info(f"Task {task_id}: rejected at analysis, enqueued analyst (job_id={new_job})")
return
# Task 3: If neither :approved: nor :rejected: — check if this is an answer to questions
if current_stage == "analysis":
from ..plane_sync import PLANE_STATES, set_issue_in_progress
issue_id = task.get("plane_issue_id") or task.get("plane_id")
if not issue_id:
issue_id = plane_id
if issue_id:
from ..plane_sync import PLANE_BASE, PLANE_HEADERS, WORKSPACE, PROJECT_ID
import httpx as _httpx
try:
_resp = _httpx.get(
f"{PLANE_BASE}/workspaces/{WORKSPACE}/projects/{PROJECT_ID}/issues/{issue_id}/",
headers=PLANE_HEADERS, timeout=10
)
if _resp.status_code == 200:
issue_data = _resp.json()
if issue_data.get("state") == PLANE_STATES["needs_input"]:
# Task 11: Check analyst retry count (max 3 question rounds)
conn3 = get_db()
analyst_runs = conn3.execute(
"SELECT COUNT(*) FROM agent_runs WHERE task_id=? AND agent='analyst'",
(task_id,)
).fetchone()[0]
conn3.close()
if analyst_runs >= 4: # initial + 3 retries
from ..plane_sync import set_issue_blocked, add_comment as _pc
set_issue_blocked(work_item_id)
_pc(
work_item_id,
"\U0001f6a8 3 \u0440\u0430\u0443\u043d\u0434\u0430 \u0443\u0442\u043e\u0447\u043d\u0435\u043d\u0438\u0439 \u0438\u0441\u0447\u0435\u0440\u043f\u0430\u043d\u044b. Analyst \u043d\u0435 \u043c\u043e\u0436\u0435\u0442 \u0441\u0444\u043e\u0440\u043c\u0438\u0440\u043e\u0432\u0430\u0442\u044c \u0422\u0417. "
"\u0422\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f \u0431\u043e\u043b\u0435\u0435 \u0434\u0435\u0442\u0430\u043b\u044c\u043d\u043e\u0435 \u043e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0438\u043b\u0438 \u0432\u0441\u0442\u0440\u0435\u0447\u0430."
)
from ..notifications import send_telegram
send_telegram(f"\U0001f6a8 {work_item_id}: 3 \u0440\u0430\u0443\u043d\u0434\u0430 \u0432\u043e\u043f\u0440\u043e\u0441\u043e\u0432 analyst'\u0430 \u0438\u0441\u0447\u0435\u0440\u043f\u0430\u043d\u044b. \u041d\u0443\u0436\u043d\u0430 \u043f\u043e\u043c\u043e\u0449\u044c.")
return
# This is an answer to analyst's questions — relaunch
set_issue_in_progress(work_item_id)
task_desc = (
f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
f"Stage: analysis\nNote: Stakeholder answered your questions. "
f"Read the latest comment in Plane and revise your artifacts.\n"
f"Answer: {comment_body[:500]}"
)
new_run = launcher.launch("analyst", repo, task_desc, task_id=task_id)
from ..plane_sync import add_comment as _pc2
_pc2(work_item_id, "\U0001f504 Analyst \u043f\u0435\u0440\u0435\u0437\u0430\u043f\u0443\u0449\u0435\u043d \u0441 \u043e\u0442\u0432\u0435\u0442\u0430\u043c\u0438 \u0441\u0442\u0435\u0439\u043a\u0445\u043e\u043b\u0434\u0435\u0440\u0430.")
logger.info(f"Task {task_id}: stakeholder answered questions, relaunched analyst (run_id={new_run})")
return
except Exception as e:
logger.error(f"Failed to check issue state: {e}")
# Rollback to previous stage
prev_stage = get_previous_stage(current_stage)
if not prev_stage:
logger.info(f"Task {task_id}: rejected at {current_stage} but no previous stage")
return
update_task_stage(task_id, prev_stage)
notify_stage_change(task_id, current_stage, prev_stage)
# Feature 3: plane_notify_stage moves the board to the prev stage's status.
plane_notify_stage(work_item_id, current_stage, prev_stage)
# Then put it back to In Progress so the relaunched agent is clearly working.
from ..plane_sync import set_issue_in_progress
set_issue_in_progress(work_item_id)
from ..plane_sync import add_comment as _plane_comment, STAGE_AUTHORS
_plane_comment(
work_item_id,
f"\U0001f504 \u041e\u0442\u043a\u0430\u0442: {current_stage} \u2192 {prev_stage}. \u041f\u0440\u0438\u0447\u0438\u043d\u0430: {reason}",
author=STAGE_AUTHORS.get(prev_stage, "stream"),
)
# Relaunch the previous stage's agent so the rollback actually re-runs work.
# STAGE_AUTHORS maps a stage directly to the role that OWNS work in it
# (analysis->analyst, architecture->architect, ...), which is exactly the
# agent we must re-run on a rollback into prev_stage.
from ..plane_sync import STAGE_AUTHORS as _STAGE_AUTHORS
prev_agent = _STAGE_AUTHORS.get(prev_stage)
if prev_agent:
task_desc = (
f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\n"
f"Stage: {prev_stage}\nNote: Stakeholder REJECTED. Reason: {reason}\n"
f"Revise and improve."
)
new_job = enqueue_job(prev_agent, repo, task_desc, task_id=task_id)
logger.info(
f"Task {task_id}: rejected, rolled back {current_stage} \u2192 {prev_stage}, "
f"enqueued {prev_agent} (job_id={new_job})"
)
else:
logger.info(f"Task {task_id}: rejected, rolled back {current_stage} \u2192 {prev_stage}")
async def _try_advance_stage(
task_id: int, current_stage: str, repo: str, work_item_id: str, branch: str
):
"""Run QG check for current stage and advance if passed."""
qg_name = get_qg_for_stage(current_stage)
next_stage = get_next_stage(current_stage)
"""Thin async wrapper over the unified stage engine (ORCH-4 / M-3).
if not next_stage:
logger.info(f"Task {task_id}: already at terminal stage '{current_stage}'")
return
The QG dispatch (including the check_review_approved PR-by-branch logic) and
the advance/launch logic now live in src/stage_engine.advance_stage(), which
is synchronous. We run it off the event loop via asyncio.to_thread so there
is exactly one implementation shared with the launcher.
# Run QG check if one is required
if qg_name:
qg_func = QG_CHECKS.get(qg_name)
if not qg_func:
logger.error(f"QG function '{qg_name}' not found in registry")
return
finished_agent is None on this webhook path (a human Approved status change,
not a finished agent), so the agent-specific rollback branches inside the
engine intentionally do not trigger — the webhook path only runs the QG and
either advances or reports the failure.
"""
import asyncio
from ..stage_engine import advance_stage
# Determine args based on QG function
if qg_name in ("check_analysis_approved", "check_analysis_complete", "check_architecture_done", "check_tests_passed", "check_reviewer_verdict"):
# ORCH-2 / S-4: pass branch so artifacts are read from the task worktree.
passed, reason = qg_func(repo, work_item_id, branch)
elif qg_name in ("check_ci_green", "check_tests_local"):
passed, reason = qg_func(repo, branch)
elif qg_name == "check_review_approved":
# Find PR number by branch via Gitea API
import httpx as _httpx
from ..config import settings as _s
_owner = _s.gitea_owner
_url = f"{_s.gitea_url}/api/v1/repos/{_owner}/{repo}/pulls?state=open&limit=50"
_headers = {"Authorization": f"token {_s.gitea_token}"}
try:
_resp = _httpx.get(_url, headers=_headers, timeout=10)
_prs = _resp.json()
_pr_number = None
for _pr in _prs:
if _pr.get("head", {}).get("ref") == branch:
_pr_number = _pr["number"]
break
if _pr_number:
passed, reason = qg_func(repo, _pr_number)
else:
# No open PR but review file exists — check file-based
import os
from ..git_worktree import get_worktree_path as _gwp
_wt = _gwp(repo, branch) if os.path.isdir(_gwp(repo, branch)) else os.path.join(_s.repos_dir, repo)
_review_path = os.path.join(_wt, f"docs/work-items/{work_item_id}/12-review.md")
_review_path2 = os.path.join(_wt, f"docs/work-items/{work_item_id}/09-review.md")
if os.path.isfile(_review_path) or os.path.isfile(_review_path2):
passed, reason = True, "Review file exists (file-based approval)"
else:
passed, reason = False, "No open PR found and no review file"
except Exception as _e:
passed, reason = False, f"Error finding PR: {_e}"
else:
passed, reason = False, f"Unknown QG: {qg_name}"
if not passed:
notify_qg_failure(task_id, current_stage, qg_name, reason)
plane_notify_qg(work_item_id, current_stage, qg_name, reason)
return
# Advance stage
update_task_stage(task_id, next_stage)
notify_stage_change(task_id, current_stage, next_stage)
plane_notify_stage(work_item_id, current_stage, next_stage)
# Launch agent associated with the current stage's transition
agent = get_agent_for_stage(current_stage)
if agent:
try:
task_desc = f"Work item: {work_item_id}\nRepo: {repo}\nBranch: {branch}\nStage: {next_stage}"
run_id = launcher.launch(agent, repo, task_desc, task_id=task_id)
plane_notify_stage(work_item_id, current_stage, next_stage, agent)
logger.info(f"Task {task_id}: launched agent '{agent}', run_id={run_id}")
except Exception as e:
notify_error(task_id, f"Failed to launch agent '{agent}': {e}")
logger.error(f"Agent launch failed: {e}")
await asyncio.to_thread(
advance_stage,
task_id,
current_stage,
repo,
work_item_id,
branch,
None,
)
async def _create_gitea_branch(repo: str, branch: str):

73
tests/conftest.py Normal file
View File

@@ -0,0 +1,73 @@
"""Global pytest fixtures.
test(conftest): mute Telegram in ALL tests to stop prod leakage.
Background: a pytest run on prod was sending REAL Telegram messages to Slava,
because some tests (e.g. test_webhook_dedup advancing a stage) reach
notify_stage_change -> send_telegram, which reads the live .env
telegram_bot_token/chat_id and actually POSTs to Telegram.
This autouse fixture stubs send_telegram to a no-op for every test:
- "src.notifications.send_telegram" is the SOURCE. All the notify_* helpers in
notifications.py call the module-global send_telegram, and every other module
that does a *local* `from .notifications import send_telegram` inside a
function resolves it live at call time -> covered by patching the source.
- "src.stage_engine.send_telegram" is patched too, because stage_engine binds
send_telegram as a MODULE-LEVEL name (from .notifications import send_telegram
at import), so a patch of the source alone would not intercept its 3 direct
calls. webhooks/plane and launcher import it locally inside functions, so the
source patch already covers them; they are patched defensively with
raising=False anyway in case that ever changes.
raising=False so a module that doesn't (yet) expose the name never breaks setup.
"""
import pytest
@pytest.fixture(autouse=True)
def _no_telegram(monkeypatch):
_noop = lambda *a, **k: None # noqa: E731
# Source of truth (covers notifications.notify_* and all local re-imports).
monkeypatch.setattr("src.notifications.send_telegram", _noop, raising=False)
# Module-level binding in stage_engine (and defensive coverage elsewhere).
monkeypatch.setattr("src.stage_engine.send_telegram", _noop, raising=False)
monkeypatch.setattr("src.webhooks.plane.send_telegram", _noop, raising=False)
monkeypatch.setattr("src.agents.launcher.send_telegram", _noop, raising=False)
monkeypatch.setattr("src.queue_worker.send_telegram", _noop, raising=False)
yield
@pytest.fixture(autouse=True)
def _reset_webhook_secrets(monkeypatch):
"""Isolate settings singleton between test files (CI cross-file isolation).
settings is a process-wide Pydantic singleton read once at import. Different
test modules set env variables differently at import-time, so those values leak
across files when pytest collects them together (as CI does).
1. webhook secrets: reset to "" so HMAC is disabled by default. Tests that
intentionally test the 401 path (test_webhook_dedup.py:268,278) re-apply
their own monkeypatch AFTER this autouse fixture runs, which overrides the
reset for the duration of that one test only.
2. db_path: reset to the value from ORCH_DB_PATH env var (last written by the
last imported test module). Without this, test_webhook_dedup.py (imported
first, alphabetically) seeds settings.db_path = dedup.db, while
test_webhooks.py's setup_db fixture tries to remove test_orchestrator.db,
leaving the DB dirty across tests that share a branch name and causing
get_task_by_repo_branch() to return a stale row with the wrong stage.
Per-test monkeypatches in test_webhook_dedup.setup_db override this reset.
"""
import os
from src.webhooks import gitea as gitea_mod
from src.webhooks import plane as plane_mod
from src import db as db_mod
monkeypatch.setattr(gitea_mod.settings, "gitea_webhook_secret", "", raising=False)
monkeypatch.setattr(plane_mod.settings, "plane_webhook_secret", "", raising=False)
db_path_env = os.environ.get("ORCH_DB_PATH", "")
if db_path_env:
monkeypatch.setattr(db_mod.settings, "db_path", db_path_env, raising=False)
yield

View File

@@ -0,0 +1,74 @@
"""BUG C: analyst "artifacts ready" comment under the status-only model.
The comment must ask for the **Approved** status (not the obsolete
":approved:" reaction, not moving back to "In Progress") and link only the
docs that actually exist in the worktree.
"""
import os
import tempfile
os.environ.setdefault("ORCH_GITEA_TOKEN", "test-token")
os.environ.setdefault("ORCH_PLANE_API_TOKEN", "test-token")
def test_analyst_comment_asks_approved_with_links(monkeypatch, tmp_path):
from src import stage_engine as SE
# Worktree with only SOME of the candidate docs present.
wt = tmp_path / "wt"
docs = wt / "docs" / "work-items" / "ET-011"
docs.mkdir(parents=True)
for fname in ("00-business-request.md", "01-brd.md", "02-trz.md",
"03-acceptance-criteria.md", "04-test-plan.yaml"):
(docs / fname).write_text("x")
# 04b-ui-test-cases.md intentionally absent -> must NOT be linked
monkeypatch.setattr(SE, "get_worktree_path", lambda repo, branch: str(wt))
# public URL set -> links must be built from it (not gitea_url)
monkeypatch.setattr(SE.settings, "gitea_url", "http://localhost:3000")
monkeypatch.setattr(SE.settings, "gitea_public_url", "https://git.mva154.duckdns.org")
monkeypatch.setattr(SE.settings, "gitea_owner", "admin")
html = SE._build_analyst_ready_comment(
"enduro-trails", "ET-011", "feature/ET-011-gpx-upload-feature"
)
# text asks for the Approved STATUS, not the obsolete mechanisms
assert "Approved" in html
assert ":approved:" not in html
assert "In Progress" not in html
assert "Rejected" in html
# clickable links to docs that ACTUALLY exist
assert "<a href=" in html
base = ("https://git.mva154.duckdns.org/admin/enduro-trails/src/branch/"
"feature/ET-011-gpx-upload-feature/docs/work-items/ET-011/")
assert base + "01-brd.md" in html
assert base + "04-test-plan.yaml" in html
# the missing file is NOT invented
assert "04b-ui-test-cases.md" not in html
# internal git url must NOT appear in clickable links
assert "localhost:3000" not in html
def test_analyst_comment_falls_back_to_gitea_url(monkeypatch, tmp_path):
"""When gitea_public_url is empty, links fall back to gitea_url."""
from src import stage_engine as SE
wt = tmp_path / "wt"
docs = wt / "docs" / "work-items" / "ET-011"
docs.mkdir(parents=True)
(docs / "01-brd.md").write_text("x")
monkeypatch.setattr(SE, "get_worktree_path", lambda repo, branch: str(wt))
monkeypatch.setattr(SE.settings, "gitea_url", "http://localhost:3000")
monkeypatch.setattr(SE.settings, "gitea_public_url", "")
monkeypatch.setattr(SE.settings, "gitea_owner", "admin")
html = SE._build_analyst_ready_comment(
"enduro-trails", "ET-011", "feature/ET-011-gpx-upload-feature"
)
base = ("http://localhost:3000/admin/enduro-trails/src/branch/"
"feature/ET-011-gpx-upload-feature/docs/work-items/ET-011/")
assert base + "01-brd.md" in html

View File

@@ -7,6 +7,7 @@ Covers the audit-2026-06-02 fixes:
the YAML frontmatter only (no fragile substring matching).
"""
import os
import signal
import tempfile
import pytest
@@ -20,6 +21,7 @@ os.environ["ORCH_PLANE_API_TOKEN"] = "test-token"
from src.agents.launcher import AgentLauncher
from src.qg.checks import check_reviewer_verdict
from src.config import settings
# ---------------------------------------------------------------------------
@@ -138,3 +140,141 @@ class TestCheckReviewerVerdict:
passed, reason = check_reviewer_verdict("enduro-trails", "ET-999")
assert passed is False
assert "not found" in reason.lower()
# ---------------------------------------------------------------------------
# ORCH-7 (M-4): dead code removed
# ---------------------------------------------------------------------------
class TestDeadCodeRemoved:
"""M-4: _auto_merge_pr was never called (merge is the deployer's job) and is
removed. _ensure_pr (used by the auto-advance path) must stay."""
def test_auto_merge_pr_is_gone(self):
assert not hasattr(AgentLauncher, "_auto_merge_pr")
def test_ensure_pr_still_present(self):
assert hasattr(AgentLauncher, "_ensure_pr")
# ---------------------------------------------------------------------------
# ORCH-7 (M-2): configurable timeout + per-agent override
# ---------------------------------------------------------------------------
class TestResolveTimeout:
"""M-2: _resolve_timeout honours a per-agent JSON override, else the default."""
def test_default_when_no_override(self, monkeypatch):
monkeypatch.setattr(settings, "agent_timeout_seconds", 1800)
monkeypatch.setattr(settings, "agent_timeout_overrides_json", "")
assert AgentLauncher._resolve_timeout("developer") == 1800
assert AgentLauncher._resolve_timeout(None) == 1800
def test_override_for_specific_agent(self, monkeypatch):
monkeypatch.setattr(settings, "agent_timeout_seconds", 1800)
monkeypatch.setattr(
settings, "agent_timeout_overrides_json", '{"reviewer": 3600, "architect": 2700}'
)
assert AgentLauncher._resolve_timeout("reviewer") == 3600
assert AgentLauncher._resolve_timeout("architect") == 2700
# an agent not in the override map falls back to the default
assert AgentLauncher._resolve_timeout("developer") == 1800
def test_malformed_override_falls_back_to_default(self, monkeypatch):
monkeypatch.setattr(settings, "agent_timeout_seconds", 1800)
monkeypatch.setattr(settings, "agent_timeout_overrides_json", "{not-json")
# must not raise, must return the default
assert AgentLauncher._resolve_timeout("reviewer") == 1800
class TestWatchdogGracefulKill:
"""M-2: SIGTERM -> grace -> SIGKILL ordering, with graceful-exit short-circuit
and ProcessLookupError tolerance. The OS process is fully faked: we record the
signals sent and decide liveness from a script, so no real process is touched."""
def _patch_db(self, monkeypatch):
"""Stub get_db so _record_kill does not need a real DB."""
class _Conn:
def execute(self, *a, **k):
return self
def commit(self):
pass
def close(self):
pass
monkeypatch.setattr("src.agents.launcher.get_db", lambda: _Conn())
def test_sigterm_then_sigkill_after_grace(self, monkeypatch):
"""Process stays alive through the whole grace window -> SIGTERM then SIGKILL."""
self._patch_db(monkeypatch)
monkeypatch.setattr(settings, "agent_kill_grace_seconds", 1)
monkeypatch.setattr("src.agents.launcher.time.sleep", lambda s: None)
sent = []
def fake_kill(pid, sig):
sent.append(sig)
# signal 0 (liveness probe) -> always alive; never raise
return None
monkeypatch.setattr("src.agents.launcher.os.kill", fake_kill)
launcher = AgentLauncher()
launcher._watchdog(pid=4242, run_id=1, timeout=0, agent="developer")
assert signal.SIGTERM in sent
assert signal.SIGKILL in sent
# SIGTERM must come before SIGKILL
assert sent.index(signal.SIGTERM) < sent.index(signal.SIGKILL)
def test_graceful_exit_in_grace_skips_sigkill(self, monkeypatch):
"""Process dies during the grace window -> SIGKILL is NOT sent."""
self._patch_db(monkeypatch)
monkeypatch.setattr(settings, "agent_kill_grace_seconds", 5)
monkeypatch.setattr("src.agents.launcher.time.sleep", lambda s: None)
sent = []
state = {"alive": True, "probes": 0}
def fake_kill(pid, sig):
if sig == 0:
state["probes"] += 1
# die on the 2nd liveness probe (within grace)
if state["probes"] >= 2:
raise ProcessLookupError
return None
sent.append(sig)
return None
monkeypatch.setattr("src.agents.launcher.os.kill", fake_kill)
launcher = AgentLauncher()
launcher._watchdog(pid=4242, run_id=2, timeout=0, agent="developer")
assert signal.SIGTERM in sent
assert signal.SIGKILL not in sent
def test_already_dead_before_sigterm(self, monkeypatch):
"""Process already gone at SIGTERM -> ProcessLookupError tolerated, no SIGKILL,
and _record_kill is NOT called (the monitor's proc.wait owns the exit)."""
self._patch_db(monkeypatch)
monkeypatch.setattr("src.agents.launcher.time.sleep", lambda s: None)
sent = []
def fake_kill(pid, sig):
if sig == signal.SIGTERM:
raise ProcessLookupError
sent.append(sig)
return None
recorded = {"called": False}
monkeypatch.setattr(
AgentLauncher, "_record_kill",
staticmethod(lambda rid: recorded.__setitem__("called", True)),
)
monkeypatch.setattr("src.agents.launcher.os.kill", fake_kill)
launcher = AgentLauncher()
# must not raise
launcher._watchdog(pid=4242, run_id=3, timeout=0, agent="developer")
assert signal.SIGKILL not in sent
assert recorded["called"] is False

View File

@@ -0,0 +1,92 @@
"""L-2: tests for prune_run_logs (run-log rotation).
Verifies that old / surplus *.log files are removed while fresh logs, non-.log
files, the active log, and subdirectories are left intact. Function is
best-effort and must never raise.
"""
import os
import time
from src.agents.launcher import prune_run_logs
def _touch(path, age_days=0):
with open(path, "w") as f:
f.write("x")
mtime = time.time() - age_days * 86400
os.utime(path, (mtime, mtime))
return path
def test_old_logs_removed_fresh_kept(tmp_path):
runs = tmp_path
fresh = _touch(str(runs / "1.log"), age_days=1)
old = _touch(str(runs / "2.log"), age_days=40)
removed = prune_run_logs(str(runs), keep_days=30, keep_max=500)
assert removed == 1
assert os.path.exists(fresh)
assert not os.path.exists(old)
def test_non_log_files_untouched(tmp_path):
runs = tmp_path
old_log = _touch(str(runs / "stale.log"), age_days=99)
keep_txt = _touch(str(runs / "notes.txt"), age_days=99)
keep_db = _touch(str(runs / "orchestrator.db"), age_days=99)
prune_run_logs(str(runs), keep_days=30, keep_max=500)
assert not os.path.exists(old_log)
assert os.path.exists(keep_txt)
assert os.path.exists(keep_db)
def test_keep_max_retains_newest(tmp_path):
runs = tmp_path
# 5 logs, all recent (within keep_days), increasing age 0..4 days.
paths = []
for i in range(5):
paths.append(_touch(str(runs / f"{i}.log"), age_days=i))
removed = prune_run_logs(str(runs), keep_days=365, keep_max=2)
# Only the 2 newest (age 0, 1) survive.
assert removed == 3
assert os.path.exists(paths[0])
assert os.path.exists(paths[1])
for p in paths[2:]:
assert not os.path.exists(p)
def test_active_log_never_removed(tmp_path):
runs = tmp_path
active = _touch(str(runs / "active.log"), age_days=99)
other = _touch(str(runs / "other.log"), age_days=99)
removed = prune_run_logs(
str(runs), keep_days=30, keep_max=500, active_paths=[active]
)
assert removed == 1
assert os.path.exists(active)
assert not os.path.exists(other)
def test_subdirs_untouched(tmp_path):
runs = tmp_path
sub = runs / "sub.log"
sub.mkdir() # a directory that happens to end in .log
old_log = _touch(str(runs / "old.log"), age_days=99)
prune_run_logs(str(runs), keep_days=30, keep_max=500)
assert sub.is_dir()
assert not os.path.exists(old_log)
def test_missing_dir_is_noop(tmp_path):
missing = tmp_path / "does-not-exist"
# Must not raise.
assert prune_run_logs(str(missing)) == 0

187
tests/test_m6_sequence.py Normal file
View File

@@ -0,0 +1,187 @@
"""M-6: work_item_id derived from Plane sequence_id (source of truth = Plane).
Covers:
* fetch_issue_sequence_id returns int on a valid Plane response (mocked httpx);
* returns None on network error / missing field WITHOUT raising;
* handle_work_item_created uses prefix-NNN when seq is available, and falls
back to get_next_work_item_id when seq is None (Plane down => autonomy);
* find_issue_id no longer hardcodes 'ET-' and matches an arbitrary prefix
(e.g. ORCH-005) by sequence_id.
"""
import os
import tempfile
import pytest
_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_m6.db")
os.environ["ORCH_DB_PATH"] = _test_db
os.environ.setdefault("ORCH_PLANE_WEBHOOK_SECRET", "")
os.environ.setdefault("ORCH_GITEA_WEBHOOK_SECRET", "")
os.environ.setdefault("ORCH_GITEA_TOKEN", "test-token")
os.environ.setdefault("ORCH_PLANE_API_TOKEN", "test-token")
from unittest.mock import patch, AsyncMock, MagicMock # noqa: E402
from fastapi.testclient import TestClient # noqa: E402
from src.main import app # noqa: E402
from src.db import init_db, get_db # noqa: E402
from src import projects as P # noqa: E402
from src.projects import reload_projects # noqa: E402
import src.plane_sync as plane_sync # noqa: E402
ORCH_PLANE_ID = "8da6aa25-a60e-44d6-a1e2-d8ae59aa7d6a"
ENDURO_PLANE_ID = "7a79f0a9-5278-49cd-9007-9a338f238f9c"
client = TestClient(app)
@pytest.fixture(autouse=True)
def setup(monkeypatch):
monkeypatch.setattr(P.settings, "db_path", _test_db)
import src.db as _db
monkeypatch.setattr(_db.settings, "db_path", _test_db)
if os.path.exists(_test_db):
os.unlink(_test_db)
init_db()
monkeypatch.setattr("src.webhooks.plane.verify_plane_signature", lambda body, sig: True)
registry_json = (
f'[{{"plane_project_id": "{ENDURO_PLANE_ID}", "repo": "enduro-trails",'
f' "work_item_prefix": "ET", "name": "enduro-trails"}},'
f' {{"plane_project_id": "{ORCH_PLANE_ID}", "repo": "orchestrator",'
f' "work_item_prefix": "ORCH", "name": "orchestrator"}}]'
)
monkeypatch.setattr(P.settings, "projects_json", registry_json)
reload_projects()
yield
reload_projects()
if os.path.exists(_test_db):
os.unlink(_test_db)
def _mock_resp(json_body, status=200):
m = MagicMock()
m.json.return_value = json_body
m.raise_for_status.return_value = None
if status >= 400:
def _raise():
raise RuntimeError(f"HTTP {status}")
m.raise_for_status.side_effect = _raise
return m
# ---------------------------------------------------------------------------
# fetch_issue_sequence_id
# ---------------------------------------------------------------------------
def test_fetch_sequence_id_returns_int():
with patch.object(plane_sync.httpx, "get", return_value=_mock_resp({"sequence_id": 42})):
seq = plane_sync.fetch_issue_sequence_id("issue-uuid", "proj-uuid")
assert seq == 42
assert isinstance(seq, int)
def test_fetch_sequence_id_network_error_returns_none():
with patch.object(plane_sync.httpx, "get", side_effect=RuntimeError("connection refused")):
seq = plane_sync.fetch_issue_sequence_id("issue-uuid", "proj-uuid")
assert seq is None # must not raise
def test_fetch_sequence_id_missing_field_returns_none():
with patch.object(plane_sync.httpx, "get", return_value=_mock_resp({"error": "not found"})):
seq = plane_sync.fetch_issue_sequence_id("missing-uuid", "proj-uuid")
assert seq is None
# ---------------------------------------------------------------------------
# handle_work_item_created: seq available -> prefix-NNN
# ---------------------------------------------------------------------------
# Feature 1: pipeline starts on a status change to In Progress, not on creation.
_IN_PROGRESS = "b873d9eb-993c-48cd-97ac-99a9b1623967"
def _post(plane_id, plane_project_id=ORCH_PLANE_ID, name="A valid work item title"):
return client.post(
"/webhook/plane",
json={
"event": "issue",
"action": "updated",
"data": {
"id": plane_id,
"name": name,
"description_stripped": "This is a sufficiently long description.",
"project": plane_project_id,
"state": {"id": _IN_PROGRESS, "name": "In Progress", "group": "started"},
},
},
)
@patch("src.webhooks.plane.launcher")
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
@patch("src.plane_sync.fetch_issue_sequence_id", return_value=7)
def test_created_uses_plane_sequence_id(mock_fetch, mock_branch, mock_docs, mock_launcher):
mock_launcher.launch.return_value = 1
resp = _post("seq-issue")
assert resp.status_code == 200
conn = get_db()
task = conn.execute("SELECT work_item_id FROM tasks WHERE plane_id='seq-issue'").fetchone()
conn.close()
assert task is not None
assert task["work_item_id"] == "ORCH-007"
mock_fetch.assert_called_once()
@patch("src.webhooks.plane.launcher")
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
@patch("src.plane_sync.fetch_issue_sequence_id", return_value=None)
@patch("src.webhooks.plane.get_next_work_item_id", return_value="ORCH-099")
def test_created_falls_back_to_db_when_plane_down(
mock_next, mock_fetch, mock_branch, mock_docs, mock_launcher
):
"""Plane unavailable (seq=None) => fall back to DB increment; task still created."""
mock_launcher.launch.return_value = 1
resp = _post("fallback-issue")
assert resp.status_code == 200
conn = get_db()
task = conn.execute("SELECT work_item_id FROM tasks WHERE plane_id='fallback-issue'").fetchone()
conn.close()
assert task is not None # autonomy: Plane down does not block creation
assert task["work_item_id"] == "ORCH-099"
mock_next.assert_called_once()
# ---------------------------------------------------------------------------
# find_issue_id: no hardcoded ET- prefix, matches arbitrary prefix by seq
# ---------------------------------------------------------------------------
def test_find_issue_id_matches_arbitrary_prefix_by_sequence():
"""ORCH-005 must resolve via the issue whose sequence_id == 5 (no ET- assumption)."""
issues = {"results": [
{"id": "uuid-a", "sequence_id": 3, "name": "something"},
{"id": "uuid-b", "sequence_id": 5, "name": "ORCH-005: target"},
{"id": "uuid-c", "sequence_id": 9, "name": "other"},
]}
# No DB row for this work_item_id => goes to the Plane API search branch.
with patch.object(plane_sync.httpx, "get", return_value=_mock_resp(issues)):
found = plane_sync.find_issue_id("ORCH-005", project_id="proj-uuid")
assert found == "uuid-b"
def test_find_issue_id_matches_et_prefix_too():
"""Backward compat: ET-002 still resolves by sequence_id == 2."""
issues = {"results": [
{"id": "uuid-x", "sequence_id": 2, "name": "ET item"},
{"id": "uuid-y", "sequence_id": 7, "name": "other"},
]}
with patch.object(plane_sync.httpx, "get", return_value=_mock_resp(issues)):
found = plane_sync.find_issue_id("ET-002", project_id="proj-uuid")
assert found == "uuid-x"

View File

@@ -0,0 +1,213 @@
"""Tests for the two pipeline-start bugs surfaced by the ET-006 live run.
BUG 1: issue.updated (status -> In Progress) ships a payload WITHOUT the
description, so start_pipeline must pull it from the Plane issue API
before QG-0 runs (otherwise QG-0 wrongly blocks the issue).
BUG 2a: M-6 derives work_item_id from the Plane sequence_id, which can collide.
ensure_unique_work_item_id() must hand out the next FREE id instead of
reusing one that is already in the tasks table.
BUG 2b: two tasks with an (artificially) identical work_item_id must not share a
branch/worktree.
launcher / Gitea / Plane network are mocked. Real FastAPI endpoint via
TestClient for the BUG 1 end-to-end path.
"""
import os
import tempfile
_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_pipeline_bugs.db")
os.environ["ORCH_DB_PATH"] = _test_db
os.environ.setdefault("ORCH_PLANE_WEBHOOK_SECRET", "")
os.environ.setdefault("ORCH_GITEA_TOKEN", "test-token")
os.environ.setdefault("ORCH_PLANE_API_TOKEN", "test-token")
import pytest # noqa: E402
from unittest.mock import patch, AsyncMock # noqa: E402
from fastapi.testclient import TestClient # noqa: E402
from src.main import app # noqa: E402
from src.db import init_db, get_db, ensure_unique_work_item_id # noqa: E402
from src import projects as P # noqa: E402
from src.projects import reload_projects # noqa: E402
from src.git_worktree import get_worktree_path # noqa: E402
ENDURO_PLANE_ID = "7a79f0a9-5278-49cd-9007-9a338f238f9c"
IN_PROGRESS = "b873d9eb-993c-48cd-97ac-99a9b1623967"
BACKLOG = "113b24f6-cce8-4be9-9a22-a359b9cf0122"
client = TestClient(app)
@pytest.fixture(autouse=True)
def setup(monkeypatch):
monkeypatch.setattr(P.settings, "db_path", _test_db)
import src.db as _db
monkeypatch.setattr(_db.settings, "db_path", _test_db)
if os.path.exists(_test_db):
os.unlink(_test_db)
init_db()
monkeypatch.setattr("src.webhooks.plane.verify_plane_signature", lambda body, sig: True)
registry_json = (
f'[{{"plane_project_id": "{ENDURO_PLANE_ID}", "repo": "enduro-trails",'
f' "work_item_prefix": "ET", "name": "enduro-trails"}}]'
)
monkeypatch.setattr(P.settings, "projects_json", registry_json)
reload_projects()
yield
reload_projects()
if os.path.exists(_test_db):
os.unlink(_test_db)
def _insert_task(work_item_id, branch, plane_id="x"):
conn = get_db()
conn.execute(
"INSERT INTO tasks (plane_id, work_item_id, repo, branch, stage, plane_issue_id) "
"VALUES (?, ?, ?, ?, ?, ?)",
(plane_id, work_item_id, "enduro-trails", branch, "analysis", plane_id),
)
conn.commit()
conn.close()
def _count(plane_id):
conn = get_db()
n = conn.execute("SELECT COUNT(*) FROM tasks WHERE plane_id=?", (plane_id,)).fetchone()[0]
conn.close()
return n
def _task(plane_id):
conn = get_db()
row = conn.execute("SELECT * FROM tasks WHERE plane_id=?", (plane_id,)).fetchone()
conn.close()
return row
# --------------------------------------------------------------------------- #
# BUG 1
# --------------------------------------------------------------------------- #
def _to_in_progress_no_desc(plane_id="bug1"):
"""issue.updated payload WITHOUT description (only changed fields)."""
return client.post("/webhook/plane", json={
"event": "issue", "action": "updated",
"data": {
"id": plane_id, "name": "A valid backlog item title",
# NO description / description_stripped here, exactly like Plane sends
# on a status change.
"project": ENDURO_PLANE_ID,
"state": {"id": IN_PROGRESS, "name": "In Progress", "group": "started"},
},
"activity": {"field": "state", "new_value": IN_PROGRESS, "old_value": BACKLOG},
})
@patch("src.webhooks.plane.enqueue_job", return_value=1)
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
@patch("src.plane_sync.fetch_issue_sequence_id", return_value=42)
@patch("src.plane_sync.fetch_issue_fields",
return_value=("A valid backlog item title",
"This is a sufficiently long description fetched from Plane API."))
def test_status_start_fetches_description(
mock_fields, mock_seq, mock_branch, mock_docs, mock_enqueue
):
"""BUG 1: empty description in payload -> start_pipeline pulls it from the
Plane API (single fetch_issue_fields GET) -> QG-0 passes -> task created +
analyst enqueued (NOT blocked)."""
resp = _to_in_progress_no_desc("bug1")
assert resp.status_code == 200
# name + description were pulled from the API in one call
mock_fields.assert_called_once()
# QG-0 passed -> task created and analyst launched (NOT set_issue_blocked)
assert _count("bug1") == 1
assert _task("bug1")["stage"] == "analysis"
mock_enqueue.assert_called_once()
assert mock_enqueue.call_args.args[0] == "analyst"
@patch("src.webhooks.plane.enqueue_job", return_value=1)
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
@patch("src.plane_sync.fetch_issue_sequence_id", return_value=42)
@patch("src.plane_sync.fetch_issue_fields", return_value=("", ""))
def test_status_start_empty_api_still_blocks(
mock_fields, mock_seq, mock_branch, mock_docs, mock_enqueue
):
"""BUG 1 negative path: if the API also returns empty, QG-0 legitimately
fails -> NO task is created (truly empty ticket)."""
resp = _to_in_progress_no_desc("bug1-empty")
assert resp.status_code == 200
mock_fields.assert_called_once()
assert _count("bug1-empty") == 0
mock_enqueue.assert_not_called()
# --------------------------------------------------------------------------- #
# BUG 2a
# --------------------------------------------------------------------------- #
def test_work_item_id_uniqueness():
"""BUG 2a: if ET-006 is already in tasks, the guard returns the next free
id (ET-007), not ET-006 again."""
_insert_task("ET-006", "feature/ET-006-gpx-upload", plane_id="old")
assert ensure_unique_work_item_id("ET-006", "enduro-trails") == "ET-007"
# ET-006 AND ET-007 taken -> next free is ET-008.
_insert_task("ET-007", "feature/ET-007-something", plane_id="old2")
assert ensure_unique_work_item_id("ET-006", "enduro-trails") == "ET-008"
# A free id is returned unchanged.
assert ensure_unique_work_item_id("ET-099", "enduro-trails") == "ET-099"
# Per-repo isolation: a different repo with the same id is not a collision.
assert ensure_unique_work_item_id("ET-006", "other-repo") == "ET-006"
@patch("src.webhooks.plane.enqueue_job", return_value=1)
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
@patch("src.plane_sync.fetch_issue_sequence_id", return_value=6)
@patch("src.plane_sync.fetch_issue_fields",
return_value=("Popup enduro trails feature",
"A sufficiently long description for QG-0 to pass cleanly."))
def test_collision_reassigns_in_start_pipeline(
mock_fields, mock_seq, mock_branch, mock_docs, mock_enqueue
):
"""BUG 2a end-to-end: ET-006 already exists -> a new In Progress issue whose
Plane sequence_id is also 6 must NOT reuse ET-006."""
_insert_task("ET-006", "feature/ET-006-gpx-upload", plane_id="task8")
resp = client.post("/webhook/plane", json={
"event": "issue", "action": "updated",
"data": {
"id": "task25", "name": "Popup enduro trails feature",
"description_stripped": "A sufficiently long description for QG-0.",
"project": ENDURO_PLANE_ID,
"state": {"id": IN_PROGRESS, "name": "In Progress", "group": "started"},
},
"activity": {"field": "state", "new_value": IN_PROGRESS, "old_value": BACKLOG},
})
assert resp.status_code == 200
new_id = _task("task25")["work_item_id"]
assert new_id != "ET-006"
assert new_id == "ET-007"
# --------------------------------------------------------------------------- #
# BUG 2b
# --------------------------------------------------------------------------- #
def test_worktree_per_task():
"""BUG 2b: two tasks must not resolve to the same worktree path. With the
uniqueness guard the branches differ, so the worktree paths differ too."""
_insert_task("ET-006", "feature/ET-006-gpx-upload", plane_id="task8")
# The second task gets a unique id via the guard...
new_id = ensure_unique_work_item_id("ET-006", "enduro-trails")
assert new_id == "ET-007"
branch_a = "feature/ET-006-gpx-upload"
branch_b = f"feature/{new_id}-popup-enduro-trails"
wt_a = get_worktree_path("enduro-trails", branch_a)
wt_b = get_worktree_path("enduro-trails", branch_b)
assert wt_a != wt_b, "two tasks must not share a worktree path"

View File

@@ -0,0 +1,99 @@
"""Tests for per-agent Plane comment authorship (feat: per-agent bot author).
Covers:
* _headers_for: role -> bot token; None/unknown/empty token -> shared fallback.
* add_comment: author is propagated into the POST headers; no author keeps
backward-compatible behaviour (shared orchestrator token).
GET/PATCH calls are intentionally NOT covered here: they stay on the shared
token by design and are unchanged by this feature.
"""
import os
# Set env defaults before importing app modules (same convention as the other
# suites) so config/settings load cleanly without a real .env.
os.environ.setdefault("ORCH_PLANE_API_TOKEN", "shared-token")
os.environ.setdefault("ORCH_GITEA_TOKEN", "test-token")
from unittest.mock import patch, MagicMock # noqa: E402
from src import plane_sync # noqa: E402
# --------------------------------------------------------------------------- #
# _headers_for
# --------------------------------------------------------------------------- #
def test_headers_for_known_role_uses_bot_token():
"""A known role with a configured token -> that bot's X-API-Key."""
with patch.dict(plane_sync.PLANE_BOT_TOKENS, {"analyst": "analyst-tok"}, clear=False):
assert plane_sync._headers_for("analyst") == {"X-API-Key": "analyst-tok"}
def test_headers_for_none_falls_back_to_shared():
"""author=None -> shared orchestrator headers."""
assert plane_sync._headers_for(None) is plane_sync.PLANE_HEADERS
def test_headers_for_unknown_role_falls_back_to_shared():
"""Unknown role -> shared orchestrator headers."""
assert plane_sync._headers_for("nope") is plane_sync.PLANE_HEADERS
def test_headers_for_empty_token_falls_back_to_shared():
"""Known role but empty/unconfigured token -> shared orchestrator headers."""
with patch.dict(plane_sync.PLANE_BOT_TOKENS, {"tester": ""}, clear=False):
assert plane_sync._headers_for("tester") is plane_sync.PLANE_HEADERS
def test_headers_for_empty_string_author_falls_back_to_shared():
"""author='' -> shared orchestrator headers."""
assert plane_sync._headers_for("") is plane_sync.PLANE_HEADERS
# --------------------------------------------------------------------------- #
# add_comment
# --------------------------------------------------------------------------- #
def _mock_post_ok():
resp = MagicMock()
resp.raise_for_status.return_value = None
return resp
def test_add_comment_with_author_posts_with_bot_headers():
"""add_comment(author='developer') -> httpx.post called with the developer
bot's X-API-Key header."""
with patch.object(plane_sync, "find_issue_id", return_value="issue-uuid"), \
patch.object(plane_sync, "_resolve_project_id", return_value="proj-uuid"), \
patch.dict(plane_sync.PLANE_BOT_TOKENS, {"developer": "dev-tok"}, clear=False), \
patch.object(plane_sync.httpx, "post", return_value=_mock_post_ok()) as mock_post:
plane_sync.add_comment("ET-001", "hello", author="developer")
assert mock_post.called
_, kwargs = mock_post.call_args
assert kwargs["headers"] == {"X-API-Key": "dev-tok"}
def test_add_comment_without_author_uses_shared_token():
"""add_comment without author -> shared orchestrator headers (backward
compatible)."""
with patch.object(plane_sync, "find_issue_id", return_value="issue-uuid"), \
patch.object(plane_sync, "_resolve_project_id", return_value="proj-uuid"), \
patch.object(plane_sync.httpx, "post", return_value=_mock_post_ok()) as mock_post:
plane_sync.add_comment("ET-001", "hello")
assert mock_post.called
_, kwargs = mock_post.call_args
assert kwargs["headers"] is plane_sync.PLANE_HEADERS
def test_add_comment_unknown_author_uses_shared_token():
"""add_comment with an unknown role -> shared orchestrator headers."""
with patch.object(plane_sync, "find_issue_id", return_value="issue-uuid"), \
patch.object(plane_sync, "_resolve_project_id", return_value="proj-uuid"), \
patch.object(plane_sync.httpx, "post", return_value=_mock_post_ok()) as mock_post:
plane_sync.add_comment("ET-001", "hello", author="ghost")
assert mock_post.called
_, kwargs = mock_post.call_args
assert kwargs["headers"] is plane_sync.PLANE_HEADERS

188
tests/test_plane_webhook.py Normal file
View File

@@ -0,0 +1,188 @@
"""ORCH-6: Plane webhook project-filter + repo-resolution tests.
Verifies the core of the 2026-06-02 incident fix:
* webhook from an UNKNOWN Plane project -> {"status": "ignored"} and no task
* webhook from the orchestrator project -> task created with repo=orchestrator
* webhook from the enduro project -> task created with repo=enduro-trails
launcher.launch is mocked so no real agents are spawned. Gitea branch/doc
creation is mocked (network). FastAPI TestClient drives the real endpoint.
This module configures its own registry via monkeypatch + reload_projects so it
is independent of ORCH_PROJECTS_JSON set by other test modules.
"""
import os
import tempfile
import pytest
# Test DB / disable signature checks (same convention as test_webhooks.py).
_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_plane.db")
os.environ["ORCH_DB_PATH"] = _test_db
os.environ.setdefault("ORCH_PLANE_WEBHOOK_SECRET", "")
os.environ.setdefault("ORCH_GITEA_WEBHOOK_SECRET", "")
os.environ.setdefault("ORCH_GITEA_TOKEN", "test-token")
os.environ.setdefault("ORCH_PLANE_API_TOKEN", "test-token")
from unittest.mock import patch, AsyncMock # noqa: E402
from fastapi.testclient import TestClient # noqa: E402
from src.main import app # noqa: E402
from src.db import init_db, get_db # noqa: E402
from src import projects as P # noqa: E402
from src.projects import reload_projects # noqa: E402
ORCH_PLANE_ID = "8da6aa25-a60e-44d6-a1e2-d8ae59aa7d6a"
ENDURO_PLANE_ID = "7a79f0a9-5278-49cd-9007-9a338f238f9c"
UNKNOWN_PLANE_ID = "deadbeef-0000-0000-0000-000000000000"
client = TestClient(app)
@pytest.fixture(autouse=True)
def setup(monkeypatch):
"""Fresh DB + a known two-project registry for each test."""
# settings.db_path is resolved once at import; force it to our isolated DB so
# this suite is independent of whichever test module imported config first.
monkeypatch.setattr(P.settings, "db_path", _test_db)
import src.db as _db
monkeypatch.setattr(_db.settings, "db_path", _test_db)
if os.path.exists(_test_db):
os.unlink(_test_db)
init_db()
# The webhook signature secret may be baked into the runtime env; this suite
# focuses on the project filter, so bypass signature verification.
monkeypatch.setattr("src.webhooks.plane.verify_plane_signature", lambda body, sig: True)
registry_json = (
f'[{{"plane_project_id": "{ENDURO_PLANE_ID}", "repo": "enduro-trails",'
f' "work_item_prefix": "ET", "name": "enduro-trails"}},'
f' {{"plane_project_id": "{ORCH_PLANE_ID}", "repo": "orchestrator",'
f' "work_item_prefix": "ORCH", "name": "orchestrator"}}]'
)
monkeypatch.setattr(P.settings, "projects_json", registry_json)
reload_projects()
yield
reload_projects() # restore from env
if os.path.exists(_test_db):
os.unlink(_test_db)
# Feature 1: the pipeline now starts on a status change to In Progress (not on
# creation). _post_created drives that status-change event so these ORCH-6
# routing tests still exercise task creation through the new trigger.
_IN_PROGRESS = "b873d9eb-993c-48cd-97ac-99a9b1623967"
def _post_created(plane_project_id, plane_id="wi-1", name="A valid work item title"):
return client.post(
"/webhook/plane",
json={
"event": "issue",
"action": "updated",
"data": {
"id": plane_id,
"name": name,
"description_stripped": "This is a sufficiently long description.",
"project": plane_project_id,
"state": {"id": _IN_PROGRESS, "name": "In Progress", "group": "started"},
},
},
)
# ---------------------------------------------------------------------------
# Filter: unknown project is ignored, no side effects
# ---------------------------------------------------------------------------
@patch("src.webhooks.plane.launcher")
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
def test_unknown_project_ignored(mock_branch, mock_docs, mock_launcher):
resp = _post_created(UNKNOWN_PLANE_ID, plane_id="ignore-me")
assert resp.status_code == 200
assert resp.json()["status"] == "ignored"
assert resp.json().get("reason") == "unknown project"
# No task, no branch, no agent.
conn = get_db()
task = conn.execute("SELECT * FROM tasks WHERE plane_id='ignore-me'").fetchone()
conn.close()
assert task is None
mock_branch.assert_not_called()
mock_launcher.launch.assert_not_called()
# ---------------------------------------------------------------------------
# orchestrator project -> repo=orchestrator, prefix ORCH
# ---------------------------------------------------------------------------
@patch("src.webhooks.plane.launcher")
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
def test_orchestrator_project_routes_to_orchestrator_repo(mock_branch, mock_docs, mock_launcher):
mock_launcher.launch.return_value = 1
resp = _post_created(ORCH_PLANE_ID, plane_id="orch-1")
assert resp.status_code == 200
assert resp.json()["status"] == "accepted"
conn = get_db()
task = conn.execute("SELECT * FROM tasks WHERE plane_id='orch-1'").fetchone()
conn.close()
assert task is not None
assert task["repo"] == "orchestrator"
assert task["work_item_id"].startswith("ORCH-")
assert task["stage"] == "analysis"
# Branch created against the orchestrator repo.
args = mock_branch.call_args.args
assert args[0] == "orchestrator"
# ---------------------------------------------------------------------------
# enduro project -> repo=enduro-trails, prefix ET
# ---------------------------------------------------------------------------
@patch("src.webhooks.plane.launcher")
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
def test_enduro_project_routes_to_enduro_repo(mock_branch, mock_docs, mock_launcher):
mock_launcher.launch.return_value = 1
resp = _post_created(ENDURO_PLANE_ID, plane_id="et-1")
assert resp.status_code == 200
assert resp.json()["status"] == "accepted"
conn = get_db()
task = conn.execute("SELECT * FROM tasks WHERE plane_id='et-1'").fetchone()
conn.close()
assert task is not None
assert task["repo"] == "enduro-trails"
assert task["work_item_id"].startswith("ET-")
args = mock_branch.call_args.args
assert args[0] == "enduro-trails"
# ---------------------------------------------------------------------------
# prefixes are independent per repo (ORCH-001 vs ET-001 in parallel)
# ---------------------------------------------------------------------------
@patch("src.webhooks.plane.launcher")
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
def test_prefixes_independent_per_project(mock_branch, mock_docs, mock_launcher):
mock_launcher.launch.return_value = 1
_post_created(ORCH_PLANE_ID, plane_id="o1", name="Orchestrator item one")
_post_created(ENDURO_PLANE_ID, plane_id="e1", name="Enduro item one")
_post_created(ORCH_PLANE_ID, plane_id="o2", name="Orchestrator item two")
conn = get_db()
rows = {r["plane_id"]: r["work_item_id"] for r in
conn.execute("SELECT plane_id, work_item_id FROM tasks").fetchall()}
conn.close()
assert rows["o1"] == "ORCH-001"
assert rows["o2"] == "ORCH-002"
assert rows["e1"] == "ET-001"

177
tests/test_projects.py Normal file
View File

@@ -0,0 +1,177 @@
"""ORCH-6: tests for the project registry (src/projects.py).
Covers resolvers (by plane_id, by repo, unknown -> None, known ids) against the
built-in default registry, plus ORCH_PROJECTS_JSON parsing (valid + malformed
-> default fallback).
The pure parser ``_parse_projects_json`` is tested directly so we don't mutate
the module-global registry. Resolver tests run against the default registry; if
another test (e.g. test_webhooks) set ORCH_PROJECTS_JSON in the env, we restore
the default via monkeypatch + reload_projects to keep this file order-independent.
"""
import pytest
from src import projects as P
from src.projects import (
ProjectConfig,
get_project_by_plane_id,
get_project_by_repo,
known_plane_project_ids,
reload_projects,
_parse_projects_json,
_DEFAULT_PROJECTS,
)
# Known ids from the default registry / task spec.
ENDURO_PLANE_ID = "7a79f0a9-5278-49cd-9007-9a338f238f9c"
ORCH_PLANE_ID = "8da6aa25-a60e-44d6-a1e2-d8ae59aa7d6a"
@pytest.fixture
def default_registry(monkeypatch):
"""Force the default (built-in) registry regardless of ORCH_PROJECTS_JSON
that other test modules may have set in the process env."""
monkeypatch.setattr(P.settings, "projects_json", "")
reload_projects()
yield
# Restore from current settings (whatever env says) after the test.
reload_projects()
# ---------------------------------------------------------------------------
# Resolvers
# ---------------------------------------------------------------------------
def test_get_project_by_plane_id_orchestrator(default_registry):
proj = get_project_by_plane_id(ORCH_PLANE_ID)
assert proj is not None
assert proj.repo == "orchestrator"
assert proj.work_item_prefix == "ORCH"
assert proj.plane_project_id == ORCH_PLANE_ID
def test_get_project_by_plane_id_enduro(default_registry):
proj = get_project_by_plane_id(ENDURO_PLANE_ID)
assert proj is not None
assert proj.repo == "enduro-trails"
assert proj.work_item_prefix == "ET"
def test_get_project_by_plane_id_unknown_returns_none(default_registry):
assert get_project_by_plane_id("00000000-0000-0000-0000-000000000000") is None
def test_get_project_by_plane_id_empty_returns_none(default_registry):
assert get_project_by_plane_id("") is None
assert get_project_by_plane_id(None) is None
def test_get_project_by_repo(default_registry):
assert get_project_by_repo("enduro-trails").work_item_prefix == "ET"
assert get_project_by_repo("orchestrator").work_item_prefix == "ORCH"
def test_get_project_by_repo_unknown_returns_none(default_registry):
assert get_project_by_repo("does-not-exist") is None
assert get_project_by_repo("") is None
assert get_project_by_repo(None) is None
def test_known_plane_project_ids(default_registry):
ids = known_plane_project_ids()
assert isinstance(ids, set)
assert ENDURO_PLANE_ID in ids
assert ORCH_PLANE_ID in ids
assert len(ids) == len(_DEFAULT_PROJECTS)
# ---------------------------------------------------------------------------
# ORCH_PROJECTS_JSON parsing (pure function, no global mutation)
# ---------------------------------------------------------------------------
def test_parse_empty_returns_none():
assert _parse_projects_json("") is None
assert _parse_projects_json(" ") is None
assert _parse_projects_json(None) is None
def test_parse_valid_json():
raw = (
'[{"plane_project_id": "p-1", "repo": "repo-a", '
'"work_item_prefix": "AAA", "name": "Alpha"}]'
)
parsed = _parse_projects_json(raw)
assert parsed is not None
assert len(parsed) == 1
assert isinstance(parsed[0], ProjectConfig)
assert parsed[0].plane_project_id == "p-1"
assert parsed[0].repo == "repo-a"
assert parsed[0].work_item_prefix == "AAA"
assert parsed[0].name == "Alpha"
def test_parse_valid_json_multiple():
raw = (
'[{"plane_project_id": "p-1", "repo": "repo-a", "work_item_prefix": "A"},'
' {"plane_project_id": "p-2", "repo": "repo-b", "work_item_prefix": "B"}]'
)
parsed = _parse_projects_json(raw)
assert len(parsed) == 2
# name defaults to repo when omitted
assert parsed[0].name == "repo-a"
assert parsed[1].repo == "repo-b"
def test_parse_malformed_json_returns_none():
assert _parse_projects_json("{not valid json") is None
assert _parse_projects_json("[}") is None
def test_parse_not_an_array_returns_none():
# A JSON object (not array) is invalid -> fallback.
assert _parse_projects_json('{"plane_project_id": "p-1"}') is None
def test_parse_skips_bad_entries_keeps_good():
raw = (
'[{"repo": "missing-id"},' # missing required key -> skipped
' {"plane_project_id": "p-2", "repo": "repo-b", "work_item_prefix": "B"}]'
)
parsed = _parse_projects_json(raw)
assert parsed is not None
assert len(parsed) == 1
assert parsed[0].plane_project_id == "p-2"
def test_parse_all_bad_entries_returns_none():
# No valid entries -> None (fallback to default).
assert _parse_projects_json('[{"repo": "no-id"}, "not-an-object"]') is None
def test_reload_from_custom_json(monkeypatch):
"""End-to-end: set settings.projects_json, reload, resolvers reflect it."""
custom = (
'[{"plane_project_id": "custom-uuid", "repo": "custom-repo", '
'"work_item_prefix": "CUS", "name": "Custom"}]'
)
monkeypatch.setattr(P.settings, "projects_json", custom)
reload_projects()
try:
assert get_project_by_plane_id("custom-uuid").repo == "custom-repo"
assert get_project_by_repo("custom-repo").work_item_prefix == "CUS"
assert known_plane_project_ids() == {"custom-uuid"}
# The built-in defaults must NOT be present when JSON overrides.
assert get_project_by_plane_id(ENDURO_PLANE_ID) is None
finally:
reload_projects()
def test_reload_invalid_json_falls_back_to_default(monkeypatch):
monkeypatch.setattr(P.settings, "projects_json", "{garbage")
reload_projects()
try:
assert get_project_by_plane_id(ENDURO_PLANE_ID) is not None
assert get_project_by_plane_id(ORCH_PLANE_ID) is not None
finally:
reload_projects()

View File

@@ -17,7 +17,10 @@ from src.qg.checks import (
check_ci_green,
check_review_approved,
check_tests_passed,
check_tests_local,
check_deploy_status,
)
from src.stages import get_qg_for_stage
@pytest.fixture(autouse=True)
@@ -164,25 +167,284 @@ class TestCheckReviewApproved:
class TestCheckTestsPassed:
def test_report_with_pass(self, setup_work_item_dir):
repo_dir = setup_work_item_dir
wi_dir = repo_dir / "docs" / "work-items" / "ET-001"
wi_dir.mkdir(parents=True)
(wi_dir / "13-test-report.md").write_text("# Test Report\n\nResult: PASS\n")
"""ET-013 fix: testing -> deploy gate reads the tester's MACHINE-READABLE verdict
in 13-test-report.md frontmatter (verdict:/status:), NOT a substring of the body.
Mirrors check_reviewer_verdict / check_deploy_status. The old `if "PASS" in content`
let a `verdict: BLOCKED` report whose prose said "23 passed"/"✅ PASS" pass the gate,
shipping an unfinished feature to Done."""
def _write(self, repo_dir, content, wi="ET-001"):
wi_dir = repo_dir / "docs" / "work-items" / wi
wi_dir.mkdir(parents=True)
(wi_dir / "13-test-report.md").write_text(content)
def test_verdict_pass_passes(self, setup_work_item_dir):
# Most common real form (ET-001/002/005/009/011/012/014).
self._write(
setup_work_item_dir,
"---\ntype: test-report\nverdict: PASS\nstatus: pass\n---\n\n# Test Report\n",
)
passed, reason = check_tests_passed("enduro-trails", "ET-001")
assert passed is True
assert "PASS" in reason
def test_verdict_pass_ready_to_deploy_passes(self, setup_work_item_dir):
# ET-007 real form: "PASS — ready-to-deploy".
self._write(
setup_work_item_dir,
"---\nverdict: PASS — ready-to-deploy\nstatus: PASS\n---\n\nbody\n",
)
passed, reason = check_tests_passed("enduro-trails", "ET-001")
assert passed is True
def test_report_without_pass(self, setup_work_item_dir):
repo_dir = setup_work_item_dir
wi_dir = repo_dir / "docs" / "work-items" / "ET-001"
wi_dir.mkdir(parents=True)
(wi_dir / "13-test-report.md").write_text("# Test Report\n\nResult: FAIL\n")
def test_verdict_ready_to_deploy_with_status_passed_passes(self, setup_work_item_dir):
# ET-006 real form: verdict has no PASS word, but status: PASSED.
self._write(
setup_work_item_dir,
"---\nverdict: ready-to-deploy\nstatus: PASSED\n---\n\nbody\n",
)
passed, reason = check_tests_passed("enduro-trails", "ET-001")
assert passed is True
def test_verdict_stage_ready_to_deploy_with_status_pass_passes(self, setup_work_item_dir):
# ET-008 real form: verdict: stage:ready-to-deploy, status: pass.
self._write(
setup_work_item_dir,
"---\nverdict: stage:ready-to-deploy\nstatus: pass\n---\n\nbody\n",
)
passed, reason = check_tests_passed("enduro-trails", "ET-001")
assert passed is True
def test_blocked_verdict_with_pass_in_body_fails(self, setup_work_item_dir):
# THE ET-013 BUG: verdict BLOCKED but body is full of "PASS"/"passed".
self._write(
setup_work_item_dir,
"---\ntype: test-report\nstatus: blocked\nverdict: BLOCKED\n---\n\n"
"23 passed\n✅ PASS (часть AC-18)\nAll checks passed\n",
)
passed, reason = check_tests_passed("enduro-trails", "ET-001")
assert passed is False
assert "BLOCKED" in reason
def test_failed_verdict_fails(self, setup_work_item_dir):
self._write(
setup_work_item_dir,
"---\nverdict: FAILED\nstatus: failed\n---\n\nbody\n",
)
passed, reason = check_tests_passed("enduro-trails", "ET-001")
assert passed is False
assert "FAILED" in reason
def test_passed_count_in_body_but_blocked_verdict_fails(self, setup_work_item_dir):
# Body says "23 passed" but frontmatter verdict BLOCKED -> substring no longer fools.
self._write(
setup_work_item_dir,
"---\nverdict: BLOCKED\n---\n\nTests: 23 passed, 0 failed.\n",
)
passed, reason = check_tests_passed("enduro-trails", "ET-001")
assert passed is False
def test_no_frontmatter_fails(self, setup_work_item_dir):
# Old format / prose only -> no machine verdict -> fail.
self._write(
setup_work_item_dir,
"# Test Report\n\nResult: PASS\nAll tests passed.\n",
)
passed, reason = check_tests_passed("enduro-trails", "ET-001")
assert passed is False
def test_no_verdict_field_fails(self, setup_work_item_dir):
# Frontmatter present but neither verdict nor status -> fail.
self._write(
setup_work_item_dir,
"---\ntype: test-report\nversion: 1\n---\n\nResult: PASS\n",
)
passed, reason = check_tests_passed("enduro-trails", "ET-001")
assert passed is False
def test_invalid_yaml_fails_no_exception(self, setup_work_item_dir):
# Broken YAML frontmatter -> False with reason, never raises.
self._write(
setup_work_item_dir,
"---\nverdict: [unclosed\n : : :\n---\n\nbody PASS\n",
)
passed, reason = check_tests_passed("enduro-trails", "ET-001")
assert passed is False
assert "YAML" in reason or "frontmatter" in reason.lower()
def test_no_report(self, setup_work_item_dir):
passed, reason = check_tests_passed("enduro-trails", "ET-001")
assert passed is False
assert "not found" in reason.lower()
class TestCheckDeployStatus:
"""BUG 8: deploy -> done must be gated on the deployer's machine-readable
deploy_status verdict in 14-deploy-log.md frontmatter, NOT the LLM exit code
(always 0). Mirrors check_reviewer_verdict (reads ONLY the frontmatter field)."""
def _write_log(self, repo_dir, content):
wi_dir = repo_dir / "docs" / "work-items" / "ET-011"
wi_dir.mkdir(parents=True)
(wi_dir / "14-deploy-log.md").write_text(content)
def test_success_verdict_passes(self, setup_work_item_dir):
self._write_log(
setup_work_item_dir,
"---\ndeploy_status: SUCCESS\nversion: v0.0.3\n---\n\nDeployed OK.\n",
)
passed, reason = check_deploy_status("enduro-trails", "ET-011")
assert passed is True
assert "SUCCESS" in reason
def test_failed_verdict_fails(self, setup_work_item_dir):
self._write_log(
setup_work_item_dir,
"---\ndeploy_status: FAILED\nversion: v0.0.3\n---\n\npermission denied.\n",
)
passed, reason = check_deploy_status("enduro-trails", "ET-011")
assert passed is False
assert "FAILED" in reason
def test_no_file_fails(self, setup_work_item_dir):
passed, reason = check_deploy_status("enduro-trails", "ET-011")
assert passed is False
assert "not found" in reason.lower()
def test_no_field_fails(self, setup_work_item_dir):
# Frontmatter present but no deploy_status field -> must NOT pass.
self._write_log(
setup_work_item_dir,
"---\nversion: v0.0.3\n---\n\nStatus: FAILED (prose only).\n",
)
passed, reason = check_deploy_status("enduro-trails", "ET-011")
assert passed is False
def test_prose_only_no_frontmatter_fails(self, setup_work_item_dir):
# Prose mentioning SUCCESS but no machine-readable frontmatter -> fail.
self._write_log(
setup_work_item_dir,
"# Deploy log\n\nStatus: SUCCESS (prose, not frontmatter).\n",
)
passed, reason = check_deploy_status("enduro-trails", "ET-011")
assert passed is False
# --- ET-013 path-sync fix: log written to origin/main via separate PR ---
def test_origin_main_success_passes_when_absent_in_worktree(self, monkeypatch):
# Deployer merged 14-deploy-log.md into main via a separate PR; it is NOT
# in the feature worktree. Gate must recover it from origin/main -> PASS.
# (This is the exact ET-013 regression.)
monkeypatch.setattr(
"src.qg.checks._deploy_log_from_main",
lambda repo, wi: "---\ndeploy_status: SUCCESS\nversion: v0.0.5\n---\n\nLive.\n",
)
passed, reason = check_deploy_status("enduro-trails", "ET-013")
assert passed is True
assert "SUCCESS" in reason
def test_origin_main_failed_fails(self, monkeypatch):
# A genuine FAILED log in main must still fail.
monkeypatch.setattr(
"src.qg.checks._deploy_log_from_main",
lambda repo, wi: "---\ndeploy_status: FAILED\nversion: v0.0.5\n---\n\nboom.\n",
)
passed, reason = check_deploy_status("enduro-trails", "ET-013")
assert passed is False
assert "FAILED" in reason
def test_absent_everywhere_fails(self, monkeypatch):
# Not in worktree and origin/main lookup yields nothing -> not found.
monkeypatch.setattr(
"src.qg.checks._deploy_log_from_main", lambda repo, wi: None
)
passed, reason = check_deploy_status("enduro-trails", "ET-013")
assert passed is False
assert "not found" in reason.lower()
@patch("src.qg.checks.subprocess.run")
@patch("src.qg.checks.os.path.isdir", return_value=True)
def test_fetch_failure_degrades_no_exception(self, mock_isdir, mock_run):
# git fetch/show raising (e.g. network) must degrade to "not found",
# never propagate an exception out of the gate.
import subprocess as _sp
mock_run.side_effect = _sp.TimeoutExpired(cmd="git", timeout=30)
passed, reason = check_deploy_status("enduro-trails", "ET-013")
assert passed is False
assert "not found" in reason.lower()
def test_worktree_log_short_circuits_main_lookup(self, setup_work_item_dir, monkeypatch):
# If the log IS present in the worktree, origin/main must NOT be consulted.
self._write_log(
setup_work_item_dir,
"---\ndeploy_status: SUCCESS\nversion: v0.0.3\n---\n\nDeployed OK.\n",
)
called = {"n": 0}
def _boom(repo, wi):
called["n"] += 1
return None
monkeypatch.setattr("src.qg.checks._deploy_log_from_main", _boom)
passed, reason = check_deploy_status("enduro-trails", "ET-011")
assert passed is True
assert called["n"] == 0
def test_deploy_stage_qg_is_check_deploy_status(self):
assert get_qg_for_stage("deploy") == "check_deploy_status"
def test_registered_in_qg_checks(self):
from src.qg.checks import QG_CHECKS
assert QG_CHECKS.get("check_deploy_status") is check_deploy_status
class TestDevelopmentStageQG:
"""BUG 6: development stage QG is now check_ci_green (CI is the authoritative
gate), not the deprecated check_tests_local."""
def test_development_qg_is_check_ci_green(self):
assert get_qg_for_stage("development") == "check_ci_green"
def test_check_tests_local_is_deprecated_and_unwired(self):
# Kept in the registry for backward-compat, but not wired to any stage.
from src.qg.checks import QG_CHECKS
from src.stages import STAGE_TRANSITIONS
assert "check_tests_local" in QG_CHECKS
wired = {t.get("qg") for t in STAGE_TRANSITIONS.values()}
assert "check_tests_local" not in wired
class TestCheckTestsLocal:
"""BUG 5: check_tests_local must run pytest directly (not make, which is
not installed in the orchestrator container)."""
@patch("src.qg.checks.ensure_worktree")
@patch("subprocess.run")
def test_passes_on_returncode_zero(self, mock_run, mock_wt, tmp_path):
mock_wt.return_value = str(tmp_path)
mock_run.return_value = MagicMock(returncode=0, stdout="ok", stderr="")
passed, reason = check_tests_local("enduro-trails", "feature/ET-001-x")
assert passed is True
assert reason == "Local tests passed"
@patch("src.qg.checks.ensure_worktree")
@patch("subprocess.run")
def test_fails_on_nonzero_returncode(self, mock_run, mock_wt, tmp_path):
mock_wt.return_value = str(tmp_path)
mock_run.return_value = MagicMock(returncode=1, stdout="boom", stderr="trace")
passed, reason = check_tests_local("enduro-trails", "feature/ET-001-x")
assert passed is False
assert "Local tests failed" in reason
@patch("src.qg.checks.ensure_worktree")
@patch("subprocess.run")
def test_invokes_pytest_not_make(self, mock_run, mock_wt, tmp_path):
"""The subprocess call must be pytest, from src/api, against ../../tests/."""
mock_wt.return_value = str(tmp_path)
mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
check_tests_local("enduro-trails", "feature/ET-001-x")
args, kwargs = mock_run.call_args
cmd = args[0]
assert "make" not in cmd
assert cmd[:3] == ["python", "-m", "pytest"]
assert "../../tests/" in cmd
assert kwargs["cwd"] == os.path.join(str(tmp_path), "src", "api")

304
tests/test_queue.py Normal file
View File

@@ -0,0 +1,304 @@
"""Tests for ORCH-1 (F-2b) persistent job queue.
Covers:
- enqueue_job -> claim_next_job -> mark_job lifecycle
- claim_next_job atomicity (no double-dispatch of the same job)
- retry: fail -> requeue while attempts < max_attempts, then failed
- requeue_running_jobs (queue-recovery)
- count_running_jobs / job_status_counts / recent_jobs
- QueueWorker respects max_concurrency (Popen / launch fully mocked)
The real claude/Popen is NEVER spawned: launcher.launch_job is mocked in worker
tests, and the launcher finalize logic is exercised directly via mark_job.
"""
import os
import tempfile
import pytest
# Override env before importing app modules (same convention as test_qg.py).
_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_queue.db")
os.environ["ORCH_DB_PATH"] = _test_db
os.environ["ORCH_REPOS_DIR"] = tempfile.gettempdir()
os.environ["ORCH_GITEA_TOKEN"] = "test-token"
os.environ["ORCH_PLANE_API_TOKEN"] = "test-token"
import src.db as db
from src.db import (
init_db,
enqueue_job,
claim_next_job,
mark_job,
count_running_jobs,
requeue_running_jobs,
get_job,
job_status_counts,
recent_jobs,
)
@pytest.fixture(autouse=True)
def fresh_db(tmp_path, monkeypatch):
"""Point the DB at a fresh per-test sqlite file and init the schema."""
dbfile = tmp_path / "queue.db"
monkeypatch.setattr(db.settings, "db_path", str(dbfile))
init_db()
yield
# ---------------------------------------------------------------------------
# enqueue / claim / mark lifecycle
# ---------------------------------------------------------------------------
class TestLifecycle:
def test_enqueue_creates_queued_job(self):
jid = enqueue_job("analyst", "enduro-trails", "task body", task_id=7)
job = get_job(jid)
assert job["status"] == "queued"
assert job["agent"] == "analyst"
assert job["repo"] == "enduro-trails"
assert job["task_content"] == "task body"
assert job["task_id"] == 7
assert job["attempts"] == 0
assert job["max_attempts"] == 2
def test_claim_marks_running_and_increments_attempts(self):
jid = enqueue_job("developer", "repo")
claimed = claim_next_job()
assert claimed is not None
assert claimed["id"] == jid
assert claimed["status"] == "running"
assert claimed["attempts"] == 1
assert count_running_jobs() == 1
def test_claim_empty_queue_returns_none(self):
assert claim_next_job() is None
def test_claim_is_fifo(self):
a = enqueue_job("analyst", "r")
b = enqueue_job("developer", "r")
assert claim_next_job()["id"] == a
assert claim_next_job()["id"] == b
def test_mark_done(self):
jid = enqueue_job("tester", "r")
claim_next_job()
mark_job(jid, "done", run_id=42)
job = get_job(jid)
assert job["status"] == "done"
assert job["run_id"] == 42
assert job["finished_at"] is not None
assert count_running_jobs() == 0
def test_mark_failed_records_error(self):
jid = enqueue_job("tester", "r")
claim_next_job()
mark_job(jid, "failed", run_id=9, error="boom")
job = get_job(jid)
assert job["status"] == "failed"
assert job["error"] == "boom"
assert job["finished_at"] is not None
# ---------------------------------------------------------------------------
# claim atomicity — no double dispatch
# ---------------------------------------------------------------------------
class TestClaimAtomicity:
def test_single_job_claimed_once(self):
jid = enqueue_job("analyst", "r")
first = claim_next_job()
second = claim_next_job()
assert first["id"] == jid
assert second is None # already running, not re-dispatched
def test_concurrent_claims_no_duplicate(self):
"""Many enqueued jobs claimed from parallel threads -> each claimed once."""
import threading
n = 20
for _ in range(n):
enqueue_job("developer", "r")
claimed_ids = []
lock = threading.Lock()
def grab():
while True:
job = claim_next_job()
if job is None:
return
with lock:
claimed_ids.append(job["id"])
threads = [threading.Thread(target=grab) for _ in range(8)]
for t in threads:
t.start()
for t in threads:
t.join()
assert len(claimed_ids) == n
assert len(set(claimed_ids)) == n # no id claimed twice
assert count_running_jobs() == n
# ---------------------------------------------------------------------------
# retry semantics (mirrors launcher._finalize_job logic)
# ---------------------------------------------------------------------------
class TestRetry:
def test_fail_requeues_while_under_max(self):
jid = enqueue_job("developer", "r", max_attempts=2)
job = claim_next_job() # attempts=1
assert job["attempts"] == 1
# attempts(1) < max(2) -> requeue
mark_job(jid, "queued", error="exit 1")
j = get_job(jid)
assert j["status"] == "queued"
assert j["error"] == "exit 1"
assert j["started_at"] is None # requeue clears started_at
def test_fail_fails_when_max_reached(self):
jid = enqueue_job("developer", "r", max_attempts=2)
claim_next_job() # attempts=1 -> requeue
mark_job(jid, "queued")
job2 = claim_next_job() # attempts=2
assert job2["attempts"] == 2
# attempts(2) >= max(2) -> failed
mark_job(jid, "failed", error="exit 1")
assert get_job(jid)["status"] == "failed"
def test_finalize_job_done(self):
"""launcher._finalize_job marks done on exit_code 0 (no Popen needed)."""
from src.agents.launcher import AgentLauncher
jid = enqueue_job("analyst", "r")
claim_next_job()
AgentLauncher()._finalize_job(jid, "analyst", run_id=5, exit_code=0)
assert get_job(jid)["status"] == "done"
def test_finalize_job_requeue_then_fail(self, monkeypatch):
from src.agents.launcher import AgentLauncher
# Silence telegram side-effect.
monkeypatch.setattr("src.notifications.send_telegram", lambda *a, **k: None)
lr = AgentLauncher()
jid = enqueue_job("developer", "r", max_attempts=2)
claim_next_job() # attempts=1
lr._finalize_job(jid, "developer", run_id=1, exit_code=2)
assert get_job(jid)["status"] == "queued" # 1 < 2 -> requeue
claim_next_job() # attempts=2
lr._finalize_job(jid, "developer", run_id=2, exit_code=2)
assert get_job(jid)["status"] == "failed" # 2 >= 2 -> failed
# ---------------------------------------------------------------------------
# queue-recovery
# ---------------------------------------------------------------------------
class TestRequeueRunning:
def test_requeue_running_jobs(self):
a = enqueue_job("analyst", "r")
b = enqueue_job("developer", "r")
claim_next_job() # a -> running
claim_next_job() # b -> running
assert count_running_jobs() == 2
n = requeue_running_jobs()
assert n == 2
assert count_running_jobs() == 0
assert get_job(a)["status"] == "queued"
assert get_job(b)["status"] == "queued"
def test_requeue_preserves_attempts(self):
jid = enqueue_job("analyst", "r")
claim_next_job() # attempts=1
requeue_running_jobs()
assert get_job(jid)["attempts"] == 1 # not reset
# ---------------------------------------------------------------------------
# observability helpers
# ---------------------------------------------------------------------------
class TestObservability:
def test_status_counts(self):
enqueue_job("analyst", "r") # stays queued
enqueue_job("developer", "r") # first claimed -> running (FIFO)
claim_next_job()
counts = job_status_counts()
assert counts["running"] == 1
assert counts["queued"] == 1
assert counts["done"] == 0
assert counts["failed"] == 0
def test_recent_jobs_desc(self):
ids = [enqueue_job("analyst", "r") for _ in range(3)]
recent = recent_jobs(10)
assert [r["id"] for r in recent] == sorted(ids, reverse=True)
# ---------------------------------------------------------------------------
# QueueWorker max_concurrency (launch_job fully mocked — no real Popen)
# ---------------------------------------------------------------------------
class TestWorkerConcurrency:
@pytest.fixture(autouse=True)
def _ok_preflight(self, monkeypatch):
# ORCH-1 resilience: the worker gates claims behind preflight; in tests there
# is no claude binary, so stub preflight OK to exercise pure queue/concurrency.
monkeypatch.setattr("src.queue_worker.preflight.check", lambda *a, **k: (True, "ok"))
def test_worker_respects_max_concurrency(self, monkeypatch):
from src.queue_worker import QueueWorker
launched = []
def fake_launch_job(job):
# Simulate a long-running agent: the job stays 'running' (we do NOT
# mark it done), so the slot remains occupied.
launched.append(job["id"])
return 100 + job["id"]
monkeypatch.setattr("src.queue_worker.launcher.launch_job", fake_launch_job)
for _ in range(5):
enqueue_job("developer", "r")
w = QueueWorker(max_concurrency=2, poll_interval=0.01)
w._drain_once()
# Only max_concurrency jobs may be launched / running at once.
assert len(launched) == 2
assert count_running_jobs() == 2
def test_worker_drains_as_slots_free(self, monkeypatch):
from src.queue_worker import QueueWorker
def fake_launch_job(job):
# Immediately complete the job so the slot frees for the next claim.
mark_job(job["id"], "done", run_id=job["id"])
return job["id"]
monkeypatch.setattr("src.queue_worker.launcher.launch_job", fake_launch_job)
for _ in range(4):
enqueue_job("analyst", "r")
w = QueueWorker(max_concurrency=1, poll_interval=0.01)
w._drain_once()
# With instant completion and concurrency 1, one drain pass empties the queue.
assert job_status_counts()["done"] == 4
assert count_running_jobs() == 0
def test_worker_launch_failure_does_not_wedge_slot(self, monkeypatch):
from src.queue_worker import QueueWorker
def boom(job):
raise RuntimeError("repo missing")
monkeypatch.setattr("src.queue_worker.launcher.launch_job", boom)
monkeypatch.setattr("src.notifications.send_telegram", lambda *a, **k: None)
enqueue_job("developer", "r", max_attempts=1)
w = QueueWorker(max_concurrency=1, poll_interval=0.01)
w._drain_once()
# attempts=1 >= max_attempts=1 -> failed, not stuck running.
assert count_running_jobs() == 0
counts = job_status_counts()
assert counts["failed"] == 1

295
tests/test_resilience.py Normal file
View File

@@ -0,0 +1,295 @@
"""ORCH-1 resilience tests: preflight, 429-classifier, backoff, circuit breaker.
No real claude/Popen is ever spawned: preflight subprocess and launcher.launch_job
are mocked. DB is a fresh per-test sqlite file.
"""
import os
import tempfile
import pytest
_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_resilience.db")
os.environ["ORCH_DB_PATH"] = _test_db
os.environ["ORCH_REPOS_DIR"] = tempfile.gettempdir()
os.environ["ORCH_GITEA_TOKEN"] = "test-token"
os.environ["ORCH_PLANE_API_TOKEN"] = "test-token"
import src.db as db
from src.db import (
init_db, enqueue_job, claim_next_job, get_job, count_running_jobs,
mark_job_transient,
)
from src import preflight, error_classifier
from src.error_classifier import classify_text, parse_retry_after, classify_log_file
from src.queue_worker import QueueWorker, CircuitBreaker
from src.agents.launcher import AgentLauncher
@pytest.fixture(autouse=True)
def fresh_db(tmp_path, monkeypatch):
monkeypatch.setattr(db.settings, "db_path", str(tmp_path / "res.db"))
init_db()
preflight.reset_cache()
yield
# ---------------------------------------------------------------------------
# A. Preflight
# ---------------------------------------------------------------------------
class TestPreflight:
def test_fail_when_bin_missing(self, monkeypatch):
monkeypatch.setattr(preflight, "_claude_bin", lambda: "/no/such/claude")
ok, reason = preflight.check(force=True)
assert ok is False
assert "not found" in reason.lower()
def test_ok_when_version_succeeds(self, monkeypatch, tmp_path):
fake_bin = tmp_path / "claude"
fake_bin.write_text("#!/bin/sh\necho v1\n")
monkeypatch.setattr(preflight, "_claude_bin", lambda: str(fake_bin))
monkeypatch.setattr(preflight, "_run_version", lambda b: (True, "1.2.3"))
ok, reason = preflight.check(force=True)
assert ok is True
def test_cache_does_not_recheck_within_ttl(self, monkeypatch, tmp_path):
fake_bin = tmp_path / "claude"
fake_bin.write_text("x")
monkeypatch.setattr(preflight, "_claude_bin", lambda: str(fake_bin))
monkeypatch.setattr(db.settings, "preflight_cache_ttl", 999)
calls = {"n": 0}
def counting_version(b):
calls["n"] += 1
return True, "ok"
monkeypatch.setattr(preflight, "_run_version", counting_version)
preflight.reset_cache()
preflight.check() # first -> runs version
preflight.check() # cached -> no extra version call
preflight.check()
assert calls["n"] == 1
def test_force_bypasses_cache(self, monkeypatch, tmp_path):
fake_bin = tmp_path / "claude"
fake_bin.write_text("x")
monkeypatch.setattr(preflight, "_claude_bin", lambda: str(fake_bin))
calls = {"n": 0}
monkeypatch.setattr(preflight, "_run_version",
lambda b: (calls.__setitem__("n", calls["n"] + 1), (True, "ok"))[1])
preflight.reset_cache()
preflight.check()
preflight.check(force=True)
assert calls["n"] == 2
def test_worker_does_not_claim_when_preflight_fails(self, monkeypatch):
# Preflight FAIL -> job stays queued, launch_job never called.
monkeypatch.setattr("src.queue_worker.preflight.check",
lambda *a, **k: (False, "down"))
called = {"launch": False}
monkeypatch.setattr("src.queue_worker.launcher.launch_job",
lambda job: called.__setitem__("launch", True))
jid = enqueue_job("analyst", "r")
QueueWorker(max_concurrency=1, poll_interval=0.01)._drain_once()
assert called["launch"] is False
assert get_job(jid)["status"] == "queued"
assert count_running_jobs() == 0
# ---------------------------------------------------------------------------
# B. Error classifier
# ---------------------------------------------------------------------------
class TestClassifier:
@pytest.mark.parametrize("text", [
"Error: 429 Too Many Requests",
"anthropic rate limit exceeded",
"overloaded_error: server is overloaded",
"API quota exhausted",
"503 Service Unavailable",
"connection reset by peer",
])
def test_transient_patterns(self, text):
assert classify_text(text) == "transient"
@pytest.mark.parametrize("text", [
"Traceback: KeyError 'foo'",
"SyntaxError: invalid syntax",
"assertion failed in test",
"",
])
def test_permanent_patterns(self, text):
assert classify_text(text) == "permanent"
def test_retry_after_header(self):
assert parse_retry_after("HTTP/1.1 429\nRetry-After: 42\n") == 42
def test_retry_after_json(self):
assert parse_retry_after('{"error":{"type":"rate_limit","retry_after": 7}}') == 7
def test_retry_after_absent(self):
assert parse_retry_after("just an error") is None
def test_classify_log_file(self, tmp_path):
p = tmp_path / "run.log"
p.write_text("...lots of output...\n429 rate limit. Retry-After: 30\n")
kind, ra = classify_log_file(str(p))
assert kind == "transient"
assert ra == 30
def test_classify_missing_file_is_permanent(self):
kind, ra = classify_log_file("/no/such/log")
assert kind == "permanent"
assert ra is None
# ---------------------------------------------------------------------------
# C. Backoff + available_at gating
# ---------------------------------------------------------------------------
class TestBackoff:
def test_backoff_grows_exponentially(self):
lr = AgentLauncher()
# base=10, cap=600 (defaults)
b1 = lr._backoff_seconds(1)
b2 = lr._backoff_seconds(2)
b3 = lr._backoff_seconds(3)
assert b1 == 20 # 2^1*10
assert b2 == 40 # 2^2*10
assert b3 == 80 # 2^3*10
assert b2 > b1 and b3 > b2
def test_backoff_capped(self):
lr = AgentLauncher()
assert lr._backoff_seconds(20) == 600 # capped at backoff_max_seconds
def test_retry_after_respected_when_larger(self):
lr = AgentLauncher()
# transient_attempts=1 -> base backoff 20; Retry-After=120 wins.
assert lr._backoff_seconds(1, retry_after=120) == 120
def test_retry_after_ignored_when_smaller(self):
lr = AgentLauncher()
assert lr._backoff_seconds(3, retry_after=5) == 80 # backoff bigger
def test_transient_requeue_sets_future_available_at_and_claim_skips(self):
jid = enqueue_job("developer", "r")
claim_next_job()
# Big backoff -> available_at far in the future.
mark_job_transient(jid, 3600, error="429")
job = get_job(jid)
assert job["status"] == "queued"
assert job["transient_attempts"] == 1
assert job["available_at"] is not None
# claim must NOT pick it up while available_at is in the future.
assert claim_next_job() is None
def test_transient_requeue_claimable_when_due(self):
jid = enqueue_job("developer", "r")
claim_next_job()
mark_job_transient(jid, -5, error="429") # available_at in the past
c = claim_next_job()
assert c is not None and c["id"] == jid
# ---------------------------------------------------------------------------
# D. Launcher transient/permanent finalize (no Popen)
# ---------------------------------------------------------------------------
class TestFinalizeClassified:
def test_transient_failure_backoff_requeue(self, tmp_path, monkeypatch):
monkeypatch.setattr("src.notifications.send_telegram", lambda *a, **k: None)
log = tmp_path / "1.log"
log.write_text("Error 429 rate limit exceeded\n")
jid = enqueue_job("developer", "r", max_attempts=2)
claim_next_job()
AgentLauncher()._finalize_job(jid, "developer", run_id=1, exit_code=1,
output_path=str(log))
job = get_job(jid)
assert job["status"] == "queued"
assert job["transient_attempts"] == 1
assert job["available_at"] is not None # backoff-gated
assert job["attempts"] == 1 # code-fault budget NOT burned
def test_permanent_failure_uses_normal_attempts(self, tmp_path, monkeypatch):
monkeypatch.setattr("src.notifications.send_telegram", lambda *a, **k: None)
log = tmp_path / "2.log"
log.write_text("Traceback: ValueError\n")
jid = enqueue_job("developer", "r", max_attempts=2)
claim_next_job()
AgentLauncher()._finalize_job(jid, "developer", run_id=2, exit_code=1,
output_path=str(log))
job = get_job(jid)
assert job["status"] == "queued"
assert job["transient_attempts"] == 0 # not transient
assert job["available_at"] is None # no backoff for code-fault
def test_transient_exhausts_to_failed(self, tmp_path, monkeypatch):
monkeypatch.setattr("src.notifications.send_telegram", lambda *a, **k: None)
monkeypatch.setattr(db.settings, "transient_max_attempts", 2)
log = tmp_path / "3.log"
log.write_text("overloaded_error\n")
lr = AgentLauncher()
jid = enqueue_job("developer", "r")
claim_next_job()
lr._finalize_job(jid, "developer", 1, exit_code=1, output_path=str(log))
assert get_job(jid)["status"] == "queued" # transient 1 -> requeue
# force claimable and retry
mark_job_transient(jid, -1) # makes it due; transient=2 now
claim_next_job()
lr._finalize_job(jid, "developer", 2, exit_code=1, output_path=str(log))
assert get_job(jid)["status"] == "failed" # transient budget exhausted
# ---------------------------------------------------------------------------
# E. Circuit breaker
# ---------------------------------------------------------------------------
class TestCircuitBreaker:
def test_opens_after_threshold(self):
cb = CircuitBreaker(threshold=3, pause_seconds=300)
assert cb.allow_claim() is True
cb.record_transient()
cb.record_transient()
assert cb.state == "closed"
cb.record_transient() # 3rd -> open
assert cb.state == "open"
assert cb.allow_claim() is False # paused, no CLI calls
def test_recovered_resets_streak(self):
cb = CircuitBreaker(threshold=3)
cb.record_transient()
cb.record_transient()
cb.record_recovered()
assert cb.consecutive_transient == 0
assert cb.state == "closed"
def test_half_open_after_pause_then_closed_on_success(self, monkeypatch):
cb = CircuitBreaker(threshold=2, pause_seconds=300)
cb.record_transient()
cb.record_transient() # open
assert cb.state == "open"
# Simulate the pause elapsing.
cb.opened_at -= 301
assert cb.allow_claim() is True # -> half-open (probe)
assert cb.state == "half-open"
cb.record_recovered() # probe succeeded
assert cb.state == "closed"
def test_half_open_reopens_on_transient(self):
cb = CircuitBreaker(threshold=2, pause_seconds=300)
cb.record_transient(); cb.record_transient() # open
cb.opened_at -= 301
cb.allow_claim() # half-open
assert cb.state == "half-open"
cb.record_transient() # probe failed -> re-open
assert cb.state == "open"
def test_breaker_blocks_worker_claim(self, monkeypatch):
monkeypatch.setattr("src.queue_worker.preflight.check",
lambda *a, **k: (True, "ok"))
called = {"launch": False}
monkeypatch.setattr("src.queue_worker.launcher.launch_job",
lambda job: called.__setitem__("launch", True))
cb = CircuitBreaker(threshold=1, pause_seconds=300)
cb.record_transient() # open immediately
w = QueueWorker(max_concurrency=1, poll_interval=0.01, breaker=cb)
enqueue_job("analyst", "r")
w._drain_once()
assert called["launch"] is False # breaker open -> no claim, no CLI

543
tests/test_stage_engine.py Normal file
View File

@@ -0,0 +1,543 @@
"""ORCH-4 / M-3: tests for the unified stage engine (src/stage_engine.advance_stage).
These verify the MERGED behavior of what used to be two diverged
_try_advance_stage implementations (launcher sync + plane async):
* happy-path advance for every stage launches the CORRECT agent
(the ORCH-4 fix: agent = get_agent_for_stage(current_stage), NOT next_stage);
* a QG failure does not advance;
* reviewer REQUEST_CHANGES -> rollback to development + enqueue developer;
* developer retries > 3 -> telegram alert, no further enqueue;
* tester FAIL -> rollback to development + enqueue developer;
* architect conflict (10-conflict.md) -> rollback to analysis + enqueue analyst;
* launcher AND plane both delegate to the engine.
Network/Plane/Telegram side effects are mocked at the src.stage_engine level so
the engine runs against a real isolated sqlite DB.
"""
import os
import tempfile
import pytest
# Isolated test DB (same convention as the other suites).
_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_stage_engine.db")
os.environ["ORCH_DB_PATH"] = _test_db
os.environ["ORCH_REPOS_DIR"] = tempfile.gettempdir()
os.environ.setdefault("ORCH_GITEA_TOKEN", "test-token")
os.environ.setdefault("ORCH_PLANE_API_TOKEN", "test-token")
from unittest.mock import MagicMock, patch # noqa: E402
import src.db as _db # noqa: E402
from src.db import init_db, get_db # noqa: E402
from src import stage_engine # noqa: E402
from src.stage_engine import advance_stage # noqa: E402
from src.stages import get_agent_for_stage # noqa: E402
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
@pytest.fixture(autouse=True)
def fresh_db(monkeypatch):
"""Fresh isolated DB per test."""
monkeypatch.setattr(_db.settings, "db_path", _test_db)
if os.path.exists(_test_db):
os.unlink(_test_db)
init_db()
yield
@pytest.fixture(autouse=True)
def silence_side_effects(monkeypatch):
"""Mock all Plane/Telegram/notification side effects in the engine.
Everything imported into src.stage_engine that touches the network or sends
a message becomes a no-op MagicMock so tests are deterministic and offline.
"""
for name in (
"notify_stage_change",
"notify_qg_failure",
"notify_approve_requested",
"send_telegram",
"plane_notify_stage",
"plane_notify_qg",
"plane_add_comment",
"set_issue_in_review",
"set_issue_needs_input",
"set_issue_in_progress",
"set_issue_blocked",
"set_issue_done",
):
monkeypatch.setattr(stage_engine, name, MagicMock())
def _make_task(stage, repo="enduro-trails", branch="feature/ET-001-x", wi="ET-001"):
conn = get_db()
cur = conn.execute(
"INSERT INTO tasks (plane_id, work_item_id, repo, branch, stage) "
"VALUES (?, ?, ?, ?, ?)",
(f"plane-{wi}", wi, repo, branch, stage),
)
task_id = cur.lastrowid
conn.commit()
conn.close()
return task_id
def _stage(task_id):
conn = get_db()
row = conn.execute("SELECT stage FROM tasks WHERE id=?", (task_id,)).fetchone()
conn.close()
return row[0]
def _jobs():
conn = get_db()
rows = conn.execute("SELECT agent, repo, task_id FROM jobs ORDER BY id").fetchall()
conn.close()
return [dict(r) for r in rows]
def _add_developer_runs(task_id, n):
conn = get_db()
for _ in range(n):
conn.execute(
"INSERT INTO agent_runs (task_id, agent) VALUES (?, 'developer')",
(task_id,),
)
conn.commit()
conn.close()
def _pass(*a, **k):
return (True, "ok")
def _fail(reason):
def _f(*a, **k):
return (False, reason)
return _f
# ---------------------------------------------------------------------------
# Happy path: each stage advances and launches the CORRECT agent (ORCH-4 fix)
# ---------------------------------------------------------------------------
class TestHappyPathAgentSelection:
"""The fixed agent-selection: when advancing FROM current_stage, the engine
must enqueue get_agent_for_stage(current_stage), NOT next_stage.
"""
@pytest.mark.parametrize(
"current_stage,expected_next,expected_agent",
[
("architecture", "development", "developer"),
("development", "review", "reviewer"),
("review", "testing", "tester"),
("testing", "deploy", "deployer"),
],
)
def test_advance_launches_current_stage_agent(
self, monkeypatch, current_stage, expected_next, expected_agent
):
# All QG checks pass for this happy-path suite.
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{k: _pass for k in stage_engine.QG_CHECKS},
)
task_id = _make_task(current_stage)
res = advance_stage(
task_id, current_stage, "enduro-trails", "ET-001",
"feature/ET-001-x", finished_agent=None,
)
assert res.advanced is True
assert res.to_stage == expected_next
assert _stage(task_id) == expected_next
# The ORCH-4 fix: correct agent == get_agent_for_stage(current_stage).
assert expected_agent == get_agent_for_stage(current_stage)
assert res.enqueued_agent == expected_agent
jobs = _jobs()
assert len(jobs) == 1
assert jobs[0]["agent"] == expected_agent
def test_deploy_to_done_no_agent(self, monkeypatch):
"""deploy -> done advances but launches no agent (terminal-ish)."""
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{k: _pass for k in stage_engine.QG_CHECKS},
)
task_id = _make_task("deploy")
res = advance_stage(task_id, "deploy", "enduro-trails", "ET-001",
"feature/ET-001-x", finished_agent=None)
assert res.advanced is True
assert _stage(task_id) == "done"
assert res.enqueued_agent is None
assert _jobs() == []
def test_deploy_success_syncs_plane_to_terminal_done(self, monkeypatch):
"""FIX 3: a successful deploy->done forces the Plane issue to terminal Done.
Previously the task could stick on In Progress because the merge webhook
completed it out-of-band. Now the engine drives set_issue_done() on the
deploy->done success transition.
"""
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{k: _pass for k in stage_engine.QG_CHECKS},
)
task_id = _make_task("deploy", wi="ET-012")
res = advance_stage(
task_id, "deploy", "enduro-trails", "ET-012",
"feature/ET-012-x", finished_agent="deployer",
)
assert res.advanced is True
assert _stage(task_id) == "done"
# The terminal Plane sync was invoked with the work item id.
stage_engine.set_issue_done.assert_called_once_with("ET-012")
def test_non_terminal_advance_does_not_force_plane_done(self, monkeypatch):
"""set_issue_done must only fire on the terminal deploy->done transition."""
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{k: _pass for k in stage_engine.QG_CHECKS},
)
task_id = _make_task("review")
advance_stage(
task_id, "review", "enduro-trails", "ET-001",
"feature/ET-001-x", finished_agent=None,
)
stage_engine.set_issue_done.assert_not_called()
def test_done_is_terminal(self):
task_id = _make_task("done")
res = advance_stage(task_id, "done", "enduro-trails", "ET-001",
"feature/ET-001-x", finished_agent=None)
assert res.advanced is False
assert _stage(task_id) == "done"
# ---------------------------------------------------------------------------
# QG failure: do not advance
# ---------------------------------------------------------------------------
class TestQgFailureDoesNotAdvance:
def test_qg_fail_keeps_stage(self, monkeypatch):
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{**stage_engine.QG_CHECKS, "check_architecture_done": _fail("not done")},
)
task_id = _make_task("architecture")
res = advance_stage(task_id, "architecture", "enduro-trails", "ET-001",
"feature/ET-001-x", finished_agent="architect")
assert res.advanced is False
assert res.qg_passed is False
assert _stage(task_id) == "architecture"
assert _jobs() == []
def test_webhook_path_emits_qg_failure_notification(self, monkeypatch):
"""finished_agent=None -> generic QG-failure notification fires (plane parity).
development stage QG is now check_ci_green (was check_tests_local).
"""
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{**stage_engine.QG_CHECKS, "check_ci_green": _fail("ci red")},
)
task_id = _make_task("development")
advance_stage(task_id, "development", "enduro-trails", "ET-001",
"feature/ET-001-x", finished_agent=None)
assert stage_engine.notify_qg_failure.called
assert stage_engine.plane_notify_qg.called
def test_launcher_path_no_generic_qg_notification(self, monkeypatch):
"""finished_agent set -> NO generic QG notification (launcher parity)."""
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{**stage_engine.QG_CHECKS, "check_architecture_done": _fail("not done")},
)
task_id = _make_task("architecture")
advance_stage(task_id, "architecture", "enduro-trails", "ET-001",
"feature/ET-001-x", finished_agent="architect")
assert not stage_engine.notify_qg_failure.called
# ---------------------------------------------------------------------------
# Reviewer REQUEST_CHANGES -> rollback to development + enqueue developer
# ---------------------------------------------------------------------------
class TestReviewerRequestChanges:
def test_rollback_and_enqueue_developer(self, monkeypatch):
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{**stage_engine.QG_CHECKS,
"check_reviewer_verdict": _fail("verdict: REQUEST_CHANGES")},
)
task_id = _make_task("review")
res = advance_stage(task_id, "review", "enduro-trails", "ET-001",
"feature/ET-001-x", finished_agent="reviewer")
assert res.advanced is False
assert res.rolled_back_to == "development"
assert _stage(task_id) == "development"
jobs = _jobs()
assert len(jobs) == 1
assert jobs[0]["agent"] == "developer"
def test_retry_over_3_alerts_no_enqueue(self, monkeypatch):
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{**stage_engine.QG_CHECKS,
"check_reviewer_verdict": _fail("verdict: REQUEST_CHANGES")},
)
task_id = _make_task("review")
_add_developer_runs(task_id, 3) # already at the max
res = advance_stage(task_id, "review", "enduro-trails", "ET-001",
"feature/ET-001-x", finished_agent="reviewer")
assert res.rolled_back_to == "development"
assert res.alerted is True
assert stage_engine.send_telegram.called
# No new developer job enqueued past the retry cap.
assert _jobs() == []
# ---------------------------------------------------------------------------
# Tester FAIL -> rollback to development + enqueue developer
# ---------------------------------------------------------------------------
class TestTesterFail:
def test_rollback_and_enqueue_developer(self, monkeypatch):
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{**stage_engine.QG_CHECKS, "check_tests_passed": _fail("2 tests failed")},
)
task_id = _make_task("testing")
res = advance_stage(task_id, "testing", "enduro-trails", "ET-001",
"feature/ET-001-x", finished_agent="tester")
assert res.advanced is False
assert res.rolled_back_to == "development"
assert _stage(task_id) == "development"
jobs = _jobs()
assert len(jobs) == 1
assert jobs[0]["agent"] == "developer"
def test_retry_over_3_blocks_and_alerts(self, monkeypatch):
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{**stage_engine.QG_CHECKS, "check_tests_passed": _fail("still failing")},
)
task_id = _make_task("testing")
_add_developer_runs(task_id, 3)
res = advance_stage(task_id, "testing", "enduro-trails", "ET-001",
"feature/ET-001-x", finished_agent="tester")
assert res.rolled_back_to == "development"
assert res.alerted is True
assert stage_engine.set_issue_blocked.called
assert _jobs() == []
# ---------------------------------------------------------------------------
# BUG 8: deploy verdict gates deploy -> done (not the LLM exit code)
# ---------------------------------------------------------------------------
class TestDeployVerdict:
"""deploy -> done must be gated on check_deploy_status (the deployer's
machine-readable verdict), NOT on the LLM exit code (always 0)."""
def test_failed_verdict_rolls_back_to_development(self, monkeypatch):
# deployer finished (exit_code 0 from launcher), but verdict is FAILED.
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{**stage_engine.QG_CHECKS,
"check_deploy_status": _fail("Deploy status: FAILED")},
)
task_id = _make_task("deploy")
res = advance_stage(task_id, "deploy", "enduro-trails", "ET-011",
"feature/ET-011-x", finished_agent="deployer")
assert res.advanced is False
assert res.rolled_back_to == "development"
assert _stage(task_id) == "development" # NOT done
assert res.alerted is True
assert stage_engine.set_issue_blocked.called
assert stage_engine.send_telegram.called
def test_no_deploy_log_rolls_back(self, monkeypatch):
# No frontmatter field / no file -> check returns False -> rollback.
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{**stage_engine.QG_CHECKS,
"check_deploy_status": _fail("Deploy log not found (14-deploy-log.md)")},
)
task_id = _make_task("deploy")
res = advance_stage(task_id, "deploy", "enduro-trails", "ET-011",
"feature/ET-011-x", finished_agent="deployer")
assert res.advanced is False
assert _stage(task_id) == "development"
def test_success_verdict_advances_to_done(self, monkeypatch):
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{**stage_engine.QG_CHECKS,
"check_deploy_status": _pass},
)
task_id = _make_task("deploy")
res = advance_stage(task_id, "deploy", "enduro-trails", "ET-011",
"feature/ET-011-x", finished_agent="deployer")
assert res.advanced is True
assert res.to_stage == "done"
assert _stage(task_id) == "done"
assert res.enqueued_agent is None # no agent leaves deploy
assert _jobs() == []
# ---------------------------------------------------------------------------
# Architect conflict -> rollback to analysis + enqueue analyst
# ---------------------------------------------------------------------------
class TestArchitectConflict:
def test_conflict_rolls_back_to_analysis(self, monkeypatch, tmp_path):
# 10-conflict.md must exist in the worktree path the engine inspects.
wt = tmp_path / "wt"
conflict_dir = wt / "docs" / "work-items" / "ET-001"
conflict_dir.mkdir(parents=True)
(conflict_dir / "10-conflict.md").write_text("conflict with TRZ")
monkeypatch.setattr(stage_engine, "get_worktree_path", lambda repo, branch: str(wt))
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{**stage_engine.QG_CHECKS, "check_architecture_done": _fail("conflict")},
)
task_id = _make_task("architecture")
res = advance_stage(task_id, "architecture", "enduro-trails", "ET-001",
"feature/ET-001-x", finished_agent="architect")
assert res.advanced is False
assert res.rolled_back_to == "analysis"
assert _stage(task_id) == "analysis"
jobs = _jobs()
assert len(jobs) == 1
assert jobs[0]["agent"] == "analyst"
def test_no_conflict_file_no_rollback(self, monkeypatch, tmp_path):
wt = tmp_path / "wt"
(wt / "docs").mkdir(parents=True)
monkeypatch.setattr(stage_engine, "get_worktree_path", lambda repo, branch: str(wt))
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{**stage_engine.QG_CHECKS, "check_architecture_done": _fail("incomplete")},
)
task_id = _make_task("architecture")
res = advance_stage(task_id, "architecture", "enduro-trails", "ET-001",
"feature/ET-001-x", finished_agent="architect")
assert res.advanced is False
assert res.rolled_back_to is None
assert _stage(task_id) == "architecture"
assert _jobs() == []
# ---------------------------------------------------------------------------
# Analyst approved-flow (analysis gate): never auto-advances
# ---------------------------------------------------------------------------
class TestAnalysisApprovedFlow:
def test_artifacts_ready_requests_approval_no_advance(self, monkeypatch):
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{**stage_engine.QG_CHECKS, "check_analysis_complete": _pass},
)
task_id = _make_task("analysis")
res = advance_stage(task_id, "analysis", "enduro-trails", "ET-001",
"feature/ET-001-x", finished_agent="analyst")
assert res.advanced is False
assert _stage(task_id) == "analysis"
assert stage_engine.set_issue_in_review.called
assert stage_engine.notify_approve_requested.called
assert _jobs() == []
def test_approved_verdict_advances_analysis_to_architecture(self, monkeypatch):
"""BUG 4: a human Approved STATUS (webhook path, finished_agent=None)
must satisfy the analysis gate and advance analysis -> architecture,
enqueuing the architect. The status-only approval must NOT re-run
check_analysis_approved (which looks for an :approved: COMMENT and would
otherwise wrongly block the advance).
"""
# Make check_analysis_approved FAIL if it is ever called: the webhook
# path must bypass it entirely (status == approval). If the engine were
# to re-run the gate, this would block the advance and fail the test.
monkeypatch.setattr(
stage_engine, "QG_CHECKS",
{
**stage_engine.QG_CHECKS,
"check_analysis_approved": _fail("no :approved: comment"),
},
)
# Guard: the approval-flow (launcher-only) must NOT be invoked here.
flow = MagicMock()
monkeypatch.setattr(stage_engine, "_handle_analysis_approved_flow", flow)
task_id = _make_task("analysis")
res = advance_stage(
task_id, "analysis", "enduro-trails", "ET-001",
"feature/ET-001-x", finished_agent=None,
)
assert res.advanced is True
assert res.to_stage == "architecture"
assert _stage(task_id) == "architecture"
assert res.enqueued_agent == "architect"
# Sanity: agent for analysis is architect, never analyst (no re-run loop).
assert get_agent_for_stage("analysis") == "architect"
jobs = _jobs()
assert len(jobs) == 1
assert jobs[0]["agent"] == "architect"
# The launcher-only approval-flow was NOT called on the webhook path.
flow.assert_not_called()
def test_launcher_path_does_not_advance_and_calls_flow(self, monkeypatch):
"""Regression: the launcher path (finished_agent='analyst') still routes
into _handle_analysis_approved_flow and does NOT advance.
"""
flow = MagicMock()
monkeypatch.setattr(stage_engine, "_handle_analysis_approved_flow", flow)
task_id = _make_task("analysis")
res = advance_stage(
task_id, "analysis", "enduro-trails", "ET-001",
"feature/ET-001-x", finished_agent="analyst",
)
assert res.advanced is not True
assert _stage(task_id) == "analysis"
assert _jobs() == []
flow.assert_called_once()
# ---------------------------------------------------------------------------
# launcher + plane both delegate to the engine
# ---------------------------------------------------------------------------
class TestDelegation:
def test_launcher_calls_engine(self):
from src.agents.launcher import AgentLauncher
task_id = _make_task("development", branch="feature/ET-777-deleg")
with patch("src.stage_engine.advance_stage") as m:
AgentLauncher()._try_advance_stage(
run_id=1, agent="developer", repo="enduro-trails",
branch="feature/ET-777-deleg",
)
m.assert_called_once()
kwargs = m.call_args.kwargs
assert kwargs["task_id"] == task_id
assert kwargs["current_stage"] == "development"
assert kwargs["finished_agent"] == "developer"
def test_plane_calls_engine(self):
import asyncio
from src.webhooks import plane as plane_mod
with patch("src.stage_engine.advance_stage") as m:
asyncio.run(
plane_mod._try_advance_stage(
task_id=5, current_stage="analysis", repo="enduro-trails",
work_item_id="ET-001", branch="feature/ET-001-x",
)
)
m.assert_called_once()
# plane passes positional args; finished_agent (last positional) is None.
args = m.call_args.args
assert args[0] == 5
assert args[1] == "analysis"
assert args[-1] is None

View File

@@ -0,0 +1,94 @@
"""Feature 3: stage visibility on the Plane board.
* PLANE_STATES carries the 6 new per-stage / verdict UUIDs.
* STAGE_TO_STATE maps architecture/development/review/testing to their
dedicated board statuses (not all -> In Progress anymore).
* set_issue_stage_state(work_item_id, stage) PATCHes the correct state UUID
for a visible stage, and is a no-op for stages without one (analysis/deploy).
* Needs Input / In Review / Blocked remain higher priority: their explicit
setters use their own state, never overwritten by the stage map.
httpx is mocked; no network.
"""
import os
os.environ.setdefault("ORCH_PLANE_API_TOKEN", "test-token")
os.environ.setdefault("ORCH_GITEA_TOKEN", "test-token")
from unittest.mock import patch, MagicMock # noqa: E402
from src import plane_sync as PS # noqa: E402
EXPECTED_UUIDS = {
"architecture": "3020bbb7-6122-4663-930c-0315ba8dfa3d",
"development": "9920609b-f140-4e46-ab95-89acda8412c8",
"review": "ba0d802c-5218-41d4-ab43-978b0ea123ed",
"testing": "7855d807-b1bf-42ef-8dae-6cde0df92d02",
"approved": "a519a341-dada-4a91-8910-7604f82b79c5",
"rejected": "ba958f3c-5db5-461d-8f82-89425e413b97",
}
def test_plane_states_has_new_uuids():
for key, uuid in EXPECTED_UUIDS.items():
assert PS.PLANE_STATES[key] == uuid
def test_stage_to_state_maps_visible_stages():
assert PS.STAGE_TO_STATE["architecture"] == EXPECTED_UUIDS["architecture"]
assert PS.STAGE_TO_STATE["development"] == EXPECTED_UUIDS["development"]
assert PS.STAGE_TO_STATE["review"] == EXPECTED_UUIDS["review"]
assert PS.STAGE_TO_STATE["testing"] == EXPECTED_UUIDS["testing"]
# analysis / deploy stay on In Progress; done stays Done.
assert PS.STAGE_TO_STATE["analysis"] == PS.PLANE_STATES["in_progress"]
assert PS.STAGE_TO_STATE["deploy"] == PS.PLANE_STATES["in_progress"]
assert PS.STAGE_TO_STATE["done"] == PS.PLANE_STATES["done"]
def _patch_resolution(monkey_targets):
"""Helper: patch find_issue_id + _resolve_project_id to skip the DB/network."""
return monkey_targets
@patch("src.plane_sync.httpx.patch")
@patch("src.plane_sync.find_issue_id", return_value="issue-uuid")
@patch("src.plane_sync._resolve_project_id", return_value="proj-1")
def test_set_issue_stage_state_patches_correct_uuid(mock_proj, mock_find, mock_patch):
resp = MagicMock(); resp.raise_for_status.return_value = None
mock_patch.return_value = resp
PS.set_issue_stage_state("ET-1", "development")
# the PATCH carried the development state UUID
_, kwargs = mock_patch.call_args
assert kwargs["json"]["state"] == EXPECTED_UUIDS["development"]
@patch("src.plane_sync.httpx.patch")
@patch("src.plane_sync.find_issue_id", return_value="issue-uuid")
@patch("src.plane_sync._resolve_project_id", return_value="proj-1")
def test_set_issue_stage_state_noop_for_analysis(mock_proj, mock_find, mock_patch):
# analysis has no dedicated board status -> no PATCH at all.
PS.set_issue_stage_state("ET-1", "analysis")
mock_patch.assert_not_called()
PS.set_issue_stage_state("ET-1", "deploy")
mock_patch.assert_not_called()
@patch("src.plane_sync.httpx.patch")
@patch("src.plane_sync.find_issue_id", return_value="issue-uuid")
@patch("src.plane_sync._resolve_project_id", return_value="proj-1")
def test_priority_states_use_their_own_uuid(mock_proj, mock_find, mock_patch):
"""Needs Input / In Review / Blocked are set explicitly and take priority."""
resp = MagicMock(); resp.raise_for_status.return_value = None
mock_patch.return_value = resp
PS.set_issue_needs_input("ET-1")
assert mock_patch.call_args.kwargs["json"]["state"] == PS.PLANE_STATES["needs_input"]
PS.set_issue_in_review("ET-1")
assert mock_patch.call_args.kwargs["json"]["state"] == PS.PLANE_STATES["in_review"]
PS.set_issue_blocked("ET-1")
assert mock_patch.call_args.kwargs["json"]["state"] == PS.PLANE_STATES["blocked"]

View File

@@ -0,0 +1,200 @@
"""Status-only verdict model (bug 3 fix).
The comment-based control mechanism (:approved: / :rejected: / answer-to-questions)
was removed. The pipeline is driven SOLELY by Plane status changes. These tests
lock in the new behaviour:
* test_inreview_comment_does_not_revert — bug 3 root: an In Review task,
any comment arrives -> status NOT reverted, no agent launched.
* test_any_comment_no_pipeline_action — :approved: / :rejected: / plain
text comment -> no status change, no enqueue.
* test_approved_status_advances_without_inprogress_reset — Approved status
advances WITHOUT an intermediate set_issue_in_progress reset.
* test_rejected_status_pulls_reason_from_comment — Rejected status pulls the
reason from the issue's latest comment (mocked GET comments).
"""
import os
import tempfile
_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_status_only.db")
os.environ["ORCH_DB_PATH"] = _test_db
os.environ.setdefault("ORCH_PLANE_WEBHOOK_SECRET", "")
os.environ.setdefault("ORCH_GITEA_TOKEN", "test-token")
os.environ.setdefault("ORCH_PLANE_API_TOKEN", "test-token")
import pytest # noqa: E402
from unittest.mock import patch, AsyncMock # noqa: E402
from fastapi.testclient import TestClient # noqa: E402
from src.main import app # noqa: E402
from src.db import init_db, get_db # noqa: E402
from src import projects as P # noqa: E402
from src.projects import reload_projects # noqa: E402
ENDURO_PLANE_ID = "7a79f0a9-5278-49cd-9007-9a338f238f9c"
APPROVED = "a519a341-dada-4a91-8910-7604f82b79c5"
REJECTED = "ba958f3c-5db5-461d-8f82-89425e413b97"
IN_REVIEW = "38fb1f64-aa1e-48a3-92e0-0b109679046b"
client = TestClient(app)
@pytest.fixture(autouse=True)
def setup(monkeypatch):
monkeypatch.setattr(P.settings, "db_path", _test_db)
import src.db as _db
monkeypatch.setattr(_db.settings, "db_path", _test_db)
if os.path.exists(_test_db):
os.unlink(_test_db)
init_db()
monkeypatch.setattr("src.webhooks.plane.verify_plane_signature", lambda body, sig: True)
registry_json = (
f'[{{"plane_project_id": "{ENDURO_PLANE_ID}", "repo": "enduro-trails",'
f' "work_item_prefix": "ET", "name": "enduro-trails"}}]'
)
monkeypatch.setattr(P.settings, "projects_json", registry_json)
reload_projects()
# Seed a task at the 'review' stage for plane_id 'r-1'.
conn = get_db()
conn.execute(
"INSERT INTO tasks (plane_id, work_item_id, repo, branch, stage, plane_issue_id) "
"VALUES (?, ?, ?, ?, ?, ?)",
("r-1", "ET-700", "enduro-trails", "feature/ET-700-x", "review", "r-1"),
)
conn.commit()
conn.close()
yield
reload_projects()
if os.path.exists(_test_db):
os.unlink(_test_db)
class _FakeResp:
def __init__(self, status_code, payload):
self.status_code = status_code
self._payload = payload
def json(self):
return self._payload
def _comment(text, plane_id="r-1"):
return client.post("/webhook/plane", json={
"event": "issue_comment", "action": "created",
"data": {"work_item_id": plane_id, "comment_stripped": text,
"project": ENDURO_PLANE_ID},
})
def _status(state_id, plane_id="r-1", old="prev"):
return client.post("/webhook/plane", json={
"event": "issue", "action": "updated",
"data": {
"id": plane_id, "name": "Status task", "project": ENDURO_PLANE_ID,
"state": {"id": state_id, "name": "X", "group": "started"},
},
"activity": {"field": "state", "new_value": state_id, "old_value": old},
})
def _stage(plane_id="r-1"):
conn = get_db()
row = conn.execute("SELECT stage FROM tasks WHERE plane_id=?", (plane_id,)).fetchone()
conn.close()
return row[0] if row else None
# --------------------------------------------------------------------------- #
# Bug 3 root: In Review must not revert on a comment.
# --------------------------------------------------------------------------- #
@patch("src.webhooks.plane.enqueue_job")
@patch("src.plane_sync.set_issue_in_progress")
@patch("src.plane_sync._set_issue_state_direct")
@patch("src.plane_sync.update_issue_state")
def test_inreview_comment_does_not_revert(
mock_update_state, mock_set_direct, mock_sip, mock_enqueue
):
"""Bug 3: task in In Review, ANY comment arrives -> status NOT reverted to
In Progress, NO agent launched. The analyst's own 'waiting for approval'
comment used to echo back and self-hit -> reverted In Review -> In Progress.
"""
# analyst's own echo comment
resp = _comment("Готово, жду approved")
assert resp.status_code == 200
# no status changes whatsoever
mock_sip.assert_not_called()
mock_set_direct.assert_not_called()
mock_update_state.assert_not_called()
# no agent launched
mock_enqueue.assert_not_called()
# stage untouched
assert _stage() == "review"
# --------------------------------------------------------------------------- #
# Any comment -> zero pipeline side-effects.
# --------------------------------------------------------------------------- #
@pytest.mark.parametrize("text", [":approved:", ":rejected: bad", "plain text", ""])
@patch("src.webhooks.plane.enqueue_job")
@patch("src.webhooks.plane._try_advance_stage", new_callable=AsyncMock)
@patch("src.webhooks.plane._rollback_stage", new_callable=AsyncMock)
@patch("src.plane_sync.set_issue_in_progress")
@patch("src.plane_sync._set_issue_state_direct")
def test_any_comment_no_pipeline_action(
mock_set_direct, mock_sip, mock_rollback, mock_advance, mock_enqueue, text
):
resp = _comment(text)
assert resp.status_code == 200
mock_advance.assert_not_called()
mock_rollback.assert_not_called()
mock_sip.assert_not_called()
mock_set_direct.assert_not_called()
mock_enqueue.assert_not_called()
assert _stage() == "review"
# --------------------------------------------------------------------------- #
# Approved status advances WITHOUT in_progress reset.
# --------------------------------------------------------------------------- #
@patch("src.plane_sync.set_issue_in_progress")
@patch("src.webhooks.plane._try_advance_stage", new_callable=AsyncMock)
def test_approved_status_advances_without_inprogress_reset(mock_advance, mock_sip):
resp = _status(APPROVED)
assert resp.status_code == 200
mock_advance.assert_awaited_once()
# work_item_id passed positionally
assert "ET-700" in mock_advance.call_args.args
# bug 3 (cause B): NO intermediate set_issue_in_progress before advance.
mock_sip.assert_not_called()
# --------------------------------------------------------------------------- #
# Rejected status pulls reason from latest comment.
# --------------------------------------------------------------------------- #
@patch("src.webhooks.plane.httpx.get")
@patch("src.webhooks.plane._rollback_stage", new_callable=AsyncMock)
def test_rejected_status_pulls_reason_from_comment(mock_rollback, mock_get):
mock_get.return_value = _FakeResp(200, {"results": [
{"comment_stripped": "old comment", "created_at": "2026-06-03T09:00:00Z"},
{"comment_html": "<p>Needs more test coverage</p>",
"created_at": "2026-06-03T11:30:00Z"},
]})
resp = _status(REJECTED)
assert resp.status_code == 200
mock_rollback.assert_awaited_once()
reason = mock_rollback.call_args.args[-1]
# latest by created_at, HTML stripped
assert "Needs more test coverage" in reason
assert "<p>" not in reason
@patch("src.webhooks.plane.httpx.get")
@patch("src.webhooks.plane._rollback_stage", new_callable=AsyncMock)
def test_rejected_status_no_comment_uses_fallback(mock_rollback, mock_get):
mock_get.return_value = _FakeResp(200, {"results": []})
resp = _status(REJECTED)
assert resp.status_code == 200
mock_rollback.assert_awaited_once()
reason = mock_rollback.call_args.args[-1]
assert "no reason comment" in reason

View File

@@ -0,0 +1,243 @@
"""Feature 1: pipeline starts on status -> In Progress, not on creation.
* work_item.created / issue created -> NO task, NO branch, NO analyst.
* issue updated -> In Progress (from backlog) -> task created + analyst enqueued.
* a second In Progress update while the agent is busy -> NO duplicate, NO
restart (busy-guard).
* In Progress returned from Needs Input (agent idle) -> agent RELAUNCHED.
launcher / Gitea network are mocked. Real FastAPI endpoint via TestClient.
"""
import os
import tempfile
_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_status_trigger.db")
os.environ["ORCH_DB_PATH"] = _test_db
os.environ.setdefault("ORCH_PLANE_WEBHOOK_SECRET", "")
os.environ.setdefault("ORCH_GITEA_TOKEN", "test-token")
os.environ.setdefault("ORCH_PLANE_API_TOKEN", "test-token")
import pytest # noqa: E402
from unittest.mock import patch, AsyncMock # noqa: E402
from fastapi.testclient import TestClient # noqa: E402
from src.main import app # noqa: E402
from src.db import init_db, get_db # noqa: E402
from src import projects as P # noqa: E402
from src.projects import reload_projects # noqa: E402
ENDURO_PLANE_ID = "7a79f0a9-5278-49cd-9007-9a338f238f9c"
IN_PROGRESS = "b873d9eb-993c-48cd-97ac-99a9b1623967"
BACKLOG = "113b24f6-cce8-4be9-9a22-a359b9cf0122"
client = TestClient(app)
@pytest.fixture(autouse=True)
def setup(monkeypatch):
monkeypatch.setattr(P.settings, "db_path", _test_db)
import src.db as _db
monkeypatch.setattr(_db.settings, "db_path", _test_db)
if os.path.exists(_test_db):
os.unlink(_test_db)
init_db()
monkeypatch.setattr("src.webhooks.plane.verify_plane_signature", lambda body, sig: True)
registry_json = (
f'[{{"plane_project_id": "{ENDURO_PLANE_ID}", "repo": "enduro-trails",'
f' "work_item_prefix": "ET", "name": "enduro-trails"}}]'
)
monkeypatch.setattr(P.settings, "projects_json", registry_json)
reload_projects()
yield
reload_projects()
if os.path.exists(_test_db):
os.unlink(_test_db)
def _created(plane_id="st-created"):
return client.post("/webhook/plane", json={
"event": "issue", "action": "created",
"data": {
"id": plane_id, "name": "A valid backlog item title",
"description_stripped": "A sufficiently long description for QG-0.",
"project": ENDURO_PLANE_ID,
"state": {"id": BACKLOG, "name": "Backlog", "group": "backlog"},
},
})
def _to_in_progress(plane_id="st-1"):
return client.post("/webhook/plane", json={
"event": "issue", "action": "updated",
"data": {
"id": plane_id, "name": "A valid backlog item title",
"description_stripped": "A sufficiently long description for QG-0.",
"project": ENDURO_PLANE_ID,
"state": {"id": IN_PROGRESS, "name": "In Progress", "group": "started"},
},
"activity": {"field": "state", "new_value": IN_PROGRESS, "old_value": BACKLOG},
})
def _count(plane_id):
conn = get_db()
n = conn.execute("SELECT COUNT(*) FROM tasks WHERE plane_id=?", (plane_id,)).fetchone()[0]
conn.close()
return n
# --------------------------------------------------------------------------- #
@patch("src.webhooks.plane.enqueue_job")
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
def test_created_does_not_start_pipeline(mock_branch, mock_docs, mock_enqueue):
resp = _created("st-created")
assert resp.status_code == 200
assert resp.json()["status"] == "accepted"
# No task, no branch, no analyst enqueue.
assert _count("st-created") == 0
mock_branch.assert_not_called()
mock_enqueue.assert_not_called()
@patch("src.webhooks.plane.enqueue_job")
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
@patch("src.plane_sync.fetch_issue_sequence_id", return_value=5)
def test_in_progress_starts_pipeline(mock_seq, mock_branch, mock_docs, mock_enqueue):
mock_enqueue.return_value = 1
resp = _to_in_progress("st-1")
assert resp.status_code == 200
assert resp.json()["status"] == "accepted"
assert _count("st-1") == 1
conn = get_db()
task = conn.execute("SELECT * FROM tasks WHERE plane_id='st-1'").fetchone()
conn.close()
assert task["stage"] == "analysis"
assert task["repo"] == "enduro-trails"
mock_branch.assert_called_once()
# analyst enqueued exactly once
assert mock_enqueue.call_count == 1
assert mock_enqueue.call_args.args[0] == "analyst"
@patch("src.webhooks.plane.enqueue_job")
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
@patch("src.plane_sync.fetch_issue_sequence_id", return_value=5)
def test_repeat_in_progress_while_job_active_does_not_relaunch(
mock_seq, mock_branch, mock_docs, mock_enqueue
):
"""Status-only model busy-guard: a duplicate In Progress webhook that arrives
while the stage agent still has a queued/running job must NOT relaunch the
agent (no double launch).
"""
mock_enqueue.return_value = 1
_to_in_progress("st-2")
assert _count("st-2") == 1
assert mock_enqueue.call_count == 1
# enqueue_job is mocked above, so no real job row exists. Seed an ACTIVE
# (queued) job for the task so has_active_job_for_task() reports the agent as
# busy -> the busy-guard fires.
conn = get_db()
task_id = conn.execute(
"SELECT id FROM tasks WHERE plane_id='st-2'"
).fetchone()[0]
conn.execute(
"INSERT INTO jobs (agent, repo, task_id, status) VALUES (?, ?, ?, 'queued')",
("analyst", "enduro-trails", task_id),
)
conn.commit()
conn.close()
# Second In Progress update. DISTINCT body (different activity old_value) so
# webhook dedup does NOT short-circuit it — this exercises the busy-guard in
# handle_status_start, not the delivery-dedup layer.
resp = client.post("/webhook/plane", json={
"event": "issue", "action": "updated",
"data": {
"id": "st-2", "name": "A valid backlog item title",
"description_stripped": "A sufficiently long description for QG-0.",
"project": ENDURO_PLANE_ID,
"state": {"id": IN_PROGRESS, "name": "In Progress", "group": "started"},
},
"activity": {"field": "state", "new_value": IN_PROGRESS, "old_value": "some-other-state"},
})
assert resp.status_code == 200
assert _count("st-2") == 1 # still exactly one task
assert mock_enqueue.call_count == 1 # analyst NOT re-enqueued (busy-guard)
@patch("src.webhooks.plane.add_comment", create=True)
@patch("src.webhooks.plane.enqueue_job")
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
@patch("src.plane_sync.fetch_issue_sequence_id", return_value=5)
def test_inprogress_from_needs_input_relaunches_analyst(
mock_seq, mock_branch, mock_docs, mock_enqueue, mock_comment
):
"""Status-only answer-to-questions flow: an existing analysis task whose agent
is IDLE (no active job — it went to Needs Input) is returned to In Progress
-> the analyst is relaunched to read Slava's fresh comments.
+ double-webhook protection: a second In Progress while the relaunch job is
active does NOT relaunch again.
"""
mock_enqueue.return_value = 1
# First In Progress: starts the pipeline (creates task + enqueues analyst).
_to_in_progress("st-ni")
assert _count("st-ni") == 1
assert mock_enqueue.call_count == 1
# The analyst finished and asked questions -> Needs Input. In our model that
# means NO active job for the task (enqueue_job is mocked, so no job row).
conn = get_db()
task_id = conn.execute(
"SELECT id FROM tasks WHERE plane_id='st-ni'"
).fetchone()[0]
has_job = conn.execute(
"SELECT COUNT(*) FROM jobs WHERE task_id=? AND status IN ('queued','running')",
(task_id,),
).fetchone()[0]
conn.close()
assert has_job == 0 # agent idle
# Slava answers + returns the issue to In Progress (distinct body).
resp = client.post("/webhook/plane", json={
"event": "issue", "action": "updated",
"data": {
"id": "st-ni", "name": "A valid backlog item title",
"description_stripped": "A sufficiently long description for QG-0.",
"project": ENDURO_PLANE_ID,
"state": {"id": IN_PROGRESS, "name": "In Progress", "group": "started"},
},
"activity": {"field": "state", "new_value": IN_PROGRESS, "old_value": "needs-input"},
})
assert resp.status_code == 200
assert _count("st-ni") == 1 # no duplicate task
assert mock_enqueue.call_count == 2 # analyst RELAUNCHED
assert mock_enqueue.call_args.args[0] == "analyst"
# Seed an active job for the relaunch, then a SECOND In Progress webhook must
# NOT relaunch again (busy-guard against double webhooks).
conn = get_db()
conn.execute(
"INSERT INTO jobs (agent, repo, task_id, status) VALUES (?, ?, ?, 'running')",
("analyst", "enduro-trails", task_id),
)
conn.commit()
conn.close()
resp2 = client.post("/webhook/plane", json={
"event": "issue", "action": "updated",
"data": {
"id": "st-ni", "name": "A valid backlog item title",
"description_stripped": "A sufficiently long description for QG-0.",
"project": ENDURO_PLANE_ID,
"state": {"id": IN_PROGRESS, "name": "In Progress", "group": "started"},
},
"activity": {"field": "state", "new_value": IN_PROGRESS, "old_value": "x-y-z"},
})
assert resp2.status_code == 200
assert mock_enqueue.call_count == 2 # still 2 — busy-guard held

View File

@@ -0,0 +1,138 @@
"""Tests for fix/taskmd-description (3 bugs at the analyst pipeline entry/exit):
BUG A: start_pipeline built the analyst .task.md WITHOUT the description body
(only Title), so analyst received a ~101-byte file and reported the
"business request is empty". task_desc must now carry the description.
BUG B: issue.updated ships only changed fields, so `name` is usually absent ->
slug/branch became "untitled". start_pipeline must pull the real name
from the Plane API (single fetch_issue_fields GET, above the slug build)
so the branch slug is NOT "untitled".
BUG C: the analyst "artifacts ready" comment used the obsolete ":approved:"
wording. Under the status-only model it must ask for the **Approved**
status (not ":approved:", not "In Progress") and link the docs that
actually exist.
"""
import os
import tempfile
_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_taskmd_desc.db")
os.environ["ORCH_DB_PATH"] = _test_db
os.environ.setdefault("ORCH_PLANE_WEBHOOK_SECRET", "")
os.environ.setdefault("ORCH_GITEA_TOKEN", "test-token")
os.environ.setdefault("ORCH_PLANE_API_TOKEN", "test-token")
import pytest # noqa: E402
from unittest.mock import patch, AsyncMock # noqa: E402
from fastapi.testclient import TestClient # noqa: E402
from src.main import app # noqa: E402
from src.db import init_db, get_db # noqa: E402
from src import projects as P # noqa: E402
from src.projects import reload_projects # noqa: E402
ENDURO_PLANE_ID = "7a79f0a9-5278-49cd-9007-9a338f238f9c"
IN_PROGRESS = "b873d9eb-993c-48cd-97ac-99a9b1623967"
BACKLOG = "113b24f6-cce8-4be9-9a22-a359b9cf0122"
client = TestClient(app)
@pytest.fixture(autouse=True)
def setup(monkeypatch):
monkeypatch.setattr(P.settings, "db_path", _test_db)
import src.db as _db
monkeypatch.setattr(_db.settings, "db_path", _test_db)
if os.path.exists(_test_db):
os.unlink(_test_db)
init_db()
monkeypatch.setattr("src.webhooks.plane.verify_plane_signature", lambda body, sig: True)
registry_json = (
f'[{{"plane_project_id": "{ENDURO_PLANE_ID}", "repo": "enduro-trails",'
f' "work_item_prefix": "ET", "name": "enduro-trails"}}]'
)
monkeypatch.setattr(P.settings, "projects_json", registry_json)
reload_projects()
yield
reload_projects()
if os.path.exists(_test_db):
os.unlink(_test_db)
def _task(plane_id):
conn = get_db()
row = conn.execute("SELECT * FROM tasks WHERE plane_id=?", (plane_id,)).fetchone()
conn.close()
return row
# --------------------------------------------------------------------------- #
# BUG A: description reaches the analyst .task.md
# --------------------------------------------------------------------------- #
@patch("src.webhooks.plane.enqueue_job", return_value=1)
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
@patch("src.plane_sync.fetch_issue_sequence_id", return_value=11)
@patch("src.plane_sync.fetch_issue_fields",
return_value=("ET-011 real title",
"REAL BUSINESS REQUEST BODY: user wants GPX upload with "
"validation and a results map."))
def test_taskdesc_includes_description(
mock_fields, mock_seq, mock_branch, mock_docs, mock_enqueue
):
resp = client.post("/webhook/plane", json={
"event": "issue", "action": "updated",
"data": {
"id": "taskA",
# status change payload: NO name, NO description (only changed field)
"project": ENDURO_PLANE_ID,
"state": {"id": IN_PROGRESS, "name": "In Progress", "group": "started"},
},
"activity": {"field": "state", "new_value": IN_PROGRESS, "old_value": BACKLOG},
})
assert resp.status_code == 200
mock_enqueue.assert_called_once()
# task_desc is the 3rd positional arg of enqueue_job(agent, repo, task_desc, ...)
task_desc = mock_enqueue.call_args.args[2]
assert "Description:" in task_desc
# the actual description body (not just the Title) is in the file
assert "REAL BUSINESS REQUEST BODY" in task_desc
assert "results map" in task_desc
# --------------------------------------------------------------------------- #
# BUG B: name fetched from Plane API when payload is empty -> slug not untitled
# --------------------------------------------------------------------------- #
@patch("src.webhooks.plane.enqueue_job", return_value=1)
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
@patch("src.plane_sync.fetch_issue_sequence_id", return_value=11)
@patch("src.plane_sync.fetch_issue_fields",
return_value=("GPX upload feature",
"A sufficiently long description so QG-0 passes cleanly."))
def test_name_fetched_when_payload_empty(
mock_fields, mock_seq, mock_branch, mock_docs, mock_enqueue
):
resp = client.post("/webhook/plane", json={
"event": "issue", "action": "updated",
"data": {
"id": "taskB",
# NO name, NO description in the payload (Plane status-change shape)
"project": ENDURO_PLANE_ID,
"state": {"id": IN_PROGRESS, "name": "In Progress", "group": "started"},
},
"activity": {"field": "state", "new_value": IN_PROGRESS, "old_value": BACKLOG},
})
assert resp.status_code == 200
mock_fields.assert_called_once()
row = _task("taskB")
assert row is not None
branch = row["branch"]
# slug derived from the fetched name -> "gpx-upload-feature", NOT untitled
assert "untitled" not in branch
assert "gpx-upload-feature" in branch
# Title in the analyst task file is the fetched name, not "untitled"
task_desc = mock_enqueue.call_args.args[2]
assert "Title: GPX upload feature" in task_desc

View File

@@ -0,0 +1,518 @@
"""feat/telegram-live-tracker: tests for the live Telegram task tracker.
Covers (per DEV_TASK_TELEGRAM_TRACKER.md):
* short_model_name: provider/claude- prefix trimming.
* render_task_tracker: per-stage line format (in↓/out↑, model, cost, minutes),
the "⏸️ Ревью БРД · твоё время" line, the 💰 totals, and the finish block
(⏱️ three times + 🔗/📦).
* first message -> sendMessage stores message_id; transition -> editMessageText.
* fallback: editMessageText fails -> a NEW message is sent and the id updated.
* which alerts go out SEPARATELY (approve-gate / deploy-fail / agent-fail /
error) vs which do NOT (QG-pending / agent-start / stage-transition).
Isolated temp DB; no network (httpx is patched).
"""
import os
import tempfile
os.environ.setdefault("ORCH_PLANE_API_TOKEN", "test-token")
os.environ.setdefault("ORCH_GITEA_TOKEN", "test-token")
_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_tracker.db")
os.environ["ORCH_DB_PATH"] = _test_db
from unittest.mock import MagicMock, patch # noqa: E402
import pytest # noqa: E402
import src.db as db_module # noqa: E402
from src.db import init_db, get_db # noqa: E402
from src import notifications as N # noqa: E402
from src import usage as U # noqa: E402
@pytest.fixture(autouse=True)
def setup_db(monkeypatch):
monkeypatch.setattr(db_module.settings, "db_path", _test_db, raising=False)
if os.path.exists(_test_db):
os.unlink(_test_db)
init_db()
# Re-enable send_telegram (conftest stubs it to a no-op); these tests patch
# httpx / the lower-level helpers explicitly per case.
yield
if os.path.exists(_test_db):
os.unlink(_test_db)
# --------------------------------------------------------------------------- #
# helpers to build a task + runs in the DB
# --------------------------------------------------------------------------- #
def _mk_task(stage="development", title="\u0422\u0440\u0435\u043a\u0438 \u0441 \u0437\u0443\u043c\u0430 z5",
wid="ET-012", brd_start=None, brd_end=None):
conn = get_db()
cur = conn.execute(
"INSERT INTO tasks (plane_id, work_item_id, repo, branch, stage, title, "
"brd_review_started_at, brd_review_ended_at) "
"VALUES (?, ?, ?, ?, ?, ?, ?, ?)",
("p1", wid, "enduro-trails", "feature/ET-012-x", stage, title,
brd_start, brd_end),
)
tid = cur.lastrowid
conn.commit()
conn.close()
return tid
def _mk_run(task_id, agent, started, finished, in_tok, out_tok,
cache_read=0, cache_creation=0, cost=0.0, model=None, exit_code=0):
conn = get_db()
cur = conn.execute(
"INSERT INTO agent_runs (task_id, agent, started_at, finished_at, "
"exit_code, input_tokens, output_tokens, cache_read_tokens, "
"cache_creation_tokens, cost_usd, model) "
"VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
(task_id, agent, started, finished, exit_code, in_tok, out_tok,
cache_read, cache_creation, cost, model),
)
rid = cur.lastrowid
conn.commit()
conn.close()
return rid
# --------------------------------------------------------------------------- #
# short_model_name
# --------------------------------------------------------------------------- #
def test_short_model_name():
assert U.short_model_name("tokenator/claude-opus-4-8") == "opus-4-8"
assert U.short_model_name("vibecode/claude-sonnet-4.6") == "sonnet-4.6"
assert U.short_model_name("claude-opus-4-8") == "opus-4-8"
assert U.short_model_name("opus-4-8") == "opus-4-8"
assert U.short_model_name(None) == ""
assert U.short_model_name("") == ""
def test_parse_usage_extracts_model_from_modelusage():
blob = (
'{"total_cost_usd":0.01,'
'"usage":{"input_tokens":10,"output_tokens":5},'
'"modelUsage":{"claude-opus-4-8":{"inputTokens":10,"outputTokens":5}}}'
)
u = U.parse_usage_from_text(blob)
assert u["model"] == "claude-opus-4-8"
# --------------------------------------------------------------------------- #
# render_task_tracker
# --------------------------------------------------------------------------- #
def test_render_in_progress_stage_lines_and_totals():
tid = _mk_task(stage="deploy", brd_start="2026-06-04 10:00:00",
brd_end="2026-06-04 10:08:00")
# Analysis: 10м, 1.1M in (mostly cache) / 39.6k out, $2.38, opus-4-8
_mk_run(tid, "analyst", "2026-06-04 09:00:00", "2026-06-04 09:10:00",
in_tok=1000, out_tok=39600, cache_read=1_100_000, cost=2.38,
model="tokenator/claude-opus-4-8")
_mk_run(tid, "architect", "2026-06-04 10:08:00", "2026-06-04 10:17:00",
in_tok=500, out_tok=34400, cache_read=1_500_000, cost=2.24,
model="tokenator/claude-opus-4-8")
_mk_run(tid, "developer", "2026-06-04 10:17:00", "2026-06-04 10:28:00",
in_tok=400, out_tok=45800, cache_read=8_400_000, cost=7.29,
model="tokenator/claude-opus-4-8")
_mk_run(tid, "reviewer", "2026-06-04 10:28:00", "2026-06-04 10:31:00",
in_tok=300, out_tok=12900, cache_read=1_200_000, cost=1.53,
model="vibecode/claude-sonnet-4.6")
_mk_run(tid, "tester", "2026-06-04 10:31:00", "2026-06-04 10:36:00",
in_tok=200, out_tok=19500, cache_read=1_200_000, cost=1.51,
model="vibecode/claude-sonnet-4.6")
# deployer started but not finished -> active "идёт" line.
_mk_run(tid, "deployer", "2026-06-04 10:36:00", None,
in_tok=0, out_tok=0, model=None, exit_code=None)
text = N.render_task_tracker(tid)
# Header in-progress
assert text.startswith("\U0001f6e0\ufe0f ET-012 \u00b7 \u0422\u0440\u0435\u043a\u0438")
# Per-stage format: in↓/out↑ · cost · model
assert "\u2705 Analysis" in text
assert "10\u043c" in text # analysis duration
assert "39.6k\u2191" in text # analysis out
assert "$2.38" in text
assert "opus-4-8" in text
assert "sonnet-4.6" in text # reviewer/tester model
# BRD review line (human time, ended)
assert "\u0420\u0435\u0432\u044c\u044e \u0411\u0420\u0414" in text
assert "\u0442\u0432\u043e\u0451 \u0432\u0440\u0435\u043c\u044f" in text
# Active stage
assert "\U0001f504 Deploy" in text
assert "\u0438\u0434\u0451\u0442" in text
# Totals line present with 💰
assert "\U0001f4b0" in text
# In-progress: no final ⏱️ line
assert "\u0412\u0441\u0435\u0433\u043e" not in text
def test_render_brd_review_waiting_shows_hourglass():
tid = _mk_task(stage="analysis", brd_start="2026-06-04 10:00:00",
brd_end=None)
_mk_run(tid, "analyst", "2026-06-04 09:00:00", "2026-06-04 09:10:00",
in_tok=1000, out_tok=39600, cache_read=1_100_000, cost=2.38,
model="tokenator/claude-opus-4-8")
text = N.render_task_tracker(tid)
assert "\u0420\u0435\u0432\u044c\u044e \u0411\u0420\u0414" in text
assert "\u23f3" in text # hourglass while waiting
def test_render_done_has_times_and_links():
tid = _mk_task(stage="done", brd_start="2026-06-04 10:00:00",
brd_end="2026-06-04 10:08:00")
# set created/updated to compute wall clock
conn = get_db()
conn.execute(
"UPDATE tasks SET created_at='2026-06-04 09:00:00', "
"updated_at='2026-06-04 09:56:00' WHERE id=?", (tid,))
conn.commit()
conn.close()
_mk_run(tid, "analyst", "2026-06-04 09:00:00", "2026-06-04 09:10:00",
in_tok=1000, out_tok=39600, cache_read=1_100_000, cost=2.38,
model="tokenator/claude-opus-4-8")
_mk_run(tid, "deployer", "2026-06-04 09:50:00", "2026-06-04 09:56:00",
in_tok=400, out_tok=22400, cache_read=1_600_000, cost=1.73,
model="tokenator/claude-opus-4-8")
with patch("src.notifications.httpx") as _hx:
# No PR found -> just "📦 deployed"
_resp = MagicMock(status_code=200)
_resp.json.return_value = []
_hx.get.return_value = _resp
text = N.render_task_tracker(tid)
assert text.startswith("\U0001f389 ET-012")
assert "\u0413\u041e\u0422\u041e\u0412\u041e" in text
# ⏱️ with three times
assert "\u23f1\ufe0f" in text
assert "\u0412\u0441\u0435\u0433\u043e" in text
assert "\u0430\u0433\u0435\u043d\u0442\u044b" in text
assert "\u0442\u0432\u043e\u0451" in text
# 📦 deployed line
assert "\U0001f4e6" in text
def test_render_escapes_html_in_title():
tid = _mk_task(stage="analysis", title="A <b>& B</b>")
_mk_run(tid, "analyst", "2026-06-04 09:00:00", "2026-06-04 09:10:00",
in_tok=10, out_tok=5, cost=0.0)
text = N.render_task_tracker(tid)
assert "&lt;b&gt;" in text
assert "&amp;" in text
def test_render_omits_model_when_unknown():
tid = _mk_task(stage="analysis")
_mk_run(tid, "analyst", "2026-06-04 09:00:00", "2026-06-04 09:10:00",
in_tok=10, out_tok=5, cost=0.0, model=None)
text = N.render_task_tracker(tid)
# No trailing " · <model>" — line ends at cost.
line = [l for l in text.splitlines() if l.startswith("\u2705 Analysis")][0]
assert line.rstrip().endswith("$0.00")
# --------------------------------------------------------------------------- #
# tracker send / edit / fallback
# --------------------------------------------------------------------------- #
def test_first_call_sends_message_and_stores_id(monkeypatch):
tid = _mk_task(stage="analysis")
_mk_run(tid, "analyst", "2026-06-04 09:00:00", None, in_tok=0, out_tok=0,
exit_code=None)
sent = {}
def _fake_send(text, disable_notification=False):
sent["text"] = text
sent["silent"] = disable_notification
return 555
monkeypatch.setattr(N, "send_telegram", _fake_send)
monkeypatch.setattr(N, "edit_telegram", lambda *a, **k: (_ for _ in ()).throw(AssertionError("should not edit on first call")))
N.update_task_tracker(tid)
from src.db import get_tracker_message_id
assert get_tracker_message_id(tid) == 555
assert sent["silent"] is True # tracker is silent
def test_second_call_edits_existing_message(monkeypatch):
tid = _mk_task(stage="development")
_mk_run(tid, "analyst", "2026-06-04 09:00:00", "2026-06-04 09:10:00",
in_tok=10, out_tok=5, cost=0.1)
from src.db import set_tracker_message_id
set_tracker_message_id(tid, 777)
edited = {}
monkeypatch.setattr(N, "edit_telegram",
lambda mid, text: edited.update(mid=mid) or N.EDIT_OK)
monkeypatch.setattr(N, "send_telegram",
lambda *a, **k: (_ for _ in ()).throw(AssertionError("should not send when edit succeeds")))
N.update_task_tracker(tid)
assert edited["mid"] == 777
def test_fallback_to_new_message_when_edit_gone(monkeypatch):
"""edit returns 'gone' (message deleted/too old) -> send NEW + update id."""
tid = _mk_task(stage="development")
_mk_run(tid, "analyst", "2026-06-04 09:00:00", "2026-06-04 09:10:00",
in_tok=10, out_tok=5, cost=0.1)
from src.db import set_tracker_message_id, get_tracker_message_id
set_tracker_message_id(tid, 100)
monkeypatch.setattr(N, "edit_telegram", lambda mid, text: N.EDIT_GONE)
monkeypatch.setattr(N, "send_telegram", lambda text, disable_notification=False: 200)
N.update_task_tracker(tid)
assert get_tracker_message_id(tid) == 200 # id updated to the new message
def test_not_modified_does_not_send_new_message(monkeypatch):
"""edit returns 'not_modified' -> NO new message, id unchanged (no dupe)."""
tid = _mk_task(stage="development")
_mk_run(tid, "analyst", "2026-06-04 09:00:00", "2026-06-04 09:10:00",
in_tok=10, out_tok=5, cost=0.1)
from src.db import set_tracker_message_id, get_tracker_message_id
set_tracker_message_id(tid, 100)
monkeypatch.setattr(N, "edit_telegram", lambda mid, text: N.EDIT_NOT_MODIFIED)
monkeypatch.setattr(N, "send_telegram",
lambda *a, **k: (_ for _ in ()).throw(AssertionError("must not send on not_modified")))
N.update_task_tracker(tid)
assert get_tracker_message_id(tid) == 100 # unchanged, no duplicate
def test_transient_edit_failure_does_not_send_new_message(monkeypatch):
"""edit returns 'failed' (network/timeout/5xx) -> NO new message (no dupe)."""
tid = _mk_task(stage="development")
_mk_run(tid, "analyst", "2026-06-04 09:00:00", "2026-06-04 09:10:00",
in_tok=10, out_tok=5, cost=0.1)
from src.db import set_tracker_message_id, get_tracker_message_id
set_tracker_message_id(tid, 100)
monkeypatch.setattr(N, "edit_telegram", lambda mid, text: N.EDIT_FAILED)
monkeypatch.setattr(N, "send_telegram",
lambda *a, **k: (_ for _ in ()).throw(AssertionError("must not send on transient failure")))
N.update_task_tracker(tid)
assert get_tracker_message_id(tid) == 100 # unchanged, no duplicate
# --------------------------------------------------------------------------- #
# edit_telegram outcome classification (httpx mocked)
# --------------------------------------------------------------------------- #
def _edit_resp(ok, description=None):
resp = MagicMock()
body = {"ok": ok}
if description is not None:
body["description"] = description
resp.json.return_value = body
return resp
def _patch_tg_creds(monkeypatch):
monkeypatch.setattr(N._get_settings(), "telegram_bot_token", "T", raising=False)
monkeypatch.setattr(N._get_settings(), "telegram_chat_id", "C", raising=False)
def test_edit_telegram_ok(monkeypatch):
_patch_tg_creds(monkeypatch)
with patch("src.notifications.httpx") as hx:
hx.post.return_value = _edit_resp(True)
assert N.edit_telegram(1, "x") == N.EDIT_OK
def test_edit_telegram_not_modified_is_success(monkeypatch):
# 400 "message is not modified" -> success, not gone, no duplicate
_patch_tg_creds(monkeypatch)
with patch("src.notifications.httpx") as hx:
hx.post.return_value = _edit_resp(
False, "Bad Request: message is not modified: ...")
assert N.edit_telegram(1, "x") == N.EDIT_NOT_MODIFIED
def test_edit_telegram_exactly_the_same_is_not_modified(monkeypatch):
_patch_tg_creds(monkeypatch)
with patch("src.notifications.httpx") as hx:
hx.post.return_value = _edit_resp(
False, "Bad Request: specified new message content and reply markup "
"are exactly the same")
assert N.edit_telegram(1, "x") == N.EDIT_NOT_MODIFIED
def test_edit_telegram_message_not_found_is_gone(monkeypatch):
_patch_tg_creds(monkeypatch)
with patch("src.notifications.httpx") as hx:
hx.post.return_value = _edit_resp(
False, "Bad Request: message to edit not found")
assert N.edit_telegram(1, "x") == N.EDIT_GONE
def test_edit_telegram_cant_be_edited_is_gone(monkeypatch):
_patch_tg_creds(monkeypatch)
with patch("src.notifications.httpx") as hx:
hx.post.return_value = _edit_resp(
False, "Bad Request: message can't be edited")
assert N.edit_telegram(1, "x") == N.EDIT_GONE
def test_edit_telegram_unknown_400_is_failed(monkeypatch):
# unknown 400 -> failed (NOT gone) -> caller won't duplicate
_patch_tg_creds(monkeypatch)
with patch("src.notifications.httpx") as hx:
hx.post.return_value = _edit_resp(
False, "Bad Request: some other unexpected error")
assert N.edit_telegram(1, "x") == N.EDIT_FAILED
def test_edit_telegram_timeout_is_failed(monkeypatch):
_patch_tg_creds(monkeypatch)
with patch("src.notifications.httpx") as hx:
hx.post.side_effect = Exception("read timeout")
assert N.edit_telegram(1, "x") == N.EDIT_FAILED
def test_edit_telegram_5xx_is_failed(monkeypatch):
# Telegram 5xx still returns ok:false w/o gone/not_modified markers
_patch_tg_creds(monkeypatch)
with patch("src.notifications.httpx") as hx:
hx.post.return_value = _edit_resp(False, "Internal Server Error")
assert N.edit_telegram(1, "x") == N.EDIT_FAILED
# --------------------------------------------------------------------------- #
# render: repeated stage attempt shows "попытка N"
# --------------------------------------------------------------------------- #
_POPYTKA = "\u043f\u043e\u043f\u044b\u0442\u043a\u0430" # popytka
def test_render_active_stage_shows_attempt_on_second_run():
# Two reviewer runs while in review -> active line shows attempt 2.
tid = _mk_task(stage="review")
_mk_run(tid, "analyst", "2026-06-04 09:00:00", "2026-06-04 09:10:00",
in_tok=10, out_tok=5, cost=0.1, model="tokenator/claude-opus-4-8")
_mk_run(tid, "developer", "2026-06-04 09:10:00", "2026-06-04 09:20:00",
in_tok=10, out_tok=5, cost=0.1, model="tokenator/claude-opus-4-8")
# First review run finished (sent back to dev), second review run active.
_mk_run(tid, "reviewer", "2026-06-04 09:20:00", "2026-06-04 09:25:00",
in_tok=10, out_tok=5, cost=0.1, model="vibecode/claude-sonnet-4.6",
exit_code=0)
_mk_run(tid, "reviewer", "2026-06-04 09:30:00", None,
in_tok=0, out_tok=0, exit_code=None)
text = N.render_task_tracker(tid)
active = [l for l in text.splitlines()
if l.startswith("\U0001f504") and "Review" in l][0]
assert _POPYTKA in active
assert "2" in active
assert "\u0438\u0434\u0451\u0442" in active
def test_render_active_stage_no_attempt_on_first_run():
# Single reviewer run -> active line has NO attempt marker.
tid = _mk_task(stage="review")
_mk_run(tid, "analyst", "2026-06-04 09:00:00", "2026-06-04 09:10:00",
in_tok=10, out_tok=5, cost=0.1, model="tokenator/claude-opus-4-8")
_mk_run(tid, "developer", "2026-06-04 09:10:00", "2026-06-04 09:20:00",
in_tok=10, out_tok=5, cost=0.1, model="tokenator/claude-opus-4-8")
_mk_run(tid, "reviewer", "2026-06-04 09:20:00", None,
in_tok=0, out_tok=0, exit_code=None)
text = N.render_task_tracker(tid)
active = [l for l in text.splitlines()
if l.startswith("\U0001f504") and "Review" in l][0]
assert _POPYTKA not in active
assert "\u0438\u0434\u0451\u0442" in active
def test_render_finished_lines_unaffected_by_attempt_logic():
# Completed (checkmark) lines never carry an attempt marker.
tid = _mk_task(stage="review")
_mk_run(tid, "analyst", "2026-06-04 09:00:00", "2026-06-04 09:10:00",
in_tok=10, out_tok=5, cost=0.1, model="tokenator/claude-opus-4-8")
# developer ran twice (retry) but is a FINISHED stage now.
_mk_run(tid, "developer", "2026-06-04 09:10:00", "2026-06-04 09:15:00",
in_tok=10, out_tok=5, cost=0.1, model="tokenator/claude-opus-4-8")
_mk_run(tid, "developer", "2026-06-04 09:16:00", "2026-06-04 09:20:00",
in_tok=10, out_tok=5, cost=0.1, model="tokenator/claude-opus-4-8")
text = N.render_task_tracker(tid)
for l in text.splitlines():
if l.startswith("\u2705"):
assert _POPYTKA not in l
# --------------------------------------------------------------------------- #
# which alerts are SEPARATE vs tracker-only
# --------------------------------------------------------------------------- #
def test_approve_gate_sends_separate_message_and_starts_brd_clock(monkeypatch):
tid = _mk_task(stage="analysis")
calls = []
monkeypatch.setattr(N, "send_telegram",
lambda text, disable_notification=False: calls.append((text, disable_notification)) or 1)
monkeypatch.setattr(N, "update_task_tracker", lambda task_id: None)
N.notify_approve_requested(tid)
# exactly one SEPARATE (notifying) send for the approve gate
assert len(calls) == 1
assert calls[0][1] is False # notifying
assert "Approved" in calls[0][0]
# BRD clock started
conn = get_db()
row = conn.execute("SELECT brd_review_started_at FROM tasks WHERE id=?", (tid,)).fetchone()
conn.close()
assert row[0] is not None
def test_error_sends_separate_message(monkeypatch):
tid = _mk_task(stage="development")
calls = []
monkeypatch.setattr(N, "send_telegram",
lambda text, disable_notification=False: calls.append((text, disable_notification)) or 1)
N.notify_error(tid, "boom")
assert len(calls) == 1
assert calls[0][1] is False # notifying
assert "ERROR" in calls[0][0]
def test_stage_change_does_not_send_separate_message(monkeypatch):
tid = _mk_task(stage="development")
sent = []
monkeypatch.setattr(N, "send_telegram",
lambda text, disable_notification=False: sent.append(text) or 1)
# tracker refresh is allowed (edit/send silent) but must NOT use send_telegram
# for a separate notification; stub update to isolate.
refreshed = []
monkeypatch.setattr(N, "update_task_tracker", lambda task_id: refreshed.append(task_id))
N.notify_stage_change(tid, "development", "review")
assert sent == [] # no separate message
assert refreshed == [tid] # tracker refreshed instead
def test_agent_started_does_not_send_separate_message(monkeypatch):
tid = _mk_task(stage="analysis")
sent = []
monkeypatch.setattr(N, "send_telegram",
lambda text, disable_notification=False: sent.append(text) or 1)
refreshed = []
monkeypatch.setattr(N, "update_task_tracker", lambda task_id: refreshed.append(task_id))
N.notify_agent_started(1, "analyst", tid)
assert sent == []
assert refreshed == [tid]
def test_qg_failure_does_not_send_separate_message(monkeypatch):
tid = _mk_task(stage="development")
sent = []
monkeypatch.setattr(N, "send_telegram",
lambda text, disable_notification=False: sent.append(text) or 1)
N.notify_qg_failure(tid, "development", "check_ci_green", "CI state: pending")
assert sent == [] # QG-pending is log-only, never a separate ping

309
tests/test_usage.py Normal file
View File

@@ -0,0 +1,309 @@
"""Feature 4: token / cost accounting tests.
Covers:
* parse_usage_from_text on a REAL claude --output-format json result blob
(captured live from CLI 2.1.142), including a leading text line.
* parse on garbage / missing JSON -> None (never raises).
* record_usage writes the columns; NULLs when usage is None.
* fmt_tokens / fmt_cost formatting.
* usage_comment string format.
* task_usage_summary / task_summary_comment aggregate over agent_runs.
DB is an isolated temp file; no network or subprocess.
"""
import os
import tempfile
os.environ.setdefault("ORCH_PLANE_API_TOKEN", "test-token")
os.environ.setdefault("ORCH_GITEA_TOKEN", "test-token")
_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_usage.db")
os.environ["ORCH_DB_PATH"] = _test_db
import pytest # noqa: E402
from src import db as db_module # noqa: E402
from src.db import init_db, get_db # noqa: E402
from src import usage as U # noqa: E402
# Real claude --output-format json result object (captured from CLI 2.1.142).
REAL_RESULT_JSON = (
'{"type":"result","subtype":"success","is_error":false,"duration_ms":1795,'
'"num_turns":1,"result":"Hi!","session_id":"abc",'
'"total_cost_usd":0.0560175,'
'"usage":{"input_tokens":45231,"cache_creation_input_tokens":7418,'
'"cache_read_input_tokens":18500,"output_tokens":12100,'
'"service_tier":"standard"},'
'"modelUsage":{"claude-opus-4-7":{"inputTokens":6,"outputTokens":7}},'
'"permission_denials":[]}'
)
@pytest.fixture(autouse=True)
def setup_db(monkeypatch):
# get_db() reads settings.db_path live; pin it to our isolated DB.
monkeypatch.setattr(db_module.settings, "db_path", _test_db, raising=False)
if os.path.exists(_test_db):
os.unlink(_test_db)
init_db()
yield
if os.path.exists(_test_db):
os.unlink(_test_db)
# --------------------------------------------------------------------------- #
# parsing
# --------------------------------------------------------------------------- #
def test_parse_real_result_json():
u = U.parse_usage_from_text(REAL_RESULT_JSON)
assert u is not None
assert u["input_tokens"] == 45231
assert u["output_tokens"] == 12100
assert u["cache_read_tokens"] == 18500
# FIX 2: cache_creation slice must now be parsed (was dropped before).
assert u["cache_creation_tokens"] == 7418
assert abs(u["cost_usd"] - 0.0560175) < 1e-9
def test_parse_cache_creation_present():
u = U.parse_usage_from_text(REAL_RESULT_JSON)
assert u["cache_creation_tokens"] == 7418
def test_parse_cache_creation_missing_defaults_zero():
blob = (
'{"total_cost_usd":0.01,'
'"usage":{"input_tokens":10,"output_tokens":5,'
'"cache_read_input_tokens":100}}'
)
u = U.parse_usage_from_text(blob)
assert u["cache_creation_tokens"] == 0
assert u["cache_read_tokens"] == 100
def test_parse_with_leading_text():
"""The agent may print text before the trailing JSON; we still find it."""
text = "some agent stdout line\nanother line\n" + REAL_RESULT_JSON
u = U.parse_usage_from_text(text)
assert u is not None
assert u["input_tokens"] == 45231
assert u["output_tokens"] == 12100
def test_parse_garbage_returns_none():
assert U.parse_usage_from_text("not json at all { broken") is None
assert U.parse_usage_from_text("") is None
assert U.parse_usage_from_text(None) is None
def test_parse_json_without_usage_returns_none():
assert U.parse_usage_from_text('{"hello":"world"}') is None
def test_parse_from_log_missing_file_returns_none():
assert U.parse_usage_from_log("/no/such/file.log") is None
# --------------------------------------------------------------------------- #
# record_usage
# --------------------------------------------------------------------------- #
def _new_run(agent="developer", task_id=1):
conn = get_db()
cur = conn.execute("INSERT INTO agent_runs (task_id, agent) VALUES (?, ?)", (task_id, agent))
rid = cur.lastrowid
conn.commit()
conn.close()
return rid
def test_record_usage_writes_columns():
rid = _new_run()
u = U.parse_usage_from_text(REAL_RESULT_JSON)
U.record_usage(rid, u)
conn = get_db()
row = conn.execute(
"SELECT input_tokens, output_tokens, cache_read_tokens, "
"cache_creation_tokens, cost_usd "
"FROM agent_runs WHERE id=?", (rid,)
).fetchone()
conn.close()
assert row["input_tokens"] == 45231
assert row["output_tokens"] == 12100
assert row["cache_read_tokens"] == 18500
# FIX 2: cache_creation column is now persisted.
assert row["cache_creation_tokens"] == 7418
assert abs(row["cost_usd"] - 0.0560175) < 1e-9
def test_record_usage_none_writes_nulls():
rid = _new_run()
U.record_usage(rid, None) # must not raise
conn = get_db()
row = conn.execute("SELECT input_tokens, cost_usd FROM agent_runs WHERE id=?", (rid,)).fetchone()
conn.close()
assert row["input_tokens"] is None
assert row["cost_usd"] is None
# --------------------------------------------------------------------------- #
# formatting
# --------------------------------------------------------------------------- #
def test_fmt_tokens():
assert U.fmt_tokens(6) == "6"
assert U.fmt_tokens(1234) == "1.2k"
assert U.fmt_tokens(45231) == "45.2k"
assert U.fmt_tokens(2_500_000) == "2.5M"
assert U.fmt_tokens(None) == "0"
def test_fmt_cost():
assert U.fmt_cost(0.21) == "$0.21"
assert U.fmt_cost(0.0560175) == "$0.06"
assert U.fmt_cost(None) == "$0.00"
def test_usage_comment_format():
# No cache -> in_total == input_tokens, no cached breakdown shown.
u = {"input_tokens": 45231, "output_tokens": 12100, "cost_usd": 0.21}
c = U.usage_comment("developer", u)
assert "Developer" in c
assert "45.2k in" in c
assert "cached" not in c
assert "12.1k out" in c
assert "$0.21" in c
def test_usage_comment_shows_full_input_with_cached():
"""FIX 2: in = input + cache_read + cache_creation, with cached breakdown."""
u = {
"input_tokens": 81,
"cache_read_tokens": 8_400_000,
"cache_creation_tokens": 100_000,
"output_tokens": 45_800,
"cost_usd": 7.29,
}
c = U.usage_comment("developer", u)
# total in = 8_500_081 -> 8.5M ; cached = 8_500_000 -> 8.5M
assert "8.5M in (8.5M cached)" in c
assert "45.8k out" in c
assert "$7.29" in c
def test_usage_comment_no_cached_when_zero():
u = {"input_tokens": 1234, "cache_read_tokens": 0,
"cache_creation_tokens": 0, "output_tokens": 50, "cost_usd": 0.01}
c = U.usage_comment("developer", u)
assert "1.2k in" in c
assert "cached" not in c
# --------------------------------------------------------------------------- #
# FIX 4: per-agent artifact links in finish comments
# --------------------------------------------------------------------------- #
def _ctx():
return dict(repo="enduro-trails", branch="feature/ET-012-x",
work_item_id="ET-012")
def test_usage_comment_reviewer_links_review_doc():
c = U.usage_comment("reviewer", {"input_tokens": 5}, **_ctx())
assert "12-review.md" in c
assert "ET-012" in c
def test_usage_comment_tester_links_test_report():
c = U.usage_comment("tester", {"input_tokens": 5}, **_ctx())
assert "13-test-report.md" in c
def test_usage_comment_deployer_links_deploy_log():
c = U.usage_comment("deployer", {"input_tokens": 5}, **_ctx())
assert "14-deploy-log.md" in c
def test_usage_comment_developer_links_pr_and_branch():
c = U.usage_comment("developer", {"input_tokens": 5}, pr_number=7, **_ctx())
assert "pulls/7" in c
assert "feature/ET-012-x" in c
def test_usage_comment_architect_links_adr():
c = U.usage_comment("architect", {"input_tokens": 5}, **_ctx())
assert "06-adr" in c
def test_usage_comment_no_links_without_context():
"""Without repo/branch context, no links are appended (no crash)."""
c = U.usage_comment("reviewer", {"input_tokens": 5})
assert "12-review.md" not in c
assert "http" not in c
# --------------------------------------------------------------------------- #
# task summary
# --------------------------------------------------------------------------- #
def test_task_summary_aggregates_over_agents():
# two runs for the same task: developer + tester
for agent, ti, to, cost in [("developer", 1000, 200, 0.10), ("tester", 500, 100, 0.05)]:
rid = _new_run(agent=agent, task_id=42)
U.record_usage(rid, {"input_tokens": ti, "output_tokens": to,
"cache_read_tokens": 0, "cost_usd": cost})
s = U.task_usage_summary(42)
assert s["total_in"] == 1500
assert s["total_out"] == 300
assert abs(s["total_cost"] - 0.15) < 1e-9
agents = {a for a, *_ in s["per_agent"]}
assert agents == {"developer", "tester"}
comment = U.task_summary_comment(42)
assert "1.5k" in comment # total in
assert "$0.15" in comment # total cost
assert "Developer" in comment
assert "Tester" in comment
def test_task_summary_sums_all_three_input_components():
"""FIX 2: total_in = SUM(input + cache_read + cache_creation); total_cached too."""
rid = _new_run(agent="developer", task_id=77)
U.record_usage(rid, {
"input_tokens": 100,
"cache_read_tokens": 2000,
"cache_creation_tokens": 900,
"output_tokens": 50,
"cost_usd": 0.10,
})
rid2 = _new_run(agent="tester", task_id=77)
U.record_usage(rid2, {
"input_tokens": 10,
"cache_read_tokens": 500,
"cache_creation_tokens": 0,
"output_tokens": 5,
"cost_usd": 0.05,
})
s = U.task_usage_summary(77)
# total_in = (100+2000+900) + (10+500+0) = 3510
assert s["total_in"] == 3510
# total_cached = (2000+900) + (500+0) = 3400
assert s["total_cached"] == 3400
assert s["total_out"] == 55
comment = U.task_summary_comment(77)
assert "cached" in comment
def test_task_summary_handles_null_cache_creation():
"""Pre-existing rows (NULL cache_creation) must not break aggregation."""
rid = _new_run(agent="developer", task_id=88)
conn = get_db()
conn.execute(
"UPDATE agent_runs SET input_tokens=100, cache_read_tokens=200, "
"cache_creation_tokens=NULL, output_tokens=10, cost_usd=0.01 WHERE id=?",
(rid,),
)
conn.commit()
conn.close()
s = U.task_usage_summary(88) # must not raise
assert s["total_in"] == 300 # 100 + 200 + (NULL->0)
assert s["total_cached"] == 200

View File

@@ -0,0 +1,171 @@
"""Status-only verdict model: verdict statuses Approved / Rejected.
* issue updated -> Approved : calls _try_advance_stage, with NO intermediate
set_issue_in_progress reset (bug 3 fix).
* issue updated -> Rejected : calls _rollback_stage, with the reason pulled
from the issue's latest comment.
* COMMENTS NEVER trigger the pipeline: a :approved: / :rejected: comment is a
pure no-op (the comment-based control mechanism was removed).
We mock the shared engine entry points (_try_advance_stage / _rollback_stage)
and assert they fire ONLY for the status trigger, never for a comment.
"""
import os
import tempfile
_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_verdict.db")
os.environ["ORCH_DB_PATH"] = _test_db
os.environ.setdefault("ORCH_PLANE_WEBHOOK_SECRET", "")
os.environ.setdefault("ORCH_GITEA_TOKEN", "test-token")
os.environ.setdefault("ORCH_PLANE_API_TOKEN", "test-token")
import pytest # noqa: E402
from unittest.mock import patch, AsyncMock # noqa: E402
from fastapi.testclient import TestClient # noqa: E402
from src.main import app # noqa: E402
from src.db import init_db, get_db # noqa: E402
from src import projects as P # noqa: E402
from src.projects import reload_projects # noqa: E402
ENDURO_PLANE_ID = "7a79f0a9-5278-49cd-9007-9a338f238f9c"
APPROVED = "a519a341-dada-4a91-8910-7604f82b79c5"
REJECTED = "ba958f3c-5db5-461d-8f82-89425e413b97"
client = TestClient(app)
@pytest.fixture(autouse=True)
def setup(monkeypatch):
monkeypatch.setattr(P.settings, "db_path", _test_db)
import src.db as _db
monkeypatch.setattr(_db.settings, "db_path", _test_db)
if os.path.exists(_test_db):
os.unlink(_test_db)
init_db()
monkeypatch.setattr("src.webhooks.plane.verify_plane_signature", lambda body, sig: True)
registry_json = (
f'[{{"plane_project_id": "{ENDURO_PLANE_ID}", "repo": "enduro-trails",'
f' "work_item_prefix": "ET", "name": "enduro-trails"}}]'
)
monkeypatch.setattr(P.settings, "projects_json", registry_json)
reload_projects()
# Seed a task at the 'review' stage for plane_id 'v-1'.
conn = get_db()
conn.execute(
"INSERT INTO tasks (plane_id, work_item_id, repo, branch, stage, plane_issue_id) "
"VALUES (?, ?, ?, ?, ?, ?)",
("v-1", "ET-500", "enduro-trails", "feature/ET-500-x", "review", "v-1"),
)
conn.commit()
conn.close()
yield
reload_projects()
if os.path.exists(_test_db):
os.unlink(_test_db)
def _status(state_id, plane_id="v-1", old="prev"):
return client.post("/webhook/plane", json={
"event": "issue", "action": "updated",
"data": {
"id": plane_id, "name": "Verdict task", "project": ENDURO_PLANE_ID,
"state": {"id": state_id, "name": "X", "group": "started"},
},
"activity": {"field": "state", "new_value": state_id, "old_value": old},
})
def _comment(text, plane_id="v-1"):
return client.post("/webhook/plane", json={
"event": "issue_comment", "action": "created",
"data": {"work_item_id": plane_id, "comment_stripped": text,
"project": ENDURO_PLANE_ID},
})
class _FakeResp:
def __init__(self, status_code, payload):
self.status_code = status_code
self._payload = payload
def json(self):
return self._payload
def _comments_response(comments):
return _FakeResp(200, {"results": comments})
# --------------------------------------------------------------------------- #
# Approved status -> advance (no in_progress reset)
# --------------------------------------------------------------------------- #
@patch("src.plane_sync.set_issue_in_progress")
@patch("src.webhooks.plane._try_advance_stage", new_callable=AsyncMock)
def test_approved_status_advances(mock_advance, mock_sip):
resp = _status(APPROVED)
assert resp.status_code == 200
mock_advance.assert_awaited_once()
# advanced the right task (ET-500 at review)
args = mock_advance.call_args.args
assert "ET-500" in args # work_item_id is passed positionally
# bug 3 fix: handle_verdict no longer resets the status to In Progress.
mock_sip.assert_not_called()
@patch("src.plane_sync.set_issue_in_progress")
@patch("src.webhooks.plane._rollback_stage", new_callable=AsyncMock)
@patch("src.webhooks.plane._try_advance_stage", new_callable=AsyncMock)
def test_approved_comment_is_noop(mock_advance, mock_rollback, mock_sip):
"""Status-only model: a :approved: comment NEVER advances the pipeline."""
resp = _comment(":approved:")
assert resp.status_code == 200
mock_advance.assert_not_called()
mock_rollback.assert_not_called()
mock_sip.assert_not_called()
# --------------------------------------------------------------------------- #
# Rejected status -> rollback (reason from latest comment)
# --------------------------------------------------------------------------- #
@patch("src.webhooks.plane.httpx.get")
@patch("src.webhooks.plane._rollback_stage", new_callable=AsyncMock)
def test_rejected_status_rolls_back(mock_rollback, mock_get):
mock_get.return_value = _comments_response(
[{"comment_stripped": "ADR missing tradeoffs",
"created_at": "2026-06-03T10:00:00Z"}]
)
resp = _status(REJECTED)
assert resp.status_code == 200
mock_rollback.assert_awaited_once()
# reason pulled from the latest comment
reason = mock_rollback.call_args.args[-1]
assert "ADR missing tradeoffs" in reason
@patch("src.webhooks.plane.httpx.get")
@patch("src.plane_sync.set_issue_in_progress")
@patch("src.webhooks.plane._rollback_stage", new_callable=AsyncMock)
@patch("src.webhooks.plane._try_advance_stage", new_callable=AsyncMock)
def test_rejected_comment_is_noop(mock_advance, mock_rollback, mock_sip, mock_get):
"""Status-only model: a :rejected: comment NEVER rolls back the pipeline."""
resp = _comment(":rejected: bad ADR")
assert resp.status_code == 200
mock_advance.assert_not_called()
mock_rollback.assert_not_called()
mock_sip.assert_not_called()
mock_get.assert_not_called()
# --------------------------------------------------------------------------- #
# Unknown verdict status -> no-op
# --------------------------------------------------------------------------- #
@patch("src.webhooks.plane._rollback_stage", new_callable=AsyncMock)
@patch("src.webhooks.plane._try_advance_stage", new_callable=AsyncMock)
def test_other_status_no_verdict_action(mock_advance, mock_rollback):
# In Review status is not a verdict -> neither advance nor rollback.
resp = _status("38fb1f64-aa1e-48a3-92e0-0b109679046b") # in_review
assert resp.status_code == 200
mock_advance.assert_not_called()
mock_rollback.assert_not_called()

284
tests/test_webhook_dedup.py Normal file
View File

@@ -0,0 +1,284 @@
"""ORCH-5 (M-7): webhook delivery de-duplication tests.
A retried/replayed webhook delivery must be processed exactly once. We mock
enqueue_job (imported into the gitea/plane module namespaces) and assert its
call_count does not grow on a repeat. HMAC is bypassed here by forcing the
webhook secrets empty (the 9 pre-existing 401 webhook tests are a separate
baseline and are NOT touched). A dedicated test keeps the 401-on-bad-signature
guarantee by re-enabling the secret.
"""
import os
import tempfile
from unittest.mock import patch, AsyncMock
import pytest
# Override DB path + project registry BEFORE importing app (same pattern as
# tests/test_webhooks.py).
_test_db = os.path.join(tempfile.gettempdir(), "test_orchestrator_dedup.db")
os.environ["ORCH_DB_PATH"] = _test_db
os.environ["ORCH_REPOS_DIR"] = tempfile.gettempdir()
os.environ["ORCH_GITEA_TOKEN"] = "test-token"
os.environ["ORCH_PLANE_API_TOKEN"] = "test-token"
os.environ["ORCH_GITEA_OWNER"] = "admin"
os.environ["ORCH_DEFAULT_REPO"] = "enduro-trails"
os.environ["ORCH_PROJECTS_JSON"] = (
'[{"plane_project_id": "proj-1", "repo": "enduro-trails", '
'"work_item_prefix": "ET", "name": "enduro-trails"}]'
)
from fastapi.testclient import TestClient # noqa: E402
from src.main import app # noqa: E402
from src.db import init_db, get_db # noqa: E402
from src import db as db_module # noqa: E402
from src.webhooks import gitea as gitea_mod # noqa: E402
from src.webhooks import plane as plane_mod # noqa: E402
from src import projects as projects_mod # noqa: E402
@pytest.fixture(autouse=True)
def setup_db(monkeypatch):
# settings is a process-wide singleton; another test module may have fixed
# settings.db_path to its own file at import time. get_db() reads it live, so
# pin it to OUR db for the duration of each test here.
monkeypatch.setattr(db_module.settings, "db_path", _test_db, raising=False)
if os.path.exists(_test_db):
os.unlink(_test_db)
init_db()
yield
if os.path.exists(_test_db):
os.unlink(_test_db)
@pytest.fixture(autouse=True)
def proj_registry():
"""Pin the shared project registry to proj-1/enduro-trails.
The registry (projects.PROJECTS / _BY_PLANE_ID) is a process-wide singleton
built at import; test_projects.py rebuilds it via reload_projects(), which can
leave it on the built-in default where proj-1 is unknown -> ORCH-6 would
ignore our fixtures. Force ours for each test, then rebuild after.
"""
os.environ["ORCH_PROJECTS_JSON"] = (
'[{"plane_project_id": "proj-1", "repo": "enduro-trails", '
'"work_item_prefix": "ET", "name": "enduro-trails"}]'
)
projects_mod.settings.projects_json = os.environ["ORCH_PROJECTS_JSON"]
projects_mod.reload_projects()
yield
projects_mod.reload_projects()
@pytest.fixture(autouse=True)
def no_hmac(monkeypatch):
"""Bypass HMAC so dedup behavior (not signing) is under test.
settings is shared, so override the secret on the module-level settings that
each verify_* function reads.
"""
monkeypatch.setattr(gitea_mod.settings, "gitea_webhook_secret", "", raising=False)
monkeypatch.setattr(plane_mod.settings, "plane_webhook_secret", "", raising=False)
yield
client = TestClient(app)
def _events_count():
conn = get_db()
n = conn.execute("SELECT COUNT(*) FROM events").fetchone()[0]
conn.close()
return n
# ---------------------------------------------------------------------------
# Migration
# ---------------------------------------------------------------------------
def test_migration_adds_delivery_id_and_index():
"""events has delivery_id + a partial unique index idx_events_delivery."""
conn = get_db()
cols = [r[1] for r in conn.execute("PRAGMA table_info(events)").fetchall()]
idxs = [r[1] for r in conn.execute("PRAGMA index_list(events)").fetchall()]
conn.close()
assert "delivery_id" in cols
assert "idx_events_delivery" in idxs
def test_migration_on_old_db_without_column_does_not_crash():
"""init_db() over a pre-existing events table WITHOUT delivery_id is safe."""
if os.path.exists(_test_db):
os.unlink(_test_db)
import sqlite3
conn = sqlite3.connect(_test_db)
# Old-shape events table (no delivery_id) + a legacy row with NULL delivery_id.
conn.executescript(
"""
CREATE TABLE events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT DEFAULT (datetime('now')),
source TEXT NOT NULL,
event_type TEXT NOT NULL,
payload TEXT NOT NULL,
processed INTEGER DEFAULT 0
);
INSERT INTO events (source, event_type, payload) VALUES ('plane','old','{}');
INSERT INTO events (source, event_type, payload) VALUES ('gitea','old2','{}');
"""
)
conn.commit()
conn.close()
# Should add the column + index without raising and keep the legacy rows.
init_db()
conn = get_db()
cols = [r[1] for r in conn.execute("PRAGMA table_info(events)").fetchall()]
n = conn.execute("SELECT COUNT(*) FROM events").fetchone()[0]
conn.close()
assert "delivery_id" in cols
assert n == 2 # legacy NULL-delivery rows preserved, partial index lets them coexist
# ---------------------------------------------------------------------------
# Gitea dedup
# ---------------------------------------------------------------------------
@patch.object(gitea_mod, "enqueue_job")
def test_gitea_duplicate_delivery_id_skips_dispatch(mock_enqueue):
"""Repeated X-Gitea-Delivery -> first processed, second {"status":"duplicate"}."""
# Task at architecture so the ADR push would enqueue.
conn = get_db()
conn.execute(
"INSERT INTO tasks (plane_id, work_item_id, repo, branch, stage) "
"VALUES (?, ?, ?, ?, ?)",
("gd-001", "ET-100", "enduro-trails", "feature/ET-100-x", "architecture"),
)
conn.commit()
conn.close()
body = {
"ref": "refs/heads/feature/ET-100-x",
"repository": {"name": "enduro-trails"},
"commits": [
{"added": ["docs/work-items/ET-100/06-adr/001-d.md"], "modified": []}
],
}
hdrs = {"X-Gitea-Event": "push", "X-Gitea-Delivery": "guid-AAA"}
r1 = client.post("/webhook/gitea", json=body, headers=hdrs)
assert r1.status_code == 200
assert r1.json()["status"] == "accepted"
assert mock_enqueue.call_count == 1
assert _events_count() == 1
# Same delivery id again -> duplicate, no new enqueue, no new event row.
r2 = client.post("/webhook/gitea", json=body, headers=hdrs)
assert r2.status_code == 200
assert r2.json()["status"] == "duplicate"
assert mock_enqueue.call_count == 1
assert _events_count() == 1
@patch.object(gitea_mod, "enqueue_job")
def test_gitea_two_distinct_delivery_ids_both_processed(mock_enqueue):
body = {"ref": "refs/heads/feature/none", "repository": {"name": "enduro-trails"}, "commits": []}
r1 = client.post("/webhook/gitea", json=body,
headers={"X-Gitea-Event": "push", "X-Gitea-Delivery": "guid-1"})
r2 = client.post("/webhook/gitea", json=body,
headers={"X-Gitea-Event": "push", "X-Gitea-Delivery": "guid-2"})
assert r1.json()["status"] == "accepted"
assert r2.json()["status"] == "accepted"
assert _events_count() == 2
def test_gitea_fallback_hash_when_no_delivery_header():
"""No X-Gitea-Delivery -> sha256 fallback; identical body repeat = duplicate."""
body = {"ref": "refs/heads/feature/none", "repository": {"name": "enduro-trails"}, "commits": []}
r1 = client.post("/webhook/gitea", json=body, headers={"X-Gitea-Event": "push"})
r2 = client.post("/webhook/gitea", json=body, headers={"X-Gitea-Event": "push"})
assert r1.json()["status"] == "accepted"
assert r2.json()["status"] == "duplicate"
assert _events_count() == 1
# ---------------------------------------------------------------------------
# Plane dedup
# ---------------------------------------------------------------------------
@patch.object(plane_mod, "enqueue_job")
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
def test_plane_fallback_hash_dedup(mock_docs, mock_branch, mock_enqueue):
"""Repeated identical Plane body -> first accepted+enqueue, repeat duplicate.
Feature 1: the pipeline now starts on a status change to In Progress, not on
creation, so this drives the dedup test with an 'issue updated' event.
"""
IN_PROGRESS = "b873d9eb-993c-48cd-97ac-99a9b1623967"
body = {
"event": "issue",
"action": "updated",
"data": {
"id": "pd-001",
"name": "Dedup plane task",
"description_stripped": "A sufficiently long description for QG-0 to pass.",
"project": "proj-1",
"state": {"id": IN_PROGRESS, "name": "In Progress", "group": "started"},
},
}
r1 = client.post("/webhook/plane", json=body)
assert r1.status_code == 200
assert r1.json()["status"] == "accepted"
assert mock_enqueue.call_count == 1
assert _events_count() == 1
r2 = client.post("/webhook/plane", json=body)
assert r2.status_code == 200
assert r2.json()["status"] == "duplicate"
assert mock_enqueue.call_count == 1 # not re-enqueued
assert _events_count() == 1
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
def test_plane_unknown_project_first_delivery_still_ignored(mock_docs, mock_branch):
"""ORCH-6 intact: first delivery of an unknown project -> {"status":"ignored"}."""
body = {
"event": "work_item.created",
"data": {"id": "unk-001", "name": "Unknown project task", "project": "proj-UNKNOWN"},
}
r1 = client.post("/webhook/plane", json=body)
assert r1.status_code == 200
assert r1.json()["status"] == "ignored"
# Event WAS logged (dedup happens before the project filter), so a retry of the
# SAME body is a duplicate, not re-evaluated.
assert _events_count() == 1
r2 = client.post("/webhook/plane", json=body)
assert r2.json()["status"] == "duplicate"
assert _events_count() == 1
# ---------------------------------------------------------------------------
# HMAC still guarded (acceptance #4) — independent of the dedup path
# ---------------------------------------------------------------------------
def test_gitea_invalid_signature_still_401(monkeypatch):
monkeypatch.setattr(gitea_mod.settings, "gitea_webhook_secret", "s3cr3t", raising=False)
r = client.post(
"/webhook/gitea",
json={"ref": "refs/heads/feature/x", "repository": {"name": "enduro-trails"}, "commits": []},
headers={"X-Gitea-Event": "push", "X-Gitea-Signature": "deadbeef"},
)
assert r.status_code == 401
def test_plane_invalid_signature_still_401(monkeypatch):
monkeypatch.setattr(plane_mod.settings, "plane_webhook_secret", "s3cr3t", raising=False)
r = client.post(
"/webhook/plane",
json={"event": "work_item.created", "data": {"id": "z", "project": "proj-1"}},
headers={"X-Plane-Signature": "deadbeef"},
)
assert r.status_code == 401

View File

@@ -1,4 +1,5 @@
import pytest
import asyncio
import os
import tempfile
from unittest.mock import patch, MagicMock, AsyncMock
@@ -14,6 +15,12 @@ os.environ["ORCH_GITEA_TOKEN"] = "test-token"
os.environ["ORCH_PLANE_API_TOKEN"] = "test-token"
os.environ["ORCH_GITEA_OWNER"] = "admin"
os.environ["ORCH_DEFAULT_REPO"] = "enduro-trails"
# ORCH-6: register the test project so the project filter lets these fixtures
# through. proj-1 maps to enduro-trails/ET, preserving the ET-001/ET-002 asserts.
os.environ["ORCH_PROJECTS_JSON"] = (
'[{"plane_project_id": "proj-1", "repo": "enduro-trails", '
'"work_item_prefix": "ET", "name": "enduro-trails"}]'
)
from fastapi.testclient import TestClient
from src.main import app
@@ -47,13 +54,19 @@ def test_status_endpoint():
assert "active_tasks" in resp.json()
@patch("src.plane_sync.add_comment")
@patch("src.plane_sync.fetch_issue_sequence_id", return_value=None)
@patch("src.plane_sync.fetch_issue_fields", return_value=("Test task", "This is a detailed test description for the task"))
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
def test_plane_webhook_creates_task(mock_docs, mock_branch):
"""work_item.created → task in DB with stage=analysis."""
def test_plane_webhook_creates_task(mock_docs, mock_branch, mock_fetch_fields, mock_fetch_seq, mock_add_comment):
"""work_item.created (via In Progress status) → task in DB with stage=analysis."""
resp = client.post("/webhook/plane", json={
"event": "work_item.created",
"data": {"id": "test-123", "name": "Test task", "project": "proj-1"}
"event": "issue", "action": "updated",
"data": {
"id": "test-123", "name": "Test task", "project": "proj-1",
"state": {"id": "b873d9eb-993c-48cd-97ac-99a9b1623967", "name": "In Progress", "group": "started"},
}
})
assert resp.status_code == 200
assert resp.json()["status"] == "accepted"
@@ -68,17 +81,37 @@ def test_plane_webhook_creates_task(mock_docs, mock_branch):
assert "feature/" in task["branch"]
@patch("src.plane_sync.add_comment")
@patch("src.plane_sync.fetch_issue_sequence_id", return_value=None)
@patch("src.plane_sync.fetch_issue_fields",
side_effect=[
("First task", "This is a detailed description for the first task item"),
("Second task", "This is a detailed description for the second task item"),
])
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
def test_plane_webhook_generates_sequential_ids(mock_docs, mock_branch):
"""Multiple work items get sequential IDs."""
def test_plane_webhook_generates_sequential_ids(
mock_docs, mock_branch, mock_fetch_fields, mock_fetch_seq, mock_add_comment
):
"""Multiple In Progress transitions get sequential IDs (ET-001, ET-002)."""
in_progress_state = {
"id": "b873d9eb-993c-48cd-97ac-99a9b1623967",
"name": "In Progress",
"group": "started",
}
client.post("/webhook/plane", json={
"event": "work_item.created",
"data": {"id": "item-1", "name": "First task", "project": "proj-1"}
"event": "issue", "action": "updated",
"data": {
"id": "item-1", "name": "First task", "project": "proj-1",
"state": in_progress_state,
}
})
client.post("/webhook/plane", json={
"event": "work_item.created",
"data": {"id": "item-2", "name": "Second task", "project": "proj-1"}
"event": "issue", "action": "updated",
"data": {
"id": "item-2", "name": "Second task", "project": "proj-1",
"state": in_progress_state,
}
})
conn = get_db()
@@ -89,27 +122,32 @@ def test_plane_webhook_generates_sequential_ids(mock_docs, mock_branch):
assert ids[1] == "ET-002"
APPROVED_STATE = "a519a341-dada-4a91-8910-7604f82b79c5"
REJECTED_STATE = "ba958f3c-5db5-461d-8f82-89425e413b97"
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
@patch("src.webhooks.plane.launcher")
def test_plane_approved_advances_stage(mock_launcher, mock_docs, mock_branch, tmp_path, monkeypatch):
"""Comment :approved: at stage=analysis advance to architecture."""
"""Status-only model: Approved STATUS at stage=analysis -> advance to
architecture. A comment never triggers this.
"""
# Patch repos_dir for QG check
monkeypatch.setattr("src.qg.checks.settings.repos_dir", str(tmp_path))
# Create task first
client.post("/webhook/plane", json={
"event": "work_item.created",
"data": {"id": "adv-001", "name": "Advance test", "project": "proj-1"}
})
# Get the task to find work_item_id
# Seed an analysis task directly (creation no longer makes a task post-PR#11).
conn = get_db()
task = conn.execute("SELECT * FROM tasks WHERE plane_id = 'adv-001'").fetchone()
conn.execute(
"INSERT INTO tasks (plane_id, work_item_id, repo, branch, stage, plane_issue_id) "
"VALUES (?, ?, ?, ?, ?, ?)",
("adv-001", "ET-001", "enduro-trails", "feature/ET-001-x", "analysis", "adv-001"),
)
conn.commit()
conn.close()
work_item_id = task["work_item_id"]
work_item_id = "ET-001"
# Create required analysis files
# Create required analysis files so the analysis QG passes.
wi_dir = tmp_path / "enduro-trails" / "docs" / "work-items" / work_item_id
wi_dir.mkdir(parents=True)
(wi_dir / "01-brd.md").write_text("# BRD")
@@ -117,16 +155,15 @@ def test_plane_approved_advances_stage(mock_launcher, mock_docs, mock_branch, tm
(wi_dir / "03-acceptance-criteria.md").write_text("# AC")
(wi_dir / "04-test-plan.yaml").write_text("tests: []")
# Mock launcher
mock_launcher.launch.return_value = 1
# Send approved comment
# Send Approved STATUS change.
resp = client.post("/webhook/plane", json={
"event": "comment.created",
"event": "issue", "action": "updated",
"data": {
"work_item_id": "adv-001",
"comment": "Looks good :approved:"
}
"id": "adv-001", "name": "Advance test", "project": "proj-1",
"state": {"id": APPROVED_STATE, "name": "Approved", "group": "completed"},
},
})
assert resp.status_code == 200
@@ -137,29 +174,39 @@ def test_plane_approved_advances_stage(mock_launcher, mock_docs, mock_branch, tm
assert task["stage"] == "architecture"
@patch("src.webhooks.plane.httpx.get")
@patch("src.webhooks.plane._create_gitea_branch", new_callable=AsyncMock)
@patch("src.webhooks.plane._create_initial_docs", new_callable=AsyncMock)
def test_plane_rejected_rolls_back(mock_docs, mock_branch):
"""Comment :rejected: rolls back stage."""
# Create task
client.post("/webhook/plane", json={
"event": "work_item.created",
"data": {"id": "rej-001", "name": "Reject test", "project": "proj-1"}
})
def test_plane_rejected_rolls_back(mock_docs, mock_branch, mock_get):
"""Status-only model: Rejected STATUS rolls back stage. A comment never
triggers this; the reason is pulled from the latest comment.
"""
class _R:
status_code = 200
@staticmethod
def json():
return {"results": [
{"comment_stripped": "missing ADR", "created_at": "2026-06-03T10:00:00Z"}
]}
mock_get.return_value = _R()
# Manually set stage to architecture
# Seed an architecture task directly.
conn = get_db()
conn.execute("UPDATE tasks SET stage = 'architecture' WHERE plane_id = 'rej-001'")
conn.execute(
"INSERT INTO tasks (plane_id, work_item_id, repo, branch, stage, plane_issue_id) "
"VALUES (?, ?, ?, ?, ?, ?)",
("rej-001", "ET-002", "enduro-trails", "feature/ET-002-x", "architecture", "rej-001"),
)
conn.commit()
conn.close()
# Send rejected comment
# Send Rejected STATUS change.
resp = client.post("/webhook/plane", json={
"event": "comment.created",
"event": "issue", "action": "updated",
"data": {
"work_item_id": "rej-001",
"comment": "Not ready :rejected:"
}
"id": "rej-001", "name": "Reject test", "project": "proj-1",
"state": {"id": REJECTED_STATE, "name": "Rejected", "group": "cancelled"},
},
})
assert resp.status_code == 200
@@ -181,8 +228,9 @@ def test_gitea_webhook_push():
assert resp.json()["status"] == "accepted"
@patch("src.webhooks.gitea.plane_notify_stage")
@patch("src.webhooks.gitea.launcher")
def test_gitea_push_with_adr_advances_stage(mock_launcher):
def test_gitea_push_with_adr_advances_stage(mock_launcher, mock_plane_notify):
"""Push with ADR files at architecture stage → advance to development."""
mock_launcher.launch.return_value = 1
@@ -214,7 +262,7 @@ def test_gitea_push_with_adr_advances_stage(mock_launcher):
task = conn.execute("SELECT * FROM tasks WHERE plane_id = 'push-001'").fetchone()
conn.close()
assert task["stage"] == "development"
mock_launcher.launch.assert_called_once()
mock_plane_notify.assert_called_once()
@patch("src.webhooks.gitea.check_ci_green")
@@ -252,6 +300,46 @@ def test_gitea_ci_success_advances_to_review(mock_launcher, mock_ci):
assert task["stage"] == "review"
@patch("src.webhooks.gitea.notify_qg_failure")
@patch("src.webhooks.gitea.launcher")
def test_gitea_ci_failure_on_development_notifies_qg_failure(mock_launcher, mock_notify):
"""BUG 6: CI failure at development is now the authoritative QG gate failing.
It must notify QG failure (not silently suppress) and must NOT advance the stage.
"""
conn = get_db()
conn.execute(
"INSERT INTO tasks (plane_id, work_item_id, repo, branch, stage) VALUES (?, ?, ?, ?, ?)",
("ci-fail-001", "ET-011", "enduro-trails", "feature/ET-011-test", "development"),
)
conn.commit()
conn.close()
resp = client.post(
"/webhook/gitea",
json={
"state": "failure",
"branches": [{"name": "feature/ET-011-test"}],
"repository": {"name": "enduro-trails"},
},
headers={"X-Gitea-Event": "status"},
)
assert resp.status_code == 200
# QG failure was reported for the development stage with check_ci_green.
assert mock_notify.called
args, kwargs = mock_notify.call_args
call = list(args) + list(kwargs.values())
assert "development" in call
assert "check_ci_green" in call
# Stage did NOT advance.
conn = get_db()
task = conn.execute("SELECT * FROM tasks WHERE plane_id = 'ci-fail-001'").fetchone()
conn.close()
assert task["stage"] == "development"
def test_gitea_webhook_pr():
"""PR event is accepted."""
resp = client.post(
@@ -281,3 +369,158 @@ def test_plane_webhook_event_logged():
conn.close()
assert event is not None
assert event["source"] == "plane"
# ---------------------------------------------------------------------------
# BUG 7: red CI on development must bounce the task back to the developer
# (capped retries, symmetric to review REQUEST_CHANGES). These are pure-logic
# tests: they invoke handle_ci_status() directly with mocked helpers so they do
# not pass through the TestClient HMAC barrier (baseline 401s are off-limits).
# ---------------------------------------------------------------------------
def _ci_failure_payload():
return {
"state": "failure",
"branches": [{"name": "feature/ET-011-test"}],
"repository": {"name": "enduro-trails"},
}
def _mock_db_with_retry_count(count):
"""Build a get_db() mock whose retry_count query returns `count`."""
conn = MagicMock()
conn.execute.return_value.fetchone.return_value = {"cnt": count}
return conn
@patch("src.webhooks.gitea.notify_error")
@patch("src.webhooks.gitea.notify_qg_failure")
@patch("src.webhooks.gitea.enqueue_job")
@patch("src.webhooks.gitea.update_task_stage")
@patch("src.webhooks.gitea.get_db")
@patch("src.webhooks.gitea.get_task_by_repo_branch")
@patch("src.webhooks.gitea.get_project_by_repo")
def test_ci_failure_development_retries_developer_under_limit(
mock_proj, mock_task, mock_get_db, mock_update_stage,
mock_enqueue, mock_qg, mock_err,
):
"""retry_count < MAX_DEV_RETRIES → relaunch developer, stage untouched."""
from src.webhooks.gitea import handle_ci_status
mock_proj.return_value = {"repo": "enduro-trails"}
mock_task.return_value = {
"id": 1, "stage": "development", "work_item_id": "ET-011",
}
mock_get_db.return_value = _mock_db_with_retry_count(0)
mock_enqueue.return_value = 42
asyncio.run(handle_ci_status(_ci_failure_payload()))
# QG failure was still reported (Slava sees both the failure and the retry).
assert mock_qg.called
# developer was re-enqueued.
assert mock_enqueue.called
assert mock_enqueue.call_args[0][0] == "developer"
# No escalation.
assert not mock_err.called
# Stage stays on development — no update_task_stage in the CI-failure path.
assert not mock_update_stage.called
@patch("src.webhooks.gitea.notify_error")
@patch("src.webhooks.gitea.notify_qg_failure")
@patch("src.webhooks.gitea.enqueue_job")
@patch("src.webhooks.gitea.update_task_stage")
@patch("src.webhooks.gitea.get_db")
@patch("src.webhooks.gitea.get_task_by_repo_branch")
@patch("src.webhooks.gitea.get_project_by_repo")
def test_ci_failure_development_escalates_at_limit(
mock_proj, mock_task, mock_get_db, mock_update_stage,
mock_enqueue, mock_qg, mock_err,
):
"""retry_count >= MAX_DEV_RETRIES → escalate via notify_error, no relaunch."""
from src.webhooks.gitea import handle_ci_status, MAX_DEV_RETRIES
mock_proj.return_value = {"repo": "enduro-trails"}
mock_task.return_value = {
"id": 1, "stage": "development", "work_item_id": "ET-011",
}
mock_get_db.return_value = _mock_db_with_retry_count(MAX_DEV_RETRIES)
asyncio.run(handle_ci_status(_ci_failure_payload()))
# QG failure still reported.
assert mock_qg.called
# developer NOT re-enqueued at the cap.
assert not mock_enqueue.called
# Escalation message mentions CI failure.
assert mock_err.called
err_msg = " ".join(str(a) for a in mock_err.call_args[0])
assert "Max developer retries" in err_msg
assert "after CI failure" in err_msg
# Stage untouched.
assert not mock_update_stage.called
# ---------------------------------------------------------------------------
# BUG 8 (second door): a merged-PR webhook must NOT fake-complete a task that is
# still in the deploy stage. On `deploy` done is gated by the deployer's verdict
# (check_deploy_status via advance_stage), not by the merge event. For every
# other stage the merge->done behaviour is preserved. Pure-logic tests: invoke
# handle_pr() directly with mocked helpers (no HMAC barrier).
# ---------------------------------------------------------------------------
def _merged_pr_payload(branch="feature/ET-012-x"):
return {
"action": "closed",
"pull_request": {
"merged": True,
"number": 7,
"head": {"ref": branch},
},
"repository": {"name": "enduro-trails"},
}
@patch("src.webhooks.gitea.notify_stage_change")
@patch("src.webhooks.gitea.update_task_stage")
@patch("src.webhooks.gitea.get_task_by_repo_branch")
@patch("src.webhooks.gitea.get_project_by_repo")
def test_merge_on_deploy_stage_does_not_set_done(
mock_proj, mock_task, mock_update_stage, mock_notify,
):
"""FIX 1: merge at deploy stage is ignored — done is gated by deployer verdict."""
from src.webhooks.gitea import handle_pr
mock_proj.return_value = {"repo": "enduro-trails"}
mock_task.return_value = {
"id": 1, "stage": "deploy", "work_item_id": "ET-012",
}
asyncio.run(handle_pr(_merged_pr_payload()))
# The merge-driven done path must NOT run on deploy.
assert not mock_update_stage.called
assert not mock_notify.called
@patch("src.webhooks.gitea.notify_stage_change")
@patch("src.webhooks.gitea.update_task_stage")
@patch("src.webhooks.gitea.get_task_by_repo_branch")
@patch("src.webhooks.gitea.get_project_by_repo")
def test_merge_on_non_deploy_stage_sets_done(
mock_proj, mock_task, mock_update_stage, mock_notify,
):
"""FIX 1: merge behaviour is preserved for non-deploy stages (e.g. review)."""
from src.webhooks.gitea import handle_pr
mock_proj.return_value = {"repo": "enduro-trails"}
mock_task.return_value = {
"id": 2, "stage": "review", "work_item_id": "ET-013",
}
asyncio.run(handle_pr(_merged_pr_payload(branch="feature/ET-013-x")))
# Non-deploy stages still get the merge-driven done.
mock_update_stage.assert_called_once_with(2, "done")
assert mock_notify.called