Commit Graph

54 Commits

Author SHA1 Message Date
f81715bd39 fix(infra): run orchestrator containers as host uid 1000:1000 (not root)
All checks were successful
CI / test (push) Successful in 12s
CI / test (pull_request) Successful in 12s
Both compose services (orchestrator, orchestrator-staging) now declare
user: "1000:1000" so pipeline artifacts (git worktree, docs/work-items
commits) are created as slin:slin on the host — git pull/reset under slin
no longer fail with permission errors. docker.sock access preserved via
group_add: ["999"]. SSH mount target aligned with the launcher-forced
HOME=/home/slin (/root/.ssh -> /home/slin/.ssh). launcher.py and Dockerfile
unchanged. INFRA.md and CHANGELOG.md updated; host-prerequisites (P-1..P-4)
documented.

Refs: ORCH-040

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-06 15:02:33 +00:00
05c17135c1 feat(notifications): add bump mode + russify Telegram live-tracker
All checks were successful
CI / test (push) Successful in 13s
CI / test (pull_request) Successful in 13s
ORCH-042: new ORCH_TRACKER_MODE (Settings.tracker_mode, default edit) selects
the live-tracker card behaviour. bump mode re-creates the card at the bottom of
the chat on every update (delete_telegram + send silently + repoint message_id),
keeping the "one card per task" invariant: <=1 new message per call, repoint
only on successful send, delete result never gates the send. New never-raising
delete_telegram helper. Anything != "bump" resolves to edit (zero regression).

Also russify/cosmetic-fix the card text (both modes): "Подтверждение BRD" label,
 after approve-gate, Russian stage labels, "📦 Внедрено". Docs updated in the
same PR (CHANGELOG, internals.md, .env.example).

Refs: ORCH-042

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-06 10:13:49 +00:00
28d019a1e2 fix(staging_check): B6 reads registry from running staging instance env
All checks were successful
CI / test (push) Successful in 12s
CI / test (pull_request) Successful in 16s
B6 false-FAILed because it built the project registry from the
launcher process-env via a host-path hack (sys.path.insert +
importlib.reload), not from the running staging instance. Run from the
host, ORCH_PROJECTS_JSON is unset -> default ET+ORCH registry -> false
FAIL -> spurious deploy-staging -> development rollback.

Variant (v) per ADR-001: remove the host-path hack; canonically run the
suite INSIDE orchestrator-staging via docker exec so src.projects
resolves from /app (PYTHONPATH) with .env.staging. Verdict logic
extracted into pure _evaluate_b6(known) -> (passed, detail) +
_known_project_ids_from_registry() / _run_b6() with deterministic FAIL on
source unavailability. deployer.md and STAGING_CHECK.md updated to the
docker exec command. src/projects.py, .env* and checks A/B4/B5/C
untouched.

Refs: ORCH-048

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-06 07:03:31 +00:00
03c3d77cac feat(stage-engine): embed verbatim reviewer/tester findings in rollback task_desc
All checks were successful
CI / test (push) Successful in 12s
CI / test (pull_request) Successful in 11s
При заворотах на development task_desc теперь несёт дословный must-fix текст
(P0/P1 ревьюера, причина FAIL тестера) вместо одной ссылки на файл — developer-
агент видит суть претензий сразу и не повторяет ту же ошибку, экономя retry-
бюджет и токены общего инстанса.

- Новый defensive-модуль src/review_parse.py (never-raise): extract_review_findings
  (P0/P1 из 12-review.md ## Findings), extract_test_failures (фрагмент тела
  13-test-report.md: pytest output / FAIL-строки / Итог), усечение по лимиту.
- Две rollback-ветки stage_engine: встраивают текст + сохраняют ссылку на полный
  файл; graceful-фоллбэк на ссылку-строку при битом/пустом артефакте.
- Последовательность отката, retry-счётчик, поля AdvanceResult, реестр QG_CHECKS
  не менялись.
- Доки: README (Stage Engine / Откаты), CHANGELOG.
- Тесты: tests/test_review_parse.py, test_stage_engine.py::TestRollbackTaskDescEmbedding.

Refs: ORCH-046

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-06 04:42:11 +00:00
51a76e8169 fix(qg): read result: alongside verdict:/status: in tests gate
All checks were successful
CI / test (push) Successful in 12s
CI / test (pull_request) Successful in 11s
_parse_tests_verdict now accepts three equal-rank machine-readable
frontmatter fields in 13-test-report.md — result: (canonical tester
output), verdict: and status: (legacy/enduro-trails). Any one non-empty
field suffices; a negative token in any field stays authoritative.

Fixes the producer/consumer contract mismatch where the tester emits
`result: PASS` (per .openclaw/agents/tester.md) but the gate only read
verdict:/status:, causing a testing->development rollback loop until
MAX_DEVELOPER_RETRIES (observed on ORCH-17). Token sets frozen and gate
signature/QG_CHECKS unchanged for full backward compatibility.

Refs: ORCH-047
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-05 21:03:32 +00:00
stream
0eff781d13 feat(qg): ORCH-045 — poll check_ci_green with retry to fix CI race (pending->success)
All checks were successful
CI / test (push) Successful in 12s
CI / test (pull_request) Successful in 12s
2026-06-05 19:59:06 +00:00
stream
d615747d53 revert(ORCH-017): drop shared check_tests_passed gate change — moved to ORCH-47 (own ADR); keep only approve-ping links
All checks were successful
CI / test (push) Successful in 12s
CI / test (pull_request) Successful in 11s
2026-06-05 19:28:27 +00:00
e62d51aa77 fix(qg): testing gate reads documented tester result: frontmatter key (ORCH-017)
All checks were successful
CI / test (push) Successful in 12s
CI / test (pull_request) Successful in 11s
check_tests_passed/_parse_tests_verdict gated the testing -> deploy-staging
transition on `verdict:`/`status:` in 13-test-report.md, but the tester agent
prompt (.openclaw/agents/tester*) documents `result: PASS | FAIL` as THE
machine-readable field. A report that followed the contract literally
(ORCH-017: only `result: PASS`, no verdict:/status:) was bounced back to
development with a misleading "Tests FAILED". ORCH-016 only passed because its
report redundantly carried both `verdict:` and `result:`.

Treat `result:` as a first-class machine field alongside verdict/status; a
negative token in any field stays authoritative (ET-013 contract preserved).
Self-hosting QG fix: unblocks every project whose tester emits only `result:`.

Docs updated in-PR: CHANGELOG, architecture README machine-keys note.
Tests: test_qg.py::TestCheckTestsPassed::test_result_pass_only_passes / _fail_only_fails.

Refs: ORCH-017
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-05 18:34:25 +00:00
69a4aaab99 feat(notifications): direct BRD + Plane links in approve ping (ORCH-017)
All checks were successful
CI / test (push) Successful in 12s
CI / test (pull_request) Successful in 12s
notify_approve_requested now embeds two HTML <a> links into the single
notifying approve-gate message: a Gitea branch-view link to 01-brd.md and a
Plane issue browser link. Adds ORCH_PLANE_WEB_URL (external Plane web URL,
fallback to plane_api_url) with a loopback-guard that omits the Plane link
when the resolved base is localhost/empty (no broken localhost URLs in prod).

Each link is built independently and omitted on missing data; the message and
the "flip to Approved" call to action are always sent as exactly one ping. The
shared send_telegram helper is left untouched (min blast radius for the
self-hosting prod container). Dynamic labels are html.escaped; parse_mode=HTML
preserved. QG registry / stages / approve handler unchanged.

Docs updated in-PR: CHANGELOG, .env.example, INFRA env map.
Tests: test_notify_approve_links.py, test_analysis_approve_flow_links.py.

Refs: ORCH-017
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-05 17:58:00 +00:00
401bf66fe0 feat(agents): configurable LLM model + effort per-agent and per-project (ORCH-41) (#36)
Some checks failed
CI / test (push) Has been cancelled
2026-06-05 19:45:19 +03:00
8da571de86 feat(plane): unified status-comment format with duration line (ORCH-016) (#34) 2026-06-05 17:50:47 +03:00
f375be249f fix(tests): per-project Plane states in webhook tests + close CI hole (ORCH-39) (#35) 2026-06-05 17:36:40 +03:00
Dev Agent
00325bcab0 fix(plane): resolve issue states per-project instead of hardcoded enduro UUIDs (ORCH-10)
All checks were successful
CI / test (push) Successful in 12s
CI / test (pull_request) Successful in 10s
ORCH-10 root cause: PLANE_STATES was a global dict hardcoding enduro-trails
UUIDs. The webhook comparison  only
matched ET UUID (b873d9eb) and silently ignored the ORCH in_progress UUID
(e331bfb3), blocking pipeline start for all orchestrator-project tasks.

Changes:
- src/plane_sync.py:
  * Rename PLANE_STATES -> _DEFAULT_STATES (enduro UUIDs kept as safe fallback).
  * PLANE_STATES preserved as alias to _DEFAULT_STATES (backward compat).
  * Add get_project_states(project_id) -> {logical_key: state_uuid}:
    fetches Plane API GET /projects/<id>/states/, maps by state name,
    caches per project_id, falls back to _DEFAULT_STATES on API failure.
  * Add _STATES_CACHE: dict, reload_project_states(project_id=None).
  * Add _PLANE_NAME_TO_KEY mapping and _STAGE_TO_STATE_KEY for clean lookup.
  * Add stage_to_state(stage, project_id) using get_project_states().
  * update_issue_state() uses stage_to_state() instead of STAGE_TO_STATE dict.
  * set_issue_{needs_input,in_review,blocked,done,in_progress,stage_state}()
    all resolve state UUID via get_project_states(project_id) instead of
    the global PLANE_STATES dict.

- src/webhooks/plane.py:
  * handle_issue_updated: import get_project_states, resolve proj_states per
    incoming project_id, compare new_state against proj_states["in_progress"],
    proj_states["approved"], proj_states["rejected"].
  * start_pipeline QG-0 blocked path: use get_project_states(plane_project_id)
    instead of PLANE_STATES["blocked"].

- tests/test_orch10_states.py: 23 new tests covering:
  * get_project_states returns correct UUIDs for both ET and ORCH projects.
  * API failure / empty response / None project_id -> _DEFAULT_STATES fallback.
  * Caching and reload_project_states (per-project and full flush).
  * stage_to_state() per-project resolution.
  * Webhook in_progress triggers pipeline for BOTH b873d9eb (ET) and e331bfb3 (ORCH).
  * Webhook approved/rejected routes correctly per project.
  * PLANE_STATES alias and _DEFAULT_STATES backward compat.
2026-06-05 14:23:31 +03:00
Dev Agent
e0c14fae5f fix(pipeline): make deploy-staging gate conditional on self-hosting repo (ORCH-35)
All checks were successful
CI / test (push) Successful in 10s
CI / test (pull_request) Successful in 10s
2026-06-05 10:36:46 +03:00
Dev Agent
e0b6e92b09 feat(pipeline): add deploy-staging gate before prod deploy (ORCH-35)
All checks were successful
CI / test (push) Successful in 9s
CI / test (pull_request) Successful in 9s
2026-06-05 10:06:06 +03:00
Dev Agent
1baae81165 test: reset webhook secret per-test to fix cross-file isolation (CI green)
All checks were successful
CI / test (push) Successful in 10s
CI / test (pull_request) Successful in 10s
Adds autouse fixture _reset_webhook_secrets to tests/conftest.py that
resets the process-wide Pydantic settings singleton before every test:

1. gitea_webhook_secret / plane_webhook_secret → "" (HMAC disabled by
   default). Tests that deliberately test the 401 path
   (test_webhook_dedup.py:268,278) override this with their own monkeypatch
   which runs after autouse fixtures and wins for that test only.

2. db_path → os.environ["ORCH_DB_PATH"] (last written value after all test
   modules are imported). Without this, test_webhook_dedup.py (imported
   first alphabetically) seeds settings.db_path = dedup.db, while
   test_webhooks.py setup_db tries to remove test_orchestrator.db — leaving
   the DB dirty between tests that share a branch name and causing
   get_task_by_repo_branch() to return a stale row with the wrong stage.
   Per-test monkeypatches in test_webhook_dedup.setup_db still override it.

Root cause: both leaks come from the same singleton settings being read once
at import, before any per-test isolation runs. The autouse fixture is the
correct per-test reset point for process-wide singletons.

Result: pytest tests/ → 294 passed, 0 failed (was 10 failed/284 passed).
2026-06-05 00:00:01 +03:00
Dev Agent
e856e0940b test: migrate sequential_ids test to In Progress contract
Some checks failed
CI / test (push) Failing after 9s
CI / test (pull_request) Failing after 9s
2026-06-04 22:38:09 +03:00
Dev Agent
7bbab9c38b test: isolate webhook tests from live Plane API (fix CI)
Some checks failed
CI / test (push) Failing after 9s
CI / test (pull_request) Failing after 9s
2026-06-04 22:15:40 +03:00
dev-agent
757745a221 fix(qg): gate testing->deploy on machine-readable test verdict, not substring (ET-013)
check_tests_passed did "if PASS in content" over the whole 13-test-report.md
body, so a report explicitly marked verdict: BLOCKED / status: blocked whose
prose mentioned "23 passed" / "PASS" / "All checks passed" passed the gate.
On ET-013 an unfinished feature (P1 AC-19 failed) reached Done.

Now mirrors check_reviewer_verdict (S-5) and check_deploy_status: read ONLY the
YAML frontmatter verdict:/status: fields. Positive tokens (PASS/PASSED/
READY-TO-DEPLOY/GREEN/APPROVED) -> True; negative tokens (BLOCKED/FAILED/...) are
authoritative -> False; missing/empty/no-frontmatter/bad-YAML -> False with reason;
file missing -> not found. Never raises.

Positive token set derived from REAL enduro-trails reports ET-001..ET-014
(inconsistent: PASS, ready-to-deploy+status:PASSED, stage:ready-to-deploy+status:pass,
PASS — ready-to-deploy). Validated: all 9 prior passing WIs stay True, ET-013 -> False.
2026-06-04 16:05:52 +03:00
dev-agent
4e4cc6c724 fix(qg): find 14-deploy-log.md in origin/main when absent in feature worktree
ET-013: deployer writes 14-deploy-log.md and merges deploy artifacts into
main via a separate PR, so the log lands in origin/main, not the feature
branch worktree that check_deploy_status reads via _repo_path(repo, branch).
Result: every successful deploy was falsely failed (Deploy log not found)
and rolled back deploy->development.

Fix: when the log is absent in the worktree, fall back to reading it from
origin/main on the shared clone (git fetch origin main + git show
origin/main:docs/work-items/<WI>/14-deploy-log.md). Lookup order:
worktree -> origin/main -> not found. Fetch/show failures degrade to
not found (never raise). Does not touch the merge-gate in gitea.py.

Tests: origin/main SUCCESS->PASS (ET-013 case), origin/main FAILED->FAILED,
absent everywhere->not found, fetch failure->degrades no exception,
worktree log short-circuits main lookup.
2026-06-04 13:35:35 +03:00
dev-bot
ec9aa74492 fix(tracker): no duplicate Telegram messages on not-modified/transient edits
edit_telegram now returns a distinguishable outcome (ok|not_modified|gone|
failed) instead of a bare bool. update_task_tracker only sends a NEW message
when the original is truly gone; not_modified and transient failures no longer
spawn duplicate trackers or orphan the live one.

render_task_tracker shows "попытка N" on an actively re-run stage (>=2 agent
runs) so the text changes between review<->development cycles. Finished ()
lines are unchanged.

Tests: edit_telegram classification (ok/not_modified/gone/failed via mocked
httpx), update_task_tracker (not_modified/failed -> no send, gone -> send+id),
render attempt marker.
2026-06-04 13:20:40 +03:00
dev-bot
9a0298de9d feat(telegram): live editable task tracker (Variant B+), replace 15-message spam
Replace the ~15 separate Telegram messages per task (agent start/finish, stage
transition, QG-pending, tech noise) with ONE live tracker message edited in
place (editMessageText) on every stage transition. Only attention-worthy events
are still sent as SEPARATE, notifying messages: approve-gate, deploy-fail,
agent-fail, task error.

- db.py: idempotent ALTERs — tasks.tracker_message_id, tasks.title,
  tasks.brd_review_started_at/ended_at, agent_runs.model. Helpers for
  tracker message_id + BRD-review clock.
- usage.py: short_model_name() (strip provider/claude- prefix); parse model
  from result-JSON modelUsage; record_usage persists model.
- notifications.py: render_task_tracker(task_id) (stateless render from
  agent_runs), update_task_tracker (sendMessage->store id->editMessageText with
  fallback to a new message, silent), edit_telegram(). Per-stage line
  in↓/out↑·cost·model, ⏸️ Ревью БРД (human time), 💰 totals, finish block
  (⏱️ wall/agents/yours, 🔗 PR · 📦). notify_* are now tracker-only/log-only
  except the four alerts.
- stage_engine.py: stamp brd_review_ended on analysis->architecture advance.
- webhooks/plane.py: persist task title on creation.
- tests/test_telegram_tracker.py: render, short_model_name, send/edit/fallback,
  separate-vs-silent alert behavior.
2026-06-04 11:42:46 +03:00
Dev Agent
61e26a8930 fix(observability): merge-gate on deploy, full token input, Plane Done, artifact links
1. BUG 8 (second door): merge webhook no longer fake-completes a task at the
   deploy stage; done is gated by the deployer verdict (check_deploy_status).
   Other stages keep merge->done.
2. Token accounting: parse+persist cache_creation_input_tokens (new
   idempotent agent_runs column). usage_comment / task_summary now show the
   FULL input (input + cache_read + cache_creation) with a cached breakdown.
   cost_usd untouched.
3. deploy->done success now forces the Plane issue to terminal Done state.
4. All agents (architect/developer/reviewer/tester/deployer) attach artifact
   links to their finish comment via gitea_public_url.

Tests added for each fix; pytest 244 passed / 9 failed (off-limits HMAC group).
2026-06-04 11:17:58 +03:00
dev-agent
e4a9c48395 fix(deploy): gate deploy->done on deployer verdict, not LLM exit code 2026-06-04 02:43:01 +03:00
Dev Agent
3a285de11d fix(ci): bounce task back to developer on red CI (capped retries) 2026-06-04 01:39:40 +03:00
Dev Agent
e15d339b14 fix(qg): use check_ci_green instead of local tests on development stage 2026-06-04 01:22:43 +03:00
orchestrator-dev
90c9ffe839 fix(qg): run pytest directly instead of make in check_tests_local 2026-06-04 00:43:04 +03:00
Dev Agent
0b8013cb06 fix(stage): approved verdict advances analysis->architecture instead of re-running gate 2026-06-03 23:30:08 +03:00
Dev Agent
ca63bc26bb feat(config): external gitea_public_url for clickable doc links 2026-06-03 22:58:18 +03:00
dev-agent
a9cdb17614 feat(plane): analyst comment asks for Approved status + links docs
The analyst ready-comment used the obsolete :approved: wording (comment-based approve was removed in PR #12). Rewrite it for the status-only model: ask the stakeholder to move the issue to Approved (reject = reason comment + Rejected), and add clickable Gitea links to the analyst docs that actually exist in the worktree.
2026-06-03 22:42:53 +03:00
dev-agent
96c5e6b2f9 fix(pipeline): fetch issue name from Plane API on status-trigger start
issue.updated ships only the changed fields, so name was absent and the branch slug became feature/<id>-untitled. Add fetch_issue_fields (single issue-detail GET returning name+description, reusing the endpoint/token of fetch_issue_description) and pull the name above the slug build. Empty name still falls back to untitled.
2026-06-03 22:42:53 +03:00
Dev Agent
857bad314c feat(webhook): pull reject reason from latest comment
handle_verdict(rejected): the reason is now pulled from the issue latest Plane
comment (_latest_comment_reason: GET comments, newest by created_at, HTML
stripped) instead of a fixed stub. Slava writes the reason in a comment before
flipping the status to Rejected. Falls back to a fixed note when there is no
comment / the API call fails.

tests: add test_status_only_verdict.py (test_inreview_comment_does_not_revert
[bug 3 root], test_any_comment_no_pipeline_action,
test_approved_status_advances_without_inprogress_reset,
test_rejected_status_pulls_reason_from_comment) and
test_inprogress_from_needs_input_relaunches_analyst in test_status_trigger.py.
Rewrote the comment-based tests (test_verdict_status, test_plane_approved/
rejected in test_webhooks) under the status-only model: comments are no-ops,
verdicts come from status changes.
2026-06-03 22:18:24 +03:00
Dev Agent
c69e11348b test(pipeline): cover status-start description fetch and work_item_id uniqueness
- test_status_start_fetches_description: empty payload description -> pulled from
  Plane API (mocked) -> QG-0 passes, analyst enqueued.
- test_status_start_empty_api_still_blocks: empty API -> honest QG-0 fail.
- test_work_item_id_uniqueness: ET-006 taken -> next free id, per-repo isolation.
- test_collision_reassigns_in_start_pipeline: end-to-end collision reassignment.
- test_worktree_per_task: two tasks never share a worktree path.
2026-06-03 21:12:59 +03:00
Dev Agent
7fd6529a35 test(conftest): mute Telegram in all tests to stop prod leakage
A pytest run on prod was sending REAL Telegram messages to Slava: some tests
(e.g. test_webhook_dedup advancing a stage) reach notify_stage_change ->
send_telegram, which read the live .env token/chat_id and actually POSTed.

Add an autouse fixture stubbing send_telegram to a no-op for every test. Patch
the SOURCE src.notifications.send_telegram (covers all notify_* helpers and the
many modules that do a local from .notifications import send_telegram inside
functions) AND src.stage_engine.send_telegram (module-level binding, would not be
intercepted by the source patch alone). webhooks/plane, launcher, queue_worker are
patched defensively with raising=False.

Verified: full suite run with FAKE telegram creds + an un-swallowable httpx.post
trip-wire (BaseException, so send_telegram except Exception can not hide it) shows
ZERO calls to api.telegram.org. Without the fixture the trip-wire fires, proving
the guard is real.
2026-06-03 18:23:09 +03:00
Dev Agent
9a702a0216 feat(metrics): per-agent token/cost accounting
Feature 4. claude is now launched with --output-format json; the run-log trailing
result JSON is parsed (defensively, never fatal) for usage + total_cost_usd. New
idempotent ALTERs add input_tokens/output_tokens/cache_read_tokens/cost_usd to
agent_runs; the launcher monitor records usage per run, posts a per-agent finish
comment under that agent bot (e.g. Developer gotov · 45.2k in / 12.1k out · $0.21),
and the deployer posts an end-of-task summary (SUM over agent_runs GROUP BY agent)
on done. New src/usage.py holds parse/format/record/summary helpers; test_usage.py
covers parsing a real CLI JSON blob, NULL-on-garbage, recording, formatting, and the
per-task aggregate.
2026-06-03 18:18:46 +03:00
Dev Agent
38a741d24e feat(webhook): verdict via Approved/Rejected statuses (variant B)
Feature 2. The issue updated dispatch (shipped with the status-trigger handler)
also routes Approved -> _try_advance_stage (== :approved: comment) and Rejected ->
_rollback_stage (== :rejected: comment). The :rejected: comment branch was
refactored into the shared _rollback_stage so both mechanisms behave identically;
a status reject passes Reason: (rejected via status, see latest comment) since no
inline reason arrives with a status change. Comments stay fully working. This
commit adds test_verdict_status.py proving both status and comment paths funnel
into the same advance/rollback logic.
2026-06-03 18:18:36 +03:00
Dev Agent
09b1c5e1b9 feat(webhook): start pipeline on In Progress status (not on create)
Feature 1. work_item.created no longer starts the pipeline (soft QG-0 log only);
the issue stays in the backlog until moved to In Progress. The pipeline-start body
is extracted into start_pipeline(); a new issue updated handler routes a state
change to In Progress -> handle_status_start, which is idempotent: an existing task
for the plane_id is NOT re-created or restarted (protects handle_comment, which also
flips issues to In Progress). Real Plane payload: event=issue, action=updated,
data.state.id. Existing m6/plane_webhook/dedup tests updated to drive the new
trigger; new test_status_trigger.py covers created-no-op / start / idempotent.
2026-06-03 18:18:26 +03:00
Dev Agent
a4668c0303 feat(plane): stage visibility on board + verdict status UUIDs
Feature 3 + Feature 2 infra. Extend the global PLANE_STATES with the 6 new
enduro status UUIDs (architecture/development/review/testing + approved/rejected),
remap STAGE_TO_STATE so the 4 mid-pipeline stages move the issue across its own
board column instead of all sitting in In Progress, and add the
set_issue_stage_state() helper. Needs Input / In Review / Blocked keep their own
explicit setters and stay higher priority. TODO(ORCH-10): statuses are per-project;
resolve per project when more projects are onboarded.
2026-06-03 18:18:17 +03:00
Dev Agent
d305521067 feat(plane): per-agent bot authorship for comments
add_comment now accepts an optional author (agent role) and POSTs under the matching Plane bot token via _headers_for(), so Plane shows the real author (Analyst/Architect/Developer/Reviewer/Tester/Deployer/Stream) instead of a single shared account. Unknown/empty roles or missing tokens fall back to the shared orchestrator token (autonomy preserved). GET/PATCH (find_issue_id, set_state) are unchanged and stay on the shared token. Call sites in stage_engine, launcher, webhooks/plane and the plane_sync notify helpers now pass author by stage role; stage transitions use stream. Adds tests/test_plane_author.py.
2026-06-03 10:53:25 +03:00
Dev Agent
1d978caea7 feat(webhook): derive work_item_id from Plane sequence_id (M-6) 2026-06-03 10:02:15 +03:00
Dev Agent
0653c2437f feat(launcher): prune old run logs (L-2) 2026-06-03 09:53:55 +03:00
Dev Agent
4ac449ff63 test(webhook): cover delivery dedup + migration safety (M-7) 2026-06-03 09:18:02 +03:00
Dev Agent
6abdc220d2 test(stage): cover unified stage_engine + launcher/plane delegation
18 tests: happy-path advance per stage with correct agent (ORCH-4 fix),
QG-fail no-advance, reviewer REQUEST_CHANGES rollback+retry/alert, tester FAIL
rollback+retry/block, architect conflict rollback to analysis, analyst
approved-flow no-advance, and launcher+plane both delegating to the engine.
2026-06-03 08:56:25 +03:00
Dev Agent
c167c6930d test(launcher): watchdog graceful kill ordering + timeout config + M-4 removal
Cover M-2: SIGTERM-before-SIGKILL ordering, graceful exit within grace skips
SIGKILL, ProcessLookupError before SIGTERM is tolerated (no _record_kill), and
_resolve_timeout per-agent override / default / malformed-JSON fallback.
Cover M-4: _auto_merge_pr removed, _ensure_pr retained.
2026-06-03 08:28:09 +03:00
Dev Agent
a613fd8180 test(resilience): 34 tests for preflight/classifier/backoff/breaker (ORCH-1)
Covers preflight FAIL->queued + cache, transient/permanent classifier +
Retry-After, exp backoff + available_at gating, launcher transient vs permanent
finalize, circuit breaker open/half-open/closed. test_queue worker tests stub
preflight OK. Popen never spawned.
2026-06-03 00:12:17 +03:00
Dev Agent
2283b8898b test(queue): 19 tests for job queue lifecycle/atomicity/retry/worker (ORCH-1)
Covers enqueue->claim->mark, atomic claim (no double dispatch, 8-thread race),
retry fail->queued->failed, requeue_running_jobs, observability, worker
max_concurrency. Popen fully mocked (no real agent spawned).
2026-06-02 23:58:44 +03:00
Dev Agent
c1f35a2047 test(projects,webhook): cover registry resolvers + project filter
ORCH-6: test_projects.py covers resolvers and ORCH_PROJECTS_JSON parsing
(valid/malformed/fallback). test_plane_webhook.py covers the webhook
project filter via TestClient (unknown->ignored, orchestrator->orchestrator
repo, enduro->enduro-trails, independent ORCH/ET prefixes); launcher
mocked. test_webhooks.py: register proj-1 so existing ET fixtures pass.
2026-06-02 22:30:51 +03:00
Dev Agent
1ebe8afc23 feat(worktree): git worktree per task to isolate shared /repos (ORCH-2 / S-4)
- add src/git_worktree.py: ensure/remove/get_worktree_path
- config: worktrees_dir=/repos/_wt
- launcher: agent runs in per-branch worktree; task-file + commit/push in worktree; no shared checkout
- qg/checks: read artifacts + run make test from worktree (branch arg, backward-compatible)
- webhooks/plane: pass branch into QG dispatch; review fallback from worktree
- webhooks/gitea: keep read-only branch --contains in main clone (documented)
- tests: test_git_worktree.py (isolation) + update test_launcher write-task-file
- docs: ARCHITECTURE worktree section + BUGFIXES_2026-06-02_ORCH2

Preserves B-1/B-2/S-1/S-5 fixes (paths now point at worktree).
2026-06-02 21:12:06 +03:00
Dev Agent
67b9f814b5 test(launcher): cover _write_task_file and reviewer verdict parsing (L-5) 2026-06-02 20:12:29 +03:00
51f7364532 feat: integrate Analyst into Plane/Orchestrator pipeline
- Add git fetch+checkout in agent launch cmd (ensures correct branch)
- Add git fetch+checkout in _monitor_agent before commit/push
- Post start comment in Plane when analyst launches
- Post :approved: request comment after analyst completes successfully
- Branch lookup moved before cmd construction for reuse
2026-05-31 20:15:01 +03:00