orchestrator

Author	SHA1	Message	Date
claude-bot	a46dcbcab3	fix(deploy): terminal-window-aware guard so done tasks hold Done in Plane (ORCH-094) A DB stage=done task with 0 active jobs flapped in Plane between `Awaiting Deploy` and `Monitoring after Deploy` instead of holding `Done` (verified live on ORCH-061, task 47): the three deploy-phase setters were terminal-blind, so any stale/duplicate/unknown caller under the bot token re-stamped an intermediate status over the terminal Done, forever. - New leaf src/deploy_status_guard.py (pure, never-raise, config-gated): decide() -> ALLOW \| CONVERGE_DONE \| SUPPRESS on the entry of set_issue_awaiting_deploy / set_issue_deploying / set_issue_monitoring. A deploy-phase status is legitimate iff the task is non-terminal OR (done AND post-deploy window active); otherwise done converges to Done idempotently, cancelled is suppressed (FR-2, D1/D2). - D3: move post_deploy.arm_monitor ABOVE the terminal-sync block in advance_stage so window_active is True when the legitimate first Monitoring is set (the task is already DB-done by then); a re-drive after the window closes converges to Done. - D4: run_post_deploy_monitor no-ops without a status PATCH / re-queue when the task became cancelled mid-window (zombie-tick guard, FR-3). - D5: additive `reason` kwarg on the three setters + one structured log line per verdict (work_item/caller/target/db_stage/window_active/verdict); new read-only db.get_task_by_work_item_id; post_deploy.window_active helper. - Flags deploy_status_guard_enabled (kill-switch -> 1:1) / deploy_status_guard_repos (CSV; empty = self-hosting only). STAGE_TRANSITIONS / QG_CHECKS / check_* / machine-verdict keys / DB schema untouched (reads existing tasks.stage). Tests: TC-01..TC-12 across 5 new test modules + config flags; updated the reason-kwarg assertions in test_deploy_terminal_sync / test_deploy_approve. Full regress green (1413). Docs: CHANGELOG, CLAUDE.md, docs/architecture/README.md (status -> реализовано), .env.example. Refs: ORCH-094 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 23:41:24 +03:00
claude-bot	db4dd275e4	architect(ET): auto-commit from architect run_id=520	2026-06-09 23:41:24 +03:00
claude-bot	8959e0e3f4	architect(ET): auto-commit from architect run_id=519	2026-06-09 23:41:24 +03:00
claude-bot	f36528705e	analyst(ET): auto-commit from analyst run_id=518	2026-06-09 23:41:24 +03:00
Slava	5e01df00eb	docs: init ORCH-094 business request	2026-06-09 23:41:24 +03:00
Slava	fcb40eb4bb	Merge pull request 'docs(ORCH-094): staging gate log — SUCCESS' (#106 ) from docs/ORCH-094-staging-log into main	2026-06-09 23:40:59 +03:00
claude-bot	b86fc9043f	docs(ORCH-094): staging gate log — SUCCESS (8/10, C9a/C9b infra-waived) All checks were successful CI / test (pull_request) Successful in 41s Details Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 23:40:46 +03:00
Slava	fbedd0485b	Merge pull request 'fix(merge_gate): retry transient Gitea merge errors (405/5xx) + already-in-main guard (ORCH-093)' (#104 ) from feature/ORCH-093-bug-merge-gitea-405-5xx-hold-p into main Some checks failed CI / test (push) Has been cancelled Details	2026-06-09 22:51:43 +03:00
deploy-finalizer	f9ce5ca1b8	deploy(ORCH-036): finalize SUCCESS for ORCH-093 All checks were successful CI / test (push) Successful in 38s Details CI / test (pull_request) Successful in 40s Details	2026-06-09 22:51:42 +03:00
claude-bot	7863932012	tester(ET): auto-commit from tester run_id=516 All checks were successful CI / test (push) Successful in 42s Details CI / test (pull_request) Successful in 39s Details	2026-06-09 22:47:20 +03:00
claude-bot	74418893d7	reviewer(ET): auto-commit from reviewer run_id=515	2026-06-09 22:47:20 +03:00
claude-bot	0b25fc1527	fix(merge_gate): retry transient Gitea merge errors + already-in-main guard merge_pr now wraps ONLY the mutating POST /pulls/{n}/merge in a bounded exponential-backoff retry-loop on TRANSIENT outcomes (405 "try again later", 408, any 5xx, network/timeout, and 409\|422 while the PR is still mergeable); TERMINAL outcomes (403/404/real conflict via mergeable==False) -> fast honest False, so the ORCH-071/081 not-merged HOLD backstop is unchanged. Fixes the ORCH-063 false HOLD + manual re-merge on Gitea's post-push mergeability hiccup. ensure_open_pr gains an "already fully in main" guard (_branch_fully_in_main, git merge-base --is-ancestor HEAD origin/main) BEFORE creating a PR -> new "already-in-main" outcome avoids the garbage empty PR on a re-driven finalizer; _handle_merge_verify skips merge_pr on that outcome and lets the authoritative SHA-in-main check confirm -> done (not a HOLD). git error of the guard fails OPEN to the create path. New ORCH_MERGE_RETRY_* settings (kill-switch merge_retry_enabled -> one-shot, max_attempts=3, backoff base=2/max=5). INV-4 (merge only via Gitea PR-merge API, never push/force-push main), never-raise, STAGE_TRANSITIONS/QG_CHECKS/DB schema unchanged. Docs (README merge-verify section, CLAUDE.md, CHANGELOG, .env.example) updated in the same PR. Tests: test_merge_gate.py TC-01..12, test_config.py TC-13, test_merge_verify.py TC-14..16; full suite green (1389). Refs: ORCH-093 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 22:47:20 +03:00
claude-bot	3d0f51512b	architect(ET): auto-commit from architect run_id=512	2026-06-09 22:47:20 +03:00
claude-bot	520373a694	analyst(ET): auto-commit from analyst run_id=511	2026-06-09 22:47:20 +03:00
Slava	cf0a72a46b	docs: init ORCH-093 business request	2026-06-09 22:47:20 +03:00
claude-bot	1a52fcba9e	docs(ORCH-093): staging gate log — SUCCESS (8/10, C9a/C9b infra-waived) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 22:46:56 +03:00
Slava	33b7fd57ff	Merge pull request 'fix(notifications): tracker card — status completeness, rollback reflection, stage-metric summation (ORCH-091)' (#102 ) from feature/ORCH-091-bug-to-analyse-stage-deploy-st into main Some checks failed CI / test (push) Has been cancelled Details	2026-06-09 22:13:09 +03:00
deploy-finalizer	6feae55a4b	deploy(ORCH-036): finalize SUCCESS for ORCH-091 All checks were successful CI / test (push) Successful in 32s Details	2026-06-09 22:13:08 +03:00
claude-bot	86b013c872	tester(ET): auto-commit from tester run_id=509 All checks were successful CI / test (push) Successful in 36s Details CI / test (pull_request) Successful in 33s Details	2026-06-09 22:08:52 +03:00
claude-bot	3d6e957cae	reviewer(ET): auto-commit from reviewer run_id=508	2026-06-09 22:08:52 +03:00
claude-bot	328ae78da3	fix(notifications): tracker card — status-map completeness, rollback reflection, stage-metric summation (ORCH-091) Three verified live-card defects in src/notifications.py (ORCH-067/087), all additive and indication-only (STAGE_TRANSITIONS / QG_CHECKS / check_* / transport / DB schema untouched; never-raise; revert = git revert): - Деф.1 (D1): _STAGE_STATUS_LABEL covered 8 of 10 STAGE_TRANSITIONS keys — deploy-staging and cancelled (ORCH-090) fell back to the misleading "To Analyse". Added deploy-staging→"Deploying (staging)", cancelled→"Cancelled"; replaced the runtime fallback for an UNMAPPED stage with a neutral capitalized label (_neutral_stage_label). created stays an explicit "To Analyse"; broken/None input degrades safely. Map completeness is asserted programmatically from STAGE_TRANSITIONS.keys() (single source of truth), not a static list. - Деф.2 (D2): the stage-row loop drew ✅ for any stage with a finished agent run regardless of position — after a rollback the card showed the absurd "✅ Внедрение + 🔄 Разработка". Added read-only _pipeline_pos from the STAGE_TRANSITIONS order and a suppression gate (✅ only when current_pos >= _pipeline_pos(stage_key)); deploy-staging→deploy normalization applied ONLY to the current position; is_active_stage untouched. - Деф.3 (D3): _stage_line took only the LAST run (ORCH-069: developer 3 runs Σ $3.98 rendered ~$0.00). It now aggregates ALL of the agent's runs with the same per-run formulas as the task totals → strict convergence with SUM(agent_runs) by task_id; model/effort/attempt come from the last run. Tests: test_tracker_status_line.py (ORCH-091 TC-01..TC-03 + updated tc06); new test_tracker_rollback_metrics.py (TC-05..TC-08). Full suite green (1370). Docs: CHANGELOG + internals.md (architecture README already updated by architect). Refs: ORCH-091 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 22:08:52 +03:00
claude-bot	c0f2d917bf	architect(ET): auto-commit from architect run_id=506	2026-06-09 22:08:52 +03:00
claude-bot	53022d20f4	architect(ET): auto-commit from architect run_id=505	2026-06-09 22:08:52 +03:00
claude-bot	852da919b9	analyst(ET): auto-commit from analyst run_id=504	2026-06-09 22:08:52 +03:00
Slava	67f7a3abfa	docs: init ORCH-091 business request	2026-06-09 22:08:52 +03:00
Slava	a994b25146	Merge pull request 'docs(ORCH-091): staging gate log — SUCCESS' (#103 ) from docs/ORCH-091-staging-log into main	2026-06-09 22:08:27 +03:00
claude-bot	36cd6e887b	docs(ORCH-091): staging gate log — SUCCESS (8/10, C9a/C9b infra-waived) All checks were successful CI / test (pull_request) Successful in 34s Details Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 22:08:14 +03:00
Slava	3b64cddd32	Merge pull request 'feat(cancel): STOP-status task cancellation + relaunch-hole close (ORCH-090)' (#101 ) from feature/ORCH-090-stop-plane into main Some checks failed CI / test (push) Has been cancelled Details	2026-06-09 21:36:12 +03:00
deploy-finalizer	08e6bfc3d5	deploy(ORCH-036): finalize SUCCESS for ORCH-090 All checks were successful CI / test (push) Successful in 34s Details	2026-06-09 21:36:11 +03:00
claude-bot	5ca9b8fd62	tester(ET): auto-commit from tester run_id=502 All checks were successful CI / test (push) Successful in 36s Details CI / test (pull_request) Successful in 31s Details	2026-06-09 21:31:56 +03:00
claude-bot	07190f69f5	reviewer(ET): auto-commit from reviewer run_id=501	2026-06-09 21:31:56 +03:00
claude-bot	aae65969d5	fix(cancel): narrow STOP critical-window so deploy-park cancel applies (ORCH-090) Review P1: a STOP while a self-hosting task is PARKED on `deploy` awaiting the manual `Confirm Deploy` was classified as a critical merge/deploy window solely because the task still held the per-repo merge-lease (held from merge-gate through deploy->done). That window is fully reversible — nothing is merged or deployed yet (the irreversible merge_pr runs later in _handle_merge_verify, always under an INITIATED marker). So the cancel was DEFERRED to run_deploy_finalizer, which only runs after Phase B (Confirm Deploy) — the very step the operator pressed STOP to avoid. Result: the deferred cancel was never applied, the task wedged non-terminal holding the lease, blocking the repo's serial-gate (ORCH-088) and merges. Fix: gate the merge-lease branch of cancel.in_critical_window on an actively RUNNING actor (_task_has_running_actor). Lease held + running deploy/merge job -> still deferred (genuine in-flight step). Lease held + no running actor (idle deploy parking) -> NOT critical -> immediate full reset, which itself releases the lease (step 3c) and drives the task terminal. INITIATED-marker deferral unchanged. Also fixes review P2 (AC-6): set_task_cancel_requested now returns the first-stamp fact (rowcount), and the deferred branch only notifies on the first transition — a repeated STOP while still deferred no longer spams duplicate notifications. Tests: test_d7_lease_held_idle_parking_is_not_critical, test_d7_lease_held_with_running_actor_still_critical, test_d7_stop_on_deploy_awaiting_confirm_full_resets, test_d7_repeated_stop_in_critical_window_no_duplicate_notify. Full suite green (1349). Refs: ORCH-090 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 21:31:56 +03:00
claude-bot	46c59bad99	reviewer(ET): auto-commit from reviewer run_id=499	2026-06-09 21:31:56 +03:00
claude-bot	ebbf2e7a2d	feat(cancel): STOP-status task cancellation + relaunch-hole close (ORCH-090) Introduce the dedicated Plane STOP status as a single declarative task-cancel mechanism: stop the active agent (graceful SIGTERM cascade), cancel all jobs (terminal `cancelled`, never requeued), remove the worktree + delete the remote feature branch (never main, never force-push), drive the task to the new system-terminal state `cancelled` and tombstone the natural keys so a later "To Analyse" re-creates it from scratch (docs artefacts preserved). STOP during a critical merge/deploy window is deferred until the irreversible step finishes honestly. Also closes the relaunch hole: handle_status_start relaunch is gated to the `analysis` stage; the only pipeline-start entry point remains "To Analyse". Cross-cutting (adr-0026): the "task terminal" predicate is widened {done} -> {done, cancelled} in serial_gate / task_deps / stages sink + reaper/worker requeue guards. STAGE_TRANSITIONS exit-gates / QG_CHECKS / check_* are unchanged (`cancelled` is a sink, not a new edge). Additive, never-raise, restart-safe, under kill-switch ORCH_STOP_STATUS_ENABLED (off -> zero regression). New: src/cancel.py (leaf), src/gitea.py (delete_remote_branch), tasks columns cancelled_at/cancel_requested_at, jobs status `cancelled`, GET /queue `stop` block. Tests: tests/test_stop_status.py (TC-01..TC-14 + D7); full suite green (1345). Docs updated in-PR (architecture README, CLAUDE.md, README.md, .env.example, CHANGELOG). ADR-001 D4 refinement: plane_issue_id is tombstoned too (the lookup ORs on it) — original UUID recoverable from the parseable suffix. Refs: ORCH-090 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 21:31:56 +03:00
claude-bot	ab083ba826	architect(ET): auto-commit from architect run_id=497	2026-06-09 21:31:56 +03:00
claude-bot	96a99a09b7	analyst(ET): auto-commit from analyst run_id=496	2026-06-09 21:31:56 +03:00
Slava	105d6e9cba	docs: init ORCH-090 business request	2026-06-09 21:31:56 +03:00
claude-bot	7b760e54da	docs(ORCH-090): staging gate log — SUCCESS (8/10, C9a/C9b infra-waived) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 21:31:30 +03:00
Slava	6ae611a376	Merge pull request 'ORCH-062 — INFRA: авто-prune docker build cache на mva154' (#100 ) from feature/ORCH-062-infra-prune-docker-build-cache into main Some checks failed CI / test (push) Has been cancelled Details	2026-06-09 19:59:13 +03:00
deploy-finalizer	c816b33c19	deploy(ORCH-036): finalize SUCCESS for ORCH-062 All checks were successful CI / test (push) Successful in 29s Details	2026-06-09 19:59:13 +03:00
claude-bot	5ead4543ee	tester(ET): auto-commit from tester run_id=494 All checks were successful CI / test (push) Successful in 33s Details CI / test (pull_request) Successful in 29s Details	2026-06-09 19:55:00 +03:00
claude-bot	247915e3d1	reviewer(ET): auto-commit from reviewer run_id=493	2026-06-09 19:55:00 +03:00
claude-bot	664c2e945a	feat(infra): auto-prune docker build cache on mva154 (ORCH-062) Add src/build_cache_pruner.py — a background daemon thread modelled 1:1 on src/disk_watchdog.py that periodically runs STRICTLY `docker builder prune -f --filter until=<until>` (BuildKit GC) on the HOST over ssh. It is the "second half" of the disk-watchdog (ORCH-063): the watchdog signals, the pruner cleans. Removes the root cause of the 07.06.2026 incident (build cache ~11GB -> disk 100% -> whole self-hosting pipeline down) automatically, без оператора. ADR-001 (Variant A): host-over-ssh, same channel as image_freshness/self_deploy (no docker CLI in the image). Touches ONLY the build cache — no image/system prune, no image/container removal, never restarts the docker daemon or the prod container (self-hosting safety). No ssh target -> tick is a no-op. - src/config.py: ORCH_BUILD_CACHE_PRUNE_* flags + defensive validators (interval/timeout >0, until ~ ^\d+[smhdw]?$, notify_min_gb >=0 -> safe default). - src/main.py: start last (after disk_watchdog) / stop first in lifespan; additive read-only build_cache_prune block in GET /queue. - never-raise on two levels (per-command + per-tick); kill-switch ORCH_BUILD_CACHE_PRUNE_ENABLED (false -> daemon does not start, 1:1 as before). - STAGE_TRANSITIONS / QG_CHECKS / check_* / _parse_* / DB schema UNCHANGED; last-run/last-result is in-memory (no migration). - tests/test_build_cache_pruner.py: TC-01..TC-12 (23 cases, docker fully mocked). - .env.example + CHANGELOG.md updated; INFRA.md / architecture docs already carry the component (architecture stage). Refs: ORCH-062 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 19:55:00 +03:00
claude-bot	d2604e42cd	architect(ET): auto-commit from architect run_id=491	2026-06-09 19:55:00 +03:00
claude-bot	621c1352e1	analyst(ET): auto-commit from analyst run_id=490	2026-06-09 19:55:00 +03:00
Slava	e86ea82501	docs: init ORCH-062 business request	2026-06-09 19:55:00 +03:00
claude-bot	1b03f6b3a7	docs(ORCH-062): staging gate log — SUCCESS (8/10, C9a/C9b infra-waived)	2026-06-09 19:54:36 +03:00
Slava	4d74d981da	Merge pull request 'ORCH-063 — Disk-watchdog: мониторинг диска mva154 + Telegram-алерт при ≥85%' (#98 ) from feature/ORCH-063-infra-mva154-85 into main Some checks failed CI / test (push) Has been cancelled Details	2026-06-09 19:13:33 +03:00
deploy-finalizer	2bd3bb75d4	deploy(ORCH-036): finalize SUCCESS for ORCH-063 All checks were successful CI / test (push) Successful in 30s Details CI / test (pull_request) Successful in 30s Details	2026-06-09 19:08:50 +03:00
claude-bot	efd744f766	tester(ET): auto-commit from tester run_id=488 All checks were successful CI / test (push) Successful in 35s Details CI / test (pull_request) Successful in 32s Details	2026-06-09 19:04:36 +03:00

... 2 3 4 5 6 ...

757 Commits