orchestrator

Author	SHA1	Message	Date
claude-bot	04d5671e1b	tester(ET): auto-commit from tester run_id=690 All checks were successful CI / test (push) Successful in 4m35s Details CI / test (pull_request) Successful in 4m24s Details	2026-06-15 10:42:34 +03:00
claude-bot	1622454d43	reviewer(ET): auto-commit from reviewer run_id=689	2026-06-15 10:42:34 +03:00
claude-bot	651b9af7c3	fix(merge-gate): tolerate re-test infra-timeout + tree-kill spawned pytest Eliminate the false `deploy-staging -> development` rollback that fired when the merge-gate local re-test timed out (infra/resource) on a green CI + tester + staging branch (incident ORCH-109/PR #129: a 516.7s suite blew its 600s budget under CPU starvation from orphaned pytest processes -> timeout misrouted as a code fault -> developer-retry loop -> manual gate). Additive, 5 independent kill-switches, never-raise, self-hosting scope. Untouched byte-for-byte: STAGE_TRANSITIONS, the QG_CHECKS registry, check_branch_mergeable name/semantics, machine-verdict keys, the DB schema. INV-4 (never push/force-push main) and the no-prod-restart rule are preserved. - D1: new stdlib-only leaf src/proc_group.py runs the spawned re-test/coverage pytest in its own process group (start_new_session) and tree-kills the WHOLE group on timeout (os.killpg SIGTERM->grace->SIGKILL); used by merge_gate.retest_branch and coverage_gate.measure_coverage. No orphan leak. Fallback never-break: subprocess_tree_kill_enabled=False / non-POSIX -> the prior subprocess.run. - D2/D3: merge_gate.classify_retest_failure distinguishes timeout/red/lock-busy/ other; an infra timeout routes to _handle_merge_gate_infra_retry (bounded re-queue, task stays on deploy-staging, no rollback / no developer-retry); a red re-test / conflict still rolls back (BR-6). Exhaustion -> one infra alert. - D4: skip the local re-test when the pre-merge rebase was a proven no-op (HEAD already CI/tester/staging-validated); fail-safe runs the re-test on any uncertainty. Flag merge_retest_skip_when_current_enabled. - D5: merge_retest_timeout_s 600 -> 900 + _resolve_retest_timeout validation; reaper_max_running_s invariant preserved without change. - D6: in-process counters + read-only merge_gate block in GET /queue; appended ("ORCH-110","classify_retest_failure","src/merge_gate.py") to MAIN_REGRESSION_MARKERS. Docs (README/internals overview/CLAUDE/CHANGELOG/ .env.example) updated in the same PR. Tests: tests/test_orch110_*.py (TC-01..TC-12, incl. the red-before/green-after incident regression). Full suite green (1988 passed). Refs: ORCH-110 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 10:42:34 +03:00
claude-bot	cf602b4810	architect(ET): auto-commit from architect run_id=687	2026-06-15 10:42:34 +03:00
claude-bot	3a2a5063e0	analyst(ET): auto-commit from analyst run_id=686	2026-06-15 10:42:34 +03:00
Slava	fe130db788	docs: init ORCH-110 business request	2026-06-15 10:42:34 +03:00
Slava	64ba12122b	Merge pull request 'docs(ORCH-110): staging gate log — SUCCESS (8/10, C9a/C9b infra-waived)' (#133 ) from docs/ORCH-110-staging-log into main	2026-06-15 10:41:32 +03:00
claude-bot	e34233f323	docs(ORCH-110): staging gate SUCCESS — 15-staging-log.md All checks were successful CI / test (pull_request) Successful in 3m48s Details 8/10 checks PASS, exit 0. C9a/C9b infra-waived (ORCH-061). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 10:41:12 +03:00
Slava	b6c9d27e9c	Merge pull request 'ORCH-111: watchdog proc_blocking alert on long-lived orphaned test processes' (#130 ) from feature/ORCH-111-bug-watchdog-must-alert-on-lon into main Some checks failed CI / test (push) Has been cancelled Details	2026-06-15 09:14:18 +03:00
deploy-finalizer	da599e8736	deploy(ORCH-036): finalize SUCCESS for ORCH-111 All checks were successful CI / test (push) Successful in 2m41s Details CI / test (pull_request) Successful in 3m12s Details	2026-06-15 09:14:06 +03:00
claude-bot	2d0d654022	chore(ORCH-111): retrigger merge-gate re-test (2nd host CPU-starvation flake) Some checks failed CI / test (push) Has been cancelled Details CI / test (pull_request) Successful in 3m1s Details The deploy-edge merge-gate re-test bounced ORCH-111 back to development again with `3 failed, 1916 passed, 14 errors in 444.79s` — a resource-exhaustion signature, NOT a code defect. This is the SECOND occurrence of the identical flake on this branch (cf. `4311720`). Evidence the branch is sound: - Watchdog-only change (watchdog/** + docker-compose.yml + docs). It touches no src/, no STAGE_TRANSITIONS/QG_CHECKS/check_*, and none of the failing test files (tests/test_stage_engine.py, tests/test_orch109_timeout_model.py). - The failures/errors are OUTSIDE this branch's scope: test_stage_engine.py::TestStagingInfraTolerance tc02/tc13/tc14 and test_orch109_timeout_model.py::TestContractsUnchanged::test_tc12. They pass in isolation (4 passed/5.9s) and were ERRORS (subprocess timeouts), not assertion failures — a systemic host failure, not logic. - No pytest-randomly/-xdist installed -> deterministic order; merge-gate re-test and a local run execute the same order on the same code. - The failed run took 444.79s vs a clean local full run of 204.72s (2x slower): the orphaned-pytest CPU-starvation incident ORCH-111 itself alerts on. By design ORCH-111 only observes; it does not reap (ADR BR-3). Full `pytest tests/` is green locally: 1933 passed, 0 failed, 0 errors in 204.72s (well under the 600s merge_retest budget), and the local run was FASTER than the prior retrigger's (267s) -> host load is currently low. Empty commit to re-run CI + the pipeline now. NOTE (operator): until the orphaned host pytest processes are cleaned up, the merge-gate re-test can keep flaking. ORCH-111 detects them (proc_blocking, default-off) but does not reap them (BR-3) -> manual host cleanup is the durable fix; a follow-up work item for reap/remediation is recommended. Refs: ORCH-111 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 09:13:03 +03:00
claude-bot	d1e8346605	deploy-staging(ORCH-111): staging gate SUCCESS (8/10 PASS, C9a/C9b infra-waived) All checks were successful CI / test (push) Successful in 3m31s Details CI / test (pull_request) Successful in 4m15s Details Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 08:47:44 +03:00
claude-bot	3f16b77d2b	tester(ET): auto-commit from tester run_id=682 All checks were successful CI / test (push) Successful in 3m3s Details CI / test (pull_request) Successful in 3m13s Details	2026-06-15 08:43:55 +03:00
claude-bot	521a72e702	reviewer(ET): auto-commit from reviewer run_id=681 All checks were successful CI / test (push) Successful in 3m48s Details CI / test (pull_request) Successful in 4m48s Details	2026-06-15 08:31:48 +03:00
deploy-finalizer	007a9ad47d	deploy(ORCH-036): finalize FAILED for ORCH-111 All checks were successful CI / test (push) Successful in 3m0s Details CI / test (pull_request) Successful in 3m0s Details	2026-06-15 02:43:37 +03:00
claude-bot	27b85144c2	developer(ET): auto-commit from developer run_id=680 Some checks failed CI / test (push) Has been cancelled Details CI / test (pull_request) Successful in 2m50s Details	2026-06-15 02:43:30 +03:00
claude-bot	4311720c39	chore(ORCH-111): retrigger merge-gate re-test (flaked under host CPU starvation) All checks were successful CI / test (push) Successful in 2m52s Details CI / test (pull_request) Successful in 3m10s Details The merge-gate re-test bounced ORCH-111 to development with 1 failed + 40 errors in 488s — a resource-exhaustion signature, NOT a code defect: - This branch is watchdog-only (watchdog/** + compose); it touches no src/, no STAGE_TRANSITIONS/QG_CHECKS/check_*, and no tests/test_stage_engine.py. - The failing tests (test_stage_engine.py::TestStagingInfraTolerance tc02/tc12/tc13/tc14) are outside this branch's scope, pass in isolation (5 passed/19s), and pass right after the new watchdog tests (105 passed). tc14 takes NO fixtures yet "errored" — a systemic/host failure, not logic. - Host load was ~10-12 on a 4-core box at re-test time (the exact orphaned- pytest CPU-starvation incident ORCH-111 alerts on; ORCH-111 by design only observes, it does not reap — BR-3). Evidence the branch is sound: full `pytest tests/` is green locally (1933 passed, 0 failed, 0 errors in 267s, well under the 600s budget) and Gitea CI on the branch HEAD is green (push + pull_request). Empty commit to re-run the pipeline now that host load has dropped (10.5 -> 6). Refs: ORCH-111 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 02:39:59 +03:00
claude-bot	1fbfb941a9	tester(ET): auto-commit from tester run_id=678 All checks were successful CI / test (push) Successful in 4m22s Details CI / test (pull_request) Successful in 4m27s Details	2026-06-15 02:14:17 +03:00
claude-bot	96701a1a2d	reviewer(ET): auto-commit from reviewer run_id=677	2026-06-15 02:14:17 +03:00
claude-bot	2e73ccf090	feat(watchdog): proc_blocking alert for orphaned long-lived test processes Close the observability gap between agent_hung (only tracked jobs by jobs.pid) and orphaned pytest subprocesses the orchestrator launches itself (merge_gate.retest_branch / coverage_gate.measure_coverage). On a timeout-kill of the agent (-9, ORCH-109) the grand-child pytest reparents onto tini and keeps running for days, starving CPU and failing merge-gate re-test — with no alert. Strictly inside the observer (watchdog/** + the watchdog compose service): - watchdog/collectors/proc.py: stdlib-only /proc scan (under pid: host), read-only, never-raise -> []; pure parsers split from I/O (tested on a fake /proc tree). Never reads /proc/<pid>/environ. - watchdog/signals.py: pure proc_signals builder, per-entity ("proc_blocking", pid), active iff age_s > proc_age_s; actionable RU detail. - watchdog/core.py: opt-in tick block (gated on proc_enabled -> zero overhead / byte-for-byte when off) + RECOVERY synthesis for a vanished process through the existing decide()/AlertState (no new anti-spam logic). - watchdog/config.py: WATCHDOG_PROC_{ENABLED(false),AGE_MIN(60),PATTERNS(pytest), COOLDOWN_S(1800)}; default threshold > max(merge_retest_timeout_s=600, coverage_run_timeout_s=900) so a legit in-flight run never crosses it. - docker-compose.yml: pid: host on orchestrator-watchdog ONLY (read-only privilege). Anti-false-positive and no overlap with agent_hung are by construction (cmdline scope + age threshold), not fragile cross-namespace PID matching. Canon synced: WATCHDOG_PROC_* in .env.watchdog.example <-> .env.example block; documented in LITE_SETUP.md and docs/architecture/README.md (architect). src/*, /metrics, schema_version, STAGE_TRANSITIONS, QG_CHECKS, check_, machine-verdict and the DB schema are untouched; deploy rebuilds only the sidecar, prod orchestrator is not restarted (NFR-3). Tests: tests/watchdog/test_proc_blocking_signal.py (TC-01..TC-06), test_proc_collector.py (/proc parsing), test_tick_proc_blocking_integration.py (TC-07), plus pid: host and proc-config assertions. Full pytest tests/ green (1930). Refs: ORCH-111 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 02:14:17 +03:00
claude-bot	7298f11064	architect(ET): auto-commit from architect run_id=675	2026-06-15 02:14:17 +03:00
claude-bot	44adcba389	analyst(ET): auto-commit from analyst run_id=674	2026-06-15 02:14:17 +03:00
Slava	a0526e1def	docs: init ORCH-111 business request	2026-06-15 02:14:17 +03:00
Slava	6a04d0a336	Merge pull request 'docs(ORCH-111): staging gate log — SUCCESS (8/10, C9a/C9b infra-waived)' (#131 ) from docs/ORCH-111-staging-log into main	2026-06-15 02:13:22 +03:00
claude-bot	afc4e641c0	docs(ORCH-111): staging gate log — SUCCESS (8/10, C9a/C9b infra-waived) All checks were successful CI / test (pull_request) Successful in 3m27s Details Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 02:12:59 +03:00
Slava	fc1d3db505	Merge pull request 'ORCH-109: timeout budgets developer/reviewer + launch-time model telemetry' (#129 ) from feature/ORCH-109-orch-timeout-budgets-launch-ti into main Some checks failed CI / test (push) Has been cancelled Details	2026-06-14 20:47:30 +03:00
deploy-finalizer	f5c93aa3cc	deploy(ORCH-036): finalize SUCCESS for ORCH-109 All checks were successful CI / test (push) Successful in 3m7s Details CI / test (pull_request) Successful in 3m9s Details	2026-06-14 20:47:24 +03:00
claude-bot	2028b6cb14	reviewer(ET): auto-commit from reviewer run_id=671 All checks were successful CI / test (push) Successful in 3m39s Details CI / test (pull_request) Successful in 4m23s Details	2026-06-14 20:10:25 +03:00
claude-bot	8628e609d9	tester(ET): auto-commit from tester run_id=669 All checks were successful CI / test (push) Successful in 4m27s Details CI / test (pull_request) Successful in 4m8s Details	2026-06-14 14:26:11 +03:00
claude-bot	834d8d78b0	reviewer(ET): auto-commit from reviewer run_id=667	2026-06-14 14:26:11 +03:00
claude-bot	bc96977eb7	docs(readme): sync Watchdog section with per-role timeout budgets Front-page README «### Watchdog» по-прежнему утверждал «timeout 30 минут», что стало неверным после ORCH-109 (per-role бюджеты: developer 60м / reviewer 50м / прочие 30м дефолт, `_resolve_timeout`). Приведено в соответствие с docs/architecture/internals.md + добавлен Tier-3 backstop reaper_max_running_s=90м. Закрывает P1-finding reviewer (12-review.md). Docs-only: src/**/STAGE_TRANSITIONS/QG_CHECKS/схема БД не тронуты. Refs: ORCH-109 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 14:26:11 +03:00
claude-bot	b81de1536c	reviewer(ET): auto-commit from reviewer run_id=665	2026-06-14 14:26:11 +03:00
claude-bot	bbcaa93cff	docs(changelog): fix duplicated ORCH-105 entry body When the ORCH-109 entry was inserted above the ORCH-105 entry, the ORCH-105 bullet had its body accidentally duplicated (the same "слайдо-источник …" paragraph appeared twice in one bullet). Restore the ORCH-105 entry to its canonical single-bodied form (byte-for-byte identical to origin/main); the legitimate ORCH-109 additions are untouched. Refs: ORCH-109 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 14:26:11 +03:00
claude-bot	6bd7f9ba84	fix(launcher): raise developer/reviewer timeout budgets + stamp model at launch Two additive, isolated launch-subsystem fixes from incident ORCH-104, without touching STAGE_TRANSITIONS / QG_CHECKS / check_* / machine-verdict / DB schema. D1 — launch-time model stamp: write the resolved model into agent_runs.model in the SAME UPDATE as the effort stamp (ORCH-087), so the model is present from launch, survives a timeout-kill (exit_code=-9), and is visible in-flight in /metrics & /queue. record_usage stays an enrichment (model=COALESCE preserves the launch stamp when the usage JSON model is None). never-raise (isolated try/except). D3/D4 — dedicated per-role budgets: agent_timeout_developer_s=3600 / agent_timeout_reviewer_s=3000 with a deterministic _resolve_timeout ladder (overrides_json[agent] > dedicated role key > agent_timeout_seconds=1800; other roles byte-for-byte). Malformed/non-positive config falls back to the global default + WARNING (never-break). reaper_max_running_s raised 3600 -> 5400 in lockstep to keep the ORCH-065 invariant (5400 > 3600 + 20 = 3620). FR-4 (kill / in-flight visibility) and FR-5 (anti-salvage) are structural in the existing code; pinned here by regression tests (tests/test_orch109_timeout_model.py, TC-01..TC-12). Docs: .env.example, config passport, CHANGELOG, CLAUDE.md (README/internals authored by architect in this branch). Refs: ORCH-109 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 14:26:11 +03:00
claude-bot	b025e1bdf4	architect(ET): auto-commit from architect run_id=662	2026-06-14 14:26:11 +03:00
claude-bot	0bb27b7627	analyst(ET): auto-commit from analyst run_id=661	2026-06-14 14:26:11 +03:00
Slava	aa40d530c5	docs: init ORCH-109 business request	2026-06-14 14:26:11 +03:00
claude-bot	f52790004e	docs(ORCH-109): staging gate log — SUCCESS (8/10, C9a/C9b infra-waived) Canonical staging_check.py (stub) exit 0; all REAL checks green, C9a/C9b waived sandbox-infra (ORCH-061). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 14:24:15 +03:00
Slava	adebb997e6	Merge pull request 'docs(overview): ORCH-105 — слайды Lite-установки и использования через Plane' (#127 ) from feature/ORCH-105- into main Some checks failed CI / test (push) Has been cancelled Details	2026-06-12 08:25:52 +03:00
deploy-finalizer	25ce5a22a9	deploy(ORCH-036): finalize SUCCESS for ORCH-105 All checks were successful CI / test (push) Successful in 57s Details	2026-06-12 08:25:51 +03:00
claude-bot	c04dba0c0a	tester(ET): auto-commit from tester run_id=650 All checks were successful CI / test (push) Successful in 1m0s Details CI / test (pull_request) Successful in 1m3s Details	2026-06-12 08:19:36 +03:00
claude-bot	95df7278e3	reviewer(ET): auto-commit from reviewer run_id=649	2026-06-12 08:19:36 +03:00
claude-bot	d016ac9b4c	docs(overview): ORCH-105 — слайды Lite-установки и использования через Plane Расширяю слайдо-источник презентации docs/overview/presentation.md тремя слайдами в каноне ORCH-011 (16 → 19, сквозная нумерация сохранена): - Слайд «Запуск и ведение задачи через Plane» (вход «To Analyse», статусы = индикация, наблюдение: доска + Telegram-карточка + комментарии). - Слайд «Что решает человек: гейты, авто-режим, отмена» (Approved / Confirm Deploy; autoApprove/autoDeploy/Bug — без пропуска тех. проверок; STOP). - Слайд «Lite-установка скриптами» (два контейнера платформы; только конфиг; gen_secrets.py/onboard_project.py + docker compose up -d; runbook LITE_SETUP.md; одношаговый bootstrap — это смежный Bundled, не Lite). Факты сверены с golden sources (LITE_SETUP.md, tech-pipeline.md, tech-integrations.md, CLAUDE.md). Анти-дрейф — новая функция test_presentation_covers_lite_and_plane_usage_bits в tests/test_system_docs.py (существующие проверки без послаблений). CHANGELOG обновлён. Docs+tests only: src/*/STAGE_TRANSITIONS/QG_CHECKS/check_/схема БД — байт-в-байт; python-pptx не в прод-образе; .pptx в git не коммитится. Ручная сборка .pptx (TC-07) проверена в dev-venv: «Собрано слайдов: 19», exit 0. Refs: ORCH-105 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-12 08:19:36 +03:00
claude-bot	95a09b16b0	architect(ET): auto-commit from architect run_id=647	2026-06-12 08:19:36 +03:00
claude-bot	be5e4e647f	architect(ET): auto-commit from architect run_id=646	2026-06-12 08:19:36 +03:00
claude-bot	05d26a8f3e	analyst(ET): auto-commit from analyst run_id=645	2026-06-12 08:19:36 +03:00
Slava	3f44d51176	docs: init ORCH-105 business request	2026-06-12 08:19:36 +03:00
claude-bot	a8ca4db550	docs(ORCH-105): staging gate log — SUCCESS (8/10, C9a/C9b infra-waived) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-12 08:19:13 +03:00
Slava	4d5e4613e5	Merge pull request 'docs(overview): ORCH-011 — витрина системы docs/overview/ (бизнес+тех, 3 аудитории, презентация)' (#125 ) from feature/ORCH-011- into main Some checks failed CI / test (push) Has been cancelled Details	2026-06-11 09:42:53 +03:00
deploy-finalizer	a6bf5d1b25	deploy(ORCH-036): finalize SUCCESS for ORCH-011 All checks were successful CI / test (push) Successful in 55s Details	2026-06-11 09:42:52 +03:00

1 2 3 4 5 ...

795 Commits