orchestrator

Author	SHA1	Message	Date
deploy-finalizer	ab157324a7	deploy(ORCH-036): finalize SUCCESS for ORCH-126 All checks were successful CI / test (push) Successful in 1m14s Details CI / test (pull_request) Successful in 1m13s Details	2026-06-17 11:56:26 +03:00
staging-runner	aca0466162	staging(ORCH-115): staging gate SUCCESS for ORCH-126 All checks were successful CI / test (push) Successful in 1m20s Details CI / test (pull_request) Successful in 1m12s Details	2026-06-17 11:50:19 +03:00
test-runner	3b8aca03ee	test(ORCH-116): test gate PASS for ORCH-126 Some checks failed CI / test (push) Has been cancelled Details CI / test (pull_request) Successful in 1m18s Details	2026-06-17 11:48:44 +03:00
claude-bot	c8632f4b48	reviewer(ET): auto-commit from reviewer run_id=776 All checks were successful CI / test (push) Successful in 1m18s Details CI / test (pull_request) Successful in 1m14s Details	2026-06-17 11:47:05 +03:00
claude-bot	d7e7a4d817	fix(queue): enforce queued ⇒ no run-ownership invariant (ORCH-126) All checks were successful CI / test (push) Successful in 1m14s Details CI / test (pull_request) Successful in 1m15s Details Queued analyst-jobs hung forever even with ORCH_SERIAL_GATE_ENABLED=false (incident ORCH-124/125, job 2286: queued + run_id=759/760 + pid=35/42 + started_at=NULL — physically impossible). No path returning a job to 'queued' reset its run-ownership (run_id/pid); after a container restart a reused pid made pid_alive(stale)=True, so the job-reaper Tier-1 saw a phantom 'running' and at max_concurrency=1 wedged the claim of the whole shared queue. Enforce the invariant `status='queued' ⇒ run_id IS NULL AND pid IS NULL AND started_at IS NULL` on existing columns (no schema change): - D1 forward-cleanup: requeue_running_jobs / mark_job('queued') / mark_job_transient / reap_running_job('queued') reset run_id=NULL, pid=NULL in the same UPDATE that clears started_at; atomic status-guards preserved. - D2 clean claim: claim_next_job resets pid/run_id on the queued->running flip (defense-in-depth) so the row carries pid IS NULL until _spawn stamps it. - D4 self-heal + observability: db.find_impossible_queued_jobs / sanitize_impossible_queued run at startup (main.lifespan) and on each reaper tick (JobReaper.sanitize_impossible_queued_once, never-raise); counter impossible_queued_total in the GET /queue reaper block. Kill-switch ORCH_IMPOSSIBLE_QUEUED_SANITIZE_ENABLED (default on; gates only the D4 sweep). - D5: reaper Tier-1 unchanged — the fix restores its precondition (pid reflects THIS run). Marked invariants ORCH-065/113/114/099 preserved. Tests: tests/test_orch126_queued_stale_run.py (TC-01 mandatory regression red->green; TC-02..TC-10). Full pytest tests/ -q green (2189 passed). Docs: internals.md (run-ownership invariant section), .env.example, CHANGELOG; cross-cutting adr-0052. Refs: ORCH-126 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 11:39:26 +03:00
claude-bot	3fb7bd6e4c	architect(ET): auto-commit from architect run_id=774 All checks were successful CI / test (push) Successful in 1m12s Details	2026-06-17 11:22:30 +03:00
claude-bot	453c5b7d04	analyst(ET): auto-commit from analyst run_id=773 All checks were successful CI / test (push) Successful in 1m12s Details	2026-06-17 11:07:33 +03:00
Slava	a5f691fc96	docs: init ORCH-126 business request All checks were successful CI / test (push) Successful in 1m17s Details	2026-06-17 11:00:16 +03:00
deploy-finalizer	895fb3ab44	deploy(ORCH-036): finalize SUCCESS for ORCH-124 All checks were successful CI / test (push) Successful in 1m13s Details	2026-06-16 22:46:01 +03:00
staging-runner	9709aa2267	staging(ORCH-115): staging gate SUCCESS for ORCH-124 All checks were successful CI / test (push) Successful in 1m19s Details CI / test (pull_request) Successful in 1m12s Details	2026-06-16 22:35:07 +03:00
test-runner	b61a4eb092	test(ORCH-116): test gate PASS for ORCH-124 Some checks failed CI / test (push) Has been cancelled Details CI / test (pull_request) Successful in 1m16s Details	2026-06-16 22:33:32 +03:00
claude-bot	be8ddfcd57	reviewer(ET): auto-commit from reviewer run_id=772 All checks were successful CI / test (push) Successful in 1m18s Details CI / test (pull_request) Successful in 1m13s Details	2026-06-16 22:31:49 +03:00
claude-bot	58e5dfe55d	docs(serial-gate): sync system showcase + clean stray tags (ORCH-124) All checks were successful CI / test (push) Successful in 1m15s Details CI / test (pull_request) Successful in 1m12s Details Addresses reviewer REQUEST_CHANGES (run 768) on ORCH-124 — docs-only, no src/tests touched, fix scope unchanged. P1: update docs/overview/ showcase for the new serial-gate "pause without blocking" axis (changed task-routing functionality, ORCH-011/ORCH-079): - tech-pipeline.md: FIFO exception "pause without blocking" next to freeze - tech-data-model.md: durable signal tasks.paused_at on the Task row - tech-observability.md: paused/reason in serial_gate GET /queue block + operator endpoints POST /serial-gate/pause\|resume P2: strip leaked tool-call trailing tags (</content>/</invoke>) from 4 golden-source docs of this PR (06-adr/ADR-001, adr-0051, 08-data-requirements.md, 10-tech-risks.md). CHANGELOG "Доки" bullet extended accordingly. Full suite green (2178 passed); test_system_docs.py green (machine-checked showcase facts intact). Refs: ORCH-124 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 21:50:45 +03:00
claude-bot	ec932264db	reviewer(ET): auto-commit from reviewer run_id=768 All checks were successful CI / test (push) Successful in 1m16s Details CI / test (pull_request) Successful in 1m12s Details	2026-06-16 20:24:55 +03:00
test-runner	c7336dd9ea	test(ORCH-116): test gate FAIL for ORCH-124 All checks were successful CI / test (push) Successful in 1m23s Details CI / test (pull_request) Successful in 1m15s Details	2026-06-16 19:51:06 +03:00
claude-bot	7ac83a9731	reviewer(ET): auto-commit from reviewer run_id=766 All checks were successful CI / test (push) Successful in 1m23s Details CI / test (pull_request) Successful in 1m15s Details	2026-06-16 19:49:23 +03:00
claude-bot	de4f067655	architect(ET): auto-commit from architect run_id=764 All checks were successful CI / test (push) Successful in 1m12s Details	2026-06-16 19:17:43 +03:00
claude-bot	fef5ba15d5	analyst(ET): auto-commit from analyst run_id=763 All checks were successful CI / test (push) Successful in 1m9s Details	2026-06-16 17:56:23 +03:00
Slava	569abee5f2	docs: init ORCH-124 business request All checks were successful CI / test (push) Successful in 1m14s Details	2026-06-16 17:24:43 +03:00
deploy-finalizer	274fbd77fc	deploy(ORCH-036): finalize SUCCESS for ORCH-116 All checks were successful CI / test (push) Successful in 1m15s Details CI / test (pull_request) Successful in 1m13s Details	2026-06-16 10:27:22 +03:00
staging-runner	b212afbbd0	staging(ORCH-115): staging gate SUCCESS for ORCH-116 All checks were successful CI / test (push) Successful in 1m20s Details CI / test (pull_request) Successful in 1m16s Details	2026-06-16 10:21:36 +03:00
claude-bot	3270647d86	tester(ET): auto-commit from tester run_id=758 All checks were successful CI / test (push) Successful in 1m16s Details CI / test (pull_request) Successful in 1m16s Details	2026-06-16 10:19:58 +03:00
claude-bot	e12b03b235	reviewer(ET): auto-commit from reviewer run_id=757 All checks were successful CI / test (push) Successful in 1m16s Details CI / test (pull_request) Successful in 1m23s Details	2026-06-16 10:11:33 +03:00
claude-bot	c470576202	developer(ET): auto-commit from developer run_id=756 All checks were successful CI / test (push) Successful in 1m14s Details CI / test (pull_request) Successful in 1m11s Details	2026-06-16 09:59:29 +03:00
claude-bot	74fccf3a09	fix(testing): reconcile ORCH-116 with merged ORCH-123 (ADR renumber, CHANGELOG, env parity) All checks were successful CI / test (push) Successful in 1m12s Details CI / test (pull_request) Successful in 1m12s Details Recovery from the merge-gate rebase-conflict bounce. The feature branch was rebased onto origin/main (which had merged ORCH-123). The single conflicting hunk — docs/architecture/README.md — was resolved during the rebase: kept ORCH-123's host-side staging-runner line AND the ORCH-116 test-runner bullet. This follow-up commit reconciles the remainder: - Renumber the global sweeping ADR adr-0049 -> adr-0050. ORCH-123 took adr-0049 (adr-0049-host-side-docker-execution-boundary.md) on main while ORCH-116 was in flight, so ORCH-116 yields to the merged task and moves to the next free number. Mechanical cross-reference reconciliation only (git mv + title + every test-runner reference across README/internals/CLAUDE/CHANGELOG/config.py + 06-adr/ADR-001 + 12-review). Main's adr-0049 host-side references are left byte-for-byte untouched. No design/verdict content was altered. - Restore the ORCH-116 CHANGELOG entry that the CHANGELOG auto-merge silently dropped (both ORCH-123 and ORCH-116 inserted at the same [Unreleased] anchor; git kept only ORCH-123). - Add the missing ORCH_TEST_RUNNER_* keys to .env.example (parity with the ORCH_STAGING_RUNNER_* block; ORCH-101 canon of start keys). Refs: ORCH-116 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 09:56:47 +03:00
staging-runner	4b14b010de	staging(ORCH-115): staging gate SUCCESS for ORCH-116	2026-06-16 09:37:40 +03:00
claude-bot	4c7b2345b7	reviewer(ET): auto-commit from reviewer run_id=754	2026-06-16 09:37:40 +03:00
claude-bot	a3ea56c751	reviewer(ET): auto-commit from reviewer run_id=743	2026-06-16 09:37:40 +03:00
staging-runner	024e1bfceb	staging(ORCH-115): staging gate FAILED for ORCH-116	2026-06-16 09:37:40 +03:00
claude-bot	b1e00c0a7d	tester(ET): auto-commit from tester run_id=742	2026-06-16 09:37:40 +03:00
claude-bot	e386130fd1	reviewer(ET): auto-commit from reviewer run_id=741	2026-06-16 09:37:40 +03:00
claude-bot	9d16ee473a	feat(testing): deterministic test-runner replacing LLM tester on the testing stage (ORCH-116) Second realised slice of the determinization-roadmap (ORCH-118 A5, needs-hybrid-fallback): on the `testing` stage for the self-hosting `orchestrator` repo the LLM `tester` agent is replaced by a deterministic test-runner (src/test_runner.py), intercepted in launch_job BEFORE _spawn (deploy-finalizer / post-deploy-monitor / staging-runner precedent). It runs the regression `python -m pytest <target>` in the task worktree via proc_group (tree-kill) + an optional read-only smoke (/health, /status, /queue + serial_gate), maps the exit-code -> result: PASS\|FAIL via the existing self_deploy.map_exit_code_to_status contract, writes 13-test-report.md and initiates the EXISTING check_tests_passed gate exactly as a finished LLM-tester. Invariant (NFR-1): only the producer changes — the artifact contract (13-test-report.md / result:), the gate check_tests_passed / _parse_tests_verdict, STAGE_TRANSITIONS and the DB schema are byte-for-byte UNCHANGED. Additive, under a kill-switch (test_runner_enabled), never-raise, fail-closed, self-hosting scope, two-level outcome (tool-error DEFER, anti ORCH-110), hybrid (LLM strictly off-control-path). 52c-`status:` is aligned with the verdict (D6.1) so the three-field _parse_tests_verdict never false-negatives a PASS. Docs (ORCH-118 NFR-6, atomic with code): llm-call-sites.md (A5 implemented), llm-determinization-roadmap.md (rank 2 implemented), llm-usage-policy.md, README/internals/overview, tester.md, CLAUDE.md, CHANGELOG.md. Coverage: tests/test_orch116_test_runner.py (TC-01..TC-14); LLM anti-drift tests green. Full suite: 2137 passed. Refs: ORCH-116 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 09:37:40 +03:00
claude-bot	74f53b522a	architect(ET): auto-commit from architect run_id=739	2026-06-16 09:36:50 +03:00
claude-bot	9e543551aa	analyst(ET): auto-commit from analyst run_id=738	2026-06-16 09:36:50 +03:00
Slava	c081a5b6ff	docs: init ORCH-116 business request	2026-06-16 09:36:50 +03:00
deploy-finalizer	031130c7f0	deploy(ORCH-036): finalize SUCCESS for ORCH-123 All checks were successful CI / test (push) Successful in 1m8s Details	2026-06-16 09:03:29 +03:00
claude-bot	12e3a9e4f3	docs(ORCH-123): staging gate log — staging_status SUCCESS (8/10, C9a/C9b infra-waived) All checks were successful CI / test (push) Successful in 1m15s Details CI / test (pull_request) Successful in 1m14s Details Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 08:57:48 +03:00
claude-bot	2b71f3887f	tester(ET): auto-commit from tester run_id=752 All checks were successful CI / test (push) Successful in 1m7s Details CI / test (pull_request) Successful in 1m8s Details	2026-06-16 08:55:07 +03:00
claude-bot	820e534e77	reviewer(ET): auto-commit from reviewer run_id=751 All checks were successful CI / test (push) Successful in 1m14s Details CI / test (pull_request) Successful in 1m12s Details	2026-06-16 08:52:22 +03:00
claude-bot	cc41dd849c	fix(staging): host-side ssh execution + env classification for staging-runner (ORCH-123) All checks were successful CI / test (push) Successful in 1m8s Details CI / test (pull_request) Successful in 1m8s Details The ORCH-115 deterministic staging-runner ran `docker exec` FROM INSIDE the prod `orchestrator` container, which ships only `openssh-client git curl` — no `docker` CLI (Dockerfile:11). `Popen(["docker", ...])` hit FileNotFoundError -> a PERMANENT environment defect that was mis-routed as a code-fail rollback `deploy-staging -> development` (burning developer-retries). Incident ORCH-116: every self-hosting task reaching deploy-staging was doomed to a false rollback. Fix (adr-0049, additive, flag-gated, never-raise, self-hosting scope; the gate / artifact contract / STAGE_TRANSITIONS / DB schema are byte-for-byte unchanged): - D1: build_staging_command() wraps the SAME `docker exec ... staging_check.py ... --mode stub` in `ssh <user@host> '<...>'` so it runs HOST-SIDE over the existing trusted ssh channel (mirror self_deploy / image_freshness). New flag staging_runner_exec_host_side (default True). No docker CLI/SDK added to the image, docker.sock not used in-container (D2 security). - D3: three-way classify_staging_outcome (suite-ran / permanent-env / transient-infra), disambiguating the exit=1 collision by scanning stderr. - D4: invariant "infra != code-fail" — permanent-env / exhausted transient-infra end in an infra-HOLD (no rollback, no developer-retry), NOT a false FAILED rollback (supersedes ORCH-115 D5). A really-executed failing suite still rolls back (anti-over-tolerance). R-2 verified: a held deploy-staging task is not rolled back by the reconciler. - D5: prod-like preflight() of the host-side channel at startup (main.lifespan, best-effort, never blocks). - D8: snapshot adds permanent_env / exec_host_side / preflight. Docs (golden source, same PR): INFRA.md execution-boundary section, architecture/README.md, CLAUDE.md, CHANGELOG.md, .env.example. Tests: tests/test_orch123_staging_runner_exec.py (TC-01 mandatory regression red->green; TC-02..TC-14 + R-2). ORCH-115 anti-drift green (3 tests updated for the D1/D4/D8 supersession). Full suite: 2131 passed. Refs: ORCH-123 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 08:42:36 +03:00
claude-bot	2a47744c9d	architect(ET): auto-commit from architect run_id=748 All checks were successful CI / test (push) Successful in 1m8s Details	2026-06-16 08:07:55 +03:00
claude-bot	3865b14a1c	analyst(ET): auto-commit from analyst run_id=747 All checks were successful CI / test (push) Successful in 1m12s Details	2026-06-16 07:55:43 +03:00
deploy-finalizer	17312ac86f	deploy(ORCH-036): finalize SUCCESS for ORCH-115 All checks were successful CI / test (push) Successful in 1m8s Details	2026-06-16 02:21:02 +03:00
claude-bot	a975591a3c	deploy-staging(ORCH-115): staging gate SUCCESS (8/10 PASS, C9a/C9b infra-waived) All checks were successful CI / test (push) Successful in 1m12s Details CI / test (pull_request) Successful in 1m14s Details	2026-06-16 02:15:21 +03:00
claude-bot	aed3ba0cbb	tester(ET): auto-commit from tester run_id=736 All checks were successful CI / test (push) Successful in 1m10s Details CI / test (pull_request) Successful in 1m7s Details	2026-06-16 02:11:37 +03:00
claude-bot	e3ce01b824	reviewer(ET): auto-commit from reviewer run_id=735 All checks were successful CI / test (push) Successful in 1m16s Details CI / test (pull_request) Successful in 1m13s Details	2026-06-16 02:08:18 +03:00
claude-bot	b50cf1dd08	feat(staging): deterministic staging-runner replacing LLM deployer on deploy-staging (ORCH-115) All checks were successful CI / test (push) Successful in 1m8s Details CI / test (pull_request) Successful in 1m8s Details Replace the LLM `deployer` agent on the `deploy-staging` stage (self-hosting orchestrator) with a deterministic staging-runner intercepted in launch_job BEFORE _spawn (the deploy-finalizer / post-deploy-monitor reserved-agent precedent). The runner executes the SAME staging suite, maps the exit-code to `staging_status:` via the existing self_deploy.map_exit_code_to_status contract, writes 15-staging-log.md, and initiates the UNCHANGED check_staging_status gate exactly as a finished LLM-deployer would. Invariant (NFR-1): this replaces only the producer of the artifact — the artifact contract, the gate / _parse_staging_status / check_staging_status name, STAGE_TRANSITIONS, the machine-verdict key `staging_status:` and the DB schema are byte-for-byte unchanged. Additive, under a kill-switch + repo-scope CSV, never-raise, fail-safe back to the LLM path. Two-level outcome (D5, anti ORCH-110): suite executed -> verdict -> advance (FAILED -> the existing deploy-staging -> development rollback + developer-retry, same as a FAILED LLM verdict); tool-error (suite did not execute) -> bounded DEFER -> fail-closed FAILED + alert on exhaustion (infra != code fault; never a silent advance / false green). First implemented slice of the LLM determinization roadmap (ORCH-118 A6, replace-deterministic-now). - New leaf src/staging_runner.py (never-raise; proc_group tree-kill + timeout) - launch_job intercept + _run_staging_runner_job (mirror _run_deploy_finalizer_job) - config: ORCH_STAGING_RUNNER_* keys (enabled/repos/timeout/infra-retry budget) - GET /queue staging_runner observability block - docs: llm-call-sites/roadmap/usage-policy (A6 implemented; machine blocks + single-transport invariant intact), deployer.md (LLM branch -> fallback), CLAUDE.md, CHANGELOG.md, overview (tech-pipeline/tech-agents/tech-quality-security), .env.example - tests/test_orch115_staging_runner.py (TC-01..TC-13); LLM anti-drift green (TC-14) Refs: ORCH-115 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-16 01:59:43 +03:00
claude-bot	f120e4bd8f	architect(ET): auto-commit from architect run_id=733 All checks were successful CI / test (push) Successful in 1m9s Details	2026-06-16 01:37:27 +03:00
claude-bot	ac203c0ccf	analyst(ET): auto-commit from analyst run_id=732 All checks were successful CI / test (push) Successful in 1m6s Details	2026-06-16 01:11:35 +03:00
Slava	a353a72f20	docs: init ORCH-115 business request All checks were successful CI / test (push) Successful in 1m8s Details	2026-06-16 01:02:37 +03:00

1 2 3 4 5 ...

658 Commits