orchestrator

Author	SHA1	Message	Date
post-deploy-monitor	0cbb7ef0bb	docs(ORCH-021): post-deploy HEALTHY/NONE for ORCH-022 All checks were successful CI / test (push) Successful in 18s Details CI / test (pull_request) Successful in 18s Details	2026-06-07 19:24:29 +00:00
deploy-finalizer	e07ee9e574	deploy(ORCH-036): finalize SUCCESS for ORCH-022 All checks were successful CI / test (push) Successful in 17s Details CI / test (pull_request) Successful in 17s Details	2026-06-07 18:42:29 +00:00
claude-bot	8cdb9f194a	tester(ET): auto-commit from tester run_id=331 All checks were successful CI / test (push) Successful in 19s Details CI / test (pull_request) Successful in 19s Details	2026-06-07 18:04:50 +00:00
claude-bot	cb3bdd9c7a	reviewer(ET): auto-commit from reviewer run_id=330	2026-06-07 18:04:50 +00:00
Dev	04233cb3c8	test(ORCH-022): isolate TC-17 worktree under tmp_path (fix CI PermissionError on /repos/_wt) TC-17 seeded 17-security-report.md via get_worktree_path() which resolves to settings.worktrees_dir (default /repos/_wt) -> the test wrote into the real shared host worktree path. In CI that dir is owned by another user -> PermissionError. Monkeypatch git_worktree.settings.worktrees_dir to tmp_path/_wt (same pattern as test_git_worktree.py / test_merge_gate.py). Prod logic untouched.	2026-06-07 18:04:50 +00:00
stream	85ecf50926	ci: re-run after gitea restart (ORCH-022 flaky CI)	2026-06-07 18:04:50 +00:00
claude-bot	30b6187c73	feat(security): security-gate (gitleaks secret-scan + pip-audit) before merge Add a deterministic (no-LLM) security sub-gate on the deploy-staging -> deploy edge, run FIRST (before merge-gate ORCH-043 and image-freshness ORCH-058) so it fails cheaply before any expensive rebase/rebuild, and scans origin/main..HEAD before rebase so a task is never blamed for a CVE introduced by an updated main. Why: the autonomous pipeline merged branches into main with no check for a leaked secret or a vulnerable dependency. For the self-hosting orchestrator (one shared prod instance serving every project from a shared DB) a single leak/CVE landed in the prod of all projects (CLAUDE.md self-hosting, section 8). - New leaf src/security_gate.py (never-raise): gitleaks (offline, fail-closed on tool error => secrets guarantee is unconditional) + pip-audit (best-effort; unreachable CVE feed degrades fail-open + loud warning by default, strict via security_dep_audit_fail_closed). Verdict lives ONLY in 17-security-report.md YAML frontmatter (write -> read-back single source of truth); FAIL is authoritative; missing/broken frontmatter => fail-closed. - check_security_gate thin wrapper registered in QG_CHECKS (lazy import, no cycle). - _handle_security_gate wired FIRST in advance_stage deploy-staging block: FAIL -> rollback to development + developer-retry (cap MAX_DEVELOPER_RETRIES); task_desc carries verbatim findings (ORCH-046 pattern). No merge-lease release (runs before lease acquire). Self-hosting safe: only reads/scans/writes, never deploys. - Conditional rollout (security_gate_enabled + security_gate_repos; empty scope -> self-hosting only). 6 new ORCH_SECURITY_* settings. - Infra: pinned gitleaks Go binary in Dockerfile (+curl/ca-certificates), pip-audit in requirements.txt, versioned .gitleaks.toml at repo root. - STAGE_TRANSITIONS and DB schema unchanged. Docs: docs/architecture/README.md (marked realized), CLAUDE.md (artifact 17), CHANGELOG.md. Tests: test_security_gate.py, test_qg_security.py, test_stage_engine_security_gate.py + updated registry/edge snapshots. Refs: ORCH-022 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 18:04:50 +00:00
claude-bot	44db94e462	architect(ET): auto-commit from architect run_id=327	2026-06-07 18:04:50 +00:00
claude-bot	4f24f96169	analyst(ET): auto-commit from analyst run_id=326	2026-06-07 18:04:50 +00:00
Slava	2d20da295e	docs: init ORCH-022 business request	2026-06-07 18:04:50 +00:00
claude-bot	67e98b8296	docs(ORCH-022): staging gate log — staging_status SUCCESS Some checks failed CI / test (push) Has been cancelled Details Canonical staging_check.py run inside orchestrator-staging: 8/10 PASS, all REAL checks green, C9a/C9b infra-waived (ORCH-061), exit 0 → advance. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 18:04:35 +00:00
stream	cad5e98892	docs(history): lessons 2026-06-07 — autonomy closure (5 задач: ORCH-58/60/61/21/65 в прод) Some checks failed CI / test (push) Has been cancelled Details	2026-06-07 19:24:49 +03:00
Slava	bb03350ec9	Merge pull request 'feat(reaper): job-reaper + stale merge-lease reclaim + idempotent merge finalization (ORCH-065)' (#66 ) from feature/ORCH-065-bug-zombie-jobs-merge-lease-ru into main	2026-06-07 19:16:23 +03:00
claude-bot	930e65298c	tester(ET): auto-commit from tester run_id=324 All checks were successful CI / test (push) Successful in 20s Details CI / test (pull_request) Successful in 18s Details	2026-06-07 16:14:45 +00:00
claude-bot	cba67a4270	reviewer(ET): auto-commit from reviewer run_id=323	2026-06-07 16:14:45 +00:00
claude-bot	720c31393a	fix(reaper): Tier-2 finalization grace + claim-before-act (no dup advance) Tier-2 reaped a LIVE, still-finalizing monitor: _monitor_agent writes agent_runs.exit_code FIRST, then does git push / PR / Plane comments before _finalize_job, and the agent pid is already dead in that window — so the old "exit_code recorded -> reap now" had no grace and could race a healthy job. Worse, _reap_known_outcome ran the advance (advance_stage -> enqueue_job) BEFORE the atomic claim, so a reaper that lost the race had already enqueued the next stage (dup advance / dup enqueue), violating ADR-001 Р-1. Fix: - Tier-2 grace: reap only once agent_runs.exit_code has been recorded for >= reaper_finalize_grace_s (new setting, default 300s; > max finalization window). A live finalizing monitor is never reaped (FR-1.3/AC-3). New finished_age_s column computed in get_running_jobs. - claim-before-act for exit0: evaluate the canonical QG READ-ONLY (the reconciler pattern) to choose the terminal status, then atomically claim 'done' FIRST; only the claim winner runs the advance. A loser performs no side effects -> no dup advance / dup enqueue. Docs (golden source) updated in the same change: ADR-001, global adr-0011, README, internals, .env.example, CHANGELOG (also fixes the P3 broken adr-0011 link). New tests cover the grace window, lost-claim no-side-effects, and the already-advanced idempotent path. Refs: ORCH-065 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 16:14:45 +00:00
claude-bot	9b7c855df3	reviewer(ET): auto-commit from reviewer run_id=321	2026-06-07 16:14:45 +00:00
claude-bot	a6b444c356	fix(merge): wire pr_already_merged guard into deployer merge path (idempotent re-merge) The pr_already_merged guard was defined + unit-tested but consulted by zero production code, while ADR-001 Р-3 / README / CHANGELOG claimed the merge path consults it before a repeat merge (reviewer P1, ORCH-065 attempt 2/3). The actual merge actor is the LLM deployer agent (it merges the feature PR at the start of the `deploy` stage), so on a reaper re-drive of an already-merged PR the deployer would blindly re-merge → Gitea error → false БАГ-8 rollback; AC-11 ("no second merge") was not met deterministically. Wire the guard at the real consultation point — the deployer prompt — so it runs merge_gate.pr_already_merged before any (re-)merge and no-ops when the PR is already merged. check_branch_mergeable is left untouched (AC-13: check_* behaviour unchanged; it runs on the first deploy-staging→deploy edge, not on a deploy-stage re-drive where the second-merge risk lives). - .openclaw/agents/deployer.md: idempotent pre-merge guard step + general rule. - src/merge_gate.py: docstring names the deployer-prompt consultation point. - docs/architecture/README.md, CHANGELOG.md: state the consultation point so golden-source matches implementation. - tests/test_merge_gate.py: regression test asserting the deployer prompt wires the guard (so it can't silently become dead code again). pytest tests/ -q: 743 passed. Refs: ORCH-065 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 16:14:45 +00:00
claude-bot	dbf14e3d5a	reviewer(ET): auto-commit from reviewer run_id=319	2026-06-07 16:14:45 +00:00
claude-bot	4bebb921ff	feat(reaper): job-reaper + stale merge-lease reclaim + idempotent merge finalization Closes the "zombie jobs" incident class: job status was set only inside the live launcher process, so a process death left jobs.status='running' forever; at max_concurrency=1 one zombie blocked ALL projects' queue (self-hosting risk). Adds a background daemon (src/job_reaper.py) with three-tier liveness (dead-pid streak / known exit_code / max-running backstop) whose only mutating write is an atomic terminal flip guarded by WHERE status='running' (no double-process). For exit0 the canonical QG is the source of truth via gate-driven advance, not "exit0". Also proactively reclaims stale merge-lease (dead pid OR TTL) via file delete only (no git ops), and makes merge finalization idempotent (pr_already_merged guard + up-to-date short-circuit on re-drive). New jobs.pid column via idempotent _ensure_column (no migration); pid stamped in launcher._spawn after Popen. Reaper start/stop in lifespan; "reaper" snapshot in GET /queue. Kill-switches: ORCH_REAPER_ENABLED, ORCH_REAPER_INTERVAL_S, ORCH_REAPER_DEAD_TICKS, ORCH_REAPER_MAX_RUNNING_S, ORCH_LEASE_RECLAIM_ENABLED. Invariants unchanged (AC-13): STAGE_TRANSITIONS, QG_CHECKS registry, check_branch_mergeable signature/behaviour, BUG-8 rollback, hook exit codes. restart-safe, never-raise per unit of background work. Docs: docs/architecture/README.md, CHANGELOG.md, .env.example. Tests: tests/test_job_reaper.py, tests/test_merge_lease_reclaim.py, tests/test_merge_gate.py (TC-16), tests/test_merge_gate_race.py (TC-17), tests/test_queue.py, tests/test_config.py (TC-19/TC-20). 742 passed. Refs: ORCH-065 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 16:14:45 +00:00
claude-bot	9f846b5a50	architect(ET): auto-commit from architect run_id=317	2026-06-07 16:14:45 +00:00
claude-bot	b760b24a48	analyst(ET): auto-commit from analyst run_id=316	2026-06-07 16:14:45 +00:00
Slava	f0ac9d5562	docs: init ORCH-065 business request	2026-06-07 16:14:45 +00:00
claude-bot	987ea810bf	docs(ORCH-065): staging gate SUCCESS — REAL green, C9a/C9b infra-waived Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 16:14:22 +00:00
Slava	f85e449d80	Merge pull request 'feat(post-deploy): post-deploy prod monitoring + auto-rollback (ORCH-021)' (#65 ) from feature/ORCH-021-post-deploy-rollback into main Some checks failed CI / test (push) Has been cancelled Details	2026-06-07 17:42:27 +03:00
claude-bot	1c89ac9df9	tester(ET): auto-commit from tester run_id=313 All checks were successful CI / test (push) Successful in 19s Details CI / test (pull_request) Successful in 17s Details	2026-06-07 14:40:06 +00:00
claude-bot	03d899812c	reviewer(ET): auto-commit from reviewer run_id=312	2026-06-07 14:40:06 +00:00
claude-bot	b9bcdc1545	fix(deploy): drop COPY data/ from Dockerfile so worktree-context staging build succeeds The ORCH-058 staging rebuild (check_staging_image_fresh) builds the image with the task git-worktree as the docker build context. A fresh worktree holds only tracked files, but the Dockerfile did `COPY data/ ./data/` — and `data/` (the SQLite dir) is gitignored, so it is absent from that context: `docker build` failed with exit 1 ("BUILD-STAGING: docker build failed - aborting"), bouncing the task off deploy-staging back to development in a loop. The COPY was dead weight regardless: `data/` is always supplied at runtime as a bind-mount volume (./data:/app/data, see docker-compose.yml) which shadows anything baked into the image. Replace it with `RUN mkdir -p /app/data` so the mountpoint exists without depending on the build context. Regression guard: test_tc08b_dockerfile_does_not_copy_gitignored_data_dir forbids COPY of any gitignored path (the worktree-context invariant). Refs: ORCH-021 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 14:40:06 +00:00
claude-bot	b04fae748e	tester(ET): auto-commit from tester run_id=309	2026-06-07 14:40:06 +00:00
claude-bot	fbfcd84b16	reviewer(ET): auto-commit from reviewer run_id=308	2026-06-07 14:40:06 +00:00
claude-bot	2f4c553fd8	feat(post-deploy): post-deploy prod monitoring + degradation reaction (ORCH-021) Extend pipeline responsibility past deploy->done: after the terminal transition for an applicable repo, arm a ~15min observation window that probes prod and reacts to a degradation the restart-time health-check missed ("green deploy, red prod"). - src/post_deploy.py: new leaf module (config + lazy qg/db only). Sentinel-file restart-safe state (.post-deploy-state-<repo>/<wi>/), no DB migration. probe_signals/classify/decide_action/run_rollback, all never-raise. - Reserved-agent job `post-deploy-monitor` (no-LLM, Variant B, calque of deploy-finalizer): self-requeues each tick via enqueue_job. - Deterministic classify: DEGRADED iff >= fail_threshold consecutive health failures OR window 5xx ratio > 5xx_threshold; fail-safe HEALTHY. - Self-hosting invariant (BR-5/AC-8): a tick NEVER restarts the prod orchestrator container -> orchestrator is ALWAYS ALERT_ONLY. - Conditionality (ORCH-35/36/43/58): kill-switch + CSV repos, empty -> self-hosting only. - QG_CHECKS / STAGE_TRANSITIONS / schema unchanged (AC-12). - Docs: CHANGELOG, CLAUDE artefact list (16-post-deploy-log.md), architecture README, .env.example (ORCH_POST_DEPLOY_*). Refs: ORCH-021 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 14:40:06 +00:00
claude-bot	2bdba532d5	architect(ET): auto-commit from architect run_id=306	2026-06-07 14:40:06 +00:00
claude-bot	db83b89467	analyst(ET): auto-commit from analyst run_id=305	2026-06-07 14:40:06 +00:00
Slava	961c5e9eee	docs: init ORCH-021 business request	2026-06-07 14:40:06 +00:00
claude-bot	84a6f61ba8	docs(ORCH-021): staging gate SUCCESS — refresh 15-staging-log timestamp Re-ran staging_check inside orchestrator-staging (exit 0); all REAL checks green, C9a/C9b waived per ORCH-061. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 14:39:48 +00:00
claude-bot	1af356a343	docs(ORCH-021): staging gate SUCCESS — REAL green, C9a/C9b infra-waived Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 14:25:00 +00:00
Slava	e18947d2d9	Merge pull request 'fix(staging): tolerate sandbox-infra-only FAILs (C9a/C9b) in deploy-staging verdict (ORCH-061)' (#62 ) from feature/ORCH-061-bug-deploy-staging-development into main Some checks failed CI / test (push) Has been cancelled Details	2026-06-07 16:30:07 +03:00
Slava	0ec34d10fc	Merge pull request 'docs(ORCH-061): staging gate SUCCESS — C9a/C9b infra-waived' (#63 ) from docs/ORCH-061-staging-log into main	2026-06-07 16:29:55 +03:00
claude-bot	bf6a0c095a	docs(ORCH-061): staging gate SUCCESS — REAL green, C9a/C9b infra-waived All checks were successful CI / test (pull_request) Successful in 16s Details Validated ORCH-061 infra-tolerance against live staging (8501): all REAL checks pass, only sandbox-infra C9a/C9b fail and are waived → exit 0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 13:29:33 +00:00
claude-bot	39769bdf23	tester(ET): auto-commit from tester run_id=300 All checks were successful CI / test (push) Successful in 17s Details CI / test (pull_request) Successful in 17s Details	2026-06-07 13:21:17 +00:00
claude-bot	de47737f4f	reviewer(ET): auto-commit from reviewer run_id=299 All checks were successful CI / test (push) Successful in 16s Details CI / test (pull_request) Successful in 15s Details	2026-06-07 13:18:47 +00:00
stream	e3f7c1c272	ci: re-trigger after gitea restart (ORCH-061) All checks were successful CI / test (push) Successful in 16s Details CI / test (pull_request) Successful in 17s Details	2026-06-07 13:14:14 +00:00
stream	32a7aa8c6b	ci: trigger re-run after host disk cleanup (ORCH-061)	2026-06-07 13:08:38 +00:00
stream	fe8586ed78	ci: re-run after host disk cleanup (ORCH-061)	2026-06-07 13:04:38 +00:00
claude-bot	9070489968	fix(staging): tolerate sandbox-infra-only FAILs (C9a/C9b) in deploy-staging verdict Some checks failed CI / test (push) Failing after 39s Details CI / test (pull_request) Failing after 35s Details The self-hosting orchestrator looped on deploy-staging -> development because scripts/staging_check.py exited 1 on ANY failed check, so two infra-only checks (C9a sandbox branch / C9b analyst-job — caused by SANDBOX bot accounts not being members of the sandbox Plane project, NOT a pipeline regress) forced staging_status: FAILED -> rollback -> loop, burning developer retries and tokens. Direction (б) per ADR-001: classify staging checks as REAL (all pipeline checks, fail-closed) vs SANDBOX_INFRA (narrow allowlist {C9a, C9b}, waivable). New leaf module src/staging_verdict.py (stdlib-only, never-raise): classify_check + compute_staging_verdict fold per-check results into a tolerant-but-fail-closed verdict — any REAL failure -> FAILED/exit1 (safety net holds under any flag); only C9a/C9b failed & tolerant -> SUCCESS/exit0 with waived list; only infra & strict -> FAILED/exit1; any internal error -> FAILED/exit1 (never a false green). staging_check.py now auto-classifies each check (public 3-tuple _items shape kept as an ORCH-048 b6 regression guard), exposes categorized_items(), prints INFRA-WAIVED/VERDICT lines, and exits via the verdict; new --strict flag forces legacy strictness per-run. Kill-switch ORCH_STAGING_INFRA_TOLERANCE_ENABLED (default true) restores legacy strict mode globally. launcher gains action_stage_no_changes_note so "no changes to commit" on action stages is logged as expected, not treated as under-delivery. Contracts unchanged: STAGE_TRANSITIONS, QG_CHECKS registry, staging_status:/ deploy_status: frontmatter, hook exit-code (0/1/2), check_staging_status; no DB migration. Docs: README, STAGING_CHECK.md, deployer.md, .env.example, CHANGELOG. Refs: ORCH-061 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 12:39:00 +00:00
claude-bot	1d1208c136	architect(ET): auto-commit from architect run_id=297 All checks were successful CI / test (push) Successful in 18s Details	2026-06-07 12:22:46 +00:00
claude-bot	3ab2690a68	analyst(ET): auto-commit from analyst run_id=296 All checks were successful CI / test (push) Successful in 16s Details	2026-06-07 12:10:46 +00:00
Slava	3806522041	docs: init ORCH-061 business request All checks were successful CI / test (push) Successful in 17s Details	2026-06-07 15:05:55 +03:00
Slava	d4c6cc0f61	Merge pull request 'fix(reconciler): skip escalated / Blocked / Needs-Input tasks in F-1 (ORCH-060)' (#60 ) from feature/ORCH-060-reconciler-escalated-max-retri into main Some checks failed CI / test (push) Has been cancelled Details	2026-06-07 15:01:11 +03:00
claude-bot	210aef6954	deployer(ET): auto-commit from deployer run_id=293 All checks were successful CI / test (push) Successful in 17s Details CI / test (pull_request) Successful in 16s Details	2026-06-07 11:59:00 +00:00

1 2 3 4 5 ...

326 Commits