orchestrator

Author	SHA1	Message	Date
stream	32a7aa8c6b	ci: trigger re-run after host disk cleanup (ORCH-061)	2026-06-07 13:08:38 +00:00
stream	fe8586ed78	ci: re-run after host disk cleanup (ORCH-061)	2026-06-07 13:04:38 +00:00
claude-bot	9070489968	fix(staging): tolerate sandbox-infra-only FAILs (C9a/C9b) in deploy-staging verdict Some checks failed CI / test (push) Failing after 39s Details CI / test (pull_request) Failing after 35s Details The self-hosting orchestrator looped on deploy-staging -> development because scripts/staging_check.py exited 1 on ANY failed check, so two infra-only checks (C9a sandbox branch / C9b analyst-job — caused by SANDBOX bot accounts not being members of the sandbox Plane project, NOT a pipeline regress) forced staging_status: FAILED -> rollback -> loop, burning developer retries and tokens. Direction (б) per ADR-001: classify staging checks as REAL (all pipeline checks, fail-closed) vs SANDBOX_INFRA (narrow allowlist {C9a, C9b}, waivable). New leaf module src/staging_verdict.py (stdlib-only, never-raise): classify_check + compute_staging_verdict fold per-check results into a tolerant-but-fail-closed verdict — any REAL failure -> FAILED/exit1 (safety net holds under any flag); only C9a/C9b failed & tolerant -> SUCCESS/exit0 with waived list; only infra & strict -> FAILED/exit1; any internal error -> FAILED/exit1 (never a false green). staging_check.py now auto-classifies each check (public 3-tuple _items shape kept as an ORCH-048 b6 regression guard), exposes categorized_items(), prints INFRA-WAIVED/VERDICT lines, and exits via the verdict; new --strict flag forces legacy strictness per-run. Kill-switch ORCH_STAGING_INFRA_TOLERANCE_ENABLED (default true) restores legacy strict mode globally. launcher gains action_stage_no_changes_note so "no changes to commit" on action stages is logged as expected, not treated as under-delivery. Contracts unchanged: STAGE_TRANSITIONS, QG_CHECKS registry, staging_status:/ deploy_status: frontmatter, hook exit-code (0/1/2), check_staging_status; no DB migration. Docs: README, STAGING_CHECK.md, deployer.md, .env.example, CHANGELOG. Refs: ORCH-061 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 12:39:00 +00:00
claude-bot	1d1208c136	architect(ET): auto-commit from architect run_id=297 All checks were successful CI / test (push) Successful in 18s Details	2026-06-07 12:22:46 +00:00
claude-bot	3ab2690a68	analyst(ET): auto-commit from analyst run_id=296 All checks were successful CI / test (push) Successful in 16s Details	2026-06-07 12:10:46 +00:00
Slava	3806522041	docs: init ORCH-061 business request All checks were successful CI / test (push) Successful in 17s Details	2026-06-07 15:05:55 +03:00
Slava	d4c6cc0f61	Merge pull request 'fix(reconciler): skip escalated / Blocked / Needs-Input tasks in F-1 (ORCH-060)' (#60 ) from feature/ORCH-060-reconciler-escalated-max-retri into main Some checks failed CI / test (push) Has been cancelled Details	2026-06-07 15:01:11 +03:00
claude-bot	210aef6954	deployer(ET): auto-commit from deployer run_id=293 All checks were successful CI / test (push) Successful in 17s Details CI / test (pull_request) Successful in 16s Details	2026-06-07 11:59:00 +00:00
Slava	1820b0244e	Merge pull request 'docs(ORCH-060): staging gate FAILED (8/10) — C9a/C9b E2E' (#61 ) from docs/ORCH-060-staging-log into main	2026-06-07 14:58:44 +03:00
claude-bot	2f898ede7b	docs(ORCH-060): staging gate FAILED (8/10) — C9a/C9b E2E All checks were successful CI / test (pull_request) Successful in 17s Details Canonical staging_check run inside orchestrator-staging container (ORCH-048). Exit code 1: branch never appeared in sandbox (C9a) and analyst job never enqueued (C9b). staging_status: FAILED → rollback to development per ORCH-35. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 11:58:29 +00:00
claude-bot	829b914ff7	tester(ET): auto-commit from tester run_id=292 All checks were successful CI / test (push) Successful in 17s Details CI / test (pull_request) Successful in 16s Details	2026-06-07 11:54:59 +00:00
claude-bot	55e5e968ae	reviewer(ET): auto-commit from reviewer run_id=291 All checks were successful CI / test (push) Successful in 16s Details CI / test (pull_request) Successful in 22s Details	2026-06-07 11:53:34 +00:00
claude-bot	4db8276f98	fix(reconciler): skip escalated / Blocked / Needs-Input tasks in F-1 All checks were successful CI / test (push) Successful in 16s Details CI / test (pull_request) Successful in 16s Details Reconciler F-1 could not tell "stuck by a lost webhook" from "escalated: max developer retries reached, waiting for a human". With CI green and a reviewer that kept sending REQUEST_CHANGES up to the cap, every tick re-unblocked development -> review -> rollback -> re-unblock (incident ET-013, infinite bounce: wasted agent runs, Telegram spam, parasitic load on the shared self-hosting instance). Add two pre-gate guards in Reconciler._reconcile_gate_task (after the existing analysis/no-gate/active-job/grace guards, before the gate pre-evaluation), each an early silent return (no advance, no unblocked_total increment, no notifications): - Guard 1 (escalated, deterministic, no network, checked first): developer_retry_count(task_id) >= MAX_DEVELOPER_RETRIES. Promote stage_engine._developer_retry_count to public developer_retry_count (single source of truth; private alias kept). Limit from the constant, not a literal 3. - Guard 2 (explicit human Plane gate, Variant A, no DB migration): new never-raise plane_sync.fetch_issue_state + Reconciler._is_blocked_or_needs_input; any error/None/unresolved project -> conservative skip. New sub-flag ORCH_RECONCILE_SKIP_BLOCKED_ENABLED mutes only the networked Guard 2. F-2 unchanged: Blocked/Needs Input are outside {in_progress, approved, rejected} so they are never replayed (regression test added). DB schema, STAGE_TRANSITIONS, QG_CHECKS, never-raise, analysis carve-out and kill-switches untouched. Refs: ORCH-060 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 11:50:02 +00:00
claude-bot	efe437a4aa	architect(ET): auto-commit from architect run_id=289 All checks were successful CI / test (push) Successful in 16s Details	2026-06-07 11:41:02 +00:00
claude-bot	365c67f45d	analyst(ET): auto-commit from analyst run_id=288 All checks were successful CI / test (push) Successful in 17s Details	2026-06-07 11:28:57 +00:00
Slava	d6e0df3550	docs: init ORCH-060 business request All checks were successful CI / test (push) Successful in 17s Details	2026-06-07 14:24:00 +03:00
Slava	4d4f542b71	Merge pull request '#59 staging gate FAILED — corrected root cause' into main Some checks failed CI / test (push) Has been cancelled Details	2026-06-07 14:05:59 +03:00
claude-bot	9e810c89f0	docs(ORCH-058): staging gate FAILED (8/10) — CORRECTED root cause (harness bug, not handler) All checks were successful CI / test (pull_request) Successful in 16s Details Staging check exit code 1 (C9a/C9b). Live inspection inside orchestrator-staging proves the production webhook handler is correct: get_project_states(SANDBOX).in_progress = 84a76f65..., but scripts/staging_check.py hardcodes the enduro fallback b873d9eb... => handler correctly classifies the webhook as "no pipeline action". Fix belongs in scripts/staging_check.py (resolve SANDBOX in_progress dynamically), NOT in handle_status_start or any ORCH-058 image-freshness code. Image under test = ORCH-058 merge commit `094b5e2f`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 11:05:37 +00:00
claude-bot	60e5596e94	docs(ORCH-058): staging gate re-run — staging_status FAILED (8/10, C9a/C9b) E2E pipeline not triggered on staging webhook ("no pipeline action" on state b873d9eb...); reproduces prior FAILED. Rolls task back to development. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 10:42:21 +00:00
Slava	bf60f7a48a	Merge pull request 'docs(ORCH-058): staging gate re-run on fresh image — staging_status FAILED' (#58 ) from deployer/ORCH-058-staging-verdict into main	2026-06-07 13:22:14 +03:00
claude-bot	637c4e9e2e	docs(ORCH-058): staging gate re-run on fresh image — staging_status FAILED (8/10) All checks were successful CI / test (pull_request) Successful in 16s Details Strategy-A freshness re-validation rebuilt 8501 from merged commit `094b5e2` and re-ran staging_check; E2E C9a/C9b fail (Plane "In Progress"/started webhook -> "no pipeline action", no task/branch/analyst-job). Machine verdict FAILED -> rollback to development. Prod (8500) untouched. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 10:21:37 +00:00
Slava	094b5e2f96	Merge pull request 'feat(ORCH-058): staging-image provenance before BUILD-ONCE prod retag (INV-FRESH)' (#57 ) from feature/ORCH-058-self-deploy-retag-staging into main	2026-06-07 13:04:07 +03:00
claude-bot	90b6c8d5a8	docs(ORCH-058): staging gate re-run — staging_status SUCCESS (10/10 PASS) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 09:52:41 +00:00
claude-bot	2221d402b1	docs(ORCH-058): staging gate log — staging_status SUCCESS (10/10 PASS) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 09:33:05 +00:00
claude-bot	6ddff5583d	fix(ORCH-058): parametrize staging_check in --build-staging + explicit staging target All checks were successful CI / test (push) Successful in 19s Details CI / test (pull_request) Successful in 18s Details Round-3 review follow-up on `c53d625` (P1/P2): - P1: --build-staging now runs staging_check via parametrized STAGING_CONTAINER / STAGING_CHECK_PATH / STAGING_CHECK_MODE (default orchestrator-staging / bind-mount path / stub) instead of hardcoding $TARGET_SERVICE + the script path. docker exec runs INSIDE the staging container (ORCH-048 canonical: B6 registry isolation), after health, before exit 0. Fail-closed: any non-zero -> exit 1. STAGING only (8501). - P2a: rebuild_staging_image now passes the STAGING target EXPLICITLY (TARGET_SERVICE/TARGET_PORT/COMPOSE_PROFILE/STAGING_CONTAINER) so the self-rebuild can never drift onto prod 8500 if hook defaults change (AC-9). - P2b: TC-09 caller<->hook contract tests assert the ssh command carries GIT_SHA + BUILD_CONTEXT + the staging target and never the prod 8500 one; no-ssh-host fails closed. - P3: consolidated the three duplicate README footers into one. - Docs (golden source): DEPLOY_HOOK.md step 4 + env rows, README footer, CHANGELOG, Dockerfile ARG GIT_SHA="" comment, .env.example freshness block. Validates exactly the artefact later BUILD-ONCE retagged to prod (AC-4, ADR-001 step 3). 632 tests pass, ruff clean, bash -n OK. Refs: ORCH-058 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 09:24:38 +00:00
Dev Agent	c53d625744	ORCH-058: --build-staging runs staging_check.py --mode stub vs fresh 8501 (AC-4) All checks were successful CI / test (push) Successful in 16s Details Per ADR-001 step 3 / AC-4: after the freshly rebuilt staging container is healthy, run staging_check.py --mode stub against the fresh 8501 stand BEFORE reporting success, so the EXACT artefact BUILD-ONCE retagged to prod is the one validated on staging. Fail-closed: staging_check rc!=0 -> exit 1 (not promoted). - Invoked inside the container (docker exec $TARGET_SERVICE) per the canonical signature in scripts/staging_check.py header, --base-url http://localhost:$TARGET_PORT. - Targets ONLY 8501 (staging), never 8500 (prod) - AC-9. - --mode stub: fast, deterministic, no LLM spend (ADR). - Static regression test test_tc07_build_staging_runs_staging_check_stub_after_health: asserts staging_check.py + --mode stub present, runs after health, before exit 0, fail-closed, and never hard-codes prod 8500.	2026-06-07 12:11:07 +03:00
claude-bot	2ee06ae676	fix(deploy-hook): --build-staging must build from validated worktree, recreate+health 8501 All checks were successful CI / test (push) Successful in 17s Details Closes reviewer P0/P1 (ORCH-058 attempt 3): the committed --build-staging hook recomputed GIT_SHA=$(git rev-parse HEAD) in $REPO (prod clone on `main`) and built `docker build ... "$REPO"`, ignoring the caller-supplied BUILD_CONTEXT/GIT_SHA. On the deploy-staging -> deploy edge the PR is not yet merged, so `main` HEAD != the validated SHA -> the staging image got the wrong revision label and Strategy-B's guard fail-closed on EVERY valid self-deploy (AC-6 deadlock). It also only did `docker build` + exit 0 — never recreating 8501 nor health-checking — so rebuild_staging_image's rc=0 ("rebuilt and healthy") was a lie (AC-4 unmet). - Hook --build-staging now honours caller BUILD_CONTEXT (validated worktree) and GIT_SHA, recreates orchestrator-staging on the fresh image and runs the 10x6s health-check; build/health failure -> exit 1 (FAILED contract preserved). - image_freshness.rebuild_staging_image: document why COMPOSE_PROFILE/TARGET_SERVICE/ TARGET_PORT are intentionally omitted (hook STAGING defaults -> 8501 only, P2). - tests: assert the caller<->hook contract (builds from $BUILD_CONTEXT, no `git rev-parse HEAD` recompute, recreates + health-checks 8501) so the P0 regression can't pass green again (P1). Refs: ORCH-058 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 08:37:51 +00:00
claude-bot	3b3d587300	docs(ORCH-058): add CHANGELOG entry, .env.example flags, fix README status All checks were successful CI / test (push) Successful in 17s Details Close AC-11 documentation gap left by the prior developer run: the ORCH-058 feature (staging-image provenance before BUILD-ONCE retag) was implemented and green but never recorded in the golden-source docs. - CHANGELOG.md: add the ORCH-058 [Unreleased]/Added entry (layers A+B, validated_revision anchor, check_staging_image_fresh, EXPECTED_REVISION hook guard, new ORCH_IMAGE_FRESHNESS_* flags, ADR/test refs). - .env.example (canon): document ORCH_IMAGE_FRESHNESS_ENABLED / ORCH_IMAGE_FRESHNESS_REPOS, mirroring the ORCH-036/043/053 precedent. - docs/architecture/README.md: footer note design -> реализовано, aligning it with the already-updated section. Refs: ORCH-058 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-07 08:27:57 +00:00
Dev Agent	f0c2986477	ORCH-058: implement fail-closed provenance guard in deploy-hook + GIT_SHA OCI label in Dockerfile All checks were successful CI / test (push) Successful in 16s Details - deploy-hook: REVISION_LABEL/EXPECTED_REVISION (default unset -> backward-compat) - deploy-hook: fail-closed guard inspects SOURCE_IMAGE revision label before docker tag, normalises <no value>, exit 1 on empty/mismatch - deploy-hook: new --build-staging mode rebuilds staging image stamping GIT_SHA - Dockerfile: ARG GIT_SHA + LABEL org.opencontainers.image.revision=$GIT_SHA Closes TC07/TC08 (tests/test_deploy_hook_provenance.py).	2026-06-07 11:20:38 +03:00
claude-bot	83397570fe	developer(ET): auto-commit from developer run_id=264 Some checks failed CI / test (push) Failing after 17s Details	2026-06-07 07:46:19 +00:00
claude-bot	dbc32fc106	architect(ET): auto-commit from architect run_id=263 All checks were successful CI / test (push) Successful in 16s Details	2026-06-07 07:27:38 +00:00
claude-bot	282636fedb	analyst(ET): auto-commit from analyst run_id=262 All checks were successful CI / test (push) Successful in 16s Details	2026-06-07 07:06:10 +00:00
Slava	e5f9c38e65	docs: init ORCH-058 business request All checks were successful CI / test (push) Successful in 17s Details	2026-06-07 10:01:11 +03:00
stream	e4c6401633	docs(history): LESSONS self-deploy bootstrap — каскад 4 инфра-багов (passwd/env/log-perms/stale-staging-image) Some checks failed CI / test (push) Has been cancelled Details	2026-06-07 09:52:39 +03:00
stream	115519ebb4	fix(compose): ORCH_DEPLOY_* env for self-deploy (prefix ORCH_, orchestrator hook, host-repo path) — ORCH-36 Phase B	2026-06-07 09:39:51 +03:00
stream	64e031a37f	fix(docker): passwd entry for uid 1000 (slin) — fixes ssh/whoami, unblocks ORCH-36 self-deploy Phase B	2026-06-07 09:27:04 +03:00
claude-bot	01ff71978f	docs(ORCH-036): staging gate SUCCESS — 10/10 checks pass (re-run inside orchestrator-staging) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-06 21:48:50 +00:00
stream	d5915a89b9	docs(history): LESSONS ORCH-036+053 — bootstrap-деплой, merge-конфликт, reconciler в проде	2026-06-07 00:34:36 +03:00
Slava	1ff8d85bb9	Merge pull request 'feat: executable self-deploy — stage deploy triggers host hook (ORCH-036)' (#55 ) from feature/ORCH-036-orch-36-deploy-b into main	2026-06-07 00:23:40 +03:00
stream	36c1898fac	Merge remote-tracking branch 'origin/main' into feature/ORCH-036-orch-36-deploy-b All checks were successful CI / test (push) Successful in 16s Details CI / test (pull_request) Successful in 14s Details # Conflicts: # .env.example # CHANGELOG.md # docs/architecture/README.md # docs/operations/INFRA.md # src/config.py	2026-06-07 00:22:19 +03:00
Slava	e2dc9d6df6	Merge pull request 'ORCH-053: sweeper потерянных webhook (реконсиляция застрявших стадий)' (#56 ) from feature/ORCH-053-sweeper-webhook-stuck-task into main	2026-06-07 00:20:53 +03:00
claude-bot	c0bcb544cf	tester(ET): auto-commit from tester run_id=201 All checks were successful CI / test (push) Successful in 17s Details CI / test (pull_request) Successful in 15s Details	2026-06-06 21:07:35 +00:00
claude-bot	2be39b398b	reviewer(ET): auto-commit from reviewer run_id=199	2026-06-06 21:07:35 +00:00
claude-bot	d79defeadd	fix(deploy): clear stale self-deploy markers on rollback; document env Re-deploy after a FAILED prod deploy wedged the task on `deploy`: the sentinel markers (approve-requested/initiated/result) are keyed by the stable work_item_id, so after the БАГ-8 rollback (deploy -> development) and a developer fix, Phase B's idempotency-guard saw a STALE `initiated` and became a no-op — the detached hook never re-launched and the finalizer was never enqueued. Add self_deploy.clear_state (never-raise, idempotent) and call it on the check_deploy_status FAILED rollback and at the start of Phase A, so every fresh prod-deploy pass starts clean. Also document the new ORCH_SELF_DEPLOY_* / ORCH_DEPLOY_* descriptors in the canonical .env.example (CLAUDE.md rule #8, ТЗ §2.6), modelled on the ORCH-043 merge-gate block (placeholders only, secrets not committed). Contracts untouched: STAGE_TRANSITIONS, QG_CHECKS, _parse_deploy_status, БАГ-8, merge-gate. Refs: ORCH-036 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-06 21:07:35 +00:00
claude-bot	9f43e6a0ae	reviewer(ET): auto-commit from reviewer run_id=195	2026-06-06 21:07:35 +00:00
claude-bot	10f2a39a58	feat(deploy): build-once SOURCE_IMAGE retag in hook + deploy-stage docs Add the optional, backward-compatible SOURCE_IMAGE branch to orchestrator-deploy-hook.sh: when set, retag the staging-validated image onto TARGET_IMAGE (docker tag) before `up -d --no-build` instead of rebuilding — guarantees prod runs the exact artefact that passed staging (AC-7 / TC-14). Unset -> prior behaviour; exit-code contract (0/1/2) and health-loop untouched. Update golden-source docs (AC-13): rewrite deployer.md `deploy` stage from "paper SUCCESS" to the executable self-deploy (Phase A/B/C, no self-restart from inside the container) and add the ORCH-036 CHANGELOG entry. Refs: ORCH-036 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-06 21:07:35 +00:00
claude-bot	63187ff102	developer(ET): auto-commit from developer run_id=192	2026-06-06 21:07:35 +00:00
claude-bot	5c5525548d	architect(ET): auto-commit from architect run_id=190	2026-06-06 21:07:35 +00:00
claude-bot	0d0cd6e281	analyst(ET): auto-commit from analyst run_id=189	2026-06-06 21:07:35 +00:00
Slava	480b203a9d	docs: init ORCH-036 business request	2026-06-06 21:07:35 +00:00

1 2 3 4 5 ...

284 Commits