Second realised slice of the determinization-roadmap (ORCH-118 A5,
needs-hybrid-fallback): on the `testing` stage for the self-hosting
`orchestrator` repo the LLM `tester` agent is replaced by a deterministic
test-runner (src/test_runner.py), intercepted in launch_job BEFORE _spawn
(deploy-finalizer / post-deploy-monitor / staging-runner precedent).
It runs the regression `python -m pytest <target>` in the task worktree via
proc_group (tree-kill) + an optional read-only smoke (/health, /status, /queue
+ serial_gate), maps the exit-code -> result: PASS|FAIL via the existing
self_deploy.map_exit_code_to_status contract, writes 13-test-report.md and
initiates the EXISTING check_tests_passed gate exactly as a finished LLM-tester.
Invariant (NFR-1): only the *producer* changes — the artifact contract
(13-test-report.md / result:), the gate check_tests_passed / _parse_tests_verdict,
STAGE_TRANSITIONS and the DB schema are byte-for-byte UNCHANGED. Additive, under
a kill-switch (test_runner_enabled), never-raise, fail-closed, self-hosting scope,
two-level outcome (tool-error DEFER, anti ORCH-110), hybrid (LLM strictly
off-control-path). 52c-`status:` is aligned with the verdict (D6.1) so the
three-field _parse_tests_verdict never false-negatives a PASS.
Docs (ORCH-118 NFR-6, atomic with code): llm-call-sites.md (A5 implemented),
llm-determinization-roadmap.md (rank 2 implemented), llm-usage-policy.md,
README/internals/overview, tester.md, CLAUDE.md, CHANGELOG.md. Coverage:
tests/test_orch116_test_runner.py (TC-01..TC-14); LLM anti-drift tests green.
Full suite: 2137 passed.
Refs: ORCH-116
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The ORCH-115 deterministic staging-runner ran `docker exec` FROM INSIDE the prod
`orchestrator` container, which ships only `openssh-client git curl` — no `docker`
CLI (Dockerfile:11). `Popen(["docker", ...])` hit FileNotFoundError -> a PERMANENT
environment defect that was mis-routed as a code-fail rollback
`deploy-staging -> development` (burning developer-retries). Incident ORCH-116:
every self-hosting task reaching deploy-staging was doomed to a false rollback.
Fix (adr-0049, additive, flag-gated, never-raise, self-hosting scope; the gate /
artifact contract / STAGE_TRANSITIONS / DB schema are byte-for-byte unchanged):
- D1: build_staging_command() wraps the SAME `docker exec ... staging_check.py
... --mode stub` in `ssh <user@host> '<...>'` so it runs HOST-SIDE over the
existing trusted ssh channel (mirror self_deploy / image_freshness). New flag
staging_runner_exec_host_side (default True). No docker CLI/SDK added to the
image, docker.sock not used in-container (D2 security).
- D3: three-way classify_staging_outcome (suite-ran / permanent-env /
transient-infra), disambiguating the exit=1 collision by scanning stderr.
- D4: invariant "infra != code-fail" — permanent-env / exhausted transient-infra
end in an infra-HOLD (no rollback, no developer-retry), NOT a false FAILED
rollback (supersedes ORCH-115 D5). A really-executed failing suite still rolls
back (anti-over-tolerance). R-2 verified: a held deploy-staging task is not
rolled back by the reconciler.
- D5: prod-like preflight() of the host-side channel at startup (main.lifespan,
best-effort, never blocks).
- D8: snapshot adds permanent_env / exec_host_side / preflight.
Docs (golden source, same PR): INFRA.md execution-boundary section,
architecture/README.md, CLAUDE.md, CHANGELOG.md, .env.example.
Tests: tests/test_orch123_staging_runner_exec.py (TC-01 mandatory regression
red->green; TC-02..TC-14 + R-2). ORCH-115 anti-drift green (3 tests updated for
the D1/D4/D8 supersession). Full suite: 2131 passed.
Refs: ORCH-123
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the LLM `deployer` agent on the `deploy-staging` stage (self-hosting
orchestrator) with a deterministic staging-runner intercepted in launch_job
BEFORE _spawn (the deploy-finalizer / post-deploy-monitor reserved-agent
precedent). The runner executes the SAME staging suite, maps the exit-code to
`staging_status:` via the existing self_deploy.map_exit_code_to_status contract,
writes 15-staging-log.md, and initiates the UNCHANGED check_staging_status gate
exactly as a finished LLM-deployer would.
Invariant (NFR-1): this replaces only the *producer* of the artifact — the
artifact contract, the gate / _parse_staging_status / check_staging_status name,
STAGE_TRANSITIONS, the machine-verdict key `staging_status:` and the DB schema are
byte-for-byte unchanged. Additive, under a kill-switch + repo-scope CSV,
never-raise, fail-safe back to the LLM path.
Two-level outcome (D5, anti ORCH-110): suite executed -> verdict -> advance
(FAILED -> the existing deploy-staging -> development rollback + developer-retry,
same as a FAILED LLM verdict); tool-error (suite did not execute) -> bounded DEFER
-> fail-closed FAILED + alert on exhaustion (infra != code fault; never a silent
advance / false green).
First implemented slice of the LLM determinization roadmap (ORCH-118 A6,
replace-deterministic-now).
- New leaf src/staging_runner.py (never-raise; proc_group tree-kill + timeout)
- launch_job intercept + _run_staging_runner_job (mirror _run_deploy_finalizer_job)
- config: ORCH_STAGING_RUNNER_* keys (enabled/repos/timeout/infra-retry budget)
- GET /queue staging_runner observability block
- docs: llm-call-sites/roadmap/usage-policy (A6 implemented; machine blocks +
single-transport invariant intact), deployer.md (LLM branch -> fallback),
CLAUDE.md, CHANGELOG.md, overview (tech-pipeline/tech-agents/tech-quality-security),
.env.example
- tests/test_orch115_staging_runner.py (TC-01..TC-13); LLM anti-drift green (TC-14)
Refs: ORCH-115
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ORCH-118 (inventory-first, docs+tests only): publish an evidence-based map of
every place the orchestrator's control flow consumes (or can consume) an LLM
judgment, mark the control-path axis (C control-path vs P artifact-producer),
define "avoidable LLM control path" as a checkable two-bit predicate, classify
each call-site, and order the deterministic-replacement roadmap. Pin the map to
code with offline structural anti-drift tests.
- docs/architecture/llm-call-sites.md — map + machine-readable inventory block
+ control-path axis + classification + keep-LLM justifications + deterministic
non-agent paths (FR-1/FR-2/FR-3/FR-8).
- docs/architecture/llm-determinization-roadmap.md — ordered candidates BY ROLE,
savings sourced from agent_runs, recommended first slice = deployer staging
(FR-4). No fabricated follow-up Plane-IDs (R3/NFR-6).
- docs/architecture/llm-usage-policy.md — normative principle, keep/replace
criteria via the axis, definition of "avoidable LLM control path" (FR-5/FR-8).
- tests/test_llm_call_site_inventory.py — TC-01/02/03/04/05/06/09/12/13/14.
- tests/test_llm_determinization_docs.py — TC-07/08/11.
- CHANGELOG.md + docs/overview/tech-quality-security.md — golden-source sync (AC-8).
Avoidable LLM control paths = {tester, deployer}; control-path-keep = {reviewer};
not-control-path (P) = {analyst, architect, developer}. Single LLM transport =
launcher._spawn (S0); no alternative transport (TC-12). Runtime untouched:
STAGE_TRANSITIONS / QG_CHECKS / check_* / machine-verdict keys / DB schema are
byte-for-byte; no replacement runners implemented (FR-7). Full suite: 2081 passed.
Refs: ORCH-118
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Close the root class of incident ORCH-114: a pytest/worktree process performed a
REAL write (PATCH issues state=<Done> + comment) against the PRODUCTION Plane
project, because test/staging processes inherit the live Plane token
(PLANE_HEADERS/PROJECT_ID are captured at import — a post-hoc env/token swap is a
no-op) and nothing forced them to write only to the sandbox. Symmetric to the
existing _no_telegram autouse floor.
- New pure never-raise leaf src/plane_write_guard.py (decide/audit_block/
audit_allow), wired into the 3 plane_sync write primitives (update_issue_state /
add_comment / _set_issue_state_direct) via _guard_allows_write, AT CALL TIME,
before any network step. Active ONLY in a test process (pytest in sys.modules /
PYTEST_CURRENT_TEST); live + staging runtimes (uvicorn) are a strict no-op.
- In a test process: default-deny. A write is allowed iff opt-in
(plane_test_write_enabled) AND target project in the sandbox allowlist
(plane_test_sandbox_projects, default = the one SANDBOX id). Prod is blocked even
with opt-in (allowlist sandbox-only); unresolved project -> block (fail-closed).
- Independent second layer: tests/conftest.py::_plane_sandbox_only autouse floor.
Intentionally NO prod-block kill-switch (anti back-door, NFR-6).
- Audit: block -> loud ERROR; sandbox-allow -> INFO.
- Bypass fixtures for the 3 (+1) pre-existing tests that assert on the mocked
write primitive's httpx call (header/URL/state logic), the guard is no Quality
Gate: STAGE_TRANSITIONS / QG_CHECKS / check_* / machine-verdict / DB schema
untouched.
- Tests: tests/test_orch117_plane_write_isolation.py (TC-01 mandatory ORCH-114
regression + TC-02..TC-14). Docs: CLAUDE.md, architecture/README.md,
operations/INFRA.md, .env.example, CHANGELOG.md.
Refs: ORCH-117
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Resolves the REQUEST_CHANGES findings on ORCH-114 (durable transition-ownership
lease + expected-stage CAS):
P1 — documentation = golden source:
- .env.example: add ORCH_TRANSITION_LEASE_ENABLED / ORCH_TRANSITION_LEASE_REPOS
(canon of 100% start keys, ORCH-101), next to the other gate kill-switches.
- CLAUDE.md: add the ORCH-114 passport section (mechanism, invariant, flags,
ADR links) so a future agent editing advance_stage/reaper/webhooks finds the
ownership invariant in the first mandatory-read doc (ORCH-078 traceability index).
P2 — should-fix:
- docs/overview/ (system showcase, ORCH-011): add transition_lease to
tech-data-model.md (helper tables), tech-observability.md (/queue blocks) and
tech-architecture.md (components).
- ADR-001 D4 alignment: the four side-effectful-edge rollback handlers
(_handle_merge_gate_rollback / _handle_security_gate / _handle_coverage_gate /
_handle_image_freshness) now write `development` through the expected-stage CAS
via a shared _rollback_stage_cas helper (defence against the rollback↔done
contradiction, BR-6) instead of a bare unconditional update_task_stage. Under the
held lease the sole owner always wins; a lost race aborts WITHOUT side effects.
Kill-switch off / out-of-scope repo -> degenerates to the prior write -> 1:1.
- Test isolation: make tests/test_webhooks.py order-independent by pinning the
proj-1 registry per-test (mirrors test_webhook_dedup.proj_registry); it had only
passed by relying on import order. Drop the needless module-level ORCH_DB_PATH
setdefault in test_orch114 (fresh_db already isolates db_path).
New regression tests (TC-11): in-region rollback writes route through CAS;
rollback CAS wins when at expected stage; rollback CAS-lost does NOT clobber `done`;
kill-switch-off rollback degenerates to the unconditional write.
ruff clean (src/stage_engine.py, src/transition_lease.py); full suite 2052 passed.
Refs: ORCH-114
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Close the root class of the ORCH-110/111/112/113 incident chain: side-effectful
stage transitions had no single ownership. `advance_stage` is re-enterable and wrote
the stage with a bare `UPDATE ... WHERE id=?` (no compare-and-swap), while >=5 actors
(monitor / Plane-webhook / reconciler F-1 / job-reaper / deploy-finalizer) enter the
same transition independently. A concurrent or post-restart re-entry therefore
re-applied irreversible effects (merge_pr / coverage-ratchet / image-rebuild /
prod-deploy initiation) and produced a contradictory rollback<->done (incident
ORCH-111, job 1914 / PR #130).
Two complementary layers, both additive, under one kill-switch, never-raise:
1. Durable transition-lease (new table `transition_lease`) — owner-exclusion on
ENTRY to the side-effectful region: a second actor that sees a LIVE owner does
not start the heavy sub-gates at all (prevention, not post-hoc repair).
2. Expected-stage CAS (`db.update_task_stage_cas`) — atomicity on the stage WRITE:
a lost race aborts with NO side effect. Also closes the 6 paths that write the
stage in bypass of advance_stage (gitea x5 + plane rollback).
Owner liveness = owner_pid + owner_boot_id (NOT a heartbeat — a blocking 900s merge
re-test cannot beat one; ADR-001 D3), making restart recovery free (a fresh boot_id
renders every prior lease stale -> reclaimed by recover_on_startup). The lease has no
own TTL: its hard age ceiling is the reaper Tier-3 backstop reaper_max_running_s, so
the cross-cutting budget invariant ORCH-065/109/110/113 is untouched.
Generalises ORCH-113 finalizer-liveness (process-local, Tier-2, deploy-staging) to a
durable cross-path lease: the reaper consults it on all relevant paths (defer live,
reclaim dead; Tier-3 ignores the marker -> bounded; a reap force-releases the lease);
reconciler F-1 and the Plane webhook defer on an active lease; main.lifespan calls
recover_on_startup() after requeue_running_jobs. finalizer_liveness.py is unchanged
(it remains the kill-switch-off fallback).
Scope self-hosting (transition_lease_repos="" -> orchestrator only; enduro untouched).
Kill-switch ORCH_TRANSITION_LEASE_ENABLED=false -> CAS degenerates to the prior
unconditional update_task_stage, lease inert, reaper -> ORCH-113 fallback (byte-for-
byte pre-ORCH-114). STAGE_TRANSITIONS / QG_CHECKS / check_* / machine-verdict keys /
existing table schemas — byte-for-byte (one additive table, no epoch column on tasks).
Observability: read-only `transition_lease` block in GET /queue + a Telegram alert on
forced/stale reclaim + optional POST /transition-lease/release?work_item=<id>.
Coverage: tests/test_orch114_transition_ownership.py (TC-01 mandatory regression of
the ORCH-111 class — red before fix, green after; TC-02..TC-14). Full suite green
(2048 passed); the 4 webhook tests that spied on the removed gitea.update_task_stage
were updated to spy on the new commit_stage_cas write path.
ADR: docs/work-items/ORCH-114/06-adr/ADR-001-transition-ownership-lease-and-stage-cas.md
Cross-cutting: docs/architecture/adr/adr-0045-transition-ownership-lease-and-stage-cas.md
Refs: ORCH-114
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>