Recovery from the merge-gate rebase-conflict bounce. The feature branch was
rebased onto origin/main (which had merged ORCH-123). The single conflicting
hunk — docs/architecture/README.md — was resolved during the rebase: kept
ORCH-123's host-side staging-runner line AND the ORCH-116 test-runner bullet.
This follow-up commit reconciles the remainder:
- Renumber the global sweeping ADR adr-0049 -> adr-0050. ORCH-123 took adr-0049
(adr-0049-host-side-docker-execution-boundary.md) on main while ORCH-116 was
in flight, so ORCH-116 yields to the merged task and moves to the next free
number. Mechanical cross-reference reconciliation only (git mv + title + every
test-runner reference across README/internals/CLAUDE/CHANGELOG/config.py +
06-adr/ADR-001 + 12-review). Main's adr-0049 host-side references are left
byte-for-byte untouched. No design/verdict content was altered.
- Restore the ORCH-116 CHANGELOG entry that the CHANGELOG auto-merge silently
dropped (both ORCH-123 and ORCH-116 inserted at the same [Unreleased] anchor;
git kept only ORCH-123).
- Add the missing ORCH_TEST_RUNNER_* keys to .env.example (parity with the
ORCH_STAGING_RUNNER_* block; ORCH-101 canon of start keys).
Refs: ORCH-116
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Second realised slice of the determinization-roadmap (ORCH-118 A5,
needs-hybrid-fallback): on the `testing` stage for the self-hosting
`orchestrator` repo the LLM `tester` agent is replaced by a deterministic
test-runner (src/test_runner.py), intercepted in launch_job BEFORE _spawn
(deploy-finalizer / post-deploy-monitor / staging-runner precedent).
It runs the regression `python -m pytest <target>` in the task worktree via
proc_group (tree-kill) + an optional read-only smoke (/health, /status, /queue
+ serial_gate), maps the exit-code -> result: PASS|FAIL via the existing
self_deploy.map_exit_code_to_status contract, writes 13-test-report.md and
initiates the EXISTING check_tests_passed gate exactly as a finished LLM-tester.
Invariant (NFR-1): only the *producer* changes — the artifact contract
(13-test-report.md / result:), the gate check_tests_passed / _parse_tests_verdict,
STAGE_TRANSITIONS and the DB schema are byte-for-byte UNCHANGED. Additive, under
a kill-switch (test_runner_enabled), never-raise, fail-closed, self-hosting scope,
two-level outcome (tool-error DEFER, anti ORCH-110), hybrid (LLM strictly
off-control-path). 52c-`status:` is aligned with the verdict (D6.1) so the
three-field _parse_tests_verdict never false-negatives a PASS.
Docs (ORCH-118 NFR-6, atomic with code): llm-call-sites.md (A5 implemented),
llm-determinization-roadmap.md (rank 2 implemented), llm-usage-policy.md,
README/internals/overview, tester.md, CLAUDE.md, CHANGELOG.md. Coverage:
tests/test_orch116_test_runner.py (TC-01..TC-14); LLM anti-drift tests green.
Full suite: 2137 passed.
Refs: ORCH-116
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The ORCH-115 deterministic staging-runner ran `docker exec` FROM INSIDE the prod
`orchestrator` container, which ships only `openssh-client git curl` — no `docker`
CLI (Dockerfile:11). `Popen(["docker", ...])` hit FileNotFoundError -> a PERMANENT
environment defect that was mis-routed as a code-fail rollback
`deploy-staging -> development` (burning developer-retries). Incident ORCH-116:
every self-hosting task reaching deploy-staging was doomed to a false rollback.
Fix (adr-0049, additive, flag-gated, never-raise, self-hosting scope; the gate /
artifact contract / STAGE_TRANSITIONS / DB schema are byte-for-byte unchanged):
- D1: build_staging_command() wraps the SAME `docker exec ... staging_check.py
... --mode stub` in `ssh <user@host> '<...>'` so it runs HOST-SIDE over the
existing trusted ssh channel (mirror self_deploy / image_freshness). New flag
staging_runner_exec_host_side (default True). No docker CLI/SDK added to the
image, docker.sock not used in-container (D2 security).
- D3: three-way classify_staging_outcome (suite-ran / permanent-env /
transient-infra), disambiguating the exit=1 collision by scanning stderr.
- D4: invariant "infra != code-fail" — permanent-env / exhausted transient-infra
end in an infra-HOLD (no rollback, no developer-retry), NOT a false FAILED
rollback (supersedes ORCH-115 D5). A really-executed failing suite still rolls
back (anti-over-tolerance). R-2 verified: a held deploy-staging task is not
rolled back by the reconciler.
- D5: prod-like preflight() of the host-side channel at startup (main.lifespan,
best-effort, never blocks).
- D8: snapshot adds permanent_env / exec_host_side / preflight.
Docs (golden source, same PR): INFRA.md execution-boundary section,
architecture/README.md, CLAUDE.md, CHANGELOG.md, .env.example.
Tests: tests/test_orch123_staging_runner_exec.py (TC-01 mandatory regression
red->green; TC-02..TC-14 + R-2). ORCH-115 anti-drift green (3 tests updated for
the D1/D4/D8 supersession). Full suite: 2131 passed.
Refs: ORCH-123
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the LLM `deployer` agent on the `deploy-staging` stage (self-hosting
orchestrator) with a deterministic staging-runner intercepted in launch_job
BEFORE _spawn (the deploy-finalizer / post-deploy-monitor reserved-agent
precedent). The runner executes the SAME staging suite, maps the exit-code to
`staging_status:` via the existing self_deploy.map_exit_code_to_status contract,
writes 15-staging-log.md, and initiates the UNCHANGED check_staging_status gate
exactly as a finished LLM-deployer would.
Invariant (NFR-1): this replaces only the *producer* of the artifact — the
artifact contract, the gate / _parse_staging_status / check_staging_status name,
STAGE_TRANSITIONS, the machine-verdict key `staging_status:` and the DB schema are
byte-for-byte unchanged. Additive, under a kill-switch + repo-scope CSV,
never-raise, fail-safe back to the LLM path.
Two-level outcome (D5, anti ORCH-110): suite executed -> verdict -> advance
(FAILED -> the existing deploy-staging -> development rollback + developer-retry,
same as a FAILED LLM verdict); tool-error (suite did not execute) -> bounded DEFER
-> fail-closed FAILED + alert on exhaustion (infra != code fault; never a silent
advance / false green).
First implemented slice of the LLM determinization roadmap (ORCH-118 A6,
replace-deterministic-now).
- New leaf src/staging_runner.py (never-raise; proc_group tree-kill + timeout)
- launch_job intercept + _run_staging_runner_job (mirror _run_deploy_finalizer_job)
- config: ORCH_STAGING_RUNNER_* keys (enabled/repos/timeout/infra-retry budget)
- GET /queue staging_runner observability block
- docs: llm-call-sites/roadmap/usage-policy (A6 implemented; machine blocks +
single-transport invariant intact), deployer.md (LLM branch -> fallback),
CLAUDE.md, CHANGELOG.md, overview (tech-pipeline/tech-agents/tech-quality-security),
.env.example
- tests/test_orch115_staging_runner.py (TC-01..TC-13); LLM anti-drift green (TC-14)
Refs: ORCH-115
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ORCH-118 (inventory-first, docs+tests only): publish an evidence-based map of
every place the orchestrator's control flow consumes (or can consume) an LLM
judgment, mark the control-path axis (C control-path vs P artifact-producer),
define "avoidable LLM control path" as a checkable two-bit predicate, classify
each call-site, and order the deterministic-replacement roadmap. Pin the map to
code with offline structural anti-drift tests.
- docs/architecture/llm-call-sites.md — map + machine-readable inventory block
+ control-path axis + classification + keep-LLM justifications + deterministic
non-agent paths (FR-1/FR-2/FR-3/FR-8).
- docs/architecture/llm-determinization-roadmap.md — ordered candidates BY ROLE,
savings sourced from agent_runs, recommended first slice = deployer staging
(FR-4). No fabricated follow-up Plane-IDs (R3/NFR-6).
- docs/architecture/llm-usage-policy.md — normative principle, keep/replace
criteria via the axis, definition of "avoidable LLM control path" (FR-5/FR-8).
- tests/test_llm_call_site_inventory.py — TC-01/02/03/04/05/06/09/12/13/14.
- tests/test_llm_determinization_docs.py — TC-07/08/11.
- CHANGELOG.md + docs/overview/tech-quality-security.md — golden-source sync (AC-8).
Avoidable LLM control paths = {tester, deployer}; control-path-keep = {reviewer};
not-control-path (P) = {analyst, architect, developer}. Single LLM transport =
launcher._spawn (S0); no alternative transport (TC-12). Runtime untouched:
STAGE_TRANSITIONS / QG_CHECKS / check_* / machine-verdict keys / DB schema are
byte-for-byte; no replacement runners implemented (FR-7). Full suite: 2081 passed.
Refs: ORCH-118
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Close the root class of incident ORCH-114: a pytest/worktree process performed a
REAL write (PATCH issues state=<Done> + comment) against the PRODUCTION Plane
project, because test/staging processes inherit the live Plane token
(PLANE_HEADERS/PROJECT_ID are captured at import — a post-hoc env/token swap is a
no-op) and nothing forced them to write only to the sandbox. Symmetric to the
existing _no_telegram autouse floor.
- New pure never-raise leaf src/plane_write_guard.py (decide/audit_block/
audit_allow), wired into the 3 plane_sync write primitives (update_issue_state /
add_comment / _set_issue_state_direct) via _guard_allows_write, AT CALL TIME,
before any network step. Active ONLY in a test process (pytest in sys.modules /
PYTEST_CURRENT_TEST); live + staging runtimes (uvicorn) are a strict no-op.
- In a test process: default-deny. A write is allowed iff opt-in
(plane_test_write_enabled) AND target project in the sandbox allowlist
(plane_test_sandbox_projects, default = the one SANDBOX id). Prod is blocked even
with opt-in (allowlist sandbox-only); unresolved project -> block (fail-closed).
- Independent second layer: tests/conftest.py::_plane_sandbox_only autouse floor.
Intentionally NO prod-block kill-switch (anti back-door, NFR-6).
- Audit: block -> loud ERROR; sandbox-allow -> INFO.
- Bypass fixtures for the 3 (+1) pre-existing tests that assert on the mocked
write primitive's httpx call (header/URL/state logic), the guard is no Quality
Gate: STAGE_TRANSITIONS / QG_CHECKS / check_* / machine-verdict / DB schema
untouched.
- Tests: tests/test_orch117_plane_write_isolation.py (TC-01 mandatory ORCH-114
regression + TC-02..TC-14). Docs: CLAUDE.md, architecture/README.md,
operations/INFRA.md, .env.example, CHANGELOG.md.
Refs: ORCH-117
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>