"""ORCH-117: fail-closed guard for Plane WRITE primitives from a test/worktree process. Leaf module — pure, never-raise in the live path, config-gated. Mirrors the leaf pattern of ``src/deploy_status_guard.py`` / ``src/serial_gate.py`` / ``src/cancel.py``: it imports only ``config`` (and stdlib ``os``/``sys``), never ``plane_sync`` / ``stage_engine`` — the three write primitives that need a verdict call :func:`decide`, the guard does not live there. The incident (established fact, ORCH-114). A test/worktree process performed a REAL write to Plane against the **production** project: ``PATCH …/issues/… state=`` plus a "Stage: deploy → done" comment, i.e. ``notify_stage_change("ORCH-114", "deploy","done")`` run from pytest mutated a live board issue ("false Done"). The root: test/staging processes inherit the live Plane token (``PLANE_HEADERS`` / ``PROJECT_ID`` are captured as literals at module import, so a post-hoc env/token swap is a no-op, NFR-4), and *nothing* forced them to write only to the sandbox. The precedent. ``tests/conftest.py::_no_telegram`` is an autouse fixture muting ``send_telegram`` in ALL tests, exactly because "pytest on prod sent REAL Telegram messages to Slava". The symmetric protection for Plane WRITES did not exist — this is that protection. The fix (ADR-001 D1/D3): a low choke-point on the entry of the three write primitives, evaluated **at call time** (not at import). The guard is active **only in a test process** (``pytest``-in-process detection) — for the live orchestrator runtime and the staging runtime (both ``uvicorn src.main:app``, no pytest in the process) it is a strict no-op (byte-for-byte, NFR-2/NFR-3). In a test process a write is allowed **iff** simultaneously (a) the dedicated opt-in flag is ON **and** (b) the target project ∈ the sandbox-allowlist; otherwise BLOCK (default-deny). A non-resolvable target → BLOCK (fail-closed, NFR-1). An allowed sandbox write is audited at INFO; a block is audited LOUDLY at ERROR so an ORCH-114-class incident is obvious, not silent (FR-5 / D7). Deliberately NO kill-switch for the prod-block (ADR-001 D4 / FR-7 / NFR-6): a guard that makes a prod write from pytest *physically impossible* must not ship with a config that re-opens it (that would be the very back-door NFR-6 forbids). The only reversible regulator is the sandbox-bound opt-in (``plane_test_write_enabled`` + ``plane_test_sandbox_projects``); "disable the guard" ≠ "allow prod from pytest" — that transition does not exist by design. The independent conftest floor (``_plane_sandbox_only``, ADR-001 D5) is the second sandbox-bound layer. This is bugfix-isolation, NOT a Quality Gate and NOT a stage: ``STAGE_TRANSITIONS`` / ``QG_CHECKS`` / ``check_*`` / machine-verdict keys / the DB schema are byte-for-byte untouched. """ from __future__ import annotations import logging import os import sys from .config import settings logger = logging.getLogger("orchestrator.plane_write_guard") # Verdicts returned by decide() (the calling primitive executes them). ALLOW = True BLOCK = False # Reasons (stable slugs — asserted by tests / read in audit lines). R_LIVE_RUNTIME = "live-runtime" # not a test process -> no-op (prod/staging). R_AMBIGUOUS = "ambiguous-target" # project_id empty/unresolved -> "don't know => don't write". R_OPT_IN_DISABLED = "opt-in-disabled" # test process, opt-in OFF -> default-deny. R_PROD_IN_TEST = "prod-project-in-test" # test process, project NOT in sandbox allowlist. R_SANDBOX_OPT_IN = "sandbox-opt-in" # test process, opt-in ON + sandbox project -> ALLOW. R_GUARD_ERROR = "guard-error" # internal error inside the test-path -> fail-closed BLOCK. # Operation tokens (one per call site) — used only for the audit line. OP_STATE = "state" # update_issue_state / _set_issue_state_direct (httpx.patch) OP_COMMENT = "comment" # add_comment (httpx.post) def _in_test_process() -> bool: """True iff this Python process is a pytest/worktree test process (ADR-001 D2). ``"pytest" in sys.modules`` is true for the whole pytest run (collection + execution) in THIS process, which is exactly the worktree ``python -m pytest`` process from the incident. The live orchestrator and the staging runtime start via ``uvicorn src.main:app`` and never import pytest into their process, so the detection never fires there (NFR-2/NFR-3, AC-5/AC-6). ``PYTEST_CURRENT_TEST`` is a secondary confirming signal pytest sets for the duration of a test body. Both are read at call time (NFR-4). Never raises: on any error we treat the process as NOT-a-test (-> live ALLOW), so the guard can never accidentally wedge a legitimate prod write. """ try: if "pytest" in sys.modules: return True if os.environ.get("PYTEST_CURRENT_TEST"): return True except Exception: # noqa: BLE001 - never-raise; ambiguity -> "not a test" (live ALLOW). return False return False def _sandbox_allowlist() -> set[str]: """Sanitised set of sandbox project ids from ``plane_test_sandbox_projects``. Empty/blank CSV -> empty set (then EVERY project blocks in a test process, fail-closed). Never raises. """ try: raw = (settings.plane_test_sandbox_projects or "").strip() except Exception: # noqa: BLE001 - never-raise. return set() if not raw: return set() return {tok.strip() for tok in raw.split(",") if tok.strip()} def decide(project_id: str | None, op: str, work_item_id: str | None = None) -> tuple[bool, str]: """Decide whether a Plane WRITE primitive may hit the network (ADR-001 D3). Returns ``(ok, reason)``. Steps: 1. ``not _in_test_process()`` -> ALLOW (``live-runtime``: prod/staging no-op). 2. ``project_id`` empty/None/unresolved -> BLOCK (``ambiguous-target``, NFR-1). 3. opt-in flag OFF -> BLOCK (``opt-in-disabled``, default-deny). 4. ``project_id`` ∉ sandbox allowlist -> BLOCK (``prod-project-in-test``, AC-3). 5. otherwise -> ALLOW (``sandbox-opt-in``, audit INFO). never-raise: the live path returns at step 1 BEFORE the try-block, so a prod write can never be wedged by a guard bug. Once we know we are in a test process, any internal error fails CLOSED to BLOCK (``guard-error``) — the defect surfaces loudly rather than re-opening prod (AC-9 / FR-7). """ # Step 1 — outside any test process the guard is a strict no-op. Evaluated FIRST # and OUTSIDE the try-block so a live prod/staging write is never affected. if not _in_test_process(): return ALLOW, R_LIVE_RUNTIME # From here on we are in a test process: default-deny, fail-closed on any error. try: pid = (project_id or "").strip() if not pid: return BLOCK, R_AMBIGUOUS # step 2 if not bool(getattr(settings, "plane_test_write_enabled", False)): return BLOCK, R_OPT_IN_DISABLED # step 3 if pid not in _sandbox_allowlist(): return BLOCK, R_PROD_IN_TEST # step 4 — sandbox-only, even with opt-in. return ALLOW, R_SANDBOX_OPT_IN # step 5 except Exception as e: # noqa: BLE001 - never-raise; in-test -> fail CLOSED. logger.error( "plane_write_guard.decide error in test-process -> BLOCK (fail-closed): %s", e ) return BLOCK, R_GUARD_ERROR def audit_block(project_id: str | None, op: str, work_item_id: str | None, reason: str) -> None: """Loud structured audit of a BLOCKED write (FR-5 / D7). Logged at ERROR so an ORCH-114-class incident (a pytest mutating a non-sandbox project) is obvious in the run log, not silent. Never raises. """ try: logger.error( "plane_write_guard BLOCKED Plane %s write from a test process: " "project_id=%s work_item=%s reason=%s " "(ORCH-117 fail-closed; this would have mutated a non-sandbox Plane " "project from pytest — cf. the ORCH-114 incident)", op, project_id, work_item_id, reason, ) except Exception: # noqa: BLE001 - logging must never raise. pass def audit_allow(project_id: str | None, op: str, work_item_id: str | None, reason: str = R_SANDBOX_OPT_IN) -> None: """Audit (INFO) an ALLOWED real sandbox write from a test process (FR-5 / D7). Only the ``sandbox-opt-in`` case is audited here — the ``live-runtime`` ALLOW (prod/staging) is the normal hot path and is intentionally NOT logged to avoid spamming the production log. Never raises. """ try: logger.info( "plane_write_guard ALLOWED sandbox Plane %s write from a test process: " "project_id=%s work_item=%s reason=%s", op, project_id, work_item_id, reason, ) except Exception: # noqa: BLE001 - logging must never raise. pass def snapshot() -> dict: """Read-only view of the guard state (optional observability, D7). Never raises.""" try: return { "in_test_process": _in_test_process(), "opt_in_enabled": bool(getattr(settings, "plane_test_write_enabled", False)), "sandbox_allowlist": sorted(_sandbox_allowlist()), } except Exception as e: # noqa: BLE001 - never-raise. return {"error": str(e)}