feat(coverage): deterministic test-coverage gate (ORCH-027) #109

Merged
admin merged 11 commits from feature/ORCH-027-code-coverage into main 2026-06-10 01:30:55 +03:00
Owner

ORCH-027 — Coverage gate (deterministic test-coverage gate)

Deterministic (no-LLM) coverage sub-gate on the deploy-staging → deploy edge that blocks coverage degradation before a task branch merges into main. Mirrors the security-gate (ORCH-022).

What

  • Leaf src/coverage_gate.py (never-raise) + thin check_coverage_gate in QG_CHECKS + _handle_coverage_gate splice in advance_stage — run AFTER merge-gate (coverage measured on the caught-up HEAD that lands in main) and BEFORE image-freshness (fail before the expensive docker rebuild). Order: security → merge → coverage → image-freshness.
  • Measure: pytest --cov=src --cov-report=json in the per-branch worktree → totals.percent_covered. Tool error → fail-open + WARNING by default (coverage_tool_fail_closed flips to strict).
  • Decision (pure): compute_coverage_verdict(measured, baseline, floor, policy, epsilon)absolute | baseline | both + epsilon (anti-flap); baseline None → bootstrap.
  • Baseline: additive coverage_baseline table (CREATE TABLE IF NOT EXISTS) + ratchet-up in _handle_merge_verify (deploy → done): atomic compare-and-set under the held merge-lease, never decreases.
  • FAIL → rollback to development (+ developer-retry, cap 3) + release merge-lease.
  • Observability: artefact 18-coverage-report.md (coverage_status: frontmatter, single source of truth); GET /queue coverage block; Telegram on FAIL; optional POST /coverage/baseline override.

Invariants

  • STAGE_TRANSITIONS / existing check_* semantics / verdict keys (verdict:/result:/deploy_status:/staging_status:/security_status:) — byte-for-byte unchanged (NFR-5/AC-8). New DB table additive.
  • Kill-switch coverage_gate_enabled; scope coverage_gate_repos (empty → self-hosting only) → enduro-trails untouched.
  • never-raise; self-hosting safe (no deploy / restart / push to main).

Tests

tests/test_coverage_gate.py (TC-01..TC-15): verdict modes/borders/epsilon, ratchet up-only + bootstrap + per-repo isolation, conditionality/kill-switch, fail-open/closed, never-raise, report write/read-back, AST self-hosting safety, advance_stage rollback + lease-release, real pytest-cov measurement + timeout, snapshot + registry/transitions unchanged. Frozen QG-registry anti-regress tests + deploy-staging edge tests updated. Full suite green (1466 passed).

Infra precondition

pytest-cov==5.0.0 added to requirements.txt — must be in the prod/staging image. First applicable merge bootstraps the baseline.

Docs: README / adr-0029-coverage-gate.md / PIPELINE_DOCS / 18-coverage-report.md template (architecture stage) + CHANGELOG / CLAUDE.md / .env.example (this PR).

Refs: ORCH-027

🤖 Generated with Claude Code

## ORCH-027 — Coverage gate (deterministic test-coverage gate) Deterministic (no-LLM) coverage sub-gate on the `deploy-staging → deploy` edge that blocks coverage degradation before a task branch merges into `main`. Mirrors the security-gate (ORCH-022). ### What - **Leaf** `src/coverage_gate.py` (never-raise) + thin `check_coverage_gate` in `QG_CHECKS` + `_handle_coverage_gate` splice in `advance_stage` — run **AFTER merge-gate** (coverage measured on the caught-up HEAD that lands in `main`) and **BEFORE image-freshness** (fail before the expensive docker rebuild). Order: `security → merge → coverage → image-freshness`. - **Measure:** `pytest --cov=src --cov-report=json` in the per-branch worktree → `totals.percent_covered`. Tool error → **fail-open + WARNING** by default (`coverage_tool_fail_closed` flips to strict). - **Decision (pure):** `compute_coverage_verdict(measured, baseline, floor, policy, epsilon)` — `absolute | baseline | both` + epsilon (anti-flap); `baseline None` → bootstrap. - **Baseline:** additive `coverage_baseline` table (`CREATE TABLE IF NOT EXISTS`) + **ratchet-up** in `_handle_merge_verify` (`deploy → done`): atomic compare-and-set under the held merge-lease, never decreases. - **FAIL** → rollback to `development` (+ developer-retry, cap 3) + release merge-lease. - **Observability:** artefact `18-coverage-report.md` (`coverage_status:` frontmatter, single source of truth); `GET /queue` `coverage` block; Telegram on FAIL; optional `POST /coverage/baseline` override. ### Invariants - `STAGE_TRANSITIONS` / existing `check_*` semantics / verdict keys (`verdict:`/`result:`/`deploy_status:`/`staging_status:`/`security_status:`) — **byte-for-byte unchanged** (NFR-5/AC-8). New DB table additive. - Kill-switch `coverage_gate_enabled`; scope `coverage_gate_repos` (empty → self-hosting only) → **enduro-trails untouched**. - never-raise; self-hosting safe (no deploy / restart / push to `main`). ### Tests `tests/test_coverage_gate.py` (TC-01..TC-15): verdict modes/borders/epsilon, ratchet up-only + bootstrap + per-repo isolation, conditionality/kill-switch, fail-open/closed, never-raise, report write/read-back, AST self-hosting safety, advance_stage rollback + lease-release, real pytest-cov measurement + timeout, snapshot + registry/transitions unchanged. Frozen QG-registry anti-regress tests + deploy-staging edge tests updated. **Full suite green (1466 passed).** ### Infra precondition `pytest-cov==5.0.0` added to `requirements.txt` — must be in the prod/staging image. First applicable merge bootstraps the baseline. Docs: README / `adr-0029-coverage-gate.md` / PIPELINE_DOCS / `18-coverage-report.md` template (architecture stage) + CHANGELOG / CLAUDE.md / `.env.example` (this PR). Refs: ORCH-027 🤖 Generated with [Claude Code](https://claude.com/claude-code)
admin added 10 commits 2026-06-10 01:26:27 +03:00
Introduce a deterministic (no-LLM) coverage sub-gate that blocks coverage
degradation before a task branch merges into `main`. Existing gates judge only by
the FACT of passing (check_ci_green / check_tests_passed / merge-gate re-test), not
by completeness — so a batch autonomous run (ORCH-088) silently erodes coverage.

Pattern mirrors the security-gate (ORCH-022): leaf src/coverage_gate.py (never-raise)
+ thin check_coverage_gate in QG_CHECKS + _handle_coverage_gate splice in advance_stage,
run AFTER merge-gate (measured on the caught-up HEAD that lands in main) and BEFORE
image-freshness (fail before the expensive docker rebuild).

- measure_coverage: pytest --cov=src --cov-report=json in the per-branch worktree ->
  line coverage %; None on tool error -> fail-open + WARNING by default (FR-6).
- compute_coverage_verdict (pure): absolute | baseline | both + epsilon (NFR-4 anti-flap);
  baseline None -> bootstrap (absolute-only).
- coverage_baseline DB table (additive, CREATE TABLE IF NOT EXISTS) + ratchet-up in
  _handle_merge_verify (deploy->done): atomic compare-and-set under merge-lease, never
  decreases; bootstrap on first merge.
- Artefact 18-coverage-report.md (coverage_status: frontmatter, single source of truth);
  GET /queue `coverage` block; FAIL -> Telegram; optional POST /coverage/baseline override.
- Flags ORCH_COVERAGE_* (kill-switch + self-hosting-only scope) -> enduro untouched;
  STAGE_TRANSITIONS / existing check_* / verdict keys byte-for-byte unchanged (NFR-5/AC-8).
- pytest-cov==5.0.0 added to requirements.txt.

Tests: tests/test_coverage_gate.py (TC-01..TC-15). Frozen QG-registry anti-regress
tests + deploy-staging edge tests updated for the new sub-gate. Full suite green.

Docs: README / adr-0029 / PIPELINE_DOCS / 18-coverage-report.md template (architecture
stage) + CHANGELOG / CLAUDE.md / .env.example (this PR).

Refs: ORCH-027
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
measure_coverage hardcoded "python" for the coverage subprocess; the prod container
and the CI runner expose "python3" (a bare "python" may be absent), and pytest-cov
lives in exactly the running interpreter's environment. Use sys.executable so the
measurement always runs under the same interpreter as the orchestrator.

Refs: ORCH-027
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reviewer P1 (ORCH-027 attempt 2): inserting the ORCH-027 changelog
block duplicated the adjacent ORCH-095 entry — its paragraph body was
repeated verbatim, corrupting a golden-source doc and another work
item's artifact (CLAUDE.md §3). Remove the duplicate half, leaving a
single ORCH-095 body. ORCH-027 entry untouched (already correct).

Refs: ORCH-027

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
tester(ET): auto-commit from tester run_id=539
All checks were successful
CI / test (push) Successful in 45s
CI / test (pull_request) Successful in 43s
c2369db808
admin force-pushed feature/ORCH-027-code-coverage from 5fde2ea737 to c2369db808 2026-06-10 01:26:27 +03:00 Compare
admin added 1 commit 2026-06-10 01:30:52 +03:00
deploy(ORCH-036): finalize SUCCESS for ORCH-027
All checks were successful
CI / test (push) Successful in 43s
CI / test (pull_request) Successful in 43s
dffd151434
admin merged commit 567c27e1d9 into main 2026-06-10 01:30:55 +03:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: admin/orchestrator#109