fix(serial-gate): pause-without-blocking via per-task park signal (ORCH-124)
All checks were successful
CI / test (push) Successful in 1m12s
CI / test (pull_request) Successful in 1m17s

Fixes incident ORCH-116/ORCH-123: serial_gate defined a repo's "active task"
purely by machine stage (tasks.stage NOT IN ('done','cancelled')). Plane statuses
Backlog/Blocked/Needs-Input (layer-B indication, ORCH-066) do NOT change
tasks.stage (layer A), so a paused predecessor was indistinguishable from an active
one and held the FIFO gate closed against an urgent successor — the urgent fix
could not start until the paused task was formally done.

Introduces an explicit, durable, DB-resolvable per-task "park" signal — additive
nullable column tasks.paused_at (pattern of cancelled_at/track) — and a new
ORTHOGONAL scheduler "pause" axis. The serial-gate "active task" predicate becomes
`stage NOT IN ('done','cancelled') AND paused_at IS NULL` across all three points
(build_claim_clause / repo_has_active_task / _per_repo_snapshot). The terminal set
{done,cancelled} in serial_gate/task_deps/stages.py is byte-for-byte unchanged
(adr-0026 not regressed): task_deps/stages.py do NOT read paused_at, so a paused
declared dependency and an active repo_freeze STILL block (pause never bypasses
them — different axes). Anti-stale-base on resume relies on the existing deferred
branch cut (ORCH-088) + pre-merge auto_rebase_onto_main + merge-gate re-test
(ORCH-026/093/110) — no new rebase machinery.

Additive, under an independent sub-flag, never-raise, restart-safe; hot-claim
fail-OPEN and freeze fail-CLOSED preserved. STAGE_TRANSITIONS / QG_CHECKS / check_*
/ machine-verdict keys / existing table schemas are byte-for-byte untouched (this is
a queue-scheduler + observability change, not a Quality Gate).

- src/db.py: additive tasks.paused_at column (_ensure_column) + set/clear/is helpers
- src/serial_gate.py: _pause_layer_enabled() + pause-term in the 3 points; `paused`
  list + per-job `reason` (freeze>dependency>active-task>null) in the /queue snapshot
- src/config.py + .env.example: serial_gate_pause_enabled (default True = true no-op)
- src/main.py: POST /serial-gate/pause|resume?work_item=<id> (by образцу unfreeze)
- tests/test_orch124_serial_gate_pause.py: TC-01 mandatory incident regress + TC-02..15
- CHANGELOG.md: [Unreleased] entry

ADR: docs/work-items/ORCH-124/06-adr/ADR-001-serial-gate-pause-without-blocking.md
Cross-cutting: docs/architecture/adr/adr-0051-serial-gate-pause-without-blocking.md

Refs: ORCH-124

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-16 19:35:55 +03:00
parent de4f067655
commit 87af857082
8 changed files with 683 additions and 8 deletions

View File

@@ -376,6 +376,84 @@ async def serial_gate_unfreeze(repo: str = ""):
return {"ok": True, "repo": repo, "cleared": cleared, "frozen": frozen}
@app.post("/serial-gate/pause")
async def serial_gate_pause(work_item: str = ""):
"""ORCH-124 (adr-0051 D7): park a task so the serial gate stops counting it as
an active FIFO blocker — an urgent successor may overtake it.
Explicit, durable, DB-resolvable operator intent (NOT a Plane-status gesture):
stamps ``tasks.paused_at`` so the offline hot-claim SQL reads it locally without
a network call. Pause does NOT bypass a ``repo_freeze`` or a declared dependency
(different axes) and is NOT terminal (distinct from STOP/cancel). By образцу
``POST /serial-gate/unfreeze``; never-raise. Pausing a terminal (done/cancelled)
task is a no-op. When the pause sub-flag is off the call is a no-op + warning
(the pause-term is omitted from the gate, so a column write would be latent).
"""
from . import db
from . import serial_gate
if not work_item or not work_item.strip():
return {"ok": False, "error": "missing 'work_item'", "work_item": work_item}
work_item = work_item.strip()
if not serial_gate._pause_layer_enabled():
return {"ok": False, "error": "serial_gate_pause_enabled is off (no-op)",
"work_item": work_item}
task = db.get_task_by_work_item_id(work_item)
if not task:
return {"ok": False, "error": "unknown work_item", "work_item": work_item}
task_id = task["id"]
stage = task.get("stage")
if stage in ("done", "cancelled"):
return {"ok": False, "error": f"task is terminal (stage={stage})",
"work_item": work_item, "task_id": task_id, "stage": stage}
ok = db.set_task_paused(task_id)
refreshed = db.get_task_by_work_item_id(work_item) or {}
paused_at = refreshed.get("paused_at")
if ok:
try:
from .notifications import send_telegram, link_for
send_telegram(
f"⏸️ {link_for(work_item)}: задача поставлена на ПАУЗУ для serial-gate "
f"(task {task_id}, stage={stage}). Срочный успешник репо может обогнать; "
f"resume — POST /serial-gate/resume."
)
except Exception:
pass
return {"ok": ok, "work_item": work_item, "task_id": task_id,
"stage": stage, "paused_at": paused_at}
@app.post("/serial-gate/resume")
async def serial_gate_resume(work_item: str = ""):
"""ORCH-124 (adr-0051 D7 / AC-10): resume a parked task — it re-enters the
serial gate (holds it as active again / re-enters FIFO with the deferred branch
cut, D8). Inverse of ``POST /serial-gate/pause``; idempotent (resuming a task
that is not paused clears nothing). Anti-stale-base on resume is guaranteed by
the EXISTING mechanisms (deferred branch cut + pre-merge auto_rebase_onto_main +
merge-gate re-test, ORCH-088/093/110) — no new rebase machinery. never-raise.
"""
from . import db
if not work_item or not work_item.strip():
return {"ok": False, "error": "missing 'work_item'", "work_item": work_item}
work_item = work_item.strip()
task = db.get_task_by_work_item_id(work_item)
if not task:
return {"ok": False, "error": "unknown work_item", "work_item": work_item}
task_id = task["id"]
was_paused = task.get("paused_at") is not None
ok = db.clear_task_paused(task_id)
if ok and was_paused:
try:
from .notifications import send_telegram, link_for
send_telegram(
f"▶️ {link_for(work_item)}: задача СНЯТА С ПАУЗЫ (task {task_id}) — "
f"снова участвует в serial-gate."
)
except Exception:
pass
return {"ok": ok, "work_item": work_item, "task_id": task_id,
"was_paused": was_paused, "paused_at": None}
@app.post("/transition-lease/release")
async def transition_lease_release(work_item: str = ""):
"""ORCH-114 (adr-0045 / D10): operator manual reclaim of a stuck transition-lease.