feat(cancel): STOP-status task cancellation + relaunch-hole close (ORCH-090)

Introduce the dedicated Plane STOP status as a single declarative task-cancel
mechanism: stop the active agent (graceful SIGTERM cascade), cancel all jobs
(terminal `cancelled`, never requeued), remove the worktree + delete the remote
feature branch (never main, never force-push), drive the task to the new
system-terminal state `cancelled` and tombstone the natural keys so a later
"To Analyse" re-creates it from scratch (docs artefacts preserved). STOP during a
critical merge/deploy window is deferred until the irreversible step finishes
honestly. Also closes the relaunch hole: handle_status_start relaunch is gated to
the `analysis` stage; the only pipeline-start entry point remains "To Analyse".

Cross-cutting (adr-0026): the "task terminal" predicate is widened {done} ->
{done, cancelled} in serial_gate / task_deps / stages sink + reaper/worker
requeue guards. STAGE_TRANSITIONS exit-gates / QG_CHECKS / check_* are unchanged
(`cancelled` is a sink, not a new edge). Additive, never-raise, restart-safe,
under kill-switch ORCH_STOP_STATUS_ENABLED (off -> zero regression).

New: src/cancel.py (leaf), src/gitea.py (delete_remote_branch), tasks columns
cancelled_at/cancel_requested_at, jobs status `cancelled`, GET /queue `stop` block.
Tests: tests/test_stop_status.py (TC-01..TC-14 + D7); full suite green (1345).
Docs updated in-PR (architecture README, CLAUDE.md, README.md, .env.example,
CHANGELOG). ADR-001 D4 refinement: plane_issue_id is tombstoned too (the lookup
ORs on it) — original UUID recoverable from the parseable suffix.

Refs: ORCH-090

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-09 21:01:57 +03:00
committed by orchestrator-deployer
parent ab083ba826
commit ebbf2e7a2d
27 changed files with 1394 additions and 38 deletions

View File

@@ -110,14 +110,19 @@ def repo_has_active_task(repo: str, exclude_task_id: int | None = None) -> bool:
try:
conn = db.get_db()
try:
# ORCH-090 (adr-0026): terminal set is {done,cancelled}. A cancelled
# task must NOT count as "active" or it would block the repo's serial
# gate forever.
if exclude_task_id is not None:
row = conn.execute(
"SELECT 1 FROM tasks WHERE repo=? AND id != ? AND stage != 'done' LIMIT 1",
"SELECT 1 FROM tasks WHERE repo=? AND id != ? "
"AND stage NOT IN ('done','cancelled') LIMIT 1",
(repo, exclude_task_id),
).fetchone()
else:
row = conn.execute(
"SELECT 1 FROM tasks WHERE repo=? AND stage != 'done' LIMIT 1",
"SELECT 1 FROM tasks WHERE repo=? "
"AND stage NOT IN ('done','cancelled') LIMIT 1",
(repo,),
).fetchone()
return row is not None
@@ -264,10 +269,12 @@ def build_claim_clause() -> str:
repo_scope = f"AND jobs.repo IN ({repo_in}) "
else:
repo_scope = ""
# ORCH-090 (adr-0026): {done,cancelled} are both terminal — an EARLIER
# cancelled task no longer holds the FIFO serial gate closed.
active_clause = (
"EXISTS (SELECT 1 FROM tasks t2 "
"WHERE t2.repo = jobs.repo AND t2.id < jobs.task_id "
"AND t2.stage != 'done') "
"AND t2.stage NOT IN ('done','cancelled')) "
)
if _freeze_layer_enabled():
freeze_clause = (
@@ -329,9 +336,10 @@ def _per_repo_snapshot(repo: str) -> dict:
try:
conn = db.get_db()
try:
# ORCH-090 (adr-0026): terminal set {done,cancelled}.
row = conn.execute(
"SELECT work_item_id, stage FROM tasks "
"WHERE repo=? AND stage != 'done' ORDER BY id LIMIT 1",
"WHERE repo=? AND stage NOT IN ('done','cancelled') ORDER BY id LIMIT 1",
(repo,),
).fetchone()
if row: