The merge-gate's auto_rebase_onto_main silently dropped the ORCH-119
CHANGELOG bullet during a same-anchor 3-way merge: origin/main's
ORCH-120/126 entries were kept while the ORCH-119 insertion was lost.
Re-spliced the entry verbatim under ## [Unreleased] alongside 120/126.
Refs: ORCH-119
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Description section of 00-business-request.md always read the literal `TBD`,
losing the source-backed Plane-issue request context. Render the ACTUAL issue
`description` on both creation paths:
- Direct path A (serial_gate N/A): start_pipeline passes `description` to
_create_initial_docs.
- Deferred path B (ORCH-088, dominates on self-hosting): persist `description`
durable in the additive `tasks.description` column inside the same atomic INSERT
in create_task_atomic (race-safe vs ORCH-053 anti-dup claim), read it in
launcher._spawn -> _materialize_deferred_branch at claim (no network in the hot
claim path, NFR-4).
Pure render helper _render_business_request with a fail-safe fallback marker for
empty/None/unreadable descriptions (never breaks task creation); Gitea 422 stays a
no-op (idempotent). STAGE_TRANSITIONS / QG_CHECKS / check_* / machine-verdict keys
and the base CREATE TABLE tasks are byte-for-byte unchanged; the ORCH-088
anti-stale-base invariant is preserved (only the data source is enriched).
Tests: tests/test_orch119_business_request.py (TC-01 mandatory red->green
regression; TC-02..TC-07). Updated the ORCH-088 serial-gate spy for the additive
_create_initial_docs arg.
Refs: ORCH-119
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Address reviewer P1 (ось ORCH-011/ORCH-079, правило агентов №6): витрина
описывала паузу serial-gate как исключительно операторскую, но ORCH-120
добавил движковый авто-park/unpark на analyst Needs Input.
- tech-pipeline.md: абзац пауз теперь называет два источника (оператор +
авто-park движком на Needs Input, флаг analyst_needs_input_autopause_enabled,
скоуп self-hosting, симметричный unpark на resume).
- tech-observability.md: пункт пауз в GET /queue — оба источника.
- tech-agents.md: when-applicable сигнальный канал 01-questions.md у analyst
(строка таблицы + поясняющая врезка; не machine-verdict, не deliverable).
- CHANGELOG: запись ORCH-120 дополнена строкой про обновление витрины.
tests/test_system_docs.py зелёный (29 passed). src/STAGE_TRANSITIONS/QG_CHECKS
не тронуты — docs-only.
Refs: ORCH-120
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Activates and completes the previously dead "analyst asks BLOCKING questions ->
01-questions.md -> Needs Input" path. Four coordinated changes, additive, under
kill-switch, self-hosting scope, never-raise; STAGE_TRANSITIONS / QG_CHECKS /
check_* / machine-verdict keys / DB schema are byte-for-byte UNCHANGED (the flow
is a pre-gate engine branch, NOT a Quality Gate; 01-questions.md is a SIGNAL
artifact, NOT a machine-verdict).
- D1 contract + canon: analyst.md documents the 01-questions.md channel (blocking
questions -> Needs Input, do NOT fabricate deliverables) + resume behaviour; new
skeleton docs/_templates/01-questions.md; PIPELINE_DOCS.md manifest row + 01-
prefix note.
- D2 freshness-supersede (DQ-2): pure offline mtime predicate questions_active in
the new leaf src/analyst_questions.py (a full FRESH package supersedes a stale
untouched 01-questions.md -> no Needs-Input loop, AC-6).
- D3 priority: questions take priority over "files ready" in
_handle_analysis_approved_flow (_decide_analysis_outcome + _emit_analysis_*);
off/out-of-scope runs the ORIGINAL byte-for-byte order (AC-9).
- D4 auto-park: set_task_paused on Needs Input via the ORCH-124 pause axis so the
repo serial-gate FIFO is not wedged while waiting for a human (AC-4); D5 resume +
unpark (clear_task_paused) in handle_status_start (analysis branch).
Flags (config.py, safe defaults): analyst_questions_gate_enabled /
analyst_questions_gate_repos (empty -> self-hosting only) /
analyst_needs_input_autopause_enabled.
Tests: test_orch120_analyst_needs_input.py (TC-01 regress + TC-02/03/06/09/10),
test_orch120_serial_gate_needs_input.py (TC-04), test_orch120_resume_unpark.py
(TC-05), test_orch120_questions_artifact_canon.py (TC-08), assert in
test_agent_prompts_canon.py (TC-07). Full suite green (2205 passed).
Refs: ORCH-120
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Queued analyst-jobs hung forever even with ORCH_SERIAL_GATE_ENABLED=false
(incident ORCH-124/125, job 2286: queued + run_id=759/760 + pid=35/42 +
started_at=NULL — physically impossible). No path returning a job to
'queued' reset its run-ownership (run_id/pid); after a container restart a
reused pid made pid_alive(stale)=True, so the job-reaper Tier-1 saw a phantom
'running' and at max_concurrency=1 wedged the claim of the whole shared queue.
Enforce the invariant `status='queued' ⇒ run_id IS NULL AND pid IS NULL AND
started_at IS NULL` on existing columns (no schema change):
- D1 forward-cleanup: requeue_running_jobs / mark_job('queued') /
mark_job_transient / reap_running_job('queued') reset run_id=NULL, pid=NULL
in the same UPDATE that clears started_at; atomic status-guards preserved.
- D2 clean claim: claim_next_job resets pid/run_id on the queued->running flip
(defense-in-depth) so the row carries pid IS NULL until _spawn stamps it.
- D4 self-heal + observability: db.find_impossible_queued_jobs /
sanitize_impossible_queued run at startup (main.lifespan) and on each reaper
tick (JobReaper.sanitize_impossible_queued_once, never-raise); counter
impossible_queued_total in the GET /queue reaper block. Kill-switch
ORCH_IMPOSSIBLE_QUEUED_SANITIZE_ENABLED (default on; gates only the D4 sweep).
- D5: reaper Tier-1 unchanged — the fix restores its precondition (pid reflects
THIS run). Marked invariants ORCH-065/113/114/099 preserved.
Tests: tests/test_orch126_queued_stale_run.py (TC-01 mandatory regression
red->green; TC-02..TC-10). Full pytest tests/ -q green (2189 passed).
Docs: internals.md (run-ownership invariant section), .env.example, CHANGELOG;
cross-cutting adr-0052.
Refs: ORCH-126
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses reviewer REQUEST_CHANGES (run 768) on ORCH-124 — docs-only,
no src/tests touched, fix scope unchanged.
P1: update docs/overview/ showcase for the new serial-gate "pause without
blocking" axis (changed task-routing functionality, ORCH-011/ORCH-079):
- tech-pipeline.md: FIFO exception "pause without blocking" next to freeze
- tech-data-model.md: durable signal tasks.paused_at on the Task row
- tech-observability.md: paused/reason in serial_gate GET /queue block +
operator endpoints POST /serial-gate/pause|resume
P2: strip leaked tool-call trailing tags (</content>/</invoke>) from 4
golden-source docs of this PR (06-adr/ADR-001, adr-0051,
08-data-requirements.md, 10-tech-risks.md).
CHANGELOG "Доки" bullet extended accordingly. Full suite green (2178 passed);
test_system_docs.py green (machine-checked showcase facts intact).
Refs: ORCH-124
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The deterministic test-runner gate (full `pytest tests/`) failed on
test_orch123_staging_runner_exec.py::test_r2_held_deploy_staging_not_rolled_back
once ORCH-124 reached the testing stage.
Root cause (pre-existing latent regress, surfaced — not introduced — by
ORCH-124): the fixture isolated `worktrees_dir` but not `repos_dir`.
`check_staging_status` falls back to `<repos_dir>/<repo>` (and its
origin/main) when the feature worktree is absent. After ORCH-123 merged,
the real `/repos/orchestrator/docs/work-items/ORCH-123/15-staging-log.md`
(verdict SUCCESS) exists on disk, so the intended-RED staging gate read it
and went green -> advance_stage was called -> the R-2 assertion failed.
Order-dependent: the test passed alone, failed in the full suite.
Fix: isolate `settings.repos_dir` to an empty tmp subdir in the fixture
(mirroring the existing worktrees_dir isolation) so the staging gate is
deterministically "not found" -> red, regardless of suite ordering. The
ORCH-123 R-2 invariant (a held deploy-staging task is never rolled back to
development, adr-0049/ADR-001 D4) is preserved and strengthened — the fix
only restores the test's stated premise. src/** / STAGE_TRANSITIONS /
QG_CHECKS / check_* untouched (test-only change).
Refs: ORCH-124
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fixes incident ORCH-116/ORCH-123: serial_gate defined a repo's "active task"
purely by machine stage (tasks.stage NOT IN ('done','cancelled')). Plane statuses
Backlog/Blocked/Needs-Input (layer-B indication, ORCH-066) do NOT change
tasks.stage (layer A), so a paused predecessor was indistinguishable from an active
one and held the FIFO gate closed against an urgent successor — the urgent fix
could not start until the paused task was formally done.
Introduces an explicit, durable, DB-resolvable per-task "park" signal — additive
nullable column tasks.paused_at (pattern of cancelled_at/track) — and a new
ORTHOGONAL scheduler "pause" axis. The serial-gate "active task" predicate becomes
`stage NOT IN ('done','cancelled') AND paused_at IS NULL` across all three points
(build_claim_clause / repo_has_active_task / _per_repo_snapshot). The terminal set
{done,cancelled} in serial_gate/task_deps/stages.py is byte-for-byte unchanged
(adr-0026 not regressed): task_deps/stages.py do NOT read paused_at, so a paused
declared dependency and an active repo_freeze STILL block (pause never bypasses
them — different axes). Anti-stale-base on resume relies on the existing deferred
branch cut (ORCH-088) + pre-merge auto_rebase_onto_main + merge-gate re-test
(ORCH-026/093/110) — no new rebase machinery.
Additive, under an independent sub-flag, never-raise, restart-safe; hot-claim
fail-OPEN and freeze fail-CLOSED preserved. STAGE_TRANSITIONS / QG_CHECKS / check_*
/ machine-verdict keys / existing table schemas are byte-for-byte untouched (this is
a queue-scheduler + observability change, not a Quality Gate).
- src/db.py: additive tasks.paused_at column (_ensure_column) + set/clear/is helpers
- src/serial_gate.py: _pause_layer_enabled() + pause-term in the 3 points; `paused`
list + per-job `reason` (freeze>dependency>active-task>null) in the /queue snapshot
- src/config.py + .env.example: serial_gate_pause_enabled (default True = true no-op)
- src/main.py: POST /serial-gate/pause|resume?work_item=<id> (by образцу unfreeze)
- tests/test_orch124_serial_gate_pause.py: TC-01 mandatory incident regress + TC-02..15
- CHANGELOG.md: [Unreleased] entry
ADR: docs/work-items/ORCH-124/06-adr/ADR-001-serial-gate-pause-without-blocking.md
Cross-cutting: docs/architecture/adr/adr-0051-serial-gate-pause-without-blocking.md
Refs: ORCH-124
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>