fix(launcher): runs log dir from settings, not hardcoded /app (CI fix)

test_spawn_stamps_resolved_effort упал в CI с PermissionError на '/app': launcher._spawn хардкодил output_path='/app/data/runs/{run_id}.log' и os.makedirs('/app/data/runs'). В контейнере /app есть, на CI-хосте (act_runner hostexecutor) — нет, makedirs бросает -> красный CI. Фикс корня (не только теста): базовый каталог per-run логов вынесен в Settings.runs_dir (env ORCH_RUNS_DIR, дефолт '/app/data/runs' = прод 1:1). Новый хелпер _run_log_path(run_id) — единый источник пути, использован в _spawn + три прежних inline-строки логов/алертов. Тест monkeypatch-ит settings.runs_dir на tmp_path -> окружение-независим (проверено прогоном с принудительно недоступным /app). pytest tests/ -q: 1090 passed. STAGE_TRANSITIONS/QG_CHECKS/схема БД не тронуты. Docs: README env-таблица, CHANGELOG. Refs: ORCH-087 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 09:55:08 +03:00
parent a7b27f2235
commit 81fc2df8a8
5 changed files with 24 additions and 4 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -3,6 +3,7 @@
 Формат: [Keep a Changelog](https://keepachangelog.com/). Записи — на смысловой PR/задачу.

 ## [Unreleased]
+- **CI-фикс: per-run путь логов из хардкода `/app/data/runs` в `settings.runs_dir`** (ORCH-087, `fix`): тест `tests/test_launcher.py::TestEffortStamp::test_spawn_stamps_resolved_effort` падал в CI (`PermissionError: [Errno 13] … '/app'`) — зелёный локально-в-контейнере (где `/app` есть), красный на CI-хосте (act_runner hostexecutor, юзер без доступа к `/app`). **Корень:** `launcher._spawn` хардкодил `output_path="/app/data/runs/{run_id}.log"` + `os.makedirs('/app/data/runs')`, а тест дёргал `_spawn`, не замокав путь → makedirs на недоступном `/app` бросал. **Фикс (корень, не только тест):** базовый каталог per-run логов вынесен в `Settings.runs_dir` (env `ORCH_RUNS_DIR`, дефолт `/app/data/runs` — прод-layout 1:1); новый хелпер `launcher._run_log_path(run_id)` = `<settings.runs_dir>/{run_id}.log` стал единым источником пути (использован в `_spawn` + три прежних inline-строки логов/алертов). Тест `monkeypatch`-ит `settings.runs_dir` на `tmp_path` → окружение-независим (подтверждено прогоном с принудительно недоступным `/app`). `STAGE_TRANSITIONS`/`QG_CHECKS`/схема БД — без изменений. Документация: `README.md` (таблица env), `CHANGELOG.md`.
 - **Live-трекер: зачистка осиротевших карточек + эффорт в строке стадии + честное итоговое время** (ORCH-087, `fix`): в чат периодически попадали «замёрзшие» сироты — старая карточка с заголовком `📍 To Analyse` висела на задаче, реально дошедшей до `deploy` (скриншот ORCH-082). **Корень (G0/ADR-001):** указатель `tasks.tracker_message_id` — скаляр (знает лишь ПОСЛЕДНИЙ `message_id`), поэтому при рассинхроне bump-режима (доминанты: гонка двух `update_task_tracker` и `delete`-fail+`send`-ok) ссылка на прежнюю карточку терялась навсегда → сирота не удалялась и больше не обновлялась (рендер исправен — застывал именно потерянный mid). **Фикс (bump сохранён дефолтом — фича «карточка внизу» ORCH-042/067):**
  - **G1 — полный учёт mid:** аддитивная таблица-леджер `tracker_messages(task_id, message_id, created_at, deleted_at)` (`src/db.py`) + хелперы `add_tracker_message`/`get_open_tracker_messages`/`mark_tracker_message_deleted`. На каждом bump зачищаются ВСЕ незакрытые mid (`deleted_at IS NULL`), а не только скаляр: успех/«already gone» (`_DELETE_GONE_MARKERS`) → `deleted_at`; transient-`delete` → остаётся для ретрая; новый mid в леджер + `set_tracker_message_id` ТОЛЬКО при успешном `send` (R-3/BR-6). Остаточная гонка самозалечивается за один переход (лок не вводится). Скаляр `tracker_message_id` сохранён (BC). Known-limitation: Telegram 48ч (сироты старше неудаляемы).
  - **G3 — deploy-цикл:** в `_LIVE_BRANCH_LABELS` добавлен ключ `confirm_deploy` («⏳ Confirm Deploy — подтвердите прод-деплой», без base-alias) → полнота `Awaiting Deploy → Deploying → Confirm Deploy → Monitoring → Done`.
--- a/README.md
+++ b/README.md
@@ -121,6 +121,7 @@ uvicorn src.main:app --reload --port 8500
 | `ORCH_REPOS_DIR` | Repos dir (container) | `/repos` |
 | `ORCH_HOST_REPOS_DIR` | Repos dir (host) | `/home/slin/repos` |
 | `ORCH_DB_PATH` | SQLite path | `/app/data/orchestrator.db` |
+| `ORCH_RUNS_DIR` | Базовый каталог per-run логов агентов (`<runs_dir>/{run_id}.log`, ORCH-087) | `/app/data/runs` |
 | `ORCH_MAX_CONCURRENCY` | Сколько jobs воркер запускает параллельно (ORCH-1) | `1` |
 | `ORCH_QUEUE_POLL_INTERVAL` | Период опроса очереди воркером, сек (ORCH-1) | `2.0` |
 | `ORCH_PREFLIGHT_CACHE_TTL` | Кэш preflight (CLI/net), сек (ORCH-1 resilience) | `45` |
--- a/src/agents/launcher.py
+++ b/src/agents/launcher.py
@@ -223,6 +223,16 @@ def resolve_agent_effort(agent: str, project_id: str = None) -> str:
    return value


+def _run_log_path(run_id):
+    """Absolute path of a per-run agent log: ``<settings.runs_dir>/<run_id>.log``.
+
+    ORCH-087: single source of truth for the log path so it follows
+    ``settings.runs_dir`` everywhere (no hardcoded ``/app/data/runs``), which keeps
+    ``_spawn`` writable on non-container hosts (CI) where ``/app`` is inaccessible.
+    """
+    return os.path.join(settings.runs_dir, f"{run_id}.log")
+
+
 def prune_run_logs(runs_dir, keep_days=30, keep_max=500, active_paths=None):
    """L-2: best-effort rotation of per-run logs (<runs_dir>/*.log).

@@ -461,7 +471,7 @@ class AgentLauncher:
            conn.commit()

        # Prepare output log path
-        output_path = f"/app/data/runs/{run_id}.log"
+        output_path = _run_log_path(run_id)
        os.makedirs(os.path.dirname(output_path), exist_ok=True)

        # Build the claude command
@@ -823,7 +833,7 @@ class AgentLauncher:
            if task_row and agent != "deployer":  # deployer handled above
                _tid, _wid = task_row
                from ..notifications import send_telegram, link_for
-                send_telegram(f"\u26a0\ufe0f {link_for(_wid, _tid)}: Agent {agent} failed (exit_code={exit_code}). Check logs: /app/data/runs/{run_id}.log")
+                send_telegram(f"\u26a0\ufe0f {link_for(_wid, _tid)}: Agent {agent} failed (exit_code={exit_code}). Check logs: {_run_log_path(run_id)}")

        # Feature 4 + ORCH-016: post the unified per-agent status comment under
        # that agent's bot, threading the wall-clock duration we just measured
@@ -885,7 +895,7 @@ class AgentLauncher:

            # Classify the failure from the agent log tail (no token cost).
            kind, retry_after = "permanent", None
-            log_path = output_path or f"/app/data/runs/{run_id}.log"
+            log_path = output_path or _run_log_path(run_id)
            try:
                kind, retry_after = classify_log_file(log_path)
            except Exception:
@@ -948,7 +958,7 @@ class AgentLauncher:
            from ..notifications import send_telegram
            send_telegram(
                f"\U0001f6a8 Job {job_id} ({agent}, repo {job.get('repo')}) "
-                f"failed: {why}. Logs: /app/data/runs/{run_id}.log"
+                f"failed: {why}. Logs: {_run_log_path(run_id)}"
            )
        except Exception:
            pass
--- a/src/config.py
+++ b/src/config.py
@@ -44,6 +44,10 @@ class Settings(BaseSettings):
    repos_dir: str = "/repos"
    host_repos_dir: str = "/home/slin/repos"
    worktrees_dir: str = "/repos/_wt"  # ORCH-2 / S-4: isolated worktree per task/branch
+    # ORCH-087: base dir for per-run agent logs (<runs_dir>/<run_id>.log). Lifted out
+    # of the hardcoded '/app/data/runs' so tests (and any non-container host) can point
+    # it at a writable path; default preserves the container layout.
+    runs_dir: str = "/app/data/runs"

    # DB
    db_path: str = "/app/data/orchestrator.db"
--- a/tests/test_launcher.py
+++ b/tests/test_launcher.py
@@ -363,6 +363,10 @@ class TestEffortStamp:
        repo = "orchestrator"
        (tmp_path / repo).mkdir()
        monkeypatch.setattr(L.settings, "repos_dir", str(tmp_path), raising=False)
+        # ORCH-087: per-run log dir must be writable on a non-container host (CI runs
+        # as a plain user where '/app' is denied). Point it at tmp_path so _spawn's
+        # makedirs/open never touch the hardcoded '/app/data/runs'.
+        monkeypatch.setattr(L.settings, "runs_dir", str(tmp_path / "runs"), raising=False)
        monkeypatch.setattr(L, "ensure_worktree", lambda r, b: str(tmp_path / repo))
        monkeypatch.setattr("src.projects.get_project_by_repo", lambda r: None)