Add the `watchdog/` package (thin Python-3.12 stdlib-only daemon) and the
`orchestrator-watchdog` compose service — the brain half of the domain-0
observability pair. F1a (ORCH-099) exposes GET /metrics raw signal; F1b reads it,
augments with host / container / dependency probes, runs each signal through a
generalised pure decision function (decide(signal_active, prev, now, cooldown),
a strict superset of disk_watchdog.decide_action) with per-signal in-memory
dedup/throttle/recovery, and alerts over its OWN independent Telegram channel.
Key properties (ADR-001):
- Observer separated from observed: separate container; /metrics not answering is
itself the master `orch_down` alarm (debounced K ticks — no flap on a hiccup).
- Strictly read-only: docker.sock GET-only + mounted :ro (double guard), host
paths :ro, no DB/disk writes, no process control — self-hosting-safe.
- never-raise on three levels (per-source/per-tick/per-send) + WATCHDOG_ENABLED
kill-switch (disabled -> inert idle-loop, not exit).
- Disk anti-duplicate (D6): disk_watchdog (ORCH-063) stays sole owner of the 85%
alert; sidecar carries orch_down + an opt-in 97% ceiling (default off).
- NO import from src/** (C-1); src/**, STAGE_TRANSITIONS, QG_CHECKS, check_*, DB
schema — untouched. env_file optional so a missing .env.watchdog never breaks
`docker compose up` for the prod orchestrator.
Tests: tests/watchdog/ (TC-01…TC-13) + full tests/ regression green (TC-14).
Docs: CHANGELOG, .env.example canon (WATCHDOG_*); architecture README + adr-0033
authored at the architecture stage.
Refs: ORCH-100
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Both compose services (orchestrator, orchestrator-staging) now declare
user: "1000:1000" so pipeline artifacts (git worktree, docs/work-items
commits) are created as slin:slin on the host — git pull/reset under slin
no longer fail with permission errors. docker.sock access preserved via
group_add: ["999"]. SSH mount target aligned with the launcher-forced
HOME=/home/slin (/root/.ssh -> /home/slin/.ssh). launcher.py and Dockerfile
unchanged. INFRA.md and CHANGELOG.md updated; host-prerequisites (P-1..P-4)
documented.
Refs: ORCH-040
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add orchestrator-staging compose service under profile 'staging'
so normal 'docker compose up -d' does NOT start it.
- Port 8501 via command override; network_mode: host (no ports mapping needed).
- DB isolation via separate volume ./data/staging:/app/data — physically
separate from prod ./data/orchestrator.db on the host.
- ORCH_DB_PATH=/app/data/orchestrator.db explicit in env (same container
path, isolated by volume mount).
- Add .env.staging.example with all required keys and placeholders.
- Update .gitignore: add .env.staging and data/staging/ exclusions.
- Add docs/STAGING.md: how to start staging, architecture table, roadmap.
Refs: ORCH-31 (Stage 1 of 5)