Adds src/disk_watchdog.py — a background daemon thread modelled on
reconciler/job_reaper that measures host-FS fill via the mounted bind-paths
(/repos, /app/data) with shutil.disk_usage and Telegram-alerts the operator at
>= threshold (default 85%). The missing proactive signal: on 07.06.2026 the
mva154 host disk silently hit 100% and stalled the whole self-hosting pipeline.
- Pure decide_action(used_pct, threshold, prev, now, realert_s): alert on
crossing up, cooldown re-alert, single recovery below threshold (unit-tested
without a thread/timer; clock injected).
- measure_paths: shutil.disk_usage per path, dedup by st_dev, per-path
never-raise (a broken path never fails the tick).
- Config flags ORCH_DISK_MONITOR_* with defensive validation (threshold 1..100,
positive intervals -> default + warning). Kill-switch -> daemon does not start.
- Additive disk_monitor block in GET /queue; start/stop in main.lifespan.
- never-raise (per-path/per-tick/per-send); STAGE_TRANSITIONS/QG_CHECKS/check_*/
DB schema untouched, no migration (anti-spam state in-memory).
Tests: tests/test_disk_watchdog.py (TC-01..TC-12, 18 cases); full suite green
(1296). Docs: INFRA.md, .env.example, CHANGELOG.md (architecture/README.md +
ADRs authored at architecture stage).
Refs: ORCH-063
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>