Add the `watchdog/` package (thin Python-3.12 stdlib-only daemon) and the `orchestrator-watchdog` compose service — the brain half of the domain-0 observability pair. F1a (ORCH-099) exposes GET /metrics raw signal; F1b reads it, augments with host / container / dependency probes, runs each signal through a generalised pure decision function (decide(signal_active, prev, now, cooldown), a strict superset of disk_watchdog.decide_action) with per-signal in-memory dedup/throttle/recovery, and alerts over its OWN independent Telegram channel. Key properties (ADR-001): - Observer separated from observed: separate container; /metrics not answering is itself the master `orch_down` alarm (debounced K ticks — no flap on a hiccup). - Strictly read-only: docker.sock GET-only + mounted :ro (double guard), host paths :ro, no DB/disk writes, no process control — self-hosting-safe. - never-raise on three levels (per-source/per-tick/per-send) + WATCHDOG_ENABLED kill-switch (disabled -> inert idle-loop, not exit). - Disk anti-duplicate (D6): disk_watchdog (ORCH-063) stays sole owner of the 85% alert; sidecar carries orch_down + an opt-in 97% ceiling (default off). - NO import from src/** (C-1); src/**, STAGE_TRANSITIONS, QG_CHECKS, check_*, DB schema — untouched. env_file optional so a missing .env.watchdog never breaks `docker compose up` for the prod orchestrator. Tests: tests/watchdog/ (TC-01…TC-13) + full tests/ regression green (TC-14). Docs: CHANGELOG, .env.example canon (WATCHDOG_*); architecture README + adr-0033 authored at the architecture stage. Refs: ORCH-100 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
55 lines
1.6 KiB
Python
55 lines
1.6 KiB
Python
"""Host collector: /proc/meminfo parsing + disk reads (never-raise)."""
|
|
import os
|
|
import tempfile
|
|
|
|
from watchdog.collectors import host as host_mod
|
|
|
|
|
|
def test_mem_used_pct_from_meminfo():
|
|
content = "MemTotal: 1000 kB\nMemFree: 100 kB\nMemAvailable: 250 kB\n"
|
|
with tempfile.NamedTemporaryFile("w", suffix=".meminfo", delete=False) as f:
|
|
f.write(content)
|
|
path = f.name
|
|
try:
|
|
pct = host_mod.read_mem_used_pct(path)
|
|
# used = (1 - 250/1000) * 100 = 75.0
|
|
assert pct == 75.0
|
|
finally:
|
|
os.unlink(path)
|
|
|
|
|
|
def test_mem_used_pct_missing_file_is_none():
|
|
assert host_mod.read_mem_used_pct("/no/such/meminfo") is None
|
|
|
|
|
|
def test_mem_used_pct_garbage_is_none():
|
|
with tempfile.NamedTemporaryFile("w", delete=False) as f:
|
|
f.write("totally not meminfo\n")
|
|
path = f.name
|
|
try:
|
|
assert host_mod.read_mem_used_pct(path) is None
|
|
finally:
|
|
os.unlink(path)
|
|
|
|
|
|
def test_disk_used_pct_real_path():
|
|
pct = host_mod.read_disk_used_pct("/")
|
|
assert pct is None or (0.0 <= pct <= 100.0)
|
|
|
|
|
|
def test_disk_used_pct_missing_path_is_none():
|
|
assert host_mod.read_disk_used_pct("/no/such/path/xyz") is None
|
|
|
|
|
|
def test_max_disk_used_pct_picks_worst(monkeypatch):
|
|
monkeypatch.setattr(
|
|
host_mod, "read_disk_used_pct",
|
|
lambda p: {"/a": 10.0, "/b": 80.0, "/c": None}.get(p),
|
|
)
|
|
assert host_mod.max_disk_used_pct(["/a", "/b", "/c"]) == ("/b", 80.0)
|
|
|
|
|
|
def test_max_disk_used_pct_all_unreadable(monkeypatch):
|
|
monkeypatch.setattr(host_mod, "read_disk_used_pct", lambda p: None)
|
|
assert host_mod.max_disk_used_pct(["/a", "/b"]) is None
|