Add the `watchdog/` package (thin Python-3.12 stdlib-only daemon) and the `orchestrator-watchdog` compose service — the brain half of the domain-0 observability pair. F1a (ORCH-099) exposes GET /metrics raw signal; F1b reads it, augments with host / container / dependency probes, runs each signal through a generalised pure decision function (decide(signal_active, prev, now, cooldown), a strict superset of disk_watchdog.decide_action) with per-signal in-memory dedup/throttle/recovery, and alerts over its OWN independent Telegram channel. Key properties (ADR-001): - Observer separated from observed: separate container; /metrics not answering is itself the master `orch_down` alarm (debounced K ticks — no flap on a hiccup). - Strictly read-only: docker.sock GET-only + mounted :ro (double guard), host paths :ro, no DB/disk writes, no process control — self-hosting-safe. - never-raise on three levels (per-source/per-tick/per-send) + WATCHDOG_ENABLED kill-switch (disabled -> inert idle-loop, not exit). - Disk anti-duplicate (D6): disk_watchdog (ORCH-063) stays sole owner of the 85% alert; sidecar carries orch_down + an opt-in 97% ceiling (default off). - NO import from src/** (C-1); src/**, STAGE_TRANSITIONS, QG_CHECKS, check_*, DB schema — untouched. env_file optional so a missing .env.watchdog never breaks `docker compose up` for the prod orchestrator. Tests: tests/watchdog/ (TC-01…TC-13) + full tests/ regression green (TC-14). Docs: CHANGELOG, .env.example canon (WATCHDOG_*); architecture README + adr-0033 authored at the architecture stage. Refs: ORCH-100 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
40 lines
1.2 KiB
Python
40 lines
1.2 KiB
Python
"""Dependency ping collector: reachable / unreachable / 5xx (never-raise)."""
|
|
from watchdog.collectors import deps as deps_mod
|
|
|
|
from .conftest import http_error, make_opener
|
|
|
|
|
|
def test_ping_reachable():
|
|
assert deps_mod.ping("http://x", 1.0, opener=make_opener(status=200)) is True
|
|
|
|
|
|
def test_ping_4xx_still_reachable():
|
|
# A 4xx proves the host is up (we ping for liveness, not auth).
|
|
assert deps_mod.ping("http://x", 1.0, opener=make_opener(exc=http_error(404))) is True
|
|
|
|
|
|
def test_ping_5xx_is_down():
|
|
assert deps_mod.ping("http://x", 1.0, opener=make_opener(exc=http_error(503))) is False
|
|
|
|
|
|
def test_ping_timeout_is_down():
|
|
assert deps_mod.ping(
|
|
"http://x", 1.0, opener=make_opener(exc=TimeoutError())
|
|
) is False
|
|
|
|
|
|
def test_ping_all_mixed():
|
|
def opener_factory(url):
|
|
return make_opener(status=200) if "good" in url else make_opener(
|
|
exc=ConnectionError()
|
|
)
|
|
|
|
def opener(req, timeout=None):
|
|
url = req.full_url if hasattr(req, "full_url") else req
|
|
return opener_factory(url)(req, timeout)
|
|
|
|
res = deps_mod.ping_all(
|
|
{"good": "http://good", "bad": "http://bad"}, 1.0, opener=opener
|
|
)
|
|
assert res == {"good": True, "bad": False}
|