feat(metrics): lightweight read-only GET /metrics raw-signal endpoint (ORCH-099) #111
Reference in New Issue
Block a user
Delete Branch "feature/ORCH-099-fnd-f1a-metrics-agent-liveness"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
ORCH-099 — FND/F1a: lightweight read-only
GET /metrics(raw signal for sidecar F1b)Adds a versioned, strictly read-only, never-raise JSON endpoint
GET /metricsthat exposes the orchestrator's own raw state for the future observability sidecar
F1b (
watchdog/): active task stages, job queue, agent-liveness, and cost/tokens.The orchestrator emits ONLY raw signal it alone knows — thresholds/alerts/history live
in the separate sidecar (observer separated from observed, BRD §1).
What changed
src/metrics.py(new leaf) —build_metrics() -> dict, never-raise per section(
serial_gate.snapshot()pattern). Envelopeschema_version/generated_at/clk_tckstages/queue/agents/cost._read_cpu_ticks(pid)reads utime+stimefrom
/proc/<pid>/stat(→nullonNone/dead/non-Linux pid, never raises).src/main.py— thin@app.get("/metrics")wrapper (style ofGET /queue).src/db.py— read-only helpersget_running_agents()(dedicated SELECT, notan extension of the hot-path
get_running_jobs()),agent_cost_totals(),queue_retry_stats();job_status_counts()default dict gains thecancelledkey.src/config.py—metrics_endpoint_enabledkill-switch (defaultTrue), envORCH_METRICS_ENABLEDvia explicitvalidation_aliasso the documented switchactually controls the flag.
architect);
.env.exampleORCH_METRICS_ENABLED.Invariants
Strictly read-only / never-raise:
STAGE_TRANSITIONS/QG_CHECKS/check_*/machine-verdict keys / DB schema untouched;
/health//status///queuebyte-for-byte.Self-hosting-safe (physically cannot affect the shared prod pipeline).
Tests
tests/test_metrics.py(TC-01..TC-11: envelope/4 sections, terminal exclusion, queuefields, liveness raw + cpu_ticks on a live pid, never-raise on
pid=None/dead pid/throwing source/unavailable breaker, cost aggregate + empty table, endpoint via handler,
read-only DB snapshot before/after, additivity vs
/health//status//queue, empty state,kill-switch) + env-alias tests in
test_config.py. Full suite green (1482).ADR:
docs/work-items/ORCH-099/06-adr/ADR-001-metrics-endpoint.md,cross-cutting
docs/architecture/adr/adr-0030-metrics-endpoint.md.Refs: ORCH-099
🤖 Generated with Claude Code
FND/F1a: add a versioned read-only JSON endpoint GET /metrics that exposes the orchestrator's own raw state for the future observability sidecar F1b — active task stages, job queue, agent-liveness (pid/runtime/cpu_ticks), and cost/tokens. The orchestrator emits ONLY raw signal it alone knows; thresholds/alerts/history live in the separate sidecar (observer separated from observed, BRD §1). - src/metrics.py: new leaf collector build_metrics() (never-raise per section, serial_gate.snapshot() pattern); envelope schema_version/generated_at/clk_tck + stages/queue/agents/cost. _read_cpu_ticks(pid) reads utime+stime from /proc/<pid>/stat (null on None/dead/non-Linux pid — never raises). - src/main.py: thin @app.get("/metrics") wrapper (style of GET /queue). - src/db.py: read-only helpers get_running_agents() (dedicated SELECT, not an extension of the hot-path get_running_jobs()), agent_cost_totals(), queue_retry_stats(); job_status_counts() default dict gains the cancelled key. - src/config.py: metrics_endpoint_enabled kill-switch (default True), env ORCH_METRICS_ENABLED via explicit validation_alias so the documented switch actually controls the flag. - docs: README API table row + CHANGELOG entry (contract section already added by architect); .env.example ORCH_METRICS_ENABLED. Strictly read-only / never-raise: STAGE_TRANSITIONS / QG_CHECKS / check_* / machine-verdict keys / DB schema untouched; /health//status//queue byte-for-byte. Tests: tests/test_metrics.py (TC-01..TC-11) + env-alias tests in test_config.py. Full suite green (1482). Refs: ORCH-099 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>e0d78f3035tofda1bea9b8