feat(metrics): lightweight read-only GET /metrics raw-signal endpoint (ORCH-099)
FND/F1a: add a versioned read-only JSON endpoint GET /metrics that exposes the
orchestrator's own raw state for the future observability sidecar F1b — active
task stages, job queue, agent-liveness (pid/runtime/cpu_ticks), and cost/tokens.
The orchestrator emits ONLY raw signal it alone knows; thresholds/alerts/history
live in the separate sidecar (observer separated from observed, BRD §1).
- src/metrics.py: new leaf collector build_metrics() (never-raise per section,
serial_gate.snapshot() pattern); envelope schema_version/generated_at/clk_tck +
stages/queue/agents/cost. _read_cpu_ticks(pid) reads utime+stime from
/proc/<pid>/stat (null on None/dead/non-Linux pid — never raises).
- src/main.py: thin @app.get("/metrics") wrapper (style of GET /queue).
- src/db.py: read-only helpers get_running_agents() (dedicated SELECT, not an
extension of the hot-path get_running_jobs()), agent_cost_totals(),
queue_retry_stats(); job_status_counts() default dict gains the cancelled key.
- src/config.py: metrics_endpoint_enabled kill-switch (default True), env
ORCH_METRICS_ENABLED via explicit validation_alias so the documented switch
actually controls the flag.
- docs: README API table row + CHANGELOG entry (contract section already added
by architect); .env.example ORCH_METRICS_ENABLED.
Strictly read-only / never-raise: STAGE_TRANSITIONS / QG_CHECKS / check_* /
machine-verdict keys / DB schema untouched; /health//status//queue byte-for-byte.
Tests: tests/test_metrics.py (TC-01..TC-11) + env-alias tests in test_config.py.
Full suite green (1482).
Refs: ORCH-099
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -1,7 +1,7 @@
|
||||
import logging
|
||||
import re
|
||||
|
||||
from pydantic import field_validator
|
||||
from pydantic import Field, field_validator
|
||||
from pydantic_settings import BaseSettings
|
||||
|
||||
|
||||
@@ -819,6 +819,17 @@ class Settings(BaseSettings):
|
||||
# 200 (was hardcoded 80). Invalid/empty value -> default (graceful, no crash).
|
||||
qg0_title_max: int = 200
|
||||
|
||||
# ORCH-099 (D8): operator off-switch for the read-only GET /metrics endpoint.
|
||||
# The env var is ORCH_METRICS_ENABLED (explicit validation_alias — the documented
|
||||
# contract name, ADR-001 D8 / README — overriding the default ORCH_ + field-name
|
||||
# mapping so the documented switch actually controls the flag). Default True ->
|
||||
# the endpoint is available out of the box (zero regression vs BRD). False ->
|
||||
# /metrics returns a minimal parsable body {"schema_version": 1, "enabled": false}
|
||||
# (200, NOT 404) so the F1b sidecar sees the off-switch explicitly. The endpoint
|
||||
# is inert / read-only anyway; the flag is a cheap self-hosting insurance on the
|
||||
# shared prod instance.
|
||||
metrics_endpoint_enabled: bool = Field(True, validation_alias="ORCH_METRICS_ENABLED")
|
||||
|
||||
@field_validator("qg0_title_max", mode="before")
|
||||
@classmethod
|
||||
def _qg0_title_max_default(cls, v):
|
||||
|
||||
Reference in New Issue
Block a user