Commit Graph

3 Commits

Author SHA1 Message Date
10d5e9fcaa feat(security): security-gate (gitleaks secret-scan + pip-audit) before merge
Some checks failed
CI / test (push) Failing after 19s
CI / test (pull_request) Failing after 18s
Add a deterministic (no-LLM) security sub-gate on the deploy-staging -> deploy
edge, run FIRST (before merge-gate ORCH-043 and image-freshness ORCH-058) so it
fails cheaply before any expensive rebase/rebuild, and scans origin/main..HEAD
before rebase so a task is never blamed for a CVE introduced by an updated main.

Why: the autonomous pipeline merged branches into main with no check for a leaked
secret or a vulnerable dependency. For the self-hosting orchestrator (one shared
prod instance serving every project from a shared DB) a single leak/CVE landed in
the prod of all projects (CLAUDE.md self-hosting, section 8).

- New leaf src/security_gate.py (never-raise): gitleaks (offline, fail-closed on
  tool error => secrets guarantee is unconditional) + pip-audit (best-effort;
  unreachable CVE feed degrades fail-open + loud warning by default, strict via
  security_dep_audit_fail_closed). Verdict lives ONLY in 17-security-report.md
  YAML frontmatter (write -> read-back single source of truth); FAIL is
  authoritative; missing/broken frontmatter => fail-closed.
- check_security_gate thin wrapper registered in QG_CHECKS (lazy import, no cycle).
- _handle_security_gate wired FIRST in advance_stage deploy-staging block: FAIL ->
  rollback to development + developer-retry (cap MAX_DEVELOPER_RETRIES); task_desc
  carries verbatim findings (ORCH-046 pattern). No merge-lease release (runs before
  lease acquire). Self-hosting safe: only reads/scans/writes, never deploys.
- Conditional rollout (security_gate_enabled + security_gate_repos; empty scope ->
  self-hosting only). 6 new ORCH_SECURITY_* settings.
- Infra: pinned gitleaks Go binary in Dockerfile (+curl/ca-certificates), pip-audit
  in requirements.txt, versioned .gitleaks.toml at repo root.
- STAGE_TRANSITIONS and DB schema unchanged.

Docs: docs/architecture/README.md (marked realized), CLAUDE.md (artifact 17),
CHANGELOG.md. Tests: test_security_gate.py, test_qg_security.py,
test_stage_engine_security_gate.py + updated registry/edge snapshots.

Refs: ORCH-022

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-07 17:23:13 +00:00
2f4c553fd8 feat(post-deploy): post-deploy prod monitoring + degradation reaction (ORCH-021)
Extend pipeline responsibility past deploy->done: after the terminal
transition for an applicable repo, arm a ~15min observation window that
probes prod and reacts to a degradation the restart-time health-check
missed ("green deploy, red prod").

- src/post_deploy.py: new leaf module (config + lazy qg/db only).
  Sentinel-file restart-safe state (.post-deploy-state-<repo>/<wi>/),
  no DB migration. probe_signals/classify/decide_action/run_rollback,
  all never-raise.
- Reserved-agent job `post-deploy-monitor` (no-LLM, Variant B, calque of
  deploy-finalizer): self-requeues each tick via enqueue_job.
- Deterministic classify: DEGRADED iff >= fail_threshold consecutive
  health failures OR window 5xx ratio > 5xx_threshold; fail-safe HEALTHY.
- Self-hosting invariant (BR-5/AC-8): a tick NEVER restarts the prod
  orchestrator container -> orchestrator is ALWAYS ALERT_ONLY.
- Conditionality (ORCH-35/36/43/58): kill-switch + CSV repos, empty ->
  self-hosting only.
- QG_CHECKS / STAGE_TRANSITIONS / schema unchanged (AC-12).
- Docs: CHANGELOG, CLAUDE artefact list (16-post-deploy-log.md),
  architecture README, .env.example (ORCH_POST_DEPLOY_*).

Refs: ORCH-021

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-07 14:40:06 +00:00
Dev Agent
7c68d1d812 docs(orchestrator): adopt enduro doc canon + CLAUDE.md + ADR (ORCH-9)
All checks were successful
CI / test (pull_request) Successful in 9s
2026-06-05 12:33:55 +03:00