feat(lessons): machine lessons-journal — additive table + observer leaf (ORCH-098)
Step 1 ("Foundation", F2) of the self-improvement epic: formalise free-text
"lessons" from memory/ into a machine-readable `lessons` table — the foundation
for the future retrospective agent (E2), the RICE prioritiser (E3) and Стрим.
- src/lessons.py: pure never-raise observer leaf (record/get/update/snapshot),
kill-switch only, NO repo scope (observer-only; records about any repo incl.
enduro; repo cut on the read side). Slug-convention constants.
- src/db.py: additive idempotent `lessons` table in init_db() (+3 indexes);
nullable attribution columns from the start (NFR-6, _ensure_column forward-safe);
helpers record_lesson/get_lessons/update_lesson/lessons_snapshot/
lessons_recent_dup_exists (auto-dedup window).
- 4 auto-detectors (best-effort, source="auto", deduped): gate_failure
(_handle_qg_failure_rollbacks), merge_hold (_handle_merge_verify HOLD),
transient_retry (launcher._finalize_transient budget-exhaustion), deploy_degraded
(post-deploy DEGRADED -> set_repo_freeze).
- src/main.py: GET /lessons, POST /lessons, POST /lessons/{id} + read-only
`lessons` block in GET /queue; off-switch -> {"enabled": false}.
- src/config.py: lessons_enabled / lessons_query_limit_default / lessons_dedup_window_s.
- tests/test_lessons.py: TC-01..TC-12 (unit + integration), all green.
- Docs: CLAUDE.md, docs/architecture/README.md (component + schema + API), CHANGELOG.
Invariant: the journal is an OBSERVER, not a Quality Gate — STAGE_TRANSITIONS /
QG_CHECKS / check_* / machine-verdict / existing table schemas are byte-for-byte
untouched; enduro not affected. never-raise on every public fn + injection.
Refs: ORCH-098
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -1016,6 +1016,20 @@ class AgentLauncher:
|
||||
)
|
||||
self._notify_failed(job_id, agent, job, run_id,
|
||||
f"transient (rate-limit) after {tattempts} attempts")
|
||||
# ORCH-098 (FR-3c / D3): auto-record a `transient_retry` lesson ONLY on
|
||||
# budget EXHAUSTION (not on each backoff — that would be noise; the
|
||||
# valuable signal is "transients exhausted"). best-effort, never-raise,
|
||||
# deduped; can't escape into the queue-worker path.
|
||||
try:
|
||||
from ..lessons import record as record_lesson, LessonType
|
||||
record_lesson(
|
||||
LessonType.TRANSIENT_RETRY,
|
||||
task_id=job.get("task_id"), repo=job.get("repo"), agent=agent,
|
||||
root_cause=f"transient retry budget exhausted ({tattempts}/{tmax})",
|
||||
detail=err, source="auto",
|
||||
)
|
||||
except Exception as e: # noqa: BLE001 - never break the queue worker
|
||||
logger.warning(f"Job {job_id}: lessons transient_retry record failed: {e}")
|
||||
|
||||
def _finalize_permanent(self, job_id, agent, run_id, exit_code, job):
|
||||
"""Permanent (code-fault) failure -> normal attempts<max requeue, then fail."""
|
||||
|
||||
Reference in New Issue
Block a user