ORCH-7: cleanup + hardening (M-4 dead code + M-2 graceful timeout) #4
Reference in New Issue
Block a user
Delete Branch "feature/ORCH-7-hardening"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
ORCH-7: cleanup + hardening (M-4 + M-2). Small focused PR, no pipeline behavior change.
M-4 — remove dead code
_auto_merge_prhad 0 callers (merge is the deployer agent job). Method removed._ensure_pr(used by auto-advance) kept.grep auto_merge src/= 0 matches.M-2 — graceful timeout + configurable
_watchdog: SIGTERM -> pollos.kill(pid,0)up toagent_kill_grace_seconds-> SIGKILL only if still alive.ProcessLookupErrortolerated at every step. Recordedexit_codestays-9so ORCH-1 retry/fail is unchanged (timeout-kill classifies permanent, bounded requeue, no loop).agent_timeout_seconds=1800,agent_kill_grace_seconds=20,agent_timeout_overrides_json=""(per-agent JSON override).AGENT_TIMEOUTkept as backward-compat alias._resolve_timeout(agent)picks override else default.Tests
_resolve_timeoutoverride/default/malformed-JSON.Deploy
/healthok;/queueok (breaker closed, preflight ok).Do NOT merge — Стрим merges after review.
The watchdog used to time.sleep(timeout) then immediately SIGKILL, which cut claude off mid-write and left half-written artifacts. It now sends SIGTERM, polls os.kill(pid, 0) for up to agent_kill_grace_seconds, and only SIGKILL if the process is still alive; ProcessLookupError is tolerated at every step. Timeout is now configurable via config.py: agent_timeout_seconds (default 1800), agent_kill_grace_seconds (default 20), and agent_timeout_overrides_json for per-agent overrides (e.g. {"reviewer": 3600}). AGENT_TIMEOUT is kept as a backward-compatible alias. The recorded exit_code stays -9 so the ORCH-1 monitor retry/fail logic is unchanged (timeout-kills classify as permanent and requeue within max_attempts, no retry loop).