ACME Agent Supply Tools
Reliability, diagnostics, and recovery infrastructure for operators running AI agent systems.
RadCheck
Read-only reliability scan. Get a 0–100 score that surfaces stall risk, silence gaps, compaction pressure, and hygiene issues. Nothing changes on your system.
OCTriage
Entry-point triage terminal for incident assessment. Collect evidence, classify failures, and generate proof bundles without touching live state.
Lazarus Lite
Maps backup coverage and validates restore assumptions before incidents force the test. Know your recovery posture before you need it.
Sentinel
Always-on runtime guardrails. Detects stalls and silent failures during live operation — the problems that don't crash but quietly break your workflows.
SphinxGate
Enforces allow/deny routing policy across multi-model teams. Deterministic, inspectable routing decisions with full audit trail.
Agent911
Unified control surface for multi-agent operations. Health signals, anomaly detection, recovery guidance, and proof bundles. The 2am cockpit.
FindMyAgent
Live agent presence and heartbeat visibility. Know which agents are up, which are degraded, and which went silent.