ACME
ACME Agent Supply Co.

Documentation

Platform documentation for reliability-minded operators.

Agent911 Support Badge

OpenClaw Reliability Stack

How OpenClaw observes, protects, and recovers AI agent systems.

OpenClaw Reliability Stack Architecture OpenClaw reliability platform architecture showing layers for observing, protecting, and recovering AI agent systems. Layers include Operator Console (OCTriage entry point), Observe Layer (FindMyAgent, RadCheck, Observe), Protection Layer (Watchdog, Sentinel, SphinxGate), Recovery Layer (ORP, Agent911, Lazarus), Memory Integrity Layer (DriftGuard, Elixir), and Agent Runtime Environment (Claude, OpenAI, local, and specialized agents). ACME AGENT SUPPLY CO. · ARCHITECTURE REFERENCE OpenClaw Reliability Stack How OpenClaw observes, protects, and recovers AI agent systems. SERIAL 77A · v4 RELIABILITY LOOP Operator Console The entry point to the OpenClaw system. OCTriage captures a deterministic proof bundle before any recovery action begins. LAYER 0 Operator Console Entry point. Start here. OCTriage OCTriage is the first-response triage terminal. It captures a deterministic, read-only proof bundle before any recovery action. Run OCTriage first, every time. OCTriage ENTRY First-response triage terminal. octriage -watch STATUS HEALTHY RELIABILITY 87 / 100 Observe Layer The Observe Layer aggregates signals about agent health and system behavior. Tools include FindMyAgent for agent presence and liveness, RadCheck for reliability scoring from 0 to 100, and Observe for signal aggregation. LAYER 1 Observe Layer Visibility. Signals before stalls. FindMyAgent FindMyAgent provides real-time agent presence state, liveness monitoring, progress signals, and needs-attention flags. Being alive does not mean the agent is working correctly. FindMyAgent Agent presence & liveness. ● monitoring RadCheck RadCheck measures system reliability scores from 0 to 100 by analyzing agent health signals and telemetry across five domains: gateway, sessions, disk, memory drift, and agent churn. Free and read-only. RadCheck Reliability scoring. 0–100. ● free tier Observe The Observe module provides unified aggregation of runtime signals from all connected agents. Its output feeds into RadCheck scoring and Sentinel anomaly detection. Observe Signal aggregation layer. ● operator action Protection Layer The Protection Layer continuously monitors runtime behavior and detects anomalies. Tools include Watchdog for heartbeat supervision, Sentinel for silent failure detection, and SphinxGate for token discipline and routing guardrails. LAYER 2 Protection Layer Continuous guardrails. Always on. Watchdog Watchdog provides heartbeat supervision with liveness probes, cron-safe cadence checks, and lock collision alerts. Port availability does not confirm process health. Watchdog Heartbeat supervision. ● monitoring Sentinel Sentinel provides continuous detection of silent failures, output deviation, and stuck runs. It watches for failure modes that logs do not surface. Sentinel Silent failure detection. ● always on SphinxGate SphinxGate enforces token discipline and routing guardrails. It separates interactive and background agent lanes, enforces token budgets, and provides a routing audit trail. SphinxGate Token discipline. Lane enforcement. ● routing guard Recovery Layer The Recovery Layer enables deterministic recovery operations. Tools include ORP (OpenClaw Recovery Protocol) for evidence-first recovery doctrine, Agent911 for read-only recovery cockpit diagnostics, and Lazarus for backup readiness verification. LAYER 3 Recovery Layer Evidence before recovery. Always. ORP — OpenClaw Recovery Protocol ORP is the recovery doctrine layer. It sequences recovery in the correct order: evidence capture, then diagnosis, then safe recovery, then verification. The order is mandatory. ORP OpenClaw Recovery Protocol. Agent911 Agent911 provides recovery cockpit diagnostics and read-only triage capabilities during system recovery operations. It aggregates protection proofs and guides operators through playbooks. Agent911 Recovery cockpit. Read-only. Lazarus Lazarus verifies backup readiness before recovery is needed. Recovery readiness degrades silently. Lazarus confirms you can return to a clean state before an incident forces you to try. Lazarus Backup readiness verification. Memory Integrity Layer The Memory Integrity Layer ensures long-term system stability and prevents behavioral drift. Tools include DriftGuard for tracking behavioral drift across agents and Elixir for deterministic agent rehydration. LAYER 4 Memory Integrity Long-horizon stability. DriftGuard DriftGuard tracks behavioral drift across agents and identifies deviations from expected system behavior. It provides drift snapshots, policy deviation flags, and run-to-run delta tracking. Systems drift before they fail. DriftGuard Behavioral drift tracking. ● monitoring Elixir Elixir provides deterministic agent rehydration following the BOOT, DIGEST, ANCHORS, ORIENTATION sequence. Agents that use Elixir return to coherent operational state reliably after restarts or context loss. Elixir Deterministic rehydration. ● BOOT → ORIENT Agent Runtime Environment The Agent Runtime Environment represents the external agent systems that OpenClaw observes, protects, and recovers. This includes Anthropic Claude agents, OpenAI agents, self-hosted local model agents, and specialized or multi-modal agent systems. LAYER 5 Agent Runtime Environment The systems being protected. Claude Agents Anthropic Claude agent runtime environment. OpenClaw provides observability, protection, and recovery for agents running on Anthropic models. Claude Agents Anthropic runtime. OpenAI Agents OpenAI agent runtime environment. OpenClaw provides observability, protection, and recovery for agents running on OpenAI models. OpenAI Agents OpenAI runtime. Local Agents Self-hosted local model agent runtime environment. OpenClaw supports agents running on locally deployed language models and custom infrastructure. Local Agents Self-hosted runtime. Specialized Agents Custom and multi-modal agent runtime environments including specialized agent architectures, domain-specific models, and multi-agent orchestration systems. Specialized Agents Custom / multi-modal. SIGNALS ALERTS PROOFS ACME · FIELD SUPPLY DIVISION · SERIAL 77A Observe → Protect → Recover. In that order. Run OCTriage →

OpenClaw Reliability Stack - architecture for observing, protecting, and recovering AI agent systems.

Launch Docs