1 · Detection
Health signals
Expose state via status endpoints; add synthetic probes and scheduled checks. Track SLO-style thresholds: latency, error rate, webhook success.
→ healthy | degraded | critical
Offer · €15,000 – €40,000
Built for messy legacy stacks, not pristine demos — automated recovery only inside a whitelist. Every action is logged, reversible where it matters, and overrideable by humans. Detection starts with signals; diagnosis may add bounded AI-assisted triage that never rewires billing without governance.
Not unsupervised remediation. The objective is stabler uptime and clearer incident evidence—for AI, legal, and fintech stacks where reliability and trust affect revenue directly.
Architecture
Healthy / degraded / critical — then diagnose root class, recover from an allow-list, steer from a control plane, and prove outcomes.
What automation must not touch without human governance
Stack context: Workers, KV/D1/R2 patterns, Cron triggers, operator UI — complements the runtime control plane and execution governance story.