🎓 What to try
Brew a retry storm
Set client load ~600, retries 3 with backoff "none", then hit 🐢 DB brownout. Watch db load EXPLODE past capacity — the failures cause retries, retries cause load, load causes failures. That loop is the storm.
Backoff is the rain
Same storm, but switch api→db backoff to "exponential + jitter". Amplification drops and the system often recovers on its own. Jitter prevents synchronized retry waves (thundering herd).
Breakers cut the loop
Enable the api→db circuit breaker during a storm. It opens (fail fast), db load drops to ~0, db cools down, the half-open probe tests it, and the breaker closes. Users saw fast errors instead of hangs — and the system healed itself.
Timeouts: too tight vs too loose
With db at +800ms, a 500ms timeout fails everything (latency > timeout); a 3000ms timeout makes users wait 3s to fail. There is no free lunch — pick timeouts from the callee's real p99.