Theo Zourzouvillys

Field Note 13 current

Fail fast and push back: retries, load shedding, and flow control

By
Theo Zourzouvillys
Published
Tags
reliabilityarchitectureinfraresilience

TL;DR

How a system behaves at the edge of its capacity decides whether a load spike is a blip or an outage — and the behaviors that save you have to be designed in from the start, because they’re a contract between caller and callee, not a feature you bolt on later. The rules:

  • Add client-side retries from day one — exponential backoff, jitter, idempotency, and a retry budget/circuit breaker so retries can’t become a storm. Honor Retry-After.
  • Load-shed quickly. When you’re over capacity, reject fast and cheap at the front door (e.g. 429/503 with Retry-After) rather than accepting the work and failing slowly. A fast failure the caller can act on beats a slow one that has already timed out.
  • Push shed failures back to the source to retry — don’t retry internally. Internal retries deep in the stack multiply load (retries compound at every layer) and hide the backpressure signal. The original caller has the context to retry sanely; let the failure propagate to it.
  • Flow-control everywhere. Every boundary applies backpressure: bounded concurrency, admission control, and bounded queues. A full bounded queue is the “slow down” signal — let it shed, don’t let it grow.
  • Don’t take more work than you can realistically finish in reasonable time. Accept work only if you can complete it within its deadline; drop work whose deadline has already passed instead of doing doomed work.

Context

Most systems are tested where they have headroom, so their overload behavior is whatever fell out by accident — and what usually falls out is the worst option: accept everything, queue it without bound, slow down, time out, and retry internally. That combination is how a brief spike becomes a metastable failureMetastable Failures in Distributed Systems (HotOS 2021)Names and characterises metastable failures: a trigger pushes a system into a degraded state that then sustains itself through a feedback loop — often retries amplifying load — and persists even after the original trigger is gone, so the system won't recover on its own until the load is removed or capacity is added.sigops.org ↗ — the system stays down even after the original trigger is gone, because it’s now generating its own load:

  • Unbounded queues absorb the overload invisibly until latency and memory explode; by the time an item is processed, the caller has long since given up, so the work is wasted and it pushed out work that still mattered.
  • Slow failures hold connections, threads, and memory while they fail, so overload in one place becomes resource exhaustion everywhere — a cascading failure.
  • Retries layered at every hop turn one client retry into an exponential fan-out: if each of three layers retries three times, one request becomes twenty-seven downstream calls, precisely when the downstream is already drowning. This is the classic retry storm.

You can’t retrofit your way out of this cheaply, because retries, idempotency, deadlines, and backpressure are part of the interface between services. If clients were written without retries and backoff, every caller is already wrong; if a queue was unbounded, everything downstream assumed it would always accept. These properties have to be there from the start.

Recommendation

Design for the overloaded case explicitly, and make the whole path push back.

Build retries into clients from the start — and make them safe.

  • Backoff with jitter. Exponential backoff so retries space out; jitter so a thousand clients don’t retry in lockstep and re-synchronize the spike.
  • Honor Retry-After. When a server sheds or rate-limits, it should say when to come back (Retry-After on 429/503); clients obey it. This converts blind retry into coordinated retry and is the single cheapest defense against retry storms.
  • Bound retries. A retry budget (retries capped as a fraction of total requests) and/or a circuit breaker so a struggling dependency gets less traffic, not more. Retrying forever is how you keep a downstream dead.
  • Idempotency first. Retries are only safe if the operation is idempotent (idempotency keys for writes). Build that in alongside the retry, not after the first double-charge.

Shed load fast, at admission. Decide whether you can serve a request before doing expensive work — cheaply, at the front door. If you’re over your concurrency or queue limit, reject immediately with a clear, retryable signal and a Retry-After. Fast rejection lets the caller back off and try elsewhere/later; slow rejection just burns both sides’ resources and usually times out anyway. Shed the least important work first where you can (load-shedding by priority — see ZFN-2).

Retry at the source, not in the middle. When a layer sheds, propagate the failure up to the original caller and let it decide whether and when to retry. Don’t bury retries inside intermediate services: they compound across layers, they retry work the caller may no longer want, and they suppress the backpressure that should reach the edge. Retry at one level — the outermost one that owns the request and its budget.

Flow-control everywhere; bound every queue. Every boundary needs backpressure, not silent buffering:

  • Bound every queue (ZFN-12). An unbounded queue is a latent outage; a bounded one that rejects when full is a working backpressure signal.
  • Bound concurrency / admission at each tier (max in-flight, connection limits) so you process at a sustainable rate instead of accepting everything and thrashing.
  • Propagate deadlines and cancel doomed work. Carry a deadline with each request; if it’s already expired by the time you’d start (a stale queue item, a caller that’s gone), drop it rather than spend capacity on a result no one will use.
  • Take only what you can finish in time. Admission control means accepting work only when you can realistically complete it within its deadline. Promising more than you can deliver just converts into timeouts and wasted work under load.

Consequences

Easier:

  • Spikes degrade gracefully: you serve what you can at full speed and cleanly reject the rest, instead of slowing everything to a crawl and toppling over.
  • No metastable lock-up — bounded retries, Retry-After, and edge-only retries stop the system from feeding its own overload, so it recovers when the trigger passes.
  • Backpressure reaches the source, where the real decision lives: slow down, retry later, or drop.

Harder:

  • Callers must handle rejection and retry properly — this only works if clients cooperate, which is why it has to be in the SDK/contract from the start.
  • Idempotency, retry budgets, deadline propagation, and admission control are real engineering with real edge cases; “just retry on error” is easier to write and is exactly the trap.
  • Load shedding means deliberately failing some requests now to keep the system alive — a trade you have to be willing to make explicit (and shed by priority, not at random, where you can).
  • Bounded queues and admission limits need sizing and tuning, and a too-tight limit sheds work you could have served.

References

Changelog

  • 2026-06-12: First published as a Field Note.