Field Note 18current

Enforce a quota at ingress on every endpoint — even unabused ones

By: Theo Zourzouvillys
Published: June 12, 2026
Tags: reliabilitysecurityapiinframulti-tenancy

TL;DR

Every endpoint has a quota, and it’s enforced at the ingress (the gateway/edge, before the request reaches application logic), from day one — even for endpoints nobody is abusing yet. No endpoint is unlimited by default. Limits are keyed by the dimensions that matter — per tenant, per principal/API key, per IP, and a global cap — and rejected requests get a clear, retryable signal (429 with Retry-After, per ZFN-13Field Note · currentZFN-13 — Fail fast and push back: retries, load shedding, and flow controlBuild client retries (backoff, jitter, Retry-After) from day one. Under overload, shed fast and push the failure back to the source to retry — don't retry internally and amplify it. Flow-control everywhere, bound every queue, and don't take more work than you can finish in time.Open ZFN-13 →).

The reason to do this before there’s a problem: “unlimited” is a promise you didn’t mean to make. The first runaway client, retry storm, infinite loop, or compromised credential turns an unmetered endpoint into an outage or a bill, and by then clients depend on no limit existing — so adding one breaks them. A quota that’s present from the start is just part of the contract.

Context

Rate limiting tends to get added reactively: an endpoint gets hammered, there’s an incident, and a limit is bolted on afterward. By then you’re in the worst position to add it. You don’t know a safe default, because real traffic has been shaped by the absence of one. Some client has built a batch job that fires ten thousand requests in a burst and considers that normal. Adding a limit now is a breaking change you have to negotiate, announce, and stage — instead of a property the API always had.

And the failure modes a quota guards against don’t require malice:

A buggy client in a tight loop, or a retry storm with no backoff (ZFN-13Field Note · currentZFN-13 — Fail fast and push back: retries, load shedding, and flow controlBuild client retries (backoff, jitter, Retry-After) from day one. Under overload, shed fast and push the failure back to the source to retry — don't retry internally and amplify it. Flow-control everywhere, bound every queue, and don't take more work than you can finish in time.Open ZFN-13 →), aimed at your cheapest-looking endpoint.
One tenant’s traffic starving everyone else on shared capacity — the noisy-neighbour problem that per-tenant limits and bulkheads (ZFN-2Field Note · currentZFN-2 — Engineering priority orderingWhen concerns conflict, prioritize security > correctness > availability > performance — and never trade a higher-ranked concern for a lower one. The rule binds the moment you must choose. Cite it instead of re-arguing it.Open ZFN-2 →, ZFN-15Field Note · currentZFN-15 — Partition customer data by tenant from day oneMake customer data tenant-partitioned from day one: tenant-scope every query, never join across tenants, route through a tenant→location directory. Run one physical database at first — but keep the model shardable. Retrofitting isolation onto a shared DB is brutal.Open ZFN-15 →) exist to contain.
A leaked API key used to exfiltrate data or rack up cost as fast as the key will allow — a quota caps the blast radius of a compromise.
An expensive endpoint (a report, a search, a fan-out) that’s fine at low volume and falls over the first time someone scripts it.

None of these announce themselves. The quota is cheap insurance you want already in place when they arrive, which is why “even if it’s not being abused” is the whole point.

Recommendation

Make a quota a default property of every endpoint, enforced at the front door.

No endpoint ships unlimited. Every endpoint has an explicit limit, even if generous. A sane default that you tighten later beats no limit that you scramble to add during an incident.
Enforce at ingress, cheaply, before the work. Check the limit at the gateway/edge before the request reaches expensive application logic — admission control, the same fail-fast-at-the-door move as load shedding (ZFN-13Field Note · currentZFN-13 — Fail fast and push back: retries, load shedding, and flow controlBuild client retries (backoff, jitter, Retry-After) from day one. Under overload, shed fast and push the failure back to the source to retry — don't retry internally and amplify it. Flow-control everywhere, bound every queue, and don't take more work than you can finish in time.Open ZFN-13 →). One central mechanism, applied consistently, not re-implemented per service.
Key limits by the right dimensions. Per tenant (fairness and blast radius), per principal/API key (compromise containment), per IP (crude abuse), per endpoint (protect the expensive ones), plus a global ceiling. Distinguish the kinds of limit: rate (requests per second), quota (requests per day/month), and concurrency (in-flight at once) — you usually want all three.
Reject clearly and retryably. Return 429 Too Many Requests (or the protocol’s equivalent) with Retry-After, so well-behaved clients back off instead of hammering — the coordinated-retry contract from ZFN-13Field Note · currentZFN-13 — Fail fast and push back: retries, load shedding, and flow controlBuild client retries (backoff, jitter, Retry-After) from day one. Under overload, shed fast and push the failure back to the source to retry — don't retry internally and amplify it. Flow-control everywhere, bound every queue, and don't take more work than you can finish in time.Open ZFN-13 →. Surface remaining quota in response headers where it helps.
Treat limits as per-tenant configuration. Limits are control-plane config (ZFN-16Field Note · currentZFN-16 — Separate the data plane from the control planeSplit the serving path (data plane) from the management path (control plane). The data plane keeps serving on last-known-good config when the control plane is down — never call it on the hot path. Coupling them turns a control-plane bug into a serving outage.Open ZFN-16 →, ZFN-17Field Note · currentZFN-17 — Separate configuration, state, and ephemeral dataCustomer data splits into mostly-static config, durable state, and ephemeral sessions — different access, durability, and change rates. Model and store each separately. For bounded static config, prefer loading one validated snapshot held in memory over fetching on demand.Open ZFN-17 →): set per plan/tier, adjustable per tenant, pushed to and enforced at the ingress data plane. This is also how you sell tiers and grant a trusted customer more headroom without code changes.
Observe usage so you can set and tune limits. Measure per-tenant/per-endpoint usage from the start; it’s how you pick non-arbitrary defaults, spot abuse, and right-size limits before they either bite real users or fail to protect you.

Scope. This is ingress quota/rate enforcement for inbound requests. It complements — doesn’t replace — internal concurrency bounds and bounded queues (ZFN-13Field Note · currentZFN-13 — Fail fast and push back: retries, load shedding, and flow controlBuild client retries (backoff, jitter, Retry-After) from day one. Under overload, shed fast and push the failure back to the source to retry — don't retry internally and amplify it. Flow-control everywhere, bound every queue, and don't take more work than you can finish in time.Open ZFN-13 →), which protect the tiers behind the front door.

Consequences

Easier:

A runaway client, retry storm, or leaked key is capped automatically instead of becoming an outage or a surprise bill — the limit was always there.
Multi-tenant fairness is enforced: one tenant can’t consume the shared budget.
Limits exist from day one, so they’re part of the contract clients build against, not a breaking change you have to retrofit and negotiate.
Per-tenant quotas double as a product lever (plans, tiers, trusted-customer headroom).

Harder:

Real infrastructure: a consistent ingress enforcement mechanism, a place to store and push per-tenant limits, and the usage accounting behind them (distributed rate limiting has its own subtleties).
Picking defaults takes data and judgment; too tight rejects legitimate use, too loose protects nothing. (Start with generous limits plus monitoring, then tighten.)
Legitimate bursty workloads need accommodation — burst allowances, higher tiers, or a batch path — so the limit doesn’t punish good use.

References

ZFN-13Field Note · currentZFN-13 — Fail fast and push back: retries, load shedding, and flow controlBuild client retries (backoff, jitter, Retry-After) from day one. Under overload, shed fast and push the failure back to the source to retry — don't retry internally and amplify it. Flow-control everywhere, bound every queue, and don't take more work than you can finish in time.Open ZFN-13 → — quotas are admission control at the front door; reject with 429 + Retry-After and let clients back off.
ZFN-2Field Note · currentZFN-2 — Engineering priority orderingWhen concerns conflict, prioritize security > correctness > availability > performance — and never trade a higher-ranked concern for a lower one. The rule binds the moment you must choose. Cite it instead of re-arguing it.Open ZFN-2 → and ZFN-15Field Note · currentZFN-15 — Partition customer data by tenant from day oneMake customer data tenant-partitioned from day one: tenant-scope every query, never join across tenants, route through a tenant→location directory. Run one physical database at first — but keep the model shardable. Retrofitting isolation onto a shared DB is brutal.Open ZFN-15 → — per-tenant limits enforce fairness and contain blast radius.
ZFN-16Field Note · currentZFN-16 — Separate the data plane from the control planeSplit the serving path (data plane) from the management path (control plane). The data plane keeps serving on last-known-good config when the control plane is down — never call it on the hot path. Coupling them turns a control-plane bug into a serving outage.Open ZFN-16 → / ZFN-17Field Note · currentZFN-17 — Separate configuration, state, and ephemeral dataCustomer data splits into mostly-static config, durable state, and ephemeral sessions — different access, durability, and change rates. Model and store each separately. For bounded static config, prefer loading one validated snapshot held in memory over fetching on demand.Open ZFN-17 → — limits are control-plane config pushed to the enforcing data plane.
Amazon Builders’ Library — Fairness in multi-tenant systems; token-bucket / leaky-bucket rate limiting as the usual primitives.

Changelog

2026-06-12: First published as a Field Note.