Theo Zourzouvillys

Field Note 15 current

Partition customer data by tenant from day one

By
Theo Zourzouvillys
Published
Tags
architecturedatamulti-tenancyscalabilitysecurity

TL;DR

Treat tenant as a first-class partition dimension of your data model from the very beginning. Never assume “one database holds all customers.” Concretely: every record is owned by a tenant, every query is scoped to a single tenant, you never do direct cross-tenant joins, and code reaches a tenant’s data through a tenant → location directory (which physical shard/database/region) rather than a hardcoded connection.

You do not have to actually shard on day one — run a single physical database if that’s all you need. What you must not do is foreclose sharding: keep the logical model partitioned so that moving a tenant to its own shard, its own region, or dedicated hardware later is an operational change, not a rewrite. If you build on the assumption that everything lives in one shared datastore and tenants are freely joined together, you will hit a wall — a scaling ceiling, a blast radius, a compliance ask you can’t satisfy — at the worst possible time, and unwinding it is one of the most painful migrations there is.

Context

The frictionless early choice is a single shared database with a customer_id column and queries that range freely across everyone. It’s simple and it works — right up until any of these arrive, usually together and usually under load:

  • A scaling ceiling. One database is one box (plus read replicas). When the working set, write throughput, or table size outgrows the biggest instance you can buy, you have nowhere to go — you can’t split it, because everything assumes one shared store.
  • Blast radius. One tenant’s runaway query, lock, hot partition, or schema migration degrades everyone. There’s no bulkhead between customers (ZFN-2).
  • Noisy neighbours and no tiering. A few huge tenants dominate a shared store, and you can’t move them to dedicated capacity without the ability to place a tenant somewhere specific.
  • Data residency and compliance. A customer requires their data in a particular region, or a clean, provable per-tenant delete/export (GDPR). Both are trivial if a tenant is a unit you can locate and operate on, and nearly impossible if everyone is interleaved in shared tables.
  • Isolation as security. A single missing WHERE tenant_id = ? is a cross-tenant data leak — the data-layer form of the confused-deputy problem (ZFN-10). Scoping every access to a tenant is a security boundary, not only a scaling one.

The cruelty is that none of this hurts early, so the shared-everything assumption gets baked deep into the schema and the query layer — and then retrofitting partitioning means rewriting data access, untangling cross-tenant joins, and migrating live customer data with zero loss. That’s the migration people lose quarters to. Designing for it up front costs very little; backing into it costs enormous.

Recommendation

Make tenant a first-class partition key now; defer physical sharding until you need it.

  • Tenant-key everything. Every record carries its owning tenant. Every read and write is scoped to one tenant; tenant scope is enforced centrally (a data-access layer, row-level security, or a per-tenant connection/schema) so it can’t be forgotten at a call site.
  • Never join across tenants. This is the rule that keeps the model shardable: if no query joins or ranges across tenants, a tenant’s data can be relocated to its own shard/region without breaking a single query. Cross-tenant joins are what weld tenants together and make sharding impossible later.
  • Route through a tenant → location directory. Code resolves where a tenant’s data lives (shard, database, region) through a lookup, never a hardcoded “the database.” On day one every tenant can map to the same single instance — the indirection is what lets you split later by changing the mapping, not the code.
  • Keep IDs partition-safe. Don’t assume globally-sequential IDs from one store; use tenant-scoped or globally-unique-but-tenant-tagged identifiers, and avoid cross-tenant foreign keys, so records stay valid after a tenant moves.
  • Do cross-tenant work on a separate path. Org-wide analytics, billing rollups, and admin views go through an offline/analytical path — replicate to a warehouse via CDC/ETL and aggregate there — not by joining across the live operational partitions. Keep the operational store strictly per-tenant.
  • Run a single shard until you don’t need to. Don’t build a sharding control plane prematurely. One physical database behind the directory and the tenant-scoped access layer is fine — you’ve kept the option to shard cheap, which is the whole point. Add real shards when scale, blast-radius, residency, or tiering demands it.

Scope. This is about operational customer/tenant data. Genuinely global, non-tenant reference data (say, a shared catalog) can live in its own store — just don’t let operational tenant data join against it in a way that assumes co-location.

Consequences

Easier:

  • You can actually scale out — add shards, move big tenants to dedicated capacity, place a tenant’s data in a required region — because the model never assumed one store.
  • Blast radius shrinks to a tenant (or a shard), not the whole customer base; per-tenant restore, migrate, export, and hard-delete become routine operations.
  • Tenant isolation is enforced and auditable, closing the cross-tenant-leak class of bug.
  • The painful “shard a live monolith database” migration never has to happen.

Harder:

  • More discipline up front: a tenant-scoped access layer, a location directory, and the standing rule against cross-tenant joins — even while everything still lives on one instance and the structure looks like overhead.
  • Cross-tenant reporting needs a separate analytical path instead of a convenient JOIN, so org-wide questions take a deliberate pipeline.
  • Some global invariants and uniqueness constraints are harder across partitions and need designing (tenant-scoped uniqueness, or a separate authority).
  • You carry the indirection before you reap the scaling benefit — accepted, because the alternative is not being able to shard when you must.

References

  • ZFN-10 — tenant scoping as a security boundary; a missing tenant predicate is the data-layer confused deputy (cross-tenant leak).
  • ZFN-2 — per-tenant/per-shard bulkheads keep one customer from taking the rest down.
  • ZFN-1 — this is a hard-to-reverse architectural choice worth committing to deliberately and early.

Changelog

  • 2026-06-12: First published as a Field Note.