---
id: 15
title: "Partition customer data by tenant from day one"
status: current
date: 2026-06-12
authors:
  - "Theo Zourzouvillys"
tags: [architecture, data, multi-tenancy, scalability, security]
summary: "Make customer data tenant-partitioned from day one: tenant-scope every query, never join across tenants, route through a tenant→location directory. Run one physical database at first — but keep the model shardable. Retrofitting isolation onto a shared DB is brutal."
supersedes: null
superseded_by: null
aliases: []
---

## TL;DR

Treat **tenant as a first-class partition dimension** of your data model from the very beginning.
Never assume "one database holds all customers." Concretely: every record is owned by a tenant, every
query is **scoped to a single tenant**, you **never do direct cross-tenant joins**, and code reaches a
tenant's data through a **tenant → location directory** (which physical shard/database/region) rather
than a hardcoded connection.

You do **not** have to actually shard on day one — run a single physical database if that's all you
need. What you must not do is *foreclose* sharding: keep the logical model partitioned so that moving a
tenant to its own shard, its own region, or dedicated hardware later is an **operational** change, not
a rewrite. If you build on the assumption that everything lives in one shared datastore and tenants are
freely joined together, you will hit a wall — a scaling ceiling, a blast radius, a compliance ask you
can't satisfy — at the worst possible time, and unwinding it is one of the most painful migrations
there is.

## Context

The frictionless early choice is a single shared database with a `customer_id` column and queries that
range freely across everyone. It's simple and it works — right up until any of these arrive, usually
together and usually under load:

- **A scaling ceiling.** One database is one box (plus read replicas). When the working set, write
  throughput, or table size outgrows the biggest instance you can buy, you have nowhere to go — you
  can't split it, because everything assumes one shared store.
- **Blast radius.** One tenant's runaway query, lock, hot partition, or schema migration degrades
  *everyone*. There's no bulkhead between customers ([ZFN-2](/zfn/2-engineering-priority-ordering/)).
- **Noisy neighbours and no tiering.** A few huge tenants dominate a shared store, and you can't move
  them to dedicated capacity without the ability to place a tenant somewhere specific.
- **Data residency and compliance.** A customer requires their data in a particular region, or a
  clean, provable **per-tenant delete/export** (GDPR). Both are trivial if a tenant is a unit you can
  locate and operate on, and nearly impossible if everyone is interleaved in shared tables.
- **Isolation as security.** A single missing `WHERE tenant_id = ?` is a cross-tenant data leak — the
  data-layer form of the confused-deputy problem ([ZFN-10](/zfn/10-verify-resource-owner/)). Scoping
  every access to a tenant is a security boundary, not only a scaling one.

The cruelty is that none of this hurts early, so the shared-everything assumption gets baked deep into
the schema and the query layer — and then retrofitting partitioning means rewriting data access,
untangling cross-tenant joins, and migrating live customer data with zero loss. That's the migration
people lose quarters to. Designing for it up front costs very little; backing into it costs enormous.

## Recommendation

**Make tenant a first-class partition key now; defer *physical* sharding until you need it.**

- **Tenant-key everything.** Every record carries its owning tenant. Every read and write is scoped to
  one tenant; tenant scope is enforced centrally (a data-access layer, row-level security, or a
  per-tenant connection/schema) so it can't be forgotten at a call site.
- **Never join across tenants.** This is the rule that keeps the model shardable: if no query joins or
  ranges across tenants, a tenant's data can be relocated to its own shard/region without breaking a
  single query. Cross-tenant joins are what weld tenants together and make sharding impossible later.
- **Route through a tenant → location directory.** Code resolves *where* a tenant's data lives (shard,
  database, region) through a lookup, never a hardcoded "the database." On day one every tenant can map
  to the same single instance — the indirection is what lets you split later by changing the mapping,
  not the code.
- **Keep IDs partition-safe.** Don't assume globally-sequential IDs from one store; use tenant-scoped or
  globally-unique-but-tenant-tagged identifiers, and avoid cross-tenant foreign keys, so records stay
  valid after a tenant moves.
- **Do cross-tenant work on a separate path.** Org-wide analytics, billing rollups, and admin views go
  through an **offline/analytical** path — replicate to a warehouse via CDC/ETL and aggregate there —
  **not** by joining across the live operational partitions. Keep the operational store strictly
  per-tenant.
- **Run a single shard until you don't need to.** Don't build a sharding control plane prematurely.
  One physical database behind the directory and the tenant-scoped access layer is fine — you've kept
  the *option* to shard cheap, which is the whole point. Add real shards when scale, blast-radius,
  residency, or tiering demands it.

**Scope.** This is about operational customer/tenant data. Genuinely global, non-tenant reference data
(say, a shared catalog) can live in its own store — just don't let operational tenant data join against
it in a way that assumes co-location.

## Consequences

**Easier:**

- You can actually scale out — add shards, move big tenants to dedicated capacity, place a tenant's data
  in a required region — because the model never assumed one store.
- Blast radius shrinks to a tenant (or a shard), not the whole customer base; per-tenant restore,
  migrate, export, and hard-delete become routine operations.
- Tenant isolation is enforced and auditable, closing the cross-tenant-leak class of bug.
- The painful "shard a live monolith database" migration never has to happen.

**Harder:**

- More discipline up front: a tenant-scoped access layer, a location directory, and the standing rule
  against cross-tenant joins — even while everything still lives on one instance and the structure looks
  like overhead.
- Cross-tenant reporting needs a separate analytical path instead of a convenient `JOIN`, so org-wide
  questions take a deliberate pipeline.
- Some global invariants and uniqueness constraints are harder across partitions and need designing
  (tenant-scoped uniqueness, or a separate authority).
- You carry the indirection before you reap the scaling benefit — accepted, because the alternative is
  not being *able* to shard when you must.

## References

- [ZFN-10](/zfn/10-verify-resource-owner/) — tenant scoping as a security boundary; a missing tenant
  predicate is the data-layer confused deputy (cross-tenant leak).
- [ZFN-2](/zfn/2-engineering-priority-ordering/) — per-tenant/per-shard bulkheads keep one customer
  from taking the rest down.
- [ZFN-1](/zfn/1-engineering-decision-records/) — this is a hard-to-reverse architectural choice worth
  committing to deliberately and early.

## Changelog

- **2026-06-12**: First published as a Field Note.
