Field Note 17 current
Separate configuration, state, and ephemeral data
TL;DR
Customer data in a SaaS product is not one uniform thing. It usually splits cleanly into three categories with very different characteristics, and you should model and store them according to their differences rather than jamming everything into one shape:
- Configuration — the mostly-static setup of the product for a tenant: rules, settings, policies, entitlements, schemas. Changes rarely (an admin action), read constantly, small and bounded.
- State — the durable operational/business data the product manages: records, transactions, entities. Grows and changes as the product is used; the system of record.
- Ephemeral / session — short-lived and often reconstructable: sessions, in-flight workflow state, presence, caches, temporary tokens. High churn, disposable, doesn’t want durable-store guarantees.
For the mostly-static, bounded parts — configuration especially, and sometimes a tenant’s whole application state — consider loading the entire set as a single, versioned snapshot rather than working out per request which pieces to fetch on demand. A snapshot is easier to validate (the whole thing is consistent, all at once), trivially cacheable / holdable in memory, and gives every request an internally-consistent view. This is the same configuration-as-control-plane-state move as ZFN-16: load it, validate it, hold it, fail static on last-known-good.
Context
When all customer data is modelled as one undifferentiated pile — every entity in the same store, read the same way, with the same durability and consistency assumptions — you end up paying the wrong cost everywhere. The categories genuinely differ:
- Configuration is read on nearly every request and written almost never. Fetching it piecemeal, per request, from the operational database is pure overhead: the same rarely-changing rows, queried endlessly, reassembled into the same in-memory structure each time, with no guarantee that the rows you read are even mutually consistent if someone is mid-edit.
- Operational state is the opposite: it grows without bound, changes constantly, and is the thing that actually needs durability, partitioning (ZFN-15), and careful query design. You can’t hold all of it in memory and you shouldn’t try.
- Ephemeral/session data wants none of the durable store’s guarantees. Persisting it in your system of record makes that store hotter and more fragile for data you could rebuild or simply lose. It belongs in something built for churn (an in-memory store, a cache, short-TTL storage), where its failure is isolated.
Conflating them means config reads compete with operational writes, session churn pollutes the durable store, and you reason about consistency and durability for everything at the strictness the hardest case needs. Separating them lets each take the treatment it actually wants.
Recommendation
Identify the categories in your data model and treat each according to its nature.
-
Name the split. For each kind of customer data, ask: is this configuration (static, small, read-heavy, admin-changed), state (durable, growing, changes with use), or ephemeral (short-lived, high-churn, reconstructable)? The boundaries are usually clearer than they first look.
-
Snapshot the static, bounded data; don’t fetch it on demand. For configuration — and for any tenant dataset that is bounded and mostly-static — load the whole set as one versioned snapshot, validate it as a unit, and cache it / keep it in memory. The advantages compound:
- Easier to validate. A whole snapshot can be checked for internal consistency and invariants at once — cross-references resolve, rules are coherent — instead of discovering a broken reference at request time on the one path that hits it (ZFN-14 validation applies here too).
- Consistent view. Every request sees one coherent version; no torn reads across config that’s mid-update.
- Cheap to serve. An in-memory copy answers config reads at zero latency and zero load on the operational store.
- Versioned and reloadable. Bump a version, rebuild and re-validate the snapshot, swap it in — and you get atomic config rollout and easy rollback for free.
-
Treat configuration as control-plane state. The snapshot is exactly the ZFN-16 pattern: config is control-plane data, pushed to and cached by the serving path, which fails static on the last-known-good snapshot if the source is unreachable. No per-request control-plane lookup.
-
Give operational state and ephemeral data their own homes. Keep durable state in the partitioned system of record; keep ephemeral/session data in a store built for churn with short TTLs, where losing it is survivable and its load doesn’t touch the durable path.
-
Know when not to snapshot. The whole-snapshot move works because the data is bounded and changes slowly. Large, unbounded, or high-churn data — most operational state, anything per-event — can’t be loaded wholesale; that’s where on-demand queries, partitioning, and proper indexing belong. The skill is matching the access strategy to the category, not applying one everywhere.
Consequences
Easier:
- Config reads stop hammering the operational store and stop competing with writes; they’re served from memory, consistently, at a known version.
- Validating a whole config/state snapshot up front catches inconsistencies before they reach production, instead of surfacing as a runtime error on a rare path.
- Atomic config rollout and rollback fall out of versioned snapshots; ephemeral data can fail or be flushed without endangering durable records.
- Each category scales and is operated on its own terms.
Harder:
- More than one storage strategy to build and operate, and a snapshot build/validate/publish pipeline for the static data.
- Snapshots can go stale; you need versioning, a refresh/reload path, and bounded-staleness reasoning (the ZFN-16 trade-off).
- The category boundaries take judgment, and some data is genuinely in between (slowly-changing state) and needs a deliberate call on which treatment fits.
- Holding state in memory bounds how much there can be — fine for config, a real constraint you must respect for anything that can grow.
References
- ZFN-16 — configuration is control-plane state; a cached, fail-static snapshot is how the data plane holds it.
- ZFN-15 — operational state is per-tenant and partitioned; the snapshot is per-tenant too.
- ZFN-14 — validate the snapshot against a schema; a whole consistent set is the easiest thing to validate.
Changelog
- 2026-06-12: First published as a Field Note.