---
id: 5
title: "Make workload identity a platform-owned service"
status: current
date: 2026-06-12
authors:
  - "Theo Zourzouvillys"
tags: [security, auth, infra, platform]
summary: "Workload identity belongs in shared platform infrastructure, not reimplemented per service. A small token service mints short-lived tokens any service verifies. Shared keys are a fine first step; asymmetric signing the better end-state — don't let 'no PKI' block it."
supersedes: null
superseded_by: null
aliases: []
---

## TL;DR

How one service proves its identity to another is a problem every service in a decomposed system
has, and it should be solved **once, as platform infrastructure** — not reimplemented per team and
not owned by whichever product happened to build it first. The shape I recommend: a small
**security-token service (STS)** (plus a shared library) that mints **short-lived** identity tokens
for workloads, which any service consumes to mint and to verify.

**Start simple; don't let "we don't have PKI" block you.** A perfectly good first step is tokens
signed with a **shared symmetric key** (HMAC) the issuer and verifiers hold — far better than
per-service schemes or long-lived static secrets, and adoptable in an afternoon. The better
end-state is **asymmetric** signing, where only the issuer can mint and verifiers hold just a public
key; and ideally a signing key rooted in a **KMS/HSM**, with verification by anchoring to that root
(**no public key-distribution endpoint — no JWKS — to operate**). Pick the rung you can run well
today and climb later; the important move is having *one shared mechanism*, not having PKI on day
one.

New peer-authentication code consumes the platform module; introducing a *new* per-service
identity scheme is a deliberate, reviewed exception, not a local choice.

## Context

Authentication and authorization decide *whether a workload may act*. They don't, on their own,
give you a good answer to *how a workload proves which workload it is* to its peers — and once a
system is decomposed into many services, every one of them needs that answer. Left unsolved at the
platform level, it gets solved many times: each team picks its own peer-identity mechanism, the
schemes interoperate poorly, they rotate differently, they fail differently, and you end up with an
archipelago of auth code that is impossible to audit as one thing.

A few forces make a shared mechanism the right call:

- **More than one service needs it.** A capability the whole platform authenticates with should not
  be owned by one product, and should not be copy-pasted into every consumer.
- **One mechanism beats many.** Convergence on a single, well-understood mechanism means one thing
  to audit, one rotation story, one verifier, and one set of cross-language ergonomics.
- **Verification shouldn't require a key-distribution service.** If every verifier has to fetch and
  cache a rotating public-key set from an endpoint, that endpoint becomes critical infrastructure
  with its own availability and trust problems. Anchoring trust to a root you already hold avoids
  it.

## Recommendation

**Treat workload identity as platform infrastructure with a single shared contract.** Concretely:

- **A platform-owned module (and a service where a boundary needs it).** The minting, certificate
  management, verification, and token-shape logic live in one library, owned by the platform or
  security function — not by a product team. An STS *service* form exists for callers that can't or
  shouldn't hold signing material directly (it issues delegated, short-lived credentials); the
  in-process library covers the rest. Both are the same capability.

- **Short-lived tokens, signed by a key the platform controls.** Tokens are short-lived; how they're
  signed is a progression, not a prerequisite:
  - **Shared key (HMAC) — the fine first step.** The issuer and verifiers hold a shared symmetric
    secret. Simple, no certificate machinery, and already a large improvement. Its weakness is that
    every verifier can also *mint* (it holds the signing secret), so a verifier compromise is a
    minting compromise — manage and rotate the shared key accordingly.
  - **Asymmetric — the better step.** Only the issuer holds the private key; verifiers hold the
    public key and can verify but not mint. A verifier compromise no longer lets an attacker forge
    identities.
  - **KMS/HSM-rooted, CA-anchored — the ideal.** A locally held, frequently-rotated **leaf** key
    whose certificate chains to a **CA private key held in a KMS/HSM** and never extractable; the
    token carries the chain (e.g. an `x5c` header) and a verifier validates by anchoring to the CA it
    already trusts. **No JWKS endpoint** to publish or keep available — verifiers need only the CA
    (or the KMS public key).

  Choose the highest rung you can operate well now; the shared contract should let you raise it later
  without every consumer changing how it *calls* the library.

- **Producers and verifiers use the shared code, not reimplementations.** A service that needs a
  token shape or claim the contract doesn't offer proposes a change to the contract; it does not
  fork its own.

- **The contract is a seam.** Once many services mint and verify against it, the token shape,
  claims, and roles are hard to change. Evolve it additively and review changes the way you'd review
  any cross-team interface — guard the module with code owners.

**Pair it with sender-constraint.** Identity tokens are still bearer tokens unless you bind them to
a holder key, so a stolen token is replayable. Combine this with proof-of-possession binding
([ZFN-6](/zfn/6-sender-constrained-tokens-dpop/)) so theft of a token alone isn't enough — or
sign the request/message itself ([ZFN-7](/zfn/7-sign-the-message/)).

**Scope.** This note is about *workload (service) identity* — one service proving which service it
is to another. It's complementary to how a workload authenticates to a *cloud provider*
([ZFN-9](/zfn/9-no-long-lived-cloud-keys/)); a single service typically uses both.

## Consequences

**Easier:**

- One workload-identity mechanism across the platform — one rotation story, one verifier, one audit
  surface — instead of per-team schemes.
- New services get authenticated identity by consuming a module, not by building crypto.
- Verification needs no key-distribution service; at the top rung trust anchors to a KMS-held root
  with no JWKS endpoint to operate, and you can start far simpler with a shared key.

**Harder:**

- Whoever owns the module takes on a capability the whole system depends on — with the on-call,
  versioning, and cross-language maintenance burden that implies. This is a real organizational
  shift, not just a code move.
- The contract becomes load-bearing; evolution must be additive and reviewed as a seam.
- Consumers that hold signing or private-key material inherit a blast-radius obligation: bound key
  residency, handle rotation/eviction carefully, never log it.
- Picking one mechanism forecloses others (mTLS-only, SPIFFE) for the cases they might have fit
  better. A single well-supported path beats a best-fit-per-case patchwork.

**New obligations:**

- New peer-authentication code uses the platform module; a *new* per-service scheme requires a
  reviewed exception, not a local decision.
- Contract changes (claims, roles, trust distribution) are code-owner-guarded and additive where
  possible.
- Consumers holding key material document and bound their custody (residency, rotation, eviction,
  no-logging).

## References

- [ZFN-6](/zfn/6-sender-constrained-tokens-dpop/) — sender-constrained tokens (DPoP): bind these
  identity tokens to a holder key so a stolen one is useless.
- [ZFN-7](/zfn/7-sign-the-message/) — signing the request (and ideally response) itself, an
  alternative or complement to bearer identity tokens.
- [ZFN-9](/zfn/9-no-long-lived-cloud-keys/) — authenticating to a cloud provider (the other
  half of a service's identity story).

## Changelog

- **2026-06-12**: First published as a Field Note.
