Theo Zourzouvillys

Field Note 3 current

Default-encrypt internal service traffic

By
Theo Zourzouvillys
Published
Tags
securityinfratransport

TL;DR

All external traffic — anything leaving infrastructure you control (public APIs, browser/mobile clients, webhook deliveries, partner integrations, third-party vendors) — uses TLS, with no exceptions. All internal service-to-service traffic is encrypted by default (mTLS or a service-mesh equivalent). An internal call site may skip transport encryption — but never authentication — if the traffic provably stays within an already-encrypted network boundary (a single VPC or a cloud-provider-encrypted link) and the threat model genuinely permits exposure to a network-layer adversary. Every carve-out site carries an inline ZFN-3 carve-out: <perimeter guarantee>; <threat-model justification> comment so it’s greppable and auditable. Network-perimeter changes invalidate dependent carve-outs and trigger a re-audit before the change ships.

Context

As a system decomposes into independent services, traffic that used to be in-process function calls becomes bytes on a wire. Every wire is a potential interception point and a surface for an attacker who has gained a foothold in the network.

Two extreme positions are both wrong:

  • “Everything internal can be plaintext because it’s in our VPC.” Cloud misconfigurations happen. Peering changes happen. A foothold on one host can pivot. Incident reports are full of cases where the network turned out not to be as isolated as the original architect assumed.
  • “Everything must be encrypted end-to-end, no exceptions.” This is the right default, but as an absolute it produces friction engineers route around. A loopback call between two processes on the same host doesn’t benefit from TLS; forcing it everywhere produces ceremony and cert management without proportionate benefit, and erodes trust in the rule overall.

A security-first priority ordering (ZFN-2) puts security first. The right shape is a security-first default with a narrow, conditioned, audited carve-out for cases where the network genuinely already provides the same guarantee transport encryption would.

Recommendation

External traffic is always encrypted. Any traffic that crosses out of infrastructure you control — public APIs, browser and mobile clients, webhook deliveries to customers, partner integrations, calls to third-party vendors — uses TLS, with no exceptions. The internal carve-out below does not apply at external boundaries. If a downstream you depend on only supports plaintext, treat it as unsupported until it offers TLS (or tunnel it through something that does). When evaluating a vendor or partner integration, transport-encryption support is a hard requirement.

Internal traffic is encrypted by default. All internal service-to-service traffic uses application-layer authentication and application-layer encryption (mTLS, or an equivalent provided by the service mesh).

Internal carve-out. A specific internal call site, connection, or service-pair may skip application-layer encryption if all of the following hold:

  1. The traffic provably stays within a single VPC, or traverses only links the cloud provider encrypts on your behalf (e.g. a private interconnect documented as encrypted in transit).
  2. The threat model for this specific traffic genuinely does not require defense against a network-layer adversary — i.e. what’s on the wire, and what it can trigger, are acceptable to expose to whoever has compromised the perimeter.
  3. Authentication is still enforced at the application layer. The carve-out covers transport encryption only. Authentication is never carved out — the receiver always verifies the caller’s identity, because the network is not allowed to be a substitute for identity.

Local documentation requirement: every site that takes the internal carve-out — the YAML config, the Terraform module, the client construction in code — carries an inline comment in this exact form:

ZFN-3 carve-out: <network-perimeter guarantee>; <threat-model justification>

Concrete examples:

# ZFN-3 carve-out: traffic stays inside the prod-us-east VPC and never
# leaves; payload is already-public rate-limit counters with no PII and no
# auth-bearing tokens.
encryption: none
// ZFN-3 carve-out: loopback only — both processes run on the same pod;
// connection cannot leave the pod's network namespace.
conn, err := net.Dial("unix", "/var/run/app/metrics.sock")

The literal string ZFN-3 carve-out: is the convention so every use site is greppable:

git grep -nE 'ZFN-3 carve-out:'

Re-evaluating carve-outs: any change to the network perimeter (peering, VPC merge, region migration, mesh topology change) invalidates the assumption underlying every dependent carve-out. The team making the network change owns running the audit and either re-justifying or removing the affected carve-outs before the change goes to production.

Out of scope: peer identity

This note commits to authentication happening at the application layer on every internal call, regardless of whether transport encryption is in use. It deliberately does not specify how peer identity is established and verified. Several mechanisms are viable — mTLS client certificates, signed service tokens, IAM-issued workload identity, mesh-issued SPIFFE IDs, OAuth client credentials — each with its own trade-offs in operational complexity, key rotation, observability, and cross-language ergonomics. That mechanism choice is a separate note; the constraint here is only that authentication must happen at the app layer.

Consequences

Easier:

  • External boundaries have a single non-negotiable rule (always TLS) — no per-case debates where the cost of a mistake is highest.
  • High-volume internal RPCs in tight loops can opt out of mTLS overhead when the security model genuinely permits it, with the cost being a comment.
  • Local-development setup doesn’t require wrangling certificates for trivial intra-host cases.
  • The default-encrypted stance covers the historically-dominant failure mode: “the network turned out not to be what we thought.”

Harder:

  • Every carve-out is a small audit obligation in perpetuity. Reviewers must check the stated justification matches reality, and you must re-audit when the perimeter changes.
  • The carve-out is a slippery slope by design. The local-documentation rule, the explicit threat-model clause, and the greppable marker exist to fight the slide — but only if used honestly.
  • Authentication-without-encryption is a less-common operational mode and the tooling for it is sometimes thinner than the all-mTLS path.

New obligations:

  • Use the ZFN-3 carve-out: comment marker exactly. Anything that looks like a carve-out but doesn’t use the marker is a bug, not a carve-out.
  • A periodic audit (quarterly is reasonable) walks every ZFN-3 carve-out: site and confirms the stated guarantee still holds.
  • A network change that invalidates a carve-out assumption isn’t complete until the dependent sites are reviewed and either re-justified or converted back to fully encrypted.
  • Carve-outs that no longer justify themselves are removed (switched back to the default), not weakened.

References

  • ZFN-1 — the carve-out-with-local-documentation mechanism this note uses.
  • ZFN-2 — the security-first priority ordering that motivates the default.

Changelog

  • 2026-06-12: First published as a Field Note.