---
id: 3
title: "Default-encrypt internal service traffic"
status: current
date: 2026-06-12
authors:
  - "Theo Zourzouvillys"
tags: [security, infra, transport]
summary: "All external traffic is TLS, no exceptions. Internal traffic is encrypted by default; an internal call site may skip transport encryption (never authentication) only under a documented, audited carve-out anchored to a network-perimeter guarantee."
supersedes: null
superseded_by: null
aliases: []
---

## TL;DR

All **external** traffic — anything leaving infrastructure you control (public APIs,
browser/mobile clients, webhook deliveries, partner integrations, third-party vendors) — uses
TLS, with no exceptions. All **internal** service-to-service traffic is encrypted by default
(mTLS or a service-mesh equivalent). An internal call site may skip *transport encryption* — but
**never** authentication — if the traffic provably stays within an already-encrypted network
boundary (a single VPC or a cloud-provider-encrypted link) *and* the threat model genuinely
permits exposure to a network-layer adversary. Every carve-out site carries an inline
`ZFN-3 carve-out: <perimeter guarantee>; <threat-model justification>` comment so it's greppable
and auditable. Network-perimeter changes invalidate dependent carve-outs and trigger a re-audit
*before* the change ships.

## Context

As a system decomposes into independent services, traffic that used to be in-process function
calls becomes bytes on a wire. Every wire is a potential interception point and a surface for an
attacker who has gained a foothold in the network.

Two extreme positions are both wrong:

- **"Everything internal can be plaintext because it's in our VPC."** Cloud misconfigurations
  happen. Peering changes happen. A foothold on one host can pivot. Incident reports are full of
  cases where the network turned out not to be as isolated as the original architect assumed.
- **"Everything must be encrypted end-to-end, no exceptions."** This is the right default, but as
  an absolute it produces friction engineers route around. A loopback call between two processes
  on the same host doesn't benefit from TLS; forcing it everywhere produces ceremony and cert
  management without proportionate benefit, and erodes trust in the rule overall.

A security-first priority ordering ([ZFN-2](/zfn/2-engineering-priority-ordering/)) puts
security first. The right shape is a security-first default with a narrow, conditioned, audited
carve-out for cases where the network genuinely already provides the same guarantee transport
encryption would.

## Recommendation

**External traffic is always encrypted.** Any traffic that crosses out of infrastructure you
control — public APIs, browser and mobile clients, webhook deliveries to customers, partner
integrations, calls to third-party vendors — uses TLS, with no exceptions. The internal carve-out
below does not apply at external boundaries. If a downstream you depend on only supports plaintext,
treat it as unsupported until it offers TLS (or tunnel it through something that does). When
evaluating a vendor or partner integration, transport-encryption support is a hard requirement.

**Internal traffic is encrypted by default.** All internal service-to-service traffic uses
application-layer authentication *and* application-layer encryption (mTLS, or an equivalent
provided by the service mesh).

**Internal carve-out.** A specific internal call site, connection, or service-pair may skip
application-layer **encryption** if *all* of the following hold:

1. The traffic provably stays within a single VPC, **or** traverses only links the cloud provider
   encrypts on your behalf (e.g. a private interconnect documented as encrypted in transit).
2. The threat model for this specific traffic genuinely does not require defense against a
   network-layer adversary — i.e. what's on the wire, and what it can trigger, are acceptable to
   expose to whoever has compromised the perimeter.
3. **Authentication is still enforced at the application layer.** The carve-out covers transport
   encryption only. Authentication is *never* carved out — the receiver always verifies the
   caller's identity, because the network is not allowed to be a substitute for identity.

**Local documentation requirement:** every site that takes the internal carve-out — the YAML
config, the Terraform module, the client construction in code — carries an inline comment in this
exact form:

```
ZFN-3 carve-out: <network-perimeter guarantee>; <threat-model justification>
```

Concrete examples:

```yaml
# ZFN-3 carve-out: traffic stays inside the prod-us-east VPC and never
# leaves; payload is already-public rate-limit counters with no PII and no
# auth-bearing tokens.
encryption: none
```

```go
// ZFN-3 carve-out: loopback only — both processes run on the same pod;
// connection cannot leave the pod's network namespace.
conn, err := net.Dial("unix", "/var/run/app/metrics.sock")
```

The literal string `ZFN-3 carve-out:` is the convention so every use site is greppable:

```sh
git grep -nE 'ZFN-3 carve-out:'
```

**Re-evaluating carve-outs:** any change to the network perimeter (peering, VPC merge, region
migration, mesh topology change) invalidates the assumption underlying every dependent carve-out.
The team making the network change owns running the audit and either re-justifying or removing the
affected carve-outs *before* the change goes to production.

## Out of scope: peer identity

This note commits to *authentication happening at the application layer* on every internal call,
regardless of whether transport encryption is in use. It deliberately does **not** specify *how*
peer identity is established and verified. Several mechanisms are viable — mTLS client
certificates, signed service tokens, IAM-issued workload identity, mesh-issued SPIFFE IDs, OAuth
client credentials — each with its own trade-offs in operational complexity, key rotation,
observability, and cross-language ergonomics. That mechanism choice is a separate note; the
constraint here is only that authentication must happen at the app layer.

## Consequences

**Easier:**

- External boundaries have a single non-negotiable rule (always TLS) — no per-case debates where
  the cost of a mistake is highest.
- High-volume internal RPCs in tight loops can opt out of mTLS overhead when the security model
  genuinely permits it, with the cost being a comment.
- Local-development setup doesn't require wrangling certificates for trivial intra-host cases.
- The default-encrypted stance covers the historically-dominant failure mode: "the network turned
  out not to be what we thought."

**Harder:**

- Every carve-out is a small audit obligation in perpetuity. Reviewers must check the stated
  justification matches reality, and you must re-audit when the perimeter changes.
- The carve-out is a slippery slope by design. The local-documentation rule, the explicit
  threat-model clause, and the greppable marker exist to fight the slide — but only if used
  honestly.
- Authentication-without-encryption is a less-common operational mode and the tooling for it is
  sometimes thinner than the all-mTLS path.

**New obligations:**

- Use the `ZFN-3 carve-out:` comment marker exactly. Anything that looks like a carve-out but
  doesn't use the marker is a bug, not a carve-out.
- A periodic audit (quarterly is reasonable) walks every `ZFN-3 carve-out:` site and confirms the
  stated guarantee still holds.
- A network change that invalidates a carve-out assumption isn't complete until the dependent sites
  are reviewed and either re-justified or converted back to fully encrypted.
- Carve-outs that no longer justify themselves are removed (switched back to the default), not
  weakened.

## References

- [ZFN-1](/zfn/1-engineering-decision-records/) — the carve-out-with-local-documentation
  mechanism this note uses.
- [ZFN-2](/zfn/2-engineering-priority-ordering/) — the security-first priority ordering that
  motivates the default.

## Changelog

- **2026-06-12**: First published as a Field Note.
