Field Note 11current

Route outbound HTTP through an isolated egress proxy

By: Theo Zourzouvillys
Published: June 12, 2026
Tags: securityinfranetworkssrf

TL;DR

Don’t let application compute open arbitrary outbound connections directly. Any service that makes outbound HTTP — fetching user-supplied URLs, delivering webhooks, calling third-party APIs, link previews, importing remote files, SSO/OIDC metadata fetches — is an SSRF surface: an attacker-influenced URL turns your app into a deputy that can reach internal services, the cloud metadata endpoint (169.254.169.254), and localhost admin ports. Route all such egress through a proxy that runs on separate compute which physically cannot reach anything internal. The proxy’s network — not the application’s URL-validation code — is the enforcement boundary: even a fully compromised app behind it can only reach the public internet.

Build it in tiers, simplest first: (1) a SOCKS proxy on an isolated subnet whose ACLs allow only routes to the public internet and deny anything aimed at internal/trusted ranges; (2) a dedicated VPC (no peering inward) with a load balancer fronting an autoscaled SOCKS fleet; (3) a gRPC egress service you call to make the request with metadata (which tenant, what purpose), so egress becomes a governed, per-tenant capability — trust chains, allowed TLS versions, header policy, audit.

Context

A surprising amount of application code makes outbound HTTP, and much of it is driven by input you don’t fully control: a customer configures a webhook URL, pastes a link to preview, points an importer at a remote file, registers an OIDC issuer, supplies an avatar URL. The moment a destination is influenced from outside, you have Server-Side Request Forgery: the attacker doesn’t reach the target directly — they make your server reach it, from inside your network, with your network’s trust.

What that buys an attacker is exactly the things your perimeter was supposed to protect:

The cloud metadata endpoint (169.254.169.254, fd00:ec2::254) — on a misconfigured host this hands out IAM credentials. This is the classic SSRF-to-cloud-takeover chain, and the reason ZFN-9Field Note · currentZFN-9 — No long-lived cloud keys; workloads authenticate by federated identityNo static AWS or GCP keys anywhere — not in code, secret stores, or env. Workloads use their runtime's own identity and cross clouds by exchanging it (OIDC) for short-lived credentials via federation. Static keys are a documented carve-out only.Open ZFN-9 → matters: federated, short-lived identity plus IMDSv2 shrink the prize, but blocking the route removes it.
Internal services — admin endpoints, databases, other tenants’ service instances, the service mesh — none of which expect to be reached from the open internet but will happily answer a request that originates inside the VPC.
localhost — debug servers, sidecars, and management ports bound to 127.0.0.1.

Per-call URL validation in application code is necessary but not sufficient: it’s defeated by DNS rebinding (the name resolves to a public IP at validation time and an internal IP at connect time), by redirects to an internal host, by IPv6 and odd encodings, and simply by the next engineer who adds an outbound call and forgets to validate. You cannot make “every outbound call site validates correctly, forever” a reliable property. You can make “application compute has no network route to internal services” a reliable property — by topology, the way ZFN-4Field Note · currentZFN-4 — Incident tooling must not depend on what it recoversAnything you need to respond to an incident — deploy/rollback, kill switches, observability, break-glass access — must not depend, directly or transitively, on the systems likely to be down during it. Never gate incident tooling behind a system it might need to recover.Open ZFN-4 → isolates recovery tooling. That’s the move.

Recommendation

Send all outbound HTTP from application compute through a dedicated egress proxy that lives on compute with no path to internal services. The proxy makes the actual outbound connection; the application only ever talks to the proxy. Implement it at the tier you can operate well, and climb:

Tier 1 — SOCKS on an isolated subnet. Run a SOCKS proxy in a subnet whose network ACLs / route tables permit egress only to public-internet routes and deny anything destined for internal or trusted ranges — RFC1918 (10/8, 172.16/12, 192.168/16), your VPC CIDRs, link-local (169.254.0.0/16, including the metadata IP), and 127/8. Also deny traffic that originates from trusted IP ranges being relayed back inward. The subnet has no route to your services, so a compromised proxy (or app) can’t pivot. Simple, and already a large improvement.
Tier 2 — a dedicated egress VPC. Put the SOCKS fleet in its own VPC with no peering, no transit gateway, no route back to your application or data VPCs. An internal load balancer (ALB/ NLB) fronts an autoscaled proxy fleet; the only way in is the LB endpoint and the only way out is the internet gateway. Isolation at the VPC boundary is stronger and far easier to reason about and audit than per-subnet ACLs.
Tier 3 — a gRPC (or HTTP) egress service. Instead of (or alongside) raw SOCKS, expose a first-party service that performs the outbound request on the caller’s behalf and accepts metadata with each call: which tenant/customer it’s for, the purpose, the caller identity. Because it’s a real service, not a dumb socket, it can enforce policy that a SOCKS proxy can’t:
- Per-tenant configuration — custom CA trust chains / pinned certs, allowed TLS versions and ciphers, required or forbidden headers, destination allow/deny lists, per-tenant rate limits and timeouts, response-size caps.
- SSRF hardening in one place — resolve DNS once and connect to the resolved IP (re-checking it’s public) to defeat rebinding; re-validate every redirect hop; restrict schemes to https/http; strip hop-by-hop and internal headers.
- Audit — every outbound call is attributable to a tenant and purpose, logged centrally. Egress becomes a governed capability, not ambient socket access.
This is a platform-owned service in the sense of ZFN-5Field Note · currentZFN-5 — Make workload identity a platform-owned serviceWorkload identity belongs in shared platform infrastructure, not reimplemented per service. A small token service mints short-lived tokens any service verifies. Shared keys are a fine first step; asymmetric signing the better end-state — don't let 'no PKI' block it.Open ZFN-5 →: one team owns the egress contract, everyone consumes it, and security policy lives at the seam.

Scope. This is about untrusted or third-party-bound egress from application compute. Purely internal service-to-service traffic isn’t egress and is governed by ZFN-3Field Note · currentZFN-3 — Default-encrypt internal service trafficAll external traffic is TLS, no exceptions. Internal traffic is encrypted by default; an internal call site may skip transport encryption (never authentication) only under a documented, audited carve-out anchored to a network-perimeter guarantee.Open ZFN-3 →. A small set of well-known first-party destinations may be reached directly if that path is itself locked down — but the default for anything internet-bound, and anything whose destination is influenced by input, is the proxy. Make the proxy the path of least resistance so engineers don’t route around it.

Consequences

Easier:

SSRF stops being catastrophic. The worst an attacker gets from a forged request is “reach a public address” — they can’t pivot to internal services or steal instance credentials, because the compute making the call has no route inward.
The security property is topological and centrally enforced, not “every call site validated correctly.” New outbound code inherits the protection by construction.
At Tier 3, egress is per-tenant configurable and fully audited — answering “who did we call, for which customer, and why?” becomes trivial, and tenant-specific TLS/cert requirements have a home.

Harder:

Real infrastructure to run: an isolated subnet/VPC, a proxy fleet, a load balancer, and (Tier 3) a service with its own availability and on-call. Egress is now a dependency on the request path.
Every HTTP client in every service must be pointed at the proxy — SDKs, libraries, and tools that don’t honor proxy settings need handling, and “someone added a client that bypasses the proxy” is a failure mode to guard against (egress-deny by default at the network layer makes the bypass fail rather than silently leak).
Latency and a potential bottleneck on the egress path; the fleet must scale and degrade sensibly.
Streaming, websockets, and long-lived connections need explicit support through the proxy.

New obligations:

Application subnets/VPCs deny direct internet egress at the network layer, so the proxy is the only way out and a bypass fails closed rather than leaking.
The proxy network is kept provably unable to reach internal ranges; any change that adds a route inward (peering, transit gateway, a shared subnet) is a blocking security review.
New outbound-HTTP code uses the egress path; introducing a direct-egress client is a reviewed exception, documented at the call site.
At Tier 3, tenant egress policy (trust chains, TLS, allow-lists) is owned and audited like any other security configuration.

References

ZFN-9Field Note · currentZFN-9 — No long-lived cloud keys; workloads authenticate by federated identityNo static AWS or GCP keys anywhere — not in code, secret stores, or env. Workloads use their runtime's own identity and cross clouds by exchanging it (OIDC) for short-lived credentials via federation. Static keys are a documented carve-out only.Open ZFN-9 → — SSRF to the metadata endpoint is the classic credential-theft chain; federated identity shrinks the prize, isolating egress removes the route.
ZFN-10Field Note · currentZFN-10 — Pin the expected owner on cross-account resource calls (confused-deputy defense)Authority to call a resource isn't proof it's the one you meant. Any call crossing an account boundary must assert the expected owner: ExpectedBucketOwner on S3, aws:ResourceAccount conditions, validation of untrusted ARNs, plus inbound trust pinned with SourceArn/ExternalId.Open ZFN-10 → — the resource-ownership sibling of this note; both are “untrusted input drives a privileged action,” one at the network layer, one at the resource layer.
ZFN-5Field Note · currentZFN-5 — Make workload identity a platform-owned serviceWorkload identity belongs in shared platform infrastructure, not reimplemented per service. A small token service mints short-lived tokens any service verifies. Shared keys are a fine first step; asymmetric signing the better end-state — don't let 'no PKI' block it.Open ZFN-5 → — the Tier-3 egress service as platform-owned capability with a governed seam.
ZFN-4Field Note · currentZFN-4 — Incident tooling must not depend on what it recoversAnything you need to respond to an incident — deploy/rollback, kill switches, observability, break-glass access — must not depend, directly or transitively, on the systems likely to be down during it. Never gate incident tooling behind a system it might need to recover.Open ZFN-4 → — the same “isolate by topology, not by code” reasoning, applied to recovery tooling.
OWASP: Server-Side Request Forgery and the SSRF Prevention Cheat Sheet.
AWS IMDSv2 — defense in depth for the metadata endpoint behind the egress isolation.

Changelog

2026-06-12: First published as a Field Note.