Theo Zourzouvillys

Field Note 14 current

Define every API with a schema, and generate the clients

By
Theo Zourzouvillys
Published
Tags
architectureapiprocesscorrectness

TL;DR

Every API — service-to-service RPC, public HTTP, internal endpoints, and event payloads — gets a machine-readable schema (OpenAPI, Protobuf/gRPC, GraphQL SDL, JSON Schema, Avro/AsyncAPI for events) that is the single source of truth for the contract. Generate the clients, and the server stubs/types, from that schema; do not hand-write request construction, URL string-building, and bespoke JSON (de)serialization. A hand-rolled client is a second, untyped copy of the contract that silently drifts from the real one.

Practically: the schema is reviewed and versioned like code; producer and consumer are both generated from it so they cannot disagree; generated code is never edited by hand; the boundary validates against the schema; and CI checks schema compatibility so a breaking change fails the build instead of an integration. Don’t raw-dog your APIs.

Context

When you “just call the endpoint” — build the URL by hand, set headers, serialize a map to JSON, parse the response into untyped objects — you’ve written a copy of the API contract in code, by hand, with no checking. It works the day you write it and rots from there:

  • A field gets renamed or its type changes on the server, and nothing tells the caller until it breaks at runtime, in production, often only on the one code path that touches that field.
  • Required parameters get missed, enums get stringly-typed, optionality is guessed at, and every caller re-implements the same parsing slightly differently.
  • The “documentation” is a wiki page that disagrees with the server, and a new consumer reverse-engineers the real shape from traffic.
  • Each language re-hand-writes the client, so the Go caller, the TypeScript caller, and the Python caller all behave subtly differently against the same API.

This is the seam problem from ZFN-1 in its most concrete form: two teams meet at an interface, and a prose description of that interface drifts. A schema with generated clients on both sides makes drift a build error instead of an incident — the contract is executable, and both sides are produced from the same source, so they cannot fall out of sync without something failing to compile.

Recommendation

Schema is the source of truth; clients and stubs are generated; the contract is enforced in CI.

  • Pick the right schema language for the protocol. Protobuf for gRPC; OpenAPI for REST/HTTP; GraphQL SDL for GraphQL; Smithy when you want a single protocol-agnostic service model that generates clients and servers and can target more than one protocol; JSON Schema for JSON documents and config; Avro/Protobuf + a schema registry, or AsyncAPI, for event payloads on your queues, topics, and journals (ZFN-12). Events are APIs too — schematize them.

  • Generate the clients — and the server interfaces. Use codegen for the client and the server-side request/response types and handler interfaces, in every language you support. Generating both sides from one schema is the whole point: a mismatch can’t survive a build. Hand-writing the transport, the URL/path construction, and the (de)serialization is the thing to eliminate.

  • Never edit generated code. Generated artifacts are build outputs, not source — regenerate them, don’t patch them. Wrap them with your own thin, hand-written layer if you need ergonomics, but keep the generated boundary pristine so the next regen doesn’t clobber your edits.

  • Prefer schema-first; if you go code-first, the schema is still the published, reviewed artifact. Designing the contract deliberately beats letting it fall out of an implementation. Code-first is acceptable only if the generated schema is the artifact you publish, review, and others generate against — not an afterthought.

  • Validate at the boundary. Use the schema to validate requests and responses at the edge, so malformed input is rejected by the contract rather than crashing three layers in. (For untrusted input this is also a security boundary.)

  • Validate conformance in tests (always) and in production (by sampling). Codegen guarantees the types line up; it does not guarantee the running system stays within the contract. So check the real thing: make every unit and integration/functional test assert that the requests it sends and the responses it receives conform to the schema — that’s where a contract violation is cheapest to catch and where 100% validation costs nothing. In production, validate a sample of live requests and responses against the schema continuously and alert on violations; full validation on every call can be too costly at scale, but sampling reliably surfaces drift, undocumented fields, and a producer or consumer that has quietly diverged from the contract in the real world.

  • Check compatibility in CI. Because the schema is machine-readable, a tool can diff it and fail the build on a breaking change (e.g. buf for Protobuf, an OpenAPI diff/lint for HTTP). This is what makes “evolve additively, never break consumers” an enforced rule instead of a hope — the same additive-seam discipline applies whether the seam is an RPC or an event schema.

  • Publish the schema and version it. The schema file lives in version control, is reviewed in PRs, and is published where consumers (and tools, and LLM agents) can fetch it. It doubles as accurate, always-current documentation.

Consequences

Easier:

  • Whole classes of bug disappear: wrong field names, type mismatches, missed required fields, and inconsistent parsing become compile errors, not production incidents.
  • Adding a consumer or a language is generating a client, not reverse-engineering the wire format.
  • The contract is one reviewed, versioned, executable artifact — accurate docs and a real change-control point at the seam, with breaking changes caught mechanically.
  • Producer and consumer provably agree, which is exactly what lets independent teams move without constant coordination.

Harder:

  • Up-front toolchain cost: codegen wired into the build, generated artifacts managed, schema linting and compatibility checks in CI. Real setup, paid once.
  • Some schema languages are awkward at the edges (deeply dynamic payloads, polymorphism, partial updates) and you’ll occasionally fight the generator or model around a limitation.
  • Schema-first asks you to design the contract before the implementation, which is more discipline than returning whatever the handler happens to produce — that discipline is the point.
  • Generated clients can be heavier or less idiomatic than a hand-tuned one; wrap them rather than abandoning generation.

References

  • ZFN-1 — APIs are contracts at team seams; a schema makes the contract executable and drift a build error.
  • ZFN-12 — event payloads on queues/topics/journals are APIs too; schematize them and enforce compatibility.
  • OpenAPI Specification; Protocol Buffers and Buf (lint + breaking-change detection); Smithy — a protocol-agnostic service IDL with client/server codegen; JSON Schema; AsyncAPI for event-driven APIs.

Changelog

  • 2026-06-12: First published as a Field Note.