Field Note 12 current
Queues, topics, and journals are different tools — don't conflate them
TL;DR
“Messaging” hides three primitives with genuinely different semantics, and reaching for the wrong one — or assuming they’re interchangeable — causes real architectural pain:
- Queue — competing consumers. Each message is handled by one worker, independently, with per-message ack/retry/dead-letter. Built for distributing work and independent per-item progress. Usually unordered (or ordered only per key).
- Topic — fan-out. One message delivered to N independent subscribers, each getting its own copy. Built for decoupled broadcast. Typically fire-and-forget: no history, no replay.
- Journal (an append-only log/stream) — an ordered, durable, replayable record. Consumers track their own offset, can replay from any point, and new consumers can read history. Built for ordering, durability, and reprocessing.
Pick by the guarantee you need, and don’t be shy about using more than one in a single pipeline. Prefer a journal over a topic where you can — it’s largely a superset (independent consumers, plus history and replay). But don’t use a journal where head-of-line blocking is a problem — its ordering means one stuck item blocks everything behind it; that’s a queue’s job. And when you use a queue, design the parallelism up front: how many consumers, and how you’ll bound in-flight concurrency so you don’t overwhelm what’s downstream.
Context
The words get used loosely — “put it on the queue”, “publish to the topic”, “stream it” — as if they were one thing. They aren’t, and the differences are exactly the ones that bite later:
- A topic gives you fan-out now but no memory. Add a new subscriber tomorrow and it sees only what arrives after it subscribed; lose a message and it’s gone. People discover this when they need to replay or onboard a consumer against history and find there’s nothing to read.
- A journal gives you order, durability, and replay — but order is a double-edged sword. Processing is strictly sequential within a partition, so a single slow or poison record causes head-of-line blocking: everything behind it waits. You can’t selectively ack item #5 and move on the way a queue can.
- A queue gives you independent per-item progress and easy horizontal scale via competing consumers — but typically not total ordering, and it will happily let you spin up so much parallelism that you flatten the database or trip a third party’s rate limit.
Conflating them shows up as: trying to get replay out of a topic; trying to get independent per-item retry out of a journal partition; treating a queue as if it were an ordered log; or adding a second consumer to a queue and being surprised that each message goes to only one of them. None of these are bugs in the tool — they’re a mismatch between the guarantee you needed and the primitive you chose.
Recommendation
Choose the primitive by the guarantee, and combine them deliberately.
Choose by what you need:
- Need ordering, durability, replay, history, or multiple independent consumers that may be added later? → journal.
- Need independent per-item completion, per-message retry/dead-letter, and competing-consumer parallelism, with ordering not required (or only per key)? → queue.
- Need stateless broadcast to N subscribers and genuinely don’t care about history? → topic — but first ask whether a journal with multiple consumer groups is the better fit, because it gives you the same fan-out plus replay.
Prefer journals over topics where possible. A journal with independent consumer offsets does nearly everything a topic does and keeps the history, so you can reprocess, onboard new consumers against the past, and debug by replay. Default to the journal; reach for a bare topic when you specifically want fire-and-forget with no retention and no replay obligation.
But don’t use a journal where head-of-line blocking is a problem. If a single slow, stuck, or poison item must not hold up the items behind it — independent jobs, per-tenant work that shouldn’t block other tenants, anything with wildly variable per-item latency — that’s a queue. Partitioning a journal by key scopes the blocking to one key’s stream, which helps, but if any single item can stall and others must keep flowing, stop fighting the log and use competing consumers.
When you use a queue, design the parallelism and bound the concurrency. Competing consumers scale out trivially, which is exactly the trap: unbounded in-flight work overwhelms whatever is downstream — a database’s connections, a third party’s rate limit, an egress path (ZFN-11). Decide deliberately:
- Bound in-flight concurrency — a max-in-flight / prefetch limit and a concurrency cap, not “as many workers as autoscaling will give me.”
- Rate-limit and apply backpressure toward fragile downstreams; let the queue absorb the burst (load-leveling is a queue’s superpower) rather than passing it straight through.
- Isolate workloads so one tenant or message type can’t consume all the parallelism — the bulkhead idea from ZFN-2.
- Plan for poison messages — retry with backoff and a dead-letter queue, so one bad item doesn’t wedge a worker or get retried forever.
- If you need ordering for a subset, use a partition/group key — but keep the key narrow, because within a key you’ve just reintroduced head-of-line blocking.
Combine them in one pipeline when it fits. A common, healthy shape: events land in a journal (the ordered, durable source of truth, replayable), a processor consumes it in order and enqueues independent work onto a queue (parallel, retryable, with bounded concurrency), and broad notifications fan out via a topic or additional journal consumer groups. Each primitive does the job it’s good at; the mistake isn’t using several, it’s using one where another’s guarantees were needed.
Consequences
Easier:
- The architecture matches the guarantees: you get replay where you need replay, independent progress where you need it, and fan-out where you need it — instead of discovering the gap in an incident.
- Bounded queue concurrency protects downstreams and turns bursts into manageable load rather than outages.
- Choosing the journal by default keeps history and replay available, which pays off every time you add a consumer or need to reprocess.
Harder:
- More than one primitive in a pipeline is more moving parts to operate, monitor, and reason about — justified by fit, not used for its own sake.
- Journals demand partitioning and ordering-key design, and you must actively watch for head-of-line blocking and consumer lag.
- Queues demand explicit concurrency, retry, and dead-letter design; “it just scales” is how you DDoS your own database.
- Picking the right primitive requires understanding these distinctions, which is exactly the knowledge this note exists to make explicit.
References
- ZFN-2 — bulkheads/partitioning to keep one workload from starving others; the same instinct bounds queue concurrency.
- ZFN-11 — a downstream that unbounded queue parallelism can easily overwhelm.
- Jay Kreps — The Log: What every software engineer should know about real-time data’s unifying abstraction.
- Martin Kleppmann, Designing Data-Intensive Applications — ch. 11 (stream processing) on logs vs. message brokers.
Changelog
- 2026-06-12: First published as a Field Note.