Field Note 39current

Break loops, not spirals

By: Theo Zourzouvillys
Published: June 12, 2026
Tags: architectureeventsreliabilityllm

TL;DR

Event-driven systems echo: handlers emit events that wake handlers that emit events. When a cycle forms, there are two cases, and they deserve opposite treatment:

A loop is a lineage that recurs with the same data — same type, same subject, same content fingerprint. It produces no new information. It is a bug in motion: detect it and break it, loudly.
A spiral is a lineage that recurs with different data each cycle — refinement, convergence, two parties genuinely advancing a piece of work. It is legitimate: never break it; bound it with budgets and rate limits instead.

The mechanism that makes the distinction enforceable is causal provenance stamped by the runtime — cause, lineage root, depth, producing context, and a fingerprint of the content — never supplied by the thing being watched. The test that separates the two cases is information: did this cycle produce anything new?

Context

Feedback cycles are as old as automation: auto-responder mail storms (RFC 3834Recommendations for Automatic Responses to Electronic Mail (RFC 3834)The IETF's answer to auto-responder mail storms: respond to less, mark automatic responses so other automation can recognise them, and never auto-respond to an automatic response. Exists because two vacation responders pointed at each other will happily fill both mailboxes forever.rfc-editor.org ↗ exists because of them), webhook ping-pong between two SaaS products, two sync jobs endlessly “correcting” the same field, remediation that re-triggers the alert it remediates. The classic defence is a hop count — IP stopped circulating packets with a TTLInternet Protocol — Time to Live (RFC 791)IP's TTL field decrements at every hop and the datagram is discarded at zero, so a routing loop cannot circulate packets indefinitely. The canonical depth cap — and the canonical illustration of its bluntness: it cannot tell a loop from a long legitimate path; both die at N hops.rfc-editor.org ↗. It works, but it’s blunt: a depth cap can’t tell a loop from a long legitimate chain. It just makes both fail at N.

Two things make this acute now. First, LLM agents are increasingly wired event-to-event: an agent that wakes on events and emits events is a feedback cycle waiting for a topology change, and each turn of the cycle burns real money and can mutate real state at machine speed. Second — and this is why the blunt defences are no longer acceptable — the agentic patterns you actually want all recur: iterative refinement, multi-round exchange between agents, poll-and-converge workflows. Asynchronous propagation between stores (ZFN-24Field Note · currentZFN-24 — One transactional store per write; propagate changes asynchronouslyCommit each logical write to exactly one transactional store; update other systems via reliable ordered async events — never a synchronous write across two stores, and never 2PC. With a relational primary the WAL is your replayable journal; write events into the same transaction.Open ZFN-24 →) multiplies the places where a cycle can close. A loop-killer that triggers on recurrence alone, or on depth alone, punishes exactly the work the system exists to do.

Recommendation

Stamp provenance in the runtime, not in the client. Every event carries the id of the event that caused it, the root of its lineage, its depth, the producing context (which handler, agent, or wiring emitted it), and a fingerprint of its content. The emitting code does not get to supply any of this — a buggy producer omits it, a misbehaving one forges it. Treat self-reported provenance like a client-supplied clock: a useful hint, never the basis of enforcement.

Define a loop precisely: same lineage, same data. A loop is a recurrence, within one causal lineage, of the same (type, subject, content-fingerprint). Depth alone is not a loop. Volume alone is not a loop. Precision here is the whole point — every false positive is a legitimate spiral you broke.

Fingerprint the meaning, not the envelope. Normalise before hashing: strip ids, timestamps, sequence numbers — anything that changes on re-emission without changing the content. Too strict and every cycle looks “new” because a timestamp moved, so you miss loops; too loose and distinct work collides, so you break spirals. This is a per-payload-type judgment call; make it deliberately.

Break loops loudly. The suppressed firing is itself recorded; the detected loop raises an alarm that names the wiring responsible; a repeat offender trips a kill switch on that wiring, not on the whole system. Silent suppression converts an obvious bug into an unexplained absence — the most expensive kind of failure to debug.

Bound spirals with budgets, never with loop-breakers. Rate limits, cost and token budgets, and a generous depth ceiling as a backstop — alarmed when hit (ZFN-13Field Note · currentZFN-13 — Fail fast and push back: retries, load shedding, and flow controlBuild client retries (backoff, jitter, Retry-After) from day one. Under overload, shed fast and push the failure back to the source to retry — don't retry internally and amplify it. Flow-control everywhere, bound every queue, and don't take more work than you can finish in time.Open ZFN-13 → is the same discipline applied to retries). A spiral that exhausts its budget surfaces as “this lineage hit its budget — is it converging?”, which is a conversation with its owner. A loop is a defect to fix. Don’t let one mechanism mangle both.

Exempt the must-deliver class. If your domain has signal that must always reach a human — pages, distress calls, life-safety alerts — loop-breaking may coalesce duplicates for presentation, but it must never eat the signal. A deduplicated page is fine; a suppressed one is an outage with a cover story.

Consequences

Easier:

You can wire agents and handlers event-to-event without the standing fear that the next topology change melts the system or the bill. A closed cycle becomes an alarm, not an incident.
Legitimate iterative work survives: refinement and convergence are no longer punished for recurring, because recurrence alone is no longer the trigger.
Every “why did this fire?” has a walkable answer. Postmortems read the lineage instead of reconstructing causality from timestamps.

Harder:

Provenance is envelope plumbing through every hop — including boundaries you don’t control. An event that leaves through a webhook and returns through an API has its lineage severed unless you propagate correlation through the round-trip; external echoes are the loops you’ll still miss.
Fingerprint normalisation is per-payload-type design work, and both failure directions cost you (missed loops, broken spirals).
Lineage state needs storage and an index, and loop detection is a read on the hot path of every fan-out.

New obligations:

Provenance fields live in the event envelope from day one — retrofitting causality onto an existing stream is miserable.
Suppressed firings and budget-exhausted lineages are visible somewhere, and someone looks.
Every standing wiring has an owner, so the alarm has a name to page.

References

ZFN-13Field Note · currentZFN-13 — Fail fast and push back: retries, load shedding, and flow controlBuild client retries (backoff, jitter, Retry-After) from day one. Under overload, shed fast and push the failure back to the source to retry — don't retry internally and amplify it. Flow-control everywhere, bound every queue, and don't take more work than you can finish in time.Open ZFN-13 → — budgets, backoff, and shedding; the same instinct that bounds retries bounds spirals.
ZFN-12Field Note · currentZFN-12 — Queues, topics, and journals are different tools — don't conflate themQueues (competing consumers), topics (fan-out), and journals (ordered, replayable logs) give different guarantees. Don't conflate them; a pipeline often uses several. Prefer journals over topics, but not where head-of-line blocking hurts. With queues, bound the concurrency.Open ZFN-12 → — the journal as the ordered, replayable event substrate the lineage rides on.
ZFN-24Field Note · currentZFN-24 — One transactional store per write; propagate changes asynchronouslyCommit each logical write to exactly one transactional store; update other systems via reliable ordered async events — never a synchronous write across two stores, and never 2PC. With a relational primary the WAL is your replayable journal; write events into the same transaction.Open ZFN-24 → — asynchronous propagation between stores, which multiplies the surfaces where cycles can close.
RFC 3834 — Recommendations for Automatic Responses to Electronic Mail — the mail-storm rules: mark automation, never auto-respond to automation.
RFC 791 — Internet Protocol (TTL) — the canonical depth cap, and its bluntness.

Changelog

2026-06-12: First published as a Field Note.