Field Note 39 current
Break loops, not spirals
TL;DR
Event-driven systems echo: handlers emit events that wake handlers that emit events. When a cycle forms, there are two cases, and they deserve opposite treatment:
- A loop is a lineage that recurs with the same data — same type, same subject, same content fingerprint. It produces no new information. It is a bug in motion: detect it and break it, loudly.
- A spiral is a lineage that recurs with different data each cycle — refinement, convergence, two parties genuinely advancing a piece of work. It is legitimate: never break it; bound it with budgets and rate limits instead.
The mechanism that makes the distinction enforceable is causal provenance stamped by the runtime — cause, lineage root, depth, producing context, and a fingerprint of the content — never supplied by the thing being watched. The test that separates the two cases is information: did this cycle produce anything new?
Context
Feedback cycles are as old as automation: auto-responder mail storms (RFC 3834Recommendations for Automatic Responses to Electronic Mail (RFC 3834)The IETF's answer to auto-responder mail storms: respond to less, mark automatic responses so other automation can recognise them, and never auto-respond to an automatic response. Exists because two vacation responders pointed at each other will happily fill both mailboxes forever.rfc-editor.org ↗ exists because of them), webhook ping-pong between two SaaS products, two sync jobs endlessly “correcting” the same field, remediation that re-triggers the alert it remediates. The classic defence is a hop count — IP stopped circulating packets with a TTLInternet Protocol — Time to Live (RFC 791)IP's TTL field decrements at every hop and the datagram is discarded at zero, so a routing loop cannot circulate packets indefinitely. The canonical depth cap — and the canonical illustration of its bluntness: it cannot tell a loop from a long legitimate path; both die at N hops.rfc-editor.org ↗. It works, but it’s blunt: a depth cap can’t tell a loop from a long legitimate chain. It just makes both fail at N.
Two things make this acute now. First, LLM agents are increasingly wired event-to-event: an agent that wakes on events and emits events is a feedback cycle waiting for a topology change, and each turn of the cycle burns real money and can mutate real state at machine speed. Second — and this is why the blunt defences are no longer acceptable — the agentic patterns you actually want all recur: iterative refinement, multi-round exchange between agents, poll-and-converge workflows. Asynchronous propagation between stores (ZFN-24) multiplies the places where a cycle can close. A loop-killer that triggers on recurrence alone, or on depth alone, punishes exactly the work the system exists to do.
The politest infinite loop
The modern mail storm is two agents subscribed to each other’s output, each politely improving and acknowledging the other’s work, forever. Every individual step looks locally reasonable — that’s what makes it dangerous. Nothing is “stuck”; the lineage is just never going to produce new information again, and the meter is running.
Recommendation
Stamp provenance in the runtime, not in the client. Every event carries the id of the event that caused it, the root of its lineage, its depth, the producing context (which handler, agent, or wiring emitted it), and a fingerprint of its content. The emitting code does not get to supply any of this — a buggy producer omits it, a misbehaving one forges it. Treat self-reported provenance like a client-supplied clock: a useful hint, never the basis of enforcement.
Define a loop precisely: same lineage, same data. A loop is a recurrence, within one causal lineage, of the same (type, subject, content-fingerprint). Depth alone is not a loop. Volume alone is not a loop. Precision here is the whole point — every false positive is a legitimate spiral you broke.
Fingerprint the meaning, not the envelope. Normalise before hashing: strip ids, timestamps, sequence numbers — anything that changes on re-emission without changing the content. Too strict and every cycle looks “new” because a timestamp moved, so you miss loops; too loose and distinct work collides, so you break spirals. This is a per-payload-type judgment call; make it deliberately.
Break loops loudly. The suppressed firing is itself recorded; the detected loop raises an alarm that names the wiring responsible; a repeat offender trips a kill switch on that wiring, not on the whole system. Silent suppression converts an obvious bug into an unexplained absence — the most expensive kind of failure to debug.
Bound spirals with budgets, never with loop-breakers. Rate limits, cost and token budgets, and a generous depth ceiling as a backstop — alarmed when hit (ZFN-13 is the same discipline applied to retries). A spiral that exhausts its budget surfaces as “this lineage hit its budget — is it converging?”, which is a conversation with its owner. A loop is a defect to fix. Don’t let one mechanism mangle both.
Exempt the must-deliver class. If your domain has signal that must always reach a human — pages, distress calls, life-safety alerts — loop-breaking may coalesce duplicates for presentation, but it must never eat the signal. A deduplicated page is fine; a suppressed one is an outage with a cover story.
Consequences
Easier:
- You can wire agents and handlers event-to-event without the standing fear that the next topology change melts the system or the bill. A closed cycle becomes an alarm, not an incident.
- Legitimate iterative work survives: refinement and convergence are no longer punished for recurring, because recurrence alone is no longer the trigger.
- Every “why did this fire?” has a walkable answer. Postmortems read the lineage instead of reconstructing causality from timestamps.
Harder:
- Provenance is envelope plumbing through every hop — including boundaries you don’t control. An event that leaves through a webhook and returns through an API has its lineage severed unless you propagate correlation through the round-trip; external echoes are the loops you’ll still miss.
- Fingerprint normalisation is per-payload-type design work, and both failure directions cost you (missed loops, broken spirals).
- Lineage state needs storage and an index, and loop detection is a read on the hot path of every fan-out.
New obligations:
- Provenance fields live in the event envelope from day one — retrofitting causality onto an existing stream is miserable.
- Suppressed firings and budget-exhausted lineages are visible somewhere, and someone looks.
- Every standing wiring has an owner, so the alarm has a name to page.
References
- ZFN-13 — budgets, backoff, and shedding; the same instinct that bounds retries bounds spirals.
- ZFN-12 — the journal as the ordered, replayable event substrate the lineage rides on.
- ZFN-24 — asynchronous propagation between stores, which multiplies the surfaces where cycles can close.
- RFC 3834 — Recommendations for Automatic Responses to Electronic Mail — the mail-storm rules: mark automation, never auto-respond to automation.
- RFC 791 — Internet Protocol (TTL) — the canonical depth cap, and its bluntness.
Changelog
- 2026-06-12: First published as a Field Note.