---
id: 39
title: "Break loops, not spirals"
status: current
date: 2026-06-12
authors:
  - "Theo Zourzouvillys"
tags: [architecture, events, reliability, llm]
summary: "Event-driven and agentic systems echo. A loop — a lineage recurring with the same data — produces nothing new; detect it with runtime-stamped provenance and break it loudly. A spiral — recurring with new data — is legitimate work; bound it with budgets, never loop-breakers."
supersedes: null
superseded_by: null
aliases: []
references:
  - id: rfc3834
    title: "Recommendations for Automatic Responses to Electronic Mail (RFC 3834)"
    url: https://www.rfc-editor.org/rfc/rfc3834
    abstract: "The IETF's answer to auto-responder mail storms: respond to less, mark automatic responses so other automation can recognise them, and never auto-respond to an automatic response. Exists because two vacation responders pointed at each other will happily fill both mailboxes forever."
  - id: ip-ttl
    title: "Internet Protocol — Time to Live (RFC 791)"
    url: https://www.rfc-editor.org/rfc/rfc791
    abstract: "IP's TTL field decrements at every hop and the datagram is discarded at zero, so a routing loop cannot circulate packets indefinitely. The canonical depth cap — and the canonical illustration of its bluntness: it cannot tell a loop from a long legitimate path; both die at N hops."
---

## TL;DR

Event-driven systems echo: handlers emit events that wake handlers that emit events. When a cycle
forms, there are two cases, and they deserve opposite treatment:

- A **loop** is a lineage that recurs with the **same data** — same type, same subject, same
  content fingerprint. It produces no new information. It is a bug in motion: **detect it and
  break it, loudly.**
- A **spiral** is a lineage that recurs with **different data** each cycle — refinement,
  convergence, two parties genuinely advancing a piece of work. It is legitimate: **never break
  it; bound it** with budgets and rate limits instead.

The mechanism that makes the distinction enforceable is **causal provenance stamped by the
runtime** — cause, lineage root, depth, producing context, and a fingerprint of the content —
never supplied by the thing being watched. The test that separates the two cases is information:
**did this cycle produce anything new?**

## Context

Feedback cycles are as old as automation: auto-responder mail storms ([RFC 3834](ref:rfc3834)
exists because of them), webhook ping-pong between two SaaS products, two sync jobs endlessly
"correcting" the same field, remediation that re-triggers the alert it remediates. The classic
defence is a hop count — IP stopped circulating packets with a [TTL](ref:ip-ttl). It works, but
it's blunt: a depth cap can't tell a loop from a long legitimate chain. It just makes both fail
at N.

Two things make this acute now. First, LLM agents are increasingly wired event-to-event: an agent
that wakes on events and emits events is a feedback cycle waiting for a topology change, and each
turn of the cycle burns real money and can mutate real state at machine speed. Second — and this
is why the blunt defences are no longer acceptable — the agentic patterns you actually *want*
all recur: iterative refinement, multi-round exchange between agents, poll-and-converge
workflows. Asynchronous propagation between stores
([ZFN-24](/zfn/24-one-transactional-store-per-write/)) multiplies the places where a cycle can
close. A loop-killer that triggers on recurrence alone, or on depth alone, punishes exactly the
work the system exists to do.

> [!aside] The politest infinite loop
>
> The modern mail storm is two agents subscribed to each other's output, each politely improving
> and acknowledging the other's work, forever. Every individual step looks locally reasonable —
> that's what makes it dangerous. Nothing is "stuck"; the lineage is just never going to produce
> new information again, and the meter is running.

## Recommendation

**Stamp provenance in the runtime, not in the client.** Every event carries the id of the event
that caused it, the root of its lineage, its depth, the producing context (which handler, agent,
or wiring emitted it), and a fingerprint of its content. The emitting code does not get to supply
any of this — a buggy producer omits it, a misbehaving one forges it. Treat self-reported
provenance like a client-supplied clock: a useful hint, never the basis of enforcement.

**Define a loop precisely: same lineage, same data.** A loop is a recurrence, within one causal
lineage, of the same (type, subject, content-fingerprint). Depth alone is not a loop. Volume
alone is not a loop. Precision here is the whole point — every false positive is a legitimate
spiral you broke.

**Fingerprint the meaning, not the envelope.** Normalise before hashing: strip ids, timestamps,
sequence numbers — anything that changes on re-emission without changing the content. Too strict
and every cycle looks "new" because a timestamp moved, so you miss loops; too loose and distinct
work collides, so you break spirals. This is a per-payload-type judgment call; make it
deliberately.

**Break loops loudly.** The suppressed firing is itself recorded; the detected loop raises an
alarm that names the wiring responsible; a repeat offender trips a kill switch on that wiring,
not on the whole system. Silent suppression converts an obvious bug into an unexplained absence —
the most expensive kind of failure to debug.

**Bound spirals with budgets, never with loop-breakers.** Rate limits, cost and token budgets,
and a generous depth ceiling as a backstop — alarmed when hit
([ZFN-13](/zfn/13-load-shedding-and-flow-control/) is the same discipline applied to retries).
A spiral that exhausts its budget surfaces as "this lineage hit its budget — is it converging?",
which is a conversation with its owner. A loop is a defect to fix. Don't let one mechanism
mangle both.

**Exempt the must-deliver class.** If your domain has signal that must always reach a human —
pages, distress calls, life-safety alerts — loop-breaking may coalesce duplicates for
presentation, but it must never eat the signal. A deduplicated page is fine; a suppressed one is
an outage with a cover story.

## Consequences

**Easier:**

- You can wire agents and handlers event-to-event without the standing fear that the next
  topology change melts the system or the bill. A closed cycle becomes an alarm, not an incident.
- Legitimate iterative work survives: refinement and convergence are no longer punished for
  recurring, because recurrence alone is no longer the trigger.
- Every "why did this fire?" has a walkable answer. Postmortems read the lineage instead of
  reconstructing causality from timestamps.

**Harder:**

- Provenance is envelope plumbing through every hop — including boundaries you don't control. An
  event that leaves through a webhook and returns through an API has its lineage severed unless
  you propagate correlation through the round-trip; external echoes are the loops you'll still
  miss.
- Fingerprint normalisation is per-payload-type design work, and both failure directions cost
  you (missed loops, broken spirals).
- Lineage state needs storage and an index, and loop detection is a read on the hot path of
  every fan-out.

**New obligations:**

- Provenance fields live in the event envelope from day one — retrofitting causality onto an
  existing stream is miserable.
- Suppressed firings and budget-exhausted lineages are visible somewhere, and someone looks.
- Every standing wiring has an owner, so the alarm has a name to page.

## References

- [ZFN-13](/zfn/13-load-shedding-and-flow-control/) — budgets, backoff, and shedding; the same
  instinct that bounds retries bounds spirals.
- [ZFN-12](/zfn/12-queues-topics-journals/) — the journal as the ordered, replayable event
  substrate the lineage rides on.
- [ZFN-24](/zfn/24-one-transactional-store-per-write/) — asynchronous propagation between
  stores, which multiplies the surfaces where cycles can close.
- [RFC 3834 — Recommendations for Automatic Responses to Electronic Mail](https://www.rfc-editor.org/rfc/rfc3834) —
  the mail-storm rules: mark automation, never auto-respond to automation.
- [RFC 791 — Internet Protocol (TTL)](https://www.rfc-editor.org/rfc/rfc791) — the canonical
  depth cap, and its bluntness.

## Changelog

- **2026-06-12**: First published as a Field Note.
