Skip to main content

Recovering from Slot Invalidation

PostgreSQL can invalidate a replication slot when the server needs to remove WAL segments that the slot still references. Once invalidated, the slot can no longer be used for replication. Laredo must reset the source and perform a full re-baseline.

What is slot invalidation

PostgreSQL retains WAL segments as long as a replication slot needs them. When the max_slot_wal_keep_size server setting is configured and lag exceeds that threshold, PostgreSQL removes the WAL segments and marks the slot as invalidated. An invalidated slot cannot resume streaming — the WAL data it needs no longer exists on disk.

This is distinct from normal slot lag. With lag, the data is still available and the consumer can catch up. With invalidation, the data is gone.

Detecting slot invalidation

Error messages in Laredo logs

When Laredo encounters an invalidated slot during streaming, the PostgreSQL source logs an error like:

pg source: replication slot "laredo_01" has been invalidated; it must be dropped and recreated

or:

pg source: cannot read from logical replication slot "laredo_01", it has been invalidated

Checking slot status in PostgreSQL

Query the pg_replication_slots view directly:

SELECT slot_name, active, wal_status, safe_wal_size
FROM pg_replication_slots
WHERE slot_name = 'laredo_01';
ColumnMeaning
wal_status = 'lost'Slot is invalidated — WAL has been removed
wal_status = 'extended'Slot is retaining WAL beyond max_wal_size but not yet invalidated
wal_status = 'reserved'Normal operation
safe_wal_sizeBytes remaining before invalidation; NULL means unlimited

Monitoring

The laredo_source_lag_bytes Prometheus metric tracks how far behind the slot is. Alert when this approaches max_slot_wal_keep_size:

# Prometheus alert rule
- alert: LaredoSlotNearInvalidation
expr: laredo_source_lag_bytes > 0.8 * <max_slot_wal_keep_size_bytes>
for: 5m
annotations:
summary: "Laredo slot lag is approaching invalidation threshold"

Recovery

Step 1: Reset the source via CLI

Use the reset-source command to drop the invalidated slot and recreate it:

laredo reset-source pg_main

This drops the replication slot and clears all position tracking. Laredo will create a new slot and perform a full baseline on the next startup cycle.

If the publication also needs to be recreated (for example, if the table set has changed):

laredo reset-source pg_main --drop-publication

Step 2: Verify recovery

After the reset, Laredo performs a full re-baseline automatically. Monitor progress:

# Check source state
laredo source pg_main

# Check pipeline states
laredo pipelines

All pipelines on the affected source will transition through BASELINE and then STREAMING.

Step 3: Verify data consistency

After the baseline completes, verify row counts match expectations:

laredo query count public.config_document

Prevention

Configure max_slot_wal_keep_size appropriately

In postgresql.conf, set a WAL retention limit that balances disk usage against your tolerance for re-baselines:

# Keep up to 10 GB of WAL for replication slots
max_slot_wal_keep_size = '10GB'

A value of 0 (default in PostgreSQL 13+) means unlimited retention — the slot will never be invalidated, but WAL can grow without bound.

Use snapshot-on-shutdown

Configure Laredo to take a snapshot before shutting down. This reduces the amount of WAL needed on restart, since the engine can restore from the snapshot and only replay changes since the snapshot was taken:

snapshot {
on_shutdown = true
store = local
config {
directory = "/var/lib/laredo/snapshots"
}
}

Keep targets healthy

Slot lag grows when targets are slow to acknowledge changes. Monitor target health and fix downstream issues promptly:

  • HTTP sync targets returning errors slow down ACK progression
  • Fan-out targets with many slow clients can apply backpressure
  • Dead letters indicate persistent downstream failures

Monitor and alert

Set up alerts on slot lag well before the invalidation threshold:

# Check lag via CLI
laredo source pg_main
# Shows: Lag: 1.2 KB

Use laredo_source_lag_bytes for automated monitoring. A sudden spike in lag often precedes invalidation if left unaddressed.

Automatic recovery

When Laredo detects that a slot has been invalidated during streaming, it automatically drops the invalidated slot and initiates a full re-baseline. No manual intervention is required in this case — the recovery happens transparently. The source transitions through RECONNECTING and then restarts the baseline process.

Manual reset-source is only needed when the slot is invalidated while Laredo is stopped, or when automatic recovery fails (for example, if PostgreSQL permissions have changed).