Recovering from Slot Invalidation

PostgreSQL can invalidate a replication slot when the server needs to remove WAL segments that the slot still references. Once invalidated, the slot can no longer be used for replication. Laredo must reset the source and perform a full re-baseline.

What is slot invalidation

PostgreSQL retains WAL segments as long as a replication slot needs them. When the max_slot_wal_keep_size server setting is configured and lag exceeds that threshold, PostgreSQL removes the WAL segments and marks the slot as invalidated. An invalidated slot cannot resume streaming — the WAL data it needs no longer exists on disk.

This is distinct from normal slot lag. With lag, the data is still available and the consumer can catch up. With invalidation, the data is gone.

Detecting slot invalidation

Error messages in Laredo logs

When Laredo encounters an invalidated slot during streaming, the PostgreSQL source logs an error like:

pg source: replication slot "laredo_01" has been invalidated; it must be dropped and recreated

or:

pg source: cannot read from logical replication slot "laredo_01", it has been invalidated

Checking slot status in PostgreSQL

Query the pg_replication_slots view directly:

SELECT slot_name, active, wal_status, safe_wal_size
FROM pg_replication_slots
WHERE slot_name = 'laredo_01';

Column	Meaning
`wal_status = 'lost'`	Slot is invalidated — WAL has been removed
`wal_status = 'extended'`	Slot is retaining WAL beyond `max_wal_size` but not yet invalidated
`wal_status = 'reserved'`	Normal operation
`safe_wal_size`	Bytes remaining before invalidation; NULL means unlimited

Monitoring

The laredo_source_lag_bytes Prometheus metric tracks how far behind the slot is. Alert when this approaches max_slot_wal_keep_size:

# Prometheus alert rule
- alert: LaredoSlotNearInvalidation
  expr: laredo_source_lag_bytes > 0.8 * <max_slot_wal_keep_size_bytes>
  for: 5m
  annotations:
    summary: "Laredo slot lag is approaching invalidation threshold"

Recovery

Step 1: Reset the source via CLI

Use the reset-source command to drop the invalidated slot and recreate it:

laredo reset-source pg_main

This drops the replication slot and clears all position tracking. Laredo will create a new slot and perform a full baseline on the next startup cycle.

If the publication also needs to be recreated (for example, if the table set has changed):

laredo reset-source pg_main --drop-publication

Step 2: Verify recovery

After the reset, Laredo performs a full re-baseline automatically. Monitor progress:

# Check source state
laredo source pg_main

# Check pipeline states
laredo pipelines

All pipelines on the affected source will transition through BASELINE and then STREAMING.

Step 3: Verify data consistency

After the baseline completes, verify row counts match expectations:

laredo query count public.config_document

Prevention

Configure max_slot_wal_keep_size appropriately

In postgresql.conf, set a WAL retention limit that balances disk usage against your tolerance for re-baselines:

# Keep up to 10 GB of WAL for replication slots
max_slot_wal_keep_size = '10GB'

A value of 0 (default in PostgreSQL 13+) means unlimited retention — the slot will never be invalidated, but WAL can grow without bound.

Use snapshot-on-shutdown

Configure Laredo to take a snapshot before shutting down. This reduces the amount of WAL needed on restart, since the engine can restore from the snapshot and only replay changes since the snapshot was taken:

snapshot {
  on_shutdown = true
  store = local
  config {
    directory = "/var/lib/laredo/snapshots"
  }
}

Keep targets healthy

Slot lag grows when targets are slow to acknowledge changes. Monitor target health and fix downstream issues promptly:

HTTP sync targets returning errors slow down ACK progression
Fan-out targets with many slow clients can apply backpressure
Dead letters indicate persistent downstream failures

Monitor and alert

Set up alerts on slot lag well before the invalidation threshold:

# Check lag via CLI
laredo source pg_main
# Shows: Lag: 1.2 KB

Use laredo_source_lag_bytes for automated monitoring. A sudden spike in lag often precedes invalidation if left unaddressed.

Automatic recovery

When Laredo detects that a slot has been invalidated during streaming, it automatically drops the invalidated slot and initiates a full re-baseline. No manual intervention is required in this case — the recovery happens transparently. The source transitions through RECONNECTING and then restarts the baseline process.

Manual reset-source is only needed when the slot is invalidated while Laredo is stopped, or when automatic recovery fails (for example, if PostgreSQL permissions have changed).

What is slot invalidation​

Detecting slot invalidation​

Error messages in Laredo logs​

Checking slot status in PostgreSQL​

Monitoring​

Recovery​

Step 1: Reset the source via CLI​

Step 2: Verify recovery​

Step 3: Verify data consistency​

Prevention​

Configure max_slot_wal_keep_size appropriately​

Use snapshot-on-shutdown​

Keep targets healthy​

Monitor and alert​

Automatic recovery​