Snapshot Writer

Snapshot Writer (laredo-snapshotter)

laredo-snapshotter is a standalone process that subscribes to a fan-out table and continuously writes it to durable storage as a base snapshot + a stream of diffs, indexed by a manifest, so cold and cross-account consumers can read the table from object storage on their own schedule. For the design and the full picture, see Snapshot Writer — Architecture.

What it produces

For each table, under a key prefix on each destination:

config_document/
  manifest.json                 # the index: latest state + the artifact chain
  epoch=1/
    snapshot-0_19F000.jsonl     # base snapshot (full table at a WAL position)
    diff-0_19F000-0_1A2B3C.jsonl
    ...
  epoch=2/
    snapshot-0_2B0000.jsonl     # a re-base started a new epoch
    ...

A consumer reads manifest.json, loads the newest snapshot, and applies the diffs after it to reconstruct the table as of head_position.

Install

# from source
go build -o laredo-snapshotter ./cmd/laredo-snapshotter

# container image
docker pull ghcr.io/zourzouvillys/laredo-snapshotter:latest

Configure

HOCON. A minimal local config (full example: examples/snapshotter/):

snapshotter {
  source { server = "localhost:4001", schema = public, table = test_users }
  diff     { interval = 5s }
  snapshot { min_interval = 30s, max_interval = 10m, max_churn_records = 1000 }
  destinations = [ { type = local, path = "./.laredo-archive" } ]
  formats { snapshot = [ jsonl ], diff = [ jsonl ] }
  http { port = 8080 }
}

Run it:

laredo-snapshotter --config snapshotter.conf
# or: LAREDO_SNAPSHOTTER_CONFIG=/etc/laredo/snapshotter.conf laredo-snapshotter

Re-base thresholds

A diff is written every diff.interval; a fresh base snapshot is written instead whenever a threshold fires (any one, subject to min_interval):

Key Meaning
snapshot.min_interval Floor — never re-base more often than this
snapshot.max_interval Ceiling — always re-base at least this often
snapshot.max_diff_bytes Re-base when a serialized diff reaches this size
snapshot.max_diff_fraction …or this fraction of the last snapshot's size
snapshot.max_churn_records …or this many changed rows since the snapshot
snapshot.max_churn_fraction …or this fraction of the dataset

Omit a key to disable that trigger.

Multiple tables

One process can materialize several tables — give a tables array, each entry a full table block:

snapshotter {
  http { port = 8080 }
  credentials { s3w { type = ambient } }
  tables = [
    { source { server = "laredo:4001", schema = public, table = users }
      destinations = [ { type = s3, bucket = b, prefix = "users/", credentials = s3w } ]
      formats { snapshot = [ jsonl, protobuf ], diff = [ protobuf ] } },
    { source { server = "laredo:4001", schema = public, table = orders }
      destinations = [ { type = local, path = "/var/lib/laredo/orders" } ] }
  ]
}

Destinations, formats, events, credentials

  • Destinations (type = local | s3) — an artifact is durable only once written to all destinations. S3 destinations name a credentials profile.
  • Formats (jsonl, protobuf) — snapshots and diffs may differ, and you may emit several (each a separate object referenced from the manifest).
  • Events (sns, sqs, kinesis) — advisory, at-least-once notifications published after the manifest head advances. Consumers must still poll the manifest as the source of truth.
  • Credentials — named profiles referenced per AWS-backed component, so one process can use different roles for different actions:
credentials {
  s3w { type = ambient }                                   # SDK default chain (env, IRSA, task role)
  pub { type = assume_role
        role_arn = "arn:aws:iam::222233334444:role/laredo-events-pub"
        external_id = "laredo" }
}

Operate

laredo-snapshotter serves an HTTP API on http.port:

Endpoint Purpose
GET /health/live Process is up
GET /health/ready Every table has written its initial base snapshot
GET /status Per-table position, epoch, buffer depth, churn, last snapshot
POST /snapshot Force an immediate re-base on every table
GET /metrics Prometheus: snapshotter_epoch, snapshotter_buffer_depth, snapshotter_churn_records, snapshotter_snapshot_age_seconds (per table), plus process/Go metrics

On SIGTERM/SIGINT the writer flushes a final diff so no buffered changes are lost. See the runbook for incident procedures.