PostgreSQL CDC → JSONL
Change data capture (CDC) streams every INSERT/UPDATE/DELETE from a
PostgreSQL table by reading its write-ahead log via logical replication — no
polling, no updated_at column required.
Prepare Postgres
CDC needs logical replication enabled (wal_level = logical) and a publication
for the tables you want to follow:
CREATE TABLE IF NOT EXISTS users (id int4 PRIMARY KEY, name text);
CREATE PUBLICATION faucet_pub FOR TABLE users;
The bundled examples/docker-compose.yml
starts a Postgres already configured for logical replication.
Config
version: 1
pipeline:
source:
type: postgres-cdc
config:
connection_url: postgres://faucet:faucet@localhost:5432/appdb
slot_name: faucet_slot
publication_name: faucet_pub
create_slot_if_missing: true
idle_timeout: 30
sink:
type: jsonl
config:
path: ./out/changes.jsonl
append: true
state:
type: file
config:
path: ./state
faucet run postgres_cdc_to_jsonl.yaml
Open a psql session and INSERT/UPDATE/DELETE some rows — the connector
drains them every fetch cycle until idle_timeout fires.
Why the state store matters here
The CDC source advances Postgres’s confirmed_flush_lsn (the point up to which
Postgres may recycle WAL) only from a durable bookmark — i.e. after the
pipeline has persisted the position. It never confirms WAL for changes that
haven’t been written to the sink. That means a crash mid-run cannot lose data:
on restart the source resumes from the last persisted bookmark. The tradeoff is
that WAL is retained until the next run advances the bookmark, so don’t point a
CDC slot at a table and then never run it.
The state key is postgres-cdc:<slot>. Use a durable backend (redis /
postgres) in production so the bookmark survives the loss of the local disk.
Slot lifecycle
slot_type: temporarydrops the slot when the connection closes — good for experiments.permanent(the default) keeps it, which retains WAL until you drop it.- Free an abandoned slot’s WAL with
PostgresCdcSource::drop_slot()(library) or by dropping the replication slot in Postgres. tls: disable | require | verify_ca | verify_fullconfigures the replication connection (defaultdisable= plaintext; useverify_fullover untrusted networks).