Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Incremental replication & state

For pipelines that run repeatedly, you usually want to fetch only what’s new. That requires two things: an incremental replication method on the source and a state store to persist the bookmark between runs.

Replication methods

  • FullTable — fetch everything every run.
  • Incremental — track a high-water mark on a cursor_field (e.g. updated_at, an auto-increment id) and only emit records past the last seen value.
source:
  type: rest
  config:
    # …
    replication_method:
      type: Incremental
      cursor_field: updated_at
    primary_keys: [id]

State stores

Attach a state: block so the bookmark survives between runs:

state:
  type: file          # built into faucet-core
  config:
    path: ./state

Available backends:

BackendCrateUse when
memoryfaucet-coretests, one-shot runs (not persistent)
filefaucet-coresingle host; one JSON file per key, atomic writes
redisfaucet-state-redisshared/ephemeral state across hosts
postgresfaucet-state-postgresshared, durable, transactional state
# Redis
state:
  type: redis
  config:
    connection_url: redis://localhost:6379
    namespace: faucet

# Postgres
state:
  type: postgres
  config:
    connection_url: postgres://user:pass@localhost/faucet

How bookmarks advance

The pipeline reads the bookmark before fetching, and persists a new one only after the sink confirms the page. Most sources emit a bookmark on the final page; CDC-style sources emit one per committed transaction and get per-transaction durability automatically. Either way, a crash can never advance the bookmark past data that wasn’t written — the next run re-fetches from the last confirmed point.

State keys

Each invocation has a state key so concurrent matrix rows don’t collide: {name}::{row_id} for roots and {name}::{row_id}::{parent_record_key} for DAG children. The CDC source uses postgres-cdc:<slot>.