Core concepts

faucet-stream is built from a handful of small pieces. Understanding them makes both the YAML config and the Rust API obvious.

Source

A source fetches records from an external system (a REST API, a database, a Kafka topic, an object store, …) and yields them as JSON values. Sources stream in batches via stream_pages, so memory stays bounded no matter how much data flows through.

Sink

A sink writes records to an external system. Sinks accept batches and most expose a batch_size knob that controls the natural unit of work (a multi-row INSERT, a _bulk body, an insertAll request, and so on).

Transform

An optional transform reshapes each record between source and sink. The config-exposed transforms are flatten, rename_keys, and snake_case; additional custom transforms are available from Rust.

Pipeline

The pipeline connects a source to a sink. It drives the source’s stream_pages, applies transforms, and writes each page to the sink as it arrives — then flushes and records progress. Memory is bounded at one batch_size page on both sides regardless of total volume.

let result = Pipeline::new(&source, &sink).run().await?;

State store & bookmarks

For incremental and resumable runs, a state store persists a bookmark after each page the sink confirms. On the next run the source resumes from that bookmark. Built-in backends are memory and file (in faucet-core); redis and postgres backends live in their own crates.

This is what makes change-data-capture safe: the PostgreSQL CDC source only tells Postgres it can recycle write-ahead log up to a bookmark that has actually been persisted.

Dead-letter queue (DLQ)

A pipeline can attach a DLQ sink. When a sink reports per-row failures, the failing rows are wrapped in a fixed-shape envelope and routed to the DLQ before the page’s bookmark advances — so a few bad records don’t abort the whole run. The on_batch_error policy (propagate vs dlq_all) decides what happens when a sink can’t report per-row results.

Matrix & DAGs

A single config can fan out into many invocations with a matrix: block — either independent rows or a parent/child DAG where a child runs once per record produced by its parent. See the matrix DAG tutorial.

Observability

Every source, sink, transform, and state operation is automatically wrapped to emit tracing spans and metrics counters/histograms — no per-connector code. See Observability.

Keyboard shortcuts

faucet-stream