Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Connector catalog

faucet-stream ships 21 sources and 17 sinks. Each is a Cargo feature (source-<name> / sink-<name>) and an independently published crate. Full API docs are on docs.rs.

Run faucet list to see what’s compiled into your binary, and faucet schema source <name> / faucet schema sink <name> for a connector’s exact config fields. Not sure which to pick? See Choosing a connector.

Legend: ✓ supported · ✗ not applicable.

Sources

ConnectorFeatureStreams¹Resumable²CompressionUnderlying primitive
RESTsource-restHTTP + 6 pagination styles, JSONPath extraction
GraphQLsource-graphqlcursor pagination, variable injection
XML / SOAPsource-xmlstreaming XML→JSON, dot-path extraction
gRPCsource-grpc✓³dynamic protobuf; unary + server-streaming
PostgreSQLsource-postgresSQL query, rows as JSON
PostgreSQL CDCsource-postgres-cdclogical replication (pgoutput), LSN bookmarks
MySQLsource-mysqlSQL query, rows as JSON
Microsoft SQL Serversource-mssql✓⁷SQL query (tiberius), rows as JSON
SQLitesource-sqliteSQL query, rows as JSON
AWS S3source-s3✓⁴object reader: JSONL, JSON array, raw text
Google Cloud Storagesource-gcs✓⁴object reader: JSONL, JSON array, raw text
MongoDBsource-mongodbfind() with filter/projection/sort
Redissource-redisstreams, lists, key patterns
Webhooksource-webhook✗⁵temporary HTTP server collecting POSTs
WebSocketsource-websocketlive push feed; subscribe frames, reconnect, ping keepalive
CSVsource-csvCSV files as JSON
Elasticsearchsource-elasticsearchsearch/scroll API
Apache Kafkasource-kafkaconsumer; idle/max-messages termination, offset bookmarks
Apache Parquetsource-parquetlocal/glob/S3, vectorized Arrow reader, projection
BigQuerysource-bigqueryjobs.query + pageToken pagination
Snowflakesource-snowflakeSQL REST API, server-side partitions

¹ Streams = yields records in bounded-memory batches rather than buffering the whole result. ² Resumable = persists a bookmark to a state store so re-runs continue where they left off (incremental replication / CDC / Kafka offsets). ³ gRPC streams natively in server-streaming mode; unary buffers the single response. ⁴ S3/GCS stream in JSONL and raw-text modes; JSON-array mode buffers one object. ⁵ Webhook is buffer-shaped by nature (it collects POSTs over a window). ⁷ MSSQL is resumable only in replication: incremental mode (it persists a tracking-column bookmark); in full mode it is not.

Sinks

Every sink exposes a batch_size knob for write-side re-chunking. For the file/append sinks (jsonl, csv, stdout) it’s a no-op — they write per record.

ConnectorFeaturebatch_sizeCompressionWrite unit
BigQuerysink-bigquerytabledata.insertAll (per-row DLQ)
PostgreSQLsink-postgresmulti-row INSERT (JSONB or mapped cols)
JSON Linessink-jsonlno-opbuffered file append
Snowflakesink-snowflakeSQL REST API
MySQLsink-mysqlmulti-row INSERT
Microsoft SQL Serversink-mssqlmulti-row INSERT (2100-param auto-split, per-row DLQ)
SQLitesink-sqlitetransaction-wrapped batch
AWS S3sink-s3JSONL objects, parallel uploads
Google Cloud Storagesink-gcsJSONL objects
MongoDBsink-mongodbinsert_many
Redissink-redisstreams, lists, key-value (pipelined)
CSVsink-csvno-opbuffered file rows
Elasticsearchsink-elasticsearch_bulk NDJSON (per-row DLQ)
HTTPsink-httpPOST, concurrent under a semaphore
Stdoutsink-stdoutno-opJSON Lines / pretty JSON / TSV
Apache Kafkasink-kafkaproducer, batched sends, multi-topic routing
Apache Parquetsink-parquet✗⁶local/S3, schema inference, row/byte rollover

⁶ Parquet has internal columnar compression, so the file-level compression feature doesn’t apply.

Authentication at a glance

FamilyAuth options
REST / GraphQL / XMLBearer, Basic, ApiKey (header), ApiKeyQuery, OAuth2 (client-credentials), TokenEndpoint, Custom headers — see Auth cookbook
BigQueryservice-account key (path or inline JSON), application-default credentials
SnowflakeJWT key-pair, OAuth
KafkaSASL (PLAIN/SCRAM) + TLS
WebSocketnone, Bearer token, Custom headers
Elasticsearchbasic, API key, bearer, none
S3 / GCScloud SDK credential chains (env, profile, metadata)
SQL databasesconnection URL (with embedded credentials / TLS params)

Inspect any connector’s exact auth shape with faucet schema source <name> / faucet schema sink <name>.

Batching

Default batch_size is 1000; max is 1,000,000. batch_size: 0 means “no batching” — the source emits the whole result set in one page and the sink writes it in one request (good for small lookup tables or load-job-style sinks). See Performance tuning.