Compression
The file-shaped connectors can read and write gzip / zstd transparently. Enable
the compression feature, then set a compression: field on the connector.
Enable the feature
# CLI
cargo install faucet-cli --features compression
# Library (umbrella) — activates compression on whichever file connectors you've enabled
faucet-stream = { version = "1.0", features = ["sink-jsonl", "source-csv", "compression"] }
The compression aggregate feature forwards to whichever of the supported
connectors you’ve already opted into; it doesn’t pull in connectors by itself.
full includes compression.
Connectors that support it
source-csv, source-s3, source-gcs, sink-jsonl, sink-csv, sink-s3,
sink-gcs.
Config
sink:
type: jsonl
config:
path: ./out/records.jsonl.gz
compression: auto # none | gzip | zstd | auto (default)
autochooses from the filename suffix:.gz→ gzip,.zst→ zstd, anything else → none.- Explicit
gzip/zstd/noneoverride the suffix.
Auto-detection runs per file at I/O time, so one matrix run can read a mix of
.jsonl, .jsonl.gz, and .jsonl.zst objects.
Notes
- File sinks finalize the encoder on
flush(); later writes reopen in append mode, producing a multi-member compressed file that gzip/zstd decoders read back transparently. - S3 and GCS sinks do not set a
Content-Encodingheader — consumers must decompress explicitly. - Parquet, Kafka, HTTP, stdout, and the database sinks are intentionally out of scope: Parquet has internal columnar compression and the others have native protocol-level options.