Running faucet as a service (faucet serve)
faucet serve turns faucet from a one-shot CLI into a long-running HTTP control
plane: an orchestrator (Airflow, Temporal, Dagster, Argo) submits pipeline
configs over HTTP, polls status, cancels runs, and streams logs — while faucet
amortizes startup (TLS handshakes, connection pools, schema introspection)
across many runs in one process. It is the second supported runtime mode
alongside one-shot faucet run and the cron
faucet schedule.
The full endpoint/schema reference is in HTTP API reference;
this page is the guided tour. serve requires the serve Cargo feature
(cargo install faucet-cli --features serve, or --features full).
Quickstart
# Start the server (loopback by default). Auth is mandatory — see below.
FAUCET_SERVE_AUTH_TOKEN=s3cret faucet serve --listen 127.0.0.1:8080
# Submit a run.
curl -XPOST http://127.0.0.1:8080/v1/runs \
-H "Authorization: Bearer s3cret" -H 'content-type: application/json' \
-d '{"config":"version: 1\npipeline:\n source: {type: csv, config: {path: in.csv}}\n sink: {type: jsonl, config: {path: out.jsonl}}\n","name":"adhoc"}'
# → {"run_id":"0192…","status":"queued","submitted_at":"…"}
# Poll it to completion.
curl -H "Authorization: Bearer s3cret" http://127.0.0.1:8080/v1/runs/0192…
# Tail its logs (SSE).
curl -N -H "Authorization: Bearer s3cret" http://127.0.0.1:8080/v1/runs/0192…/logs
⚠️ Security model — read before exposing
serve executes arbitrary client-supplied pipeline configs with the server’s
identity. That is a real privilege surface:
- Full interpolation: submitted configs resolve
${env:…},${file:…},${secret:…}, and${vault:…}/${aws-sm:…}/… against the server’s environment, filesystem, and credentials — exactly likefaucet run. An authenticated caller can read any secret the server can reach. - SSRF / egress: a submitted REST/HTTP source can be pointed at
169.254.169.254or internal services and will be fetched with the server’s network identity.
Mitigations are deployment-level and mandatory:
- Never run with
--no-authon a non-loopback bind. The no-auth gate is explicit: without--auth-token/FAUCET_SERVE_AUTH_TOKENand without--no-auth, startup fails. - Run single-tenant, behind authentication, behind egress controls / network
policy. The default loopback bind (
127.0.0.1) is deliberate — exposing externally is an explicit choice. - Terminate TLS at a proxy/ingress (serve speaks plain HTTP).
- Prefer
FAUCET_SERVE_AUTH_TOKENover--auth-token(the latter leaks throughps//proc). - Never run a serve process at
FAUCET_LOG=debugwhen submitted configs hold resolved secrets — only faucet’s own log output is redacted, not third-party connector debug logging.
Bounded concurrency & backpressure
--max-concurrent-runs (default min(16, cpu_count())) bounds how many runs
execute at once; --max-queued-runs (default 8×) bounds the queue. A submit
past the queue cap returns 429 with Retry-After. Note that total concurrent
pipeline work ≈ max-concurrent-runs × each config's execution.max_concurrent.
Idempotency
Supply idempotency_key to make retries safe (Stripe-style):
- First submit with a key → runs normally.
- Re-submit the same key + same request within
--idempotency-retention-secs(default 24h) → returns the originalrun_id(replayed, no new run). - Same key + a different request →
409 Conflict. - After the retention window, the key is re-usable for a fresh run.
- Deleting a run also frees its idempotency key immediately — a later submit with that key starts a fresh run rather than 404-ing on the deleted record.
The “request” identity covers the merged config and the run-affecting
request fields — clock, timeout_secs, and labels. In particular, a retry
that reuses the key but changes the backfill clock is a 409, not a replay of
the original window (so you can’t silently get the original clock’s results).
The claim is atomic, so concurrent retries can’t both start a run.
Degraded mode: while the persistent history backend is degraded (see Run history), the in-memory fallback can’t see claims the database made before the outage. Rather than risk a duplicate run, submissions carrying an idempotency key are rejected with
503until the backend recovers — retry then, or resubmit without a key if at-least-once is acceptable. Submissions without a key are unaffected.
Cancellation
POST /v1/runs/{id}/cancel cooperatively cancels an in-flight run (202); on an
already-terminal run it’s a 200 no-op. The same cooperative path handles a run
that hits its timeout_secs and the server-shutdown drain.
Cancellation is flush-completing: the pipeline stops at its next page
boundary and flushes the sink, so a buffered sink (e.g. Parquet, whose footer is
only written on flush) commits the rows written so far rather than orphaning the
whole file (#146 H16). The run is then marked cancelled — there is no
cross-process resume, so re-submit to continue. A run still stuck mid-write
after a bounded flush grace is hard-dropped (its buffered output may be lost),
so a hung run can’t wedge shutdown.
--default-config (workspace defaults)
Pass --default-config <file> to merge shared settings under every submitted
run (submitted values win; objects merge, scalars/arrays replace). Pin state:,
execution:, and the auth: catalog once instead of repeating them per request.
See cli/examples/serve_minimal.yaml.
Cardinality: a config’s
name:field drives the metricpipelinelabel and the state-key prefix. Use a stablename:per logical pipeline — never an ad-hoc per-run string — or Prometheus cardinality blows up. The request-levelname/labelsare run-record metadata only, never metric labels.
Run history & persistence
By default run records live in memory and are lost on restart. For durable
history across restarts, point --history at a database (requires the matching
build feature):
# Postgres (feature: serve-history-postgres)
faucet serve --history 'postgres://user:pw@db/faucet'
# SQLite (feature: serve-history-sqlite)
faucet serve --history 'sqlite:/var/lib/faucet/runs.db'
Both create their schema on first connect. If the backend is unreachable at
startup, or fails at runtime, serve degrades to the in-memory store so it
stays up: it logs once, sets the faucet_serve_history_degraded gauge, and
/readyz returns 503. Persisted records are not migrated into the fallback —
degraded mode is a stay-alive, not a replica. Terminal records are retained for
--retain-terminal-runs-secs (default 7 days).
Multi-instance orphan recovery (run-ownership leases)
A persistent backend can be shared by several faucet serve instances (an
HA pair, a rolling/blue-green deploy). Each instance gets a fresh id at startup
and owns the runs it executes; while a run is in flight its owner heartbeats a
lease on the run record (at ~⅓ of --lease-ttl-secs, default 30s). A run is
only recovered — marked failed with owning serve instance's lease expired —
once its lease has expired, i.e. its owner stopped heartbeating (crashed or was
shut down). Recovery runs both at startup and periodically, so a surviving
instance reclaims a dead peer’s orphans without waiting for a restart.
This means a starting or running instance never fails another live
instance’s in-flight runs — the gap that an unscoped “fail every non-terminal
run at startup” sweep would open on a shared database. Tune --lease-ttl-secs
above your worst-case GC/IO stall so a healthy-but-slow instance is never
falsely reclaimed (a longer TTL is safer but slows how quickly a crashed
instance’s runs are cleaned up). The in-memory backend is single-process and
unshared, so leases don’t apply to it. There is still no cross-process resume:
a recovered run is marked failed, not continued — re-submit to retry.
Graceful shutdown
SIGTERM/SIGINT stops accepting new connections, drains in-flight runs up to
--shutdown-grace-secs (default 60), then cancels the remainder (marked failed).
Health & observability
/healthz— liveness (always 200 while serving)./readyz— 503 when history is degraded or the queue is full./metrics— Prometheus, includingfaucet_serve_*series./metricsis unauthenticated; restrict it at the network layer if its labels are sensitive.