REST API → BigQuery (incremental)
This tutorial pulls records from a paginated REST API and streams them into a BigQuery table, then converts it to an incremental pipeline that only fetches new rows on each run.
Full-table version
version: 1
name: rest_to_bigquery
pipeline:
source:
type: rest
config:
base_url: https://api.example.com
path: /v1/events
method: GET
name: events
auth:
type: basic
config:
username: ${env:API_USER}
password: ${env:API_PASS}
records_path: $.events[*]
pagination:
type: PageNumber
param_name: page
start_page: 1
page_size: 500
page_size_param: per_page
max_pages: 200
timeout: 45
max_retries: 5
retry_backoff: 2
tolerated_http_errors: [404]
replication_method:
type: FullTable
primary_keys: [event_id]
schema_sample_size: 100
sink:
type: bigquery
config:
project_id: my-gcp-project
dataset_id: analytics
table_id: events
auth:
type: service_account_key_path
config:
path: service-account.json
batch_size: 1000
Secrets come from the environment via ${env:VAR} — keep credentials out of the
config file. Put them in a sibling .env or export them before running.
export API_USER=… API_PASS=…
faucet run rest_to_bigquery.yaml
The records_path is a JSONPath that selects the array of records inside each
response body; pagination walks pages until an empty page or max_pages. See
the pagination cookbook for the other styles.
Make it incremental
Switch replication_method from FullTable to a key-based incremental method
and attach a state store so progress survives between runs:
pipeline:
source:
type: rest
config:
# … as above …
replication_method:
type: Incremental
cursor_field: updated_at
primary_keys: [event_id]
sink:
# … as above …
state:
type: file
config:
path: ./state
Now each run records the maximum updated_at it saw; the next run resumes from
that bookmark. Swap the file state store for redis or postgres for shared,
durable state across machines — see state.
Tip: run
faucet schema source restandfaucet schema sink bigqueryto see every available config field with its type and default.