Hi Dataform Community,
I'm encountering issues dynamically passing variables (specifically dataset names) from a shell script into my Dataform project (version 2.6.7). Here’s a simplified and anonymized example clearly illustrating my scenario:
I have four key files:
datasetA,datasetB
{ "warehouse": "bigquery", "defaultSchema": "sample_schema", "defaultDatabase": "sample_project", "vars": { "PROJECT_ID": "sample_project", "SCHEMA": "sample_schema", "DATASETS": "" } }
config { type: "table", schema: "sample_schema" } SELECT * FROM ${ref("source_table")} WHERE dataset_id IN (${dataform.projectConfig.vars.DATASETS.split(",").map(d => `'${d.trim()}'`).join(",")})
#!/bin/bash DATASETS=$(awk 'NF' datasets.csv | paste -sd, -) cp dataform.json dataform.json.bak jq --arg DATASETS "$DATASETS" '.vars.DATASETS=$DATASETS' dataform.json > temp.json && mv temp.json dataform.json dataform run --credentials "./envs/.df-credentials.json" mv dataform.json.bak dataform.json
When I execute the script locally, the variables update correctly, and the Dataform job runs successfully. However, once deployed in a cloud workflow environment, Dataform executes successfully but returns empty tables. It seems like the variables from datasets.csv aren't being recognized or passed correctly in the workflow context.
I want the compiled SQL in Dataform to reflect the dynamic datasets passed from datasets.csv:
dataset_id IN ('datasetA','datasetB')
Is passing dynamic variables via modification of dataform.json considered best practice?
Is there a better, officially supported method (perhaps environment variables or CLI parameters) to dynamically pass variables into Dataform at runtime?
Any help, advice, or examples would be greatly appreciated. Thanks in advance!
@ms4446