The default bigquery priority for dataform execution is set to INTERACTIVE. Is it possible to set them to BATCH by default?
While Dataform doesn't have a direct setting for default BigQuery priority, you can achieve this through configuration:
1. Edit dataform.json Configuration:
Open your dataform.json
file and add or modify the defaultConfig
section:
{
"defaultConfig": {
"bigquery": {
"priority": "BATCH"
}
}
}
This sets BATCH as the default priority for all BigQuery operations in your Dataform project.
2. Configure in config
Block (Granular Control):
For specific SQLX scripts, you can set the priority directly in the config
block:
config {
type: "table",
bigquery: {
priority: "BATCH"
}
}
-- Your SQL code here
This allows you to control which scripts use BATCH priority.
Considerations:
Performance: BATCH priority is slower, designed for less time-sensitive workloads.
Cost: BATCH can be more cost-effective for large queries.
Potential Workarounds:
Custom Profiles: Create Dataform profiles with BATCH priority set, using --profile
when running.
Pre/Post Operations: Use scripts to adjust priority before or after execution (requires scripting).
Feature Request: Suggest this feature to the Dataform team for future releases.
I tried both approaches.
1. With the first suggestion, bigquery doesn't seem to pick up the logic. I double checked https://cloud.google.com/dataform/docs/reference/dataform-core-reference#iprojectconfig. `defaultConfig` is not part of it
2. the 2nd suggestion, the dataform failed to compile w/ the error
```
Error: Unexpected property "priority" in bigquery config. Supported properties are: ["partitionBy","clusterBy","updatePartitionFilter","labels","partitionExpirationDays","requirePartitionFilter","additionalOptions"]
```
Sorry for the confusion.Currently, Dataform does not provide a direct configuration option to set the default BigQuery priority to BATCH in the dataform.json
file or within individual SQLX scripts. There are some potential workarounds you can consider:
Custom Execution Profiles:
This is the most viable workaround. Create custom Dataform execution profiles where you explicitly set the BigQuery priority to BATCH using flags in your execution command.
For example, if using the dataform run
command, you could add --vars='{"priority":"BATCH"}'
.
This approach requires specifying the profile or variables each time you run Dataform but gives you the desired control.
Pre-Operation or Post-Operation Hooks:
You might be able to use pre-operation or post-operation scripts (if your Dataform setup allows them) to programmatically adjust the BigQuery job priority before or after execution. This would require some scripting and integration with the BigQuery API.
Did anyone check this solution?
How can I set it up from the Airflow/Composer, though
I checked. Apparently, the answer was generated by AI but not implemented.