I'm currently working in an environment where pretty much everything is done with scheduled queries. There is no checks for dependencies and such. So I want to improve that by moving everything to Dataform. I did some tutorials where you create some simple views and tables and have scheduled those. That all seems to work. But I don't fully understand the interaction between release configurations and workflow configurations. I understand using release configurations to setup up different environments such as prod, uat and dev. I understand it compiles the SQL workflow that then can be used by workflow configurations. But when I execute a release configuration it also generates/updates the simple tables I defined. So is the difference between release configurations and workflow configurations that the workflow configuration skips the compile step? And if that is the case, when do I run a release configuration (there is an option to never run it automatically)?
Can someone help me understand the difference?
In my use case I have some raw data being dumped in BQ every 30-60 minutes that I want to proces into some intermediate tables on which I then will build some tables that can be used in reports and by end users.
Solved! Go to Solution.
Release configurations are used to compile your SQL workflow code into executable SQL for a specific environment. This compilation process translates the project's code into SQL that can be executed against a data warehouse. Release configurations can also be used to set environment variables, such as database connection details and dataset names, which can vary between environments (dev, uat, prod).
Workflow configurations are used to schedule executions of compilation results from release configurations. Workflow configurations can also be used to select which actions from a workflow should be executed. For example, in your prod workflow configuration, you could schedule the execution of the prod release configuration every 30 minutes and specify that only the actions creating the final tables should be executed. This would ensure that your raw data is processed into intermediate tables and then into final tables on a regular basis, but it would also prevent the intermediate tables from being unnecessarily recreated every 30 minutes.
In your use case, you could create a release configuration for each environment (prod, uat, dev). Each release configuration could have different environment variables set, such as the BigQuery project ID and the dataset name. You could then create a workflow configuration for each environment that schedules the execution of the corresponding release configuration.
For example, your prod workflow configuration could schedule the execution of the prod release configuration every 30 minutes. This would ensure that your raw data is processed into intermediate tables and then into final tables on a regular basis.
It is important to note that while you can create separate workflow configurations for each environment, it is essential to ensure that the appropriate release configuration is used for each environment to avoid any inconsistencies or errors.
Here is a table that summarizes the key differences between release configurations and workflow configurations:
Feature | Release configuration | Workflow configuration |
---|---|---|
Purpose | Compiles SQL workflow code into executable SQL for a specific environment | Schedules executions of compilation results |
Can be run manually | Yes | Yes, but typically scheduled |
Can be used to set environment variables | Yes | No |
Can be used to select which actions from a workflow should be executed | No | Yes |
Release configurations are used to compile your SQL workflow code into executable SQL for a specific environment. This compilation process translates the project's code into SQL that can be executed against a data warehouse. Release configurations can also be used to set environment variables, such as database connection details and dataset names, which can vary between environments (dev, uat, prod).
Workflow configurations are used to schedule executions of compilation results from release configurations. Workflow configurations can also be used to select which actions from a workflow should be executed. For example, in your prod workflow configuration, you could schedule the execution of the prod release configuration every 30 minutes and specify that only the actions creating the final tables should be executed. This would ensure that your raw data is processed into intermediate tables and then into final tables on a regular basis, but it would also prevent the intermediate tables from being unnecessarily recreated every 30 minutes.
In your use case, you could create a release configuration for each environment (prod, uat, dev). Each release configuration could have different environment variables set, such as the BigQuery project ID and the dataset name. You could then create a workflow configuration for each environment that schedules the execution of the corresponding release configuration.
For example, your prod workflow configuration could schedule the execution of the prod release configuration every 30 minutes. This would ensure that your raw data is processed into intermediate tables and then into final tables on a regular basis.
It is important to note that while you can create separate workflow configurations for each environment, it is essential to ensure that the appropriate release configuration is used for each environment to avoid any inconsistencies or errors.
Here is a table that summarizes the key differences between release configurations and workflow configurations:
Feature | Release configuration | Workflow configuration |
---|---|---|
Purpose | Compiles SQL workflow code into executable SQL for a specific environment | Schedules executions of compilation results |
Can be run manually | Yes | Yes, but typically scheduled |
Can be used to set environment variables | Yes | No |
Can be used to select which actions from a workflow should be executed | No | Yes |