Execution order of Dataform workflows with the same tag name.

Hello Community,

I have the following situation in my project: More than one Dataform workflows under the same tag name. All of them merge data from a table to another table(different tables in each Dataform workflow).

My question is: in any possibility to choose the Dataform workflows execution order? I mean, can I force Dataform to execute the workflows with the same tag name in a given order? Or at least force Dataform to execute them sequentially? I want to avoid concurrent execution of the Dataform workflows with the same tag name.

Thank you.

Solved Solved
5 3 123
1 ACCEPTED SOLUTION

Unfortunately, Dataform currently lacks a direct method for controlling the execution order of workflows tagged identically. Here's an explanation of this limitation and some potential strategies you can employ:

Why Direct Ordering Isn't Possible:

  • Tag-Based Execution: Executing a Dataform workflow via a tag initiates all associated workflows simultaneously. Dataform does not have built-in mechanisms to prioritize or sequence these executions, which may cause issues where execution order is critical.

  • Parallel Execution: Dataform is designed to execute workflows in parallel to maximize efficiency. This can lead to simultaneous processing of workflows that share a tag, complicating scenarios where sequential execution is necessary.

Potential Workarounds:

These strategies might help you manage or circumvent issues related to concurrent executions:

  • Dependencies:

    • Logical Dependencies: You can use Dataform’s dependency management feature if one workflow must logically precede another (e.g., its output is necessary for another workflow). Define these dependencies within your SQLX files to prevent subsequent workflows from starting until their prerequisites are fulfilled.

  • Separate Tags:

    • Control Over Timing: If strict ordering is essential and dependencies alone are inadequate, consider assigning unique tags to each workflow. This approach allows for meticulous control over execution timing, which can be coordinated using external tools such as Google Cloud Composer or Google Cloud Scheduler.

  • Custom Orchestration:

    • Handling Complex Scenarios: For more complex requirements, external orchestration might be necessary. This can be achieved through services like Google Cloud Functions or Google Cloud Run, which can trigger workflows in a specific sequence based on custom logic or external triggers.

Important Considerations:

  • Data Integrity: Concurrent modifications to the same data sets can lead to inconsistencies or race conditions. It's crucial to manage the execution order carefully when workflows share data dependencies.

  • Performance Implications: Introducing external orchestration solutions can add complexity and overhead to your data pipeline. It is important to evaluate these potential impacts against the benefits of achieving controlled execution.

View solution in original post

3 REPLIES 3

Unfortunately, Dataform currently lacks a direct method for controlling the execution order of workflows tagged identically. Here's an explanation of this limitation and some potential strategies you can employ:

Why Direct Ordering Isn't Possible:

  • Tag-Based Execution: Executing a Dataform workflow via a tag initiates all associated workflows simultaneously. Dataform does not have built-in mechanisms to prioritize or sequence these executions, which may cause issues where execution order is critical.

  • Parallel Execution: Dataform is designed to execute workflows in parallel to maximize efficiency. This can lead to simultaneous processing of workflows that share a tag, complicating scenarios where sequential execution is necessary.

Potential Workarounds:

These strategies might help you manage or circumvent issues related to concurrent executions:

  • Dependencies:

    • Logical Dependencies: You can use Dataform’s dependency management feature if one workflow must logically precede another (e.g., its output is necessary for another workflow). Define these dependencies within your SQLX files to prevent subsequent workflows from starting until their prerequisites are fulfilled.

  • Separate Tags:

    • Control Over Timing: If strict ordering is essential and dependencies alone are inadequate, consider assigning unique tags to each workflow. This approach allows for meticulous control over execution timing, which can be coordinated using external tools such as Google Cloud Composer or Google Cloud Scheduler.

  • Custom Orchestration:

    • Handling Complex Scenarios: For more complex requirements, external orchestration might be necessary. This can be achieved through services like Google Cloud Functions or Google Cloud Run, which can trigger workflows in a specific sequence based on custom logic or external triggers.

Important Considerations:

  • Data Integrity: Concurrent modifications to the same data sets can lead to inconsistencies or race conditions. It's crucial to manage the execution order carefully when workflows share data dependencies.

  • Performance Implications: Introducing external orchestration solutions can add complexity and overhead to your data pipeline. It is important to evaluate these potential impacts against the benefits of achieving controlled execution.

Hello @ms4446 ,

Thank you very much for the detailed explanations and for the proposed solutions. Your kindness is much appreciated!

Hello @ms4446,

I have kept the same tag name but I have added the Logica l Dependencies as you suggested.
By using the proposed workaround I am able to set up the order of execution between workflows that have the same tag name.

Thank you for the provided solution.