Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Does Dataform Inbuilt scheduling provide downstream triggering option?

I have two Dataform repositories: "Repo A" and "Repo B". I want to schedule the workflow of Repo A (to run at 5 AM every day, say) and the Repo B workflow should run only once Repo A run is successfully finished. I can set Repo A as the dependency for Repo B by declaring one of the tables from Repo A as a source in Repo B, but will the Dataform inbuilt scheduling work? Basically, does Dataform inbuild scheduling provide downstream triggering like Airflow or other orchestration tools do?

Solved Solved
1 2 143
1 ACCEPTED SOLUTION

 

Dataform's built-in scheduling does not currently support downstream triggering capabilities similar to those found in orchestration tools like Airflow. Instead, it focuses on scheduling individual workflows or repositories at specific times or intervals, without providing the ability to create dependencies between them.

Declaring a table from Repo A as a source in Repo B creates an implicit dependency, but it does not ensure that Repo B will automatically trigger after Repo A finishes. Dataform's scheduler operates independently for each workflow based on the specified schedule. This means that Repo B might run according to its schedule, regardless of whether Repo A has successfully completed or not.

Moreover, Dataform's scheduler does not have a mechanism to check the success status of Repo A before triggering Repo B. This could result in Repo B running even if Repo A fails, potentially leading to incomplete or inaccurate data processing.

Alternatives for Downstream Triggering

To achieve downstream triggering, you can consider the following alternatives:

  1. External Orchestration Tools: Utilizing external orchestration tools like Airflow, Dagster, or Prefect allows you to manage dependencies and triggering more effectively. You can schedule Repo A in Dataform and have your chosen orchestration tool monitor its completion to subsequently trigger Repo B.

  2. Custom Scripting: Writing a custom script, such as one using Cloud Functions, can also help. This script would be triggered upon the successful completion of Repo A and could then initiate the Repo B workflow through Dataform's API.

  3. Dataform API: Exploring Dataform's API might reveal options to programmatically trigger workflows. A custom solution could monitor the status of Repo A and use the API to trigger Repo B when Repo A completes successfully.

View solution in original post

2 REPLIES 2

 

Dataform's built-in scheduling does not currently support downstream triggering capabilities similar to those found in orchestration tools like Airflow. Instead, it focuses on scheduling individual workflows or repositories at specific times or intervals, without providing the ability to create dependencies between them.

Declaring a table from Repo A as a source in Repo B creates an implicit dependency, but it does not ensure that Repo B will automatically trigger after Repo A finishes. Dataform's scheduler operates independently for each workflow based on the specified schedule. This means that Repo B might run according to its schedule, regardless of whether Repo A has successfully completed or not.

Moreover, Dataform's scheduler does not have a mechanism to check the success status of Repo A before triggering Repo B. This could result in Repo B running even if Repo A fails, potentially leading to incomplete or inaccurate data processing.

Alternatives for Downstream Triggering

To achieve downstream triggering, you can consider the following alternatives:

  1. External Orchestration Tools: Utilizing external orchestration tools like Airflow, Dagster, or Prefect allows you to manage dependencies and triggering more effectively. You can schedule Repo A in Dataform and have your chosen orchestration tool monitor its completion to subsequently trigger Repo B.

  2. Custom Scripting: Writing a custom script, such as one using Cloud Functions, can also help. This script would be triggered upon the successful completion of Repo A and could then initiate the Repo B workflow through Dataform's API.

  3. Dataform API: Exploring Dataform's API might reveal options to programmatically trigger workflows. A custom solution could monitor the status of Repo A and use the API to trigger Repo B when Repo A completes successfully.

This makes sense. Thank you so much for the prompt response.