Hello,
I'm having two problems when running Dataform ETLs from Airflow. For example, when running an ETL locally from the production release in Datafor that takes 5 minutes, from Airflow it takes 7 or 8 minutes. It always takes from 40% to 60% more time. Am I missing something?
The way I'm doing it is by using 3 tasks/operators per ETL and a task group:
Also, I always get a retry on the DataformWorkflowInvocationStateSensor because a timeout. Any ideas?
Hi @caracena,
Welcome to Google Cloud Community!
The discrepancy in execution time between running your Dataform ETL locally versus within an Airflow DAG suggests potential overhead introduced by the Airflow environment.
Here some potential causes and solutions for both your performance and timeout issues:
Performance Issues (Increased ETL Time in Airflow):
DataformCreateCompilationResultOperator
and DataformCreateWorkflowInvocationOperator
add extra API calls before ETL execution.Improving Performance:
DataformWorkflowInvocationStateSensor
Timeouts and Retries:
The timeout you're encountering with DataformWorkflowInvocationStateSensor
likely indicates that the Dataform workflow is taking longer to complete than the sensor's configured timeout period.
Possible solutions for timeouts:
DataformWorkflowInvocationStateSensor
, but avoid excessively long timeouts. It's better to identify and resolve the root cause of the long execution time.By systematically investigating these factors and implementing the proposed solutions, you should be able to significantly reduce the execution time and eliminate the timeouts within your Airflow Dataform ETL pipeline. Remember to thoroughly test any changes you make to ensure they achieve the desired result without introducing new problems.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.