Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataform ETLs too slow when run from Airflow

Hello,

I'm having two problems when running Dataform ETLs from Airflow. For example, when running an ETL locally from the production release in Datafor that takes 5 minutes, from Airflow it takes 7 or 8 minutes. It always takes from 40% to 60% more time. Am I missing something?

The way I'm doing it is by using 3 tasks/operators per ETL and a task group:

  • TaskGroup
  • DataformCreateCompilationResultOperator
  • DataformCreateWorkflowInvocationOperator
  • DataformWorkflowInvocationStateSensor
 
 
Is there a better (faster) way?

Also, I always get a retry on the DataformWorkflowInvocationStateSensor because a timeout. Any ideas?

0 1 215
1 REPLY 1

Hi @caracena,

Welcome to Google Cloud Community!

The discrepancy in execution time between running your Dataform ETL locally versus within an Airflow DAG suggests potential overhead introduced by the Airflow environment.

Here some potential causes and solutions for both your performance and timeout issues:

  • Performance Issues (Increased ETL Time in Airflow):

    • Network latency between Airflow workers and Dataform can slow down execution. Ensure both are in the same region to reduce latency.
    • Resource Contention: Airflow workers may be overloaded, causing slow ETL jobs. Monitor resource usage and consider increasing resources or adding more workers to alleviate bottlenecks.
    • The Airflow DAG introduces overhead from scheduling and monitoring tasks. Additionally, the DataformCreateCompilationResultOperator and DataformCreateWorkflowInvocationOperator add extra API calls before ETL execution.
    • Using XComs to pass data between tasks can add overhead. Minimize the data transferred and optimize the transfer method if possible.
  • Improving Performance:

    • Optimize Network Configuration: Ensure your Airflow environment and Dataform repository are in the same Google Cloud region, and check for any network bottlenecks.
    • Resource Allocation: Increase the CPU and memory resources allocated to your Airflow worker nodes. If feasible, consider using more powerful machine types.
    • Reduce API Calls: Check if you can combine the compilation result and workflow invocation into a single operation by reviewing Dataform's API for a direct invocation method.
    • Monitoring: Implement more granular monitoring to pinpoint the exact stage where the slowdown occurs. This could help identify whether it's the Airflow overhead, the Dataform execution time itself, or network latency.
  • DataformWorkflowInvocationStateSensor Timeouts and Retries:

    The timeout you're  encountering with DataformWorkflowInvocationStateSensor likely indicates that the Dataform workflow is taking longer to complete than the sensor's configured timeout period.

    Possible solutions for timeouts:

    • Increase Timeout: Increase the timeout in your DataformWorkflowInvocationStateSensor, but avoid excessively long timeouts. It's better to identify and resolve the root cause of the long execution time.
    • Poke Interval: Adjust the poke_interval to balance between frequent status checks and resource usage. A shorter interval increases API calls, while a longer interval reduces overhead but delays detection of completion or failure.
    • Improved Error Handling: Rather than relying solely on timeouts, implement more robust error handling. Consider adding a task to check for Dataform errors directly after the workflow invocation (e.g., a custom operator that retrieves the workflow's logs) instead of simply waiting for completion.

By systematically investigating these factors and implementing the proposed solutions, you should be able to significantly reduce the execution time and eliminate the timeouts within your Airflow Dataform ETL pipeline. Remember to thoroughly test any changes you make to ensure they achieve the desired result without introducing new problems.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.