Hello,
We are seeing Dataflow pipelines taking 2x to 3x more time to run in Apache beam SDK ver 2.50 compared to Apache beam SDK ver 2.44. As part of troubleshooting we compared the DAGS in 2.44 and 2.50 and we are seeing BQ read from table step in DAG (full table scan using DIRECT_TABLE_ACCESS) taking 3 sec to read 19 records / 13KB size in 2.44 and same exact pipeline with exactly same 19 records and 13KB size taking 1 min 5 sec in 2.50. Is this because this API has degraded in ver 2.50 since I also see throughput for this DAG step is much higher in 2.44 than 2.50. Please find the throughput graph images (elements/sec) below for both versions below
Throughput in ver 2.44 --> 0.15 sec (High)
Throughput in ver 2.50 --> 0.083 sec (Low)
Apache beam ver 2.44
Apache beam ver 2.50
Solved! Go to Solution.
Thanks much @ms4446 I shall open a ticket with beam support.
Here are some enhanced troubleshooting steps and considerations:
Small Scale Reproduction:
Different Beam Runners:
Environment and Configuration:
Monitoring and Logging:
Additional Considerations:
Documentation Review:
Small Scale Reproduction:
That is exactly what I did. As you can see dataset is very small (19 records and 13KB data read)
Different Beam Runners:
I have tried this option also and I am seeing same behaviour both in Direct Runner and DataFlow runner.
For the rest of the options I have just one comment - It is the exact same pipeline including input data that is acting on the same dataset and table only thing different is the apache beam SDK version.
The fact that the performance issue is reproducible on a small scale and with different Beam runners suggests that the issue is most likely related to the Beam SDK itself.
I would recommend that you open a bug report with the Beam team to report the issue. Be sure to include all of the relevant information, such as the Beam SDK version that you are using, the BigQuery table that you are reading from, and the throughput graph images that you have provided.
In the meantime, you may want to consider using Beam SDK version 2.44 until the performance issue in version 2.50 is resolved.
I apologize for the inconvenience that this is causing you.
Thanks much @ms4446 I shall open a ticket with beam support.