Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

BQ I/O Connector - Storage Read API (Direct table access) - Performance degrade seen in Apache B2.50

Hello,

  We are seeing Dataflow  pipelines taking 2x to 3x more time to run in Apache beam SDK ver 2.50 compared to Apache beam SDK ver 2.44. As part of troubleshooting we compared the DAGS in 2.44 and 2.50 and we are seeing BQ read from table step in DAG (full table scan using DIRECT_TABLE_ACCESS) taking 3 sec to read 19 records / 13KB size in 2.44 and same exact pipeline with exactly same 19  records   and 13KB size taking 1 min 5 sec in 2.50. Is this because this API has degraded in ver 2.50 since I also see throughput for this DAG step is much higher in 2.44 than 2.50. Please find the  throughput graph images (elements/sec)  below for both versions below 

Throughput in ver 2.44 --> 0.15 sec (High)

Throughput in ver 2.50 --> 0.083 sec (Low)

Apache beam ver 2.44Apache beam ver 2.44Apache beam ver 2.50Apache beam ver 2.50

 

 

Solved Solved
0 4 320
1 ACCEPTED SOLUTION

Thanks much @ms4446 I shall open a ticket with beam support.

View solution in original post

4 REPLIES 4