Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

calling a pyspark stored procedure in bigquery takes a long time

is there a way to speed this up? the script itself is very fast, but the action of actually calling the stored procedure takes over 60 seconds. For example, I run the CALL statement starting at 3:24:38PM. The first log statement is not printed until 3:25:53pm. What is happening in those 75 seconds, between starting the run with the CALL statement, and actually executing the pyspark script? 

Is it possible the 75 seconds is the execution of the script, and the log statements (print statements within the pyspark script) are all dumped at the end? The first and last log statements are tagged within the same second, even though when I'm watching the log stream they do not come in at the same second.

Creation time: Jun 13, 2024, 3:24:38 PM UTC-4

Start time: Jun 13, 2024, 3:24:38 PM UTC-4
End time: Jun 13, 2024, 3:26:18 PM UTC-4
Duration: 1 min 40 sec
 
First log statement:
2024-06-13 15:25:53.000 EDT: Using the default container image
(^ Jun 13, 2024, 3:25:53 PM UTC-4)
Last log statement:
2024-06-13T19:25:53.000 EDT: success
(^ Jun 13, 2024, 3:25:53 PM UTC-4)

3 REPLIES 3