Solved: How to get the Slot MS by BigQuery API and calcula...

songxxx · 08-04-2024 07:39 PM

Hi team, I'm trying to use the following code to get the SlotMs of a BigQuery job

val slotMs = bigquery.getJob(JobId.of(bqJobId.jobId)).getStatistics[QueryStatistics].getTotalSlotMs

and calculate the cost based on the charge $12 per slot per month

val cost = (slotMs * 12).toFloat / (1000.0f * 60 * 60 * 24 * 30)

When testing it, I run a same query job twice. The first time it took more than 4 hours, and the second time it only took 10+ minutes to finish. It seems the second job uses the result in cache directly. But the slotMS and cost got by the code above are very similar(one is $7.3 and one is $7.7). It's very counter-intuitive since the second job takes much less time and should have much less cost. Can someone help me understand why? Thanks a lot in advance

jangemmar

Hi @songxxx,

Welcome to Google Cloud Community!

While BigQuery offers flat-rate pricing options, the default and most common billing method is on-demand pricing. This means that you are charged for the amount of data processed by your queries, not directly for the execution time or slots used. Yes, Slot usage is a factor in determining how quickly your query completes, of course more slots means faster processing, but still the cost is determined by the volume of data scanned, regardless of how many slots are allocated or the duration of the query.

The small difference in slotMs between your two runs suggests that the overhead of utilizing the cache was minimal compared to the actual data processing time of your initial query.

Note: Don't rely solely on slotMs to estimate BigQuery costs.

I hope the above information is helpful.

View solution in original post

jangemmar

Hi @songxxx,

Welcome to Google Cloud Community!

While BigQuery offers flat-rate pricing options, the default and most common billing method is on-demand pricing. This means that you are charged for the amount of data processed by your queries, not directly for the execution time or slots used. Yes, Slot usage is a factor in determining how quickly your query completes, of course more slots means faster processing, but still the cost is determined by the volume of data scanned, regardless of how many slots are allocated or the duration of the query.

The small difference in slotMs between your two runs suggests that the overhead of utilizing the cache was minimal compared to the actual data processing time of your initial query.

Note: Don't rely solely on slotMs to estimate BigQuery costs.

I hope the above information is helpful.

shubham21feb

Hi @songxxx

Thanks for the answer. I am calculating the cost on the basis of totalbytesprocessed which i get from Information_schema table but sometimes i get 0 as the totalbytesprocess and hugenumber in totalslotsms columns. Can you please help in understanding what should we do in this case? How can i calculate the cost when Totalslotsms is coming for longer queries.

@jangemmar

Can you please/guide help on this ?

How to get the Slot MS by BigQuery API and calculate the cost