Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Same query, different costs [BigQuery]

I have a table with daily partitions by the triggered_at (timestamp) column. When writing a query in the Google cloud console, it is possible to see the costs of processing.

Today I noticed such an anomaly. I wrote a query using where clause on the triggered_atfield, the cost was 4.49 GB. Then I accidentally added whitespace character to my query and the cost decreased = 163.63 MB. After that I could repeat the action and the cost constantly changed. For the sake of interest, I tried two more ways of using where clause and the cost dropped dramatically and did not change anymore (14.27 and 11.85 MB).

first_attempt_1.jpg

first_attempt_2.jpg

second_attempt.jpg

third_attempt.jpg

I tried to execute the first version of the query a couple of times. As a result, on job information tab I could see bytes processed/bytes billed, and they were:

first attempt = 3.35 GB 
second attempt = 3.37 GB
third attempt = 1.31 GB (maybe already cached)

Actually there are two questions:

  1. why adding just whitespace character affects costs?

  2. why is there such a serious difference between options:

date(triggered_at, 'Europe/Moscow') = date'2023-10-01'
triggered_at = timestamp'2023-10-01 00:00:00+0300'
triggered_at = '2023-10-01'

I assume that my table is partitioned by UTC time and in case of switching to the Moscow timezone the BigQuery optimizer checks more than one partition. But why do the costs not change in the second case?

0 1 1,552
1 REPLY 1

In Google BigQuery, the cost of processing a query is determined by the amount of data that the query scans. Here are some insights into the behavior you've observed:

  1. Columnar Storage:

    • BigQuery's columnar storage system is designed to efficiently read and process data by scanning only the necessary columns. The presence of whitespace in a query should not influence the amount of data read from storage, as BigQuery's query parser ignores extraneous whitespace.
  2. Cost Model:

    • The cost of a query in BigQuery is based on the volume of data processed. While it's true that efficient queries can reduce costs by minimizing the data scanned, the addition of whitespace does not inherently make a query less efficient or cause it to scan more data.
  3. Caching:

    • BigQuery's caching system can indeed affect query costs. If the same query is executed multiple times and the underlying data has not changed, the results may be served from the cache, which does not incur additional costs. However, the cache is based on the exact text of the query, so whitespace should not impact the cache hit or miss.
  4. Query Optimization:

    • The differences in cost between the various WHERE clause formulations are likely due to how BigQuery's query optimizer interprets the different expressions. The optimizer decides the best way to execute a query, which can result in different amounts of data being scanned, especially when time zone conversions or different timestamp formats are involved.
  5. Tips for Writing Efficient BigQuery Queries:

    • Ensure you're using the appropriate data types for your columns.
    • Write concise WHERE clauses to filter your data effectively.
    • Use GROUP BY and ORDER BY clauses judiciously to aggregate and sort your data.
    • Avoid unnecessary subqueries when possible.
    • Implement data partitioning and clustering to optimize query performance.

If you're encountering inconsistent query costs, especially with changes as minor as adding whitespace, it may be worth reaching out to Google Cloud support for assistance. They can provide a more detailed analysis and help ensure that your queries are both cost-effective and performant.