Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Question about filtering/billable bytes using date partition column with Function applied

Hi, 

I tried to do some tests and its not very clear to me. So, wanted to ask here as a newbie to BigQuery.

1. If I use a filter on partitioned date column(monthly partition), and I want to apply a function like below, will Big Query be able to do partition pruning if EXTRACT function is applied on partition col ? are there any better alternatives to writing this where clause filter ?
WHERE EXTRACT(YEAR FROM part_dt) BETWEEN 2019 and 2024.

2.  How can I check if the partition/cluster pruning actually happened ? is there a way to check in INFORMATION_SCHEMA for a query ?

3. If my query has two reference tables in such a query, how can I find out the billable bytes from each referenced tables ? 

0 3 497
3 REPLIES 3

Hello @crazyrevol,

Welcome to Google Cloud Community!

Please see my answers inline with your questions below:

  1. Yes, you can apply the EXTRACT function filter on the partition column. You can also refer to the other functions used in BigQuery to apply to your syntax.
  2. The easiest way to check if the table is partitioned is by viewing by the Details tab, as illustrated in the image below. Alternatively, you can use INFORMATION_SCHEMA.PARTITIONS or INFORMATION_SCHEMA.COLUMNS, to gather information about partitioned and clustered tables or columns. For more details, you can also refer to this documentation. Please note that this may incur cost when running and requires the use of standard SQL syntax.

Screenshot 2024-06-27 9.18.12 PM.png

3. BigQuery does not provide a direct way to determine the billable bytes from each table in a query at the table level. The total_bytes_billed can only be accessed within the INFORMATION_SCHEMA.JOBS view. I recommend checking the Estimate query costs for the lists of best practices in controlling costs in BigQuery

For more information about your table, you can also check INFORMATION_SCHEMA.TABLE. Additionally, you can also refer to BigQuery pricing for an overview.

I hope this helps.

 

 



Thank you for your reply @caryna . Clarifying question with regards to your answer to 1 and 2. 

1. you said I can use EXTRACT function on partition column. My question was NOT if I can use the function(or any of the functions here) but "Does usage of these functions on the partition column, prevent BQ from performing partition pruning ?"

2. How can I know if partition pruning has happened( if a query that using partition column filter in WHERE clause) in a query by lookin at the JOBS view, if at all possible. 

Hi @crazyrevol,

Please see my response to your questions below:

1. In addition to my previous response, while the EXTRACT function can be useful, its effectiveness can vary based on the context. Using it might potentially hinder optimization and result in less efficient queries. For best performance, make sure your partition column is either of type DATE or TIMESTAMP.

Alternatively, the optimal approach is to do the following:

  • Use Range Comparisons. 

Here's an example:

WHERE partition_table_name >= '2019-01-01' AND partition_table_name < '2025-01-01' 

Result: This will only scan from January 2019 to December 2024

  • Create Partitioned View
CREATE `your_project.your_dataset.createTableName_view`
PARTITION BY EXTRACT(YEAR FROM part_dt) AS year
AS
SELECT *
FROM `your_project.your_dataset.your_base_table`

Then create query to select data from the created partitioned view table:

SELECT *
FROM `your_project.your_dataset.createTableName_view`
WHERE year BETWEEN 2019 AND 2024

Result: This view will only scan data for years between 2019 and 2024.

2. You cannot directly determine if partition pruning occurred by looking at the JOBS view in BigQuery. The Jobs view is only limited to general information about the job. You can check the following schema under Jobs view, this includes creation time, job stages, etc.

Alternatively, here are the possible indicators when partition pruning works:

  • Check the query duration if it has a shorter execution time, this means pruning is happening.
  • Monitor  "BYTES_PROCESSED" and "BYTES_BILLED" metrics in the JOBS view. If they have smaller values than the total size of your table, it proves that pruning is also working.
  • You can also refer to optimize query computation for faster query execution and cost reduction.

Note: These indicators aren't concrete proof of partition pruning, but they can suggest its use.

I hope the above information is helpful.