Hi,
I tried to do some tests and its not very clear to me. So, wanted to ask here as a newbie to BigQuery.
1. If I use a filter on partitioned date column(monthly partition), and I want to apply a function like below, will Big Query be able to do partition pruning if EXTRACT function is applied on partition col ? are there any better alternatives to writing this where clause filter ?
WHERE EXTRACT(YEAR FROM part_dt) BETWEEN 2019 and 2024.
2. How can I check if the partition/cluster pruning actually happened ? is there a way to check in INFORMATION_SCHEMA for a query ?
3. If my query has two reference tables in such a query, how can I find out the billable bytes from each referenced tables ?
Hello @crazyrevol,
Welcome to Google Cloud Community!
Please see my answers inline with your questions below:
3. BigQuery does not provide a direct way to determine the billable bytes from each table in a query at the table level. The total_bytes_billed can only be accessed within the INFORMATION_SCHEMA.JOBS view. I recommend checking the Estimate query costs for the lists of best practices in controlling costs in BigQuery
For more information about your table, you can also check INFORMATION_SCHEMA.TABLE. Additionally, you can also refer to BigQuery pricing for an overview.
I hope this helps.
Thank you for your reply @caryna . Clarifying question with regards to your answer to 1 and 2.
1. you said I can use EXTRACT function on partition column. My question was NOT if I can use the function(or any of the functions here) but "Does usage of these functions on the partition column, prevent BQ from performing partition pruning ?"
2. How can I know if partition pruning has happened( if a query that using partition column filter in WHERE clause) in a query by lookin at the JOBS view, if at all possible.
Hi @crazyrevol, Please see my response to your questions below: 1. In addition to my previous response, while the EXTRACT function can be useful, its effectiveness can vary based on the context. Using it might potentially hinder optimization and result in less efficient queries. For best performance, make sure your partition column is either of type DATE or TIMESTAMP. Alternatively, the optimal approach is to do the following:
Here's an example:
Result: This will only scan from January 2019 to December 2024
Then create query to select data from the created partitioned view table:
Result: This view will only scan data for years between 2019 and 2024. 2. You cannot directly determine if partition pruning occurred by looking at the JOBS view in BigQuery. The Jobs view is only limited to general information about the job. You can check the following schema under Jobs view, this includes creation time, job stages, etc. Alternatively, here are the possible indicators when partition pruning works:
Note: These indicators aren't concrete proof of partition pruning, but they can suggest its use. I hope the above information is helpful. |