Solved: Big Query table partition behavior with views - Page 2

a_shah · 10-10-2023 03:38 AM

Hello All.

I have a bq table that is partitioned on one of the date columns. A view was created on top of this tables as (select * from table).

If I use this view to query the table and filter on the partition column then what will happen

a. the query will scan the whole table based on the view definition?

b. bq will intelligently use the partition column filter in my query to select limited data?

Please share your thoughts on this.

~Ashish

ms4446

BigQuery will intelligently use the partition column filter in your query to select limited data. BigQuery is capable of pruning partitions based on the partition column filter in your query. This means that BigQuery will only scan the partitions that match the filter criteria and skip the remaining partitions. This can significantly improve the performance of your queries, especially if you are querying a large partitioned table.

For example, if you have a table partitioned on a column named <partition_column>, and you create a view on top of this table as SELECT * FROM table, then the following query will only scan the partition for data with a value of '2023-10-10' for that column:

SELECT * FROM view WHERE <partition_column> = '2023-10-10'

BigQuery will do this even if the partition column isn't explicitly selected in the view. This is because BigQuery is able to infer that the partition column filter in the query should be applied to the underlying table of the view.

However, there are a few things to keep in mind when using partition pruning with views:

The view must be based on a partitioned table.
The partition column filter in the query must match the partition column of the underlying table.
Even if the partition column isn't explicitly selected in the view, BigQuery can still perform partition pruning when querying the underlying table. However, for clarity and best practices, it's generally a good idea to include the partition column in the view.
Always test and verify that your queries are being optimized as expected by examining the query execution details to ensure only the expected partitions are being read.

If any of these conditions are not met, BigQuery might not be able to prune partitions efficiently and could end up scanning more data than necessary, leading to increased costs and query times.

View solution in original post