Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Filtering Big Query view with partition column

Hello.
I have a case where I need to filter data using a view and a partition column.

Table:

CREATE TABLE `testing-442412.test_dataset.Product`(
   Id INTEGER NOT NULL,
   UserId INTEGER NOT NULL,
   CompanyId INTEGER NOT NULL,
   DateCreated DATETIME NOT NULL
) PARTITION BY DATE_TRUNC(DateCreated, MONTH) CLUSTER BY UserId, CompanyId;

View:

CREATE VIEW `testing-442412.test_dataset.ProductView` AS
SELECT
   UserId,
   CompanyId,
   EXTRACT(YEAR FROM DateCreated) AS Year,
   EXTRACT(MONTH FROM DateCreated) AS Month,
   COUNT(DISTINCT Id) AS Total
   FROM `testing-442412.test_dataset.Product`
GROUP BY
   UserId,
   CompanyId,
   Year,
   Month

I WANT TO FILTER MY AGGREGATED DATA USING QUERY LIKE THIS:

SELECT * FROM `testing-442412.test_dataset.ProductView` WHERE Year = 2024 AND Month = 1;

However, when I do this, I notice that partition pruning doesn't seem to be working, as many records are read in the execution details.
How can I improve my view in this case? Does partition pruning even work with the EXTRACT function? Or do I need to use the DateCreated column directly without any manipulation, with a query like this:

SELECT * FROM `testing-442412.test_dataset.Product` WHERE DateCreated BETWEEN DATETIME("2024-01-01") AND DATETIME("2024-01-30");

I don't really want to force my clients to use table Product directly, so I hope you will be able to help me.

0 1 223
1 REPLY 1

Hi @DepressedCookie,

Welcome to Google Cloud Community!

Your original query couldn't be pruned because BigQuery couldn't easily translate the Year and Month conditions back to the underlying DateCreated column. The EXTRACT function obscures the relationship, preventing the optimizer from taking advantage of the partitioning.

To improve your view to enable partition pruning and allow you to query the aggregated data efficiently, you may include a column in your view that directly represents the partition key, without transformation, along with your aggregated data. This will allow queries against the view to filter on this column and leverage partition pruning.

For more guidance, feel free to check this introduction to partitioned tables. You can also refer to BigQuery's documentation on querying partitioned tables and best practices for partition pruning.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.