BigQuery not taking advantage of WHERE predicate

We tried comparing the performance of BigQuery on the Star Schema Benchmark described here. The benchmark creates a table which is partitioned on Year and clustered on date.

morpheus_0-1665965592523.png

When following query was run (Q1.3):

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE (toISOWeek(LO_ORDERDATE) = 6) AND (toYear(LO_ORDERDATE) = 1994) AND ((LO_DISCOUNT >= 5) AND (LO_DISCOUNT <= 7)) AND ((LO_QUANTITY >= 26) AND (LO_QUANTITY <= 35))
we found BQ did a scan of all the rows in partition corresponding to the year 1994. 
 
Expected: since the data is clustered (ordered) on the date, it should only have scanned a much smaller subset of the data. For comparison Clickhouse does just that - it only processes the rows corresponding to week 6 which is 50x less (there are 52 weeks in a year).
 
Conclusion: clustering the data has seemingly no benefit
 
Why is it like this? Can anyone explain this?
 
We saw same behavior for queries Q1.2, Q3.4. In those queries there is a predicate on the month clause in addition to the year. But BQ ends up processing the entire partition instead of just the rows in the month.
 
Any explanation and guidance appreciated. we are quite impressed with BQ's speed and massive parallelization.
0 2 139
2 REPLIES 2

RC1
Bronze 4
Bronze 4

@morpheus I think clustering benefits when your data size is greater than 1 GB. What is the size of the data in 1994 partition and what is the size of your overall data ?

here are the details of the table we are querying:

morpheus_0-1666022326113.png

the 1994 partition has 303M rows in it.