I have encountered a weird bug relating to required partition filter columns and merge statements.
The summary is basically when there is both a filter on the partition column AND a `false` condition (which may occur as we use dbt and dynamically populate this value with true or false), the query itself runs with no issue but when it gets used as part of the merge statement, BigQuery returns the `Cannot query over table 'test_partitioned_table' without a filter over column(s) 'tstamp' that can be used for partition elimination` error, despite the filter clearly being in the query.
I assume that the optimizer is somehow removing this filter (because it sees the other clause will always be false), but this is leading to the error. For some reason this isn't removed when it isn't part of a merge statement.
A minimum working example to see the issue is below, does anyone have any experience or can confirm this is a bug?
CREATE TABLE
test_partitioned_table ( id string,
tstamp timestamp )
PARTITION BY
TIMESTAMP_TRUNC(tstamp, DAY) OPTIONS ( require_partition_filter = TRUE);
INSERT INTO
dbt_ryan.test_partitioned_table (id,
tstamp)
VALUES
('hello', timestamp '2023-05-01');
INSERT INTO
test_partitioned_table (id,
tstamp)
VALUES
('world', timestamp '2023-05-02');
CREATE TABLE
test_partitioned_table2 ( id string,
start_tstamp timestamp )
PARTITION BY
TIMESTAMP_TRUNC(start_tstamp, DAY) OPTIONS ( require_partition_filter = FALSE);
-- this works with no issue
SELECT
id,
tstamp AS start_tstamp
FROM
test_partitioned_table
WHERE
tstamp BETWEEN CAST('2023-04-01 00:00:00+00:00' AS timestamp )
AND CAST('2023-05-03 00:00:00+00:00' AS timestamp )
AND FALSE
-- This causes and error based on not filtering on the partition key
MERGE INTO
test_partitioned_table2 dest
USING
(
SELECT
id,
tstamp AS start_tstamp
FROM
test_partitioned_table
WHERE
tstamp BETWEEN CAST('2023-04-01 00:00:00+00:00' AS timestamp )
AND CAST('2023-05-03 00:00:00+00:00' AS timestamp )
AND FALSE ) src
ON
(dest.id = src.id)
WHEN MATCHED THEN UPDATE SET dest.id = src.id, dest.start_tstamp = src.start_tstamp
WHEN NOT MATCHED
THEN
INSERT
(id,
start_tstamp)
VALUES
(id, start_tstamp)