Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Bigquery Error when using partition filter and false condition in merge subquery

I have encountered a weird bug relating to required partition filter columns and merge statements.

The summary is basically when there is both a filter on the partition column AND a `false` condition (which may occur as we use dbt and dynamically populate this value with true or false), the query itself runs with no issue but when it gets used as part of the merge statement, BigQuery returns the `Cannot query over table 'test_partitioned_table' without a filter over column(s) 'tstamp' that can be used for partition elimination` error, despite the filter clearly being in the query.

I assume that the optimizer is somehow removing this filter (because it sees the other clause will always be false), but this is leading to the error. For some reason this isn't removed when it isn't part of a merge statement. 

A minimum working example to see the issue is below, does anyone have any experience or can confirm this is a bug?

 

CREATE TABLE
 test_partitioned_table ( id string,
    tstamp timestamp )
PARTITION BY
  TIMESTAMP_TRUNC(tstamp, DAY) OPTIONS ( require_partition_filter = TRUE);

INSERT INTO
  dbt_ryan.test_partitioned_table (id,
    tstamp)
VALUES
  ('hello', timestamp '2023-05-01');

INSERT INTO
 test_partitioned_table (id,
    tstamp)
VALUES
  ('world', timestamp '2023-05-02');

CREATE TABLE
 test_partitioned_table2 ( id string,
    start_tstamp timestamp )
PARTITION BY
  TIMESTAMP_TRUNC(start_tstamp, DAY) OPTIONS ( require_partition_filter = FALSE);

-- this works with no issue
SELECT
  id,
  tstamp AS start_tstamp
FROM
 test_partitioned_table
WHERE
  tstamp BETWEEN CAST('2023-04-01 00:00:00+00:00' AS timestamp )
  AND CAST('2023-05-03 00:00:00+00:00' AS timestamp )
  AND FALSE 

-- This causes and error based on not filtering on the partition key
MERGE INTO
 test_partitioned_table2 dest
USING
  (
  SELECT
    id,
    tstamp AS start_tstamp
  FROM
   test_partitioned_table
  WHERE
    tstamp BETWEEN CAST('2023-04-01 00:00:00+00:00' AS timestamp )
    AND CAST('2023-05-03 00:00:00+00:00' AS timestamp )
    AND FALSE ) src
ON
  (dest.id = src.id)
  WHEN MATCHED THEN UPDATE SET dest.id = src.id, dest.start_tstamp = src.start_tstamp
  WHEN NOT MATCHED
  THEN
INSERT
  (id,
    start_tstamp)
VALUES
  (id, start_tstamp)

 

4 14 13.9K
14 REPLIES 14