Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataform Incremental table - How to limit the table scan on the target table

Hi All,

I've the following target: minimize the Mbytes used by Bigquery to load an incremental table.

This is the case:

Source table is a view which select only new added data from a source table:

 

 

 

create or replace SOURCE_VIEW
as
select *
from SOURCE_TABLE
WHERE ingestionTime >= maxIngestionTime

 

 

 

I've partitioned the target table on a date field which should be a good indicator if a record could be modified or not

 

 

 

config {
    name    : "whmovementTEST",
    schema  : "ptestds1_bronze",
    type    : "incremental",
    uniqueKey : ["key1", "key2"],
    bigquery: {
        partitionBy: "TIMESTAMP_TRUNC(compDate, MONTH)"
    }
}

 

 

 

I'd like that dataform generates in incremental mode a merge code like following:

 

 

 

MERGE into ${self()} AS target
USING (
select * 
from ${ref("SOURCE_VIEW")}
) as source
ON  target.key1= source.key1
AND target.key2= source.key2
${when(incremental(),
    `AND  target.comDate> (SELECT TIMESTAMP_ADD(MAX(compDate), -1, MONTH) FROM ${self()})`)}
}
WHEN MATCHED UPDATE ....
WHEN NOT MATCHED INSER ....

 

 

 

to scan only some of the partions on target table

How to customize the "ON" condition ? There could be an other way to reach the same purpose?

Thanks

 

0 1 115
1 REPLY 1

have you tried the updatePartitionFilter option in the config block ?