Hello, i'm testing DataForm instead of DBT, and i'm missing a feature we use a lot.
In DBT incremental mode we have a mode called incremental_overwrite, what it does is the following:
```
- create a temp table with the data you want to add incrementally (table_temp)
- set partitions_to_delete = select distinct <partition_column> from table_temp
- MERGE on <table> source USING(table_temp dest) ON False
when not matched by source and dest.<partition_column> IN (partitions_to_delete) then delete
when not matched by source then insert ...
```
It's pretty nice when the data volume is importance (making merge too slow) and no clear id exists to deduplicate.
I'm looking to do this and struggle a bit, i saw those conversations about custom merge logic: https://www.googlecloudcommunity.com/gc/Data-Analytics/Use-custom-merge-logic-for-incremental-table-... and the option Creating a Custom Operations SQLX File.
Would those same options apply ? or is there an easier way ? or a repo where i could check a few merge strategies already written ?
Thanks
While Dataform doesn't have a direct equivalent to incremental_overwrite
, you can achieve a similar effect with the following approach:
Custom Merge Logic:
type: "incremental"
.post_operations
block. This will handle partition deletion and insertion based on your data.Example:
config { type: "incremental" }
pre_operations {
-- Logic to create/populate 'table_temp' if needed
}
post_operations {
DELETE FROM `${self()}`
WHERE <partition_column> IN (SELECT DISTINCT <partition_column> FROM table_temp);
INSERT INTO `${self()}`
SELECT * FROM table_temp;
}
Important Considerations:
partition_column
for optimal efficiency.