Hi,
I have a dataform incremental table where I would need to delete the past 3 days of data from the table. However, it seems like the `delete from ${self()} where event_date >= event_date_checkpoint` is scanning the entire table (as it shows the script will process 930.32 GB). If I run the delete script on BQ UI, it will be only 0 byte.
This will be very costly when running the incremental table everyday if the delete function itself is processing 900 GB. May I know if the suggestion of process byte is a correct value? If yes, how do I optimize the delete function to only process the specific partition date
Any guidance or suggestions would be greatly appreciated!
Hi @Yinnnnn ,
It seems the current operation isn't fully leveraging your table's partitioning, leading to unnecessary scans and increased costs.
Here's how you can optimize this:
Direct Partition Deletion:
Since your deletion criteria align perfectly with the event_date
partition, this is the most efficient approach. The following query directly targets the relevant partitions:
DELETE FROM your_table
WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)
AND event_date < CURRENT_DATE();
Dataform when
Clause (Optional):
For more granular control over deletion timing, use the when
clause in your Dataform configuration:
config {
type: "incremental",
...
when: options.full_refresh AND options.run_deletions
}
This ensures deletions only occur during full refreshes when options.run_deletions
is true.
Additional Recommendations: