Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataform Incremental Table Merge to also DELETE

Hello Google Community, hoping for some help here!

I've got a Dataform JavaScript file that needs to merge a source to a target using a specific condition -

If unique key is matched and field1 = value1: DELETE
If unique key is matched and field1 != value1: UPDATE
If unique key is not matched : INSERT

Currently I'm publishing the source as an incremental table with the following config -

 publish(table_name,
{
type: 'incremental',
uniqueKey: [`${tabledata['key']}`],
bigquery: {
partitionBy: "DATE(modified_pt)",
stamp_sub(current_timestamp(), interval 24 hour)"
}
}
)

This takes care of the INSERT and UPDATE operations, but I'm having to do the DELETE as a postOps. I'm not too pleased with this as the source (that populates the table publish) is generated through a complex set of CTEs and I'm having to execute the exact same CTEs again for the postOps DELETE. The only difference is the field1:value1 condition that decides whether I need to DELETE v/s UPDATE, so I'm wondering if there's a way to do all define the conditions for 3 operations just once?

Looking for something similar to the below example that I found here - https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_statement

MERGE dataset.NewArrivals T
USING
(SELECT * FROM dataset.NewArrivals WHERE warehouse <> 'warehouse #2') S
ON T
.product = S.product
WHEN MATCHED AND T
.warehouse = 'warehouse #1' THEN
  UPDATE SET quantity
= T.quantity + 20
WHEN MATCHED THEN
  DELETE

 

1 1 2,509
1 REPLY 1

Yes, you can achieve all three operations (INSERT, UPDATE, and DELETE) with a single Dataform JavaScript file using the BigQuery MERGE statement. Here's how you can adapt the example you found to your scenario:

 
-- Define source table with CTEs
WITH source_data AS (
  -- ... Your complex CTE definitions here ...
)

-- Merge source data into target table
MERGE INTO your_target_table_name AS T
USING source_data AS S
ON T.your_unique_key = S.your_unique_key

-- Handle UPDATE and DELETE conditions based on field1
WHEN MATCHED AND S.field1 = 'value1' THEN DELETE
WHEN MATCHED THEN UPDATE SET
  -- ... Column assignments for UPDATE ...
WHEN NOT MATCHED THEN INSERT (
  -- ... Column names for INSERT ...
) VALUES (
  -- ... Corresponding values for INSERT ...
)
  • WHEN MATCHED AND S.field1 = 'value1' THEN DELETE: This deletes rows from the target table where the unique key matches and field1 equals "value1".
  • WHEN MATCHED THEN UPDATE: This updates rows in the target table where the unique key matches and field1 is not equal to "value1". You need to specify the specific column assignments for the update in this section.
  • WHEN NOT MATCHED THEN INSERT: This inserts new rows into the target table where the unique key doesn't match any existing rows. You need to specify the columns and their corresponding values for the insert operation.

This approach eliminates the need for a separate postOps deletion step, as you can define all conditions within the MERGE statement itself. Additionally, you don't need to specify the CTEs twice, improving the overall efficiency of your code.

Here are some additional points to consider:

  • Modify the WHEN clauses and update/insert logic to match your specific needs.
  • Replace your_target_table_name and your_unique_key with the actual name of your target table and the field name that represents the unique key in your data.
  • If you need to handle additional field comparisons or complexities, you can further customize the WHEN clauses and the corresponding actions.
  • Test the MERGE statement with a subset of data before applying it to the entire dataset to ensure it behaves as expected and to prevent data integrity issues.
  • Monitor the performance of the MERGE operation, especially if dealing with large datasets, and optimize the query if necessary.
  • Use comments within the script to explain complex logic or specific decisions made in the CTEs and MERGE statement for better maintainability and readability.

By utilizing the MERGE statement in your Dataform JavaScript file, you can efficiently handle all three operations (INSERT, UPDATE, and DELETE) based on various conditions without duplicating the CTE definitions. This approach improves the clarity and conciseness of your code while ensuring accurate data manipulation.