Duplicate records in bigquery not able to delete.

1. Our source is Oracle and the Destination is a big query we are using Datastream to migrate the data. In the source, there are no duplicate records but we find duplicate records in Bigquery and we are not able to delete the records in Bigquery. How can we solve this issue?

2. Configuring Datastream to Ignore Deletes:

Disable Delete Handling: Adjusting the Datastream configuration to disable delete handling can prevent Datastream from applying delete operations to the target BigQuery table.

How can we implement above operation? Can you provide step by step implementation?

0 2 593
2 REPLIES 2

There are two different approaches you can take to solve your issue. Let's break down the solutions into two distinct approaches:

Preventing Delete Operations:

  1. Access Datastream Configuration: Navigate to the Datastream section in the Google Cloud Console.

  2. Select Data Stream: Choose the stream that handles data migration from Oracle to BigQuery.

  3. Edit Stream Configuration: Click on the "Edit" button to modify the stream's settings.

  4. Locate Delete Handling Options: Identify options related to handling delete operations.

  5. Disable Delete Handling: Check the appropriate box or select the option to ignore delete operations.

  6. Save and Restart Stream: Save the changes and restart the stream for the new settings to take effect.

Handling Duplicates with Dataflow and UDF:

  1. Create Duplicate Detection UDF: Develop a JavaScript UDF that identifies and filters out duplicate records.

  2. Upload UDF to Cloud Storage: Upload the UDF file to an accessible Cloud Storage location.

  3. Set Up Dataflow Job: Create a Dataflow job that reads data from the Datastream output.

  4. Configure Dataflow Job: Instruct the Dataflow job to apply the UDF for duplicate detection.

  5. Specify Dataflow Output: Configure the Dataflow job to write the processed (duplicate-free) data to BigQuery.

  6. Deploy Dataflow Job: Launch the Dataflow job to commence duplicate removal and data processing.

I am not able to find the disable delete handling in the datastream, can you please elaborate more on that?