Hey guys!
I have an active connection from Datastream to Bigquery, but when changing or deleting records in my source, Bigquery is replicating this. I would like to use the change stream method, keeping the change history in the BigQuery table.
Is it possible to do this with a connection between Datastream and BigQuery?
Implementing a CDC workflow using Datastream and BigQuery allows you to capture and preserve historical data changes from your source databases in near real-time. Here's a detailed breakdown:
Prerequisites
Steps
Datastream CDC:
BigQuery Schema Design for CDC:
_CHANGE_TYPE
Column: Add a column (e.g., _CHANGE_TYPE
) to your BigQuery table to indicate the type of change ('INSERT', 'UPDATE', 'DELETE').change_timestamp
) to record the time of each change.Data Transformation for CDC:
_CHANGE_TYPE
and change_timestamp
columns based on the changes received from Datastream. This can be done in several ways:
Querying CDC Data:
SELECT *
FROM my_cdc_table
WHERE record_id = 123
AND _CHANGE_TYPE = 'UPDATE'
ORDER BY change_timestamp;
Important Considerations:
Monitoring and Maintenance: