Would like to know whether we have standard pubsub to BQ dataflow template to insert the record if new and update the record if existing? If not what is the recommended approach? UDF or Custom templates? Does any one have any examples?
Solved! Go to Solution.
The PubSubCdcToBigQuery template under v2 in Google Cloud's DataflowTemplates is specifically designed to handle Change Data Capture (CDC) events from Pub/Sub and update existing records in BigQuery. This template facilitates maintaining a near-real-time replica of your data in BigQuery, streamlining the process without the need for custom scripts or complex workflows.
Key Differences Between the Two Templates:
PubSubToBigQuery (v1):
PubSubCdcToBigQuery (v2):
Use Cases:
Additional Considerations:
Helpful Resources:
Unfortunately, there is no standard Pub/Sub to BigQuery Dataflow template that directly supports an insert-or-update (upsert) use case. However, you have a couple of options depending on your specific requirements:
Using a Custom Dataflow Template:
Using BigQuery's Merge Statement:
Recommendation:
Thanks @ms4446 for the response. Under GitHub - DataflowTemplates , I see PubSubToBigQuery under v1 and PubSubCdcToBigQuery under v2 template. Is PubSubCdcToBigQuery under v2 to update the existing records in the BigQuery?
The PubSubCdcToBigQuery template under v2 in Google Cloud's DataflowTemplates is specifically designed to handle Change Data Capture (CDC) events from Pub/Sub and update existing records in BigQuery. This template facilitates maintaining a near-real-time replica of your data in BigQuery, streamlining the process without the need for custom scripts or complex workflows.
Key Differences Between the Two Templates:
PubSubToBigQuery (v1):
PubSubCdcToBigQuery (v2):
Use Cases:
Additional Considerations:
Helpful Resources: