Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataflow Pipeline BigTable Change Streams to Pub/Sub

I have been testing out a Dataflow pipeline using the template BigTable change streams to Pub/Sub. When I was filling out the template I noticed that some options are greyed out, specifically at least once is greyed out. Why is that? From what I have read in the docs here: https://cloud.google.com/dataflow/docs/guides/streaming-modes my use case should be supported, theoretically. The docs specifically mention "change data capture." Here is the direct quote about at least once delivery use cases.

  • Map-only pipelines with no aggregations. Examples include log processing, change data capture, or extract, transform, and load (ETL) jobs, in which the pipeline performs only per-element transforms, such as schema translation.


Furthermore, I don't see any reason why the template code would fail for at least once delivery https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/main/v2/googlecloud-to-googlecloud/src.... The validation logic seems to allow for it. 

Is there a reason I'm missing as to why that the option is greyed out? Any details are appreciated.

Thanks in advance.

 

 Screenshot 2024-02-28 at 12.38.12 PM.png

1 3 1,129
3 REPLIES 3

Thanks for bringing this issue to our attention! 

Can you confirm if your Pub/Sub topic subscription has exactly-once delivery enabled?

 

Why would the subscription settings matter? The fields for setting up a Dataflow job do not ask for an active subscription, only a topic. The topic itself doesn't have any settings for method delivery as far as I can tell. The subscriber however, can configure exactly once.

Yeah, you're right. The Pub/Sub topic subscription delivery method doesn't matter here. 

We've looked into this and noticed that the Dataflow at-least-once streaming mode is a newly released feature, so it hasn't been enabled for all the Dataflow templates yet. We expect the at-least-once streaming mode will be enabled for Bigtable Change Streams Dataflow templates by the end of March 2024. In the meantime, you can follow http://cloud/dataflow/docs/guides/streaming-modes#set-template-streaming-mode to manually configure it.