Read Message from PubSub to GCS Parquet file using Dataflow template Streaming

Team,

Is there any default template available to consume message from pubsub topic to GCS Parquet file using Dataflow Streaming method?

Thanks,

Samy

0 2 131
2 REPLIES 2

I'm going to say "not explicitly".  My thinking on this is that Parquet can't be a streaming destination.  Parquet is a file format for a collection of records.  To the best of my knowledge, one can't "append" a new record to a parquet file.  This thus makes streaming to parquet a bad notion.  What one might do is stream to an appendable target such as BigQuery or a GCS text file and then, periodically, export or convert the content to Parquet.  Alternatively, one might be able to do streaming if one sets a sequential window period... say 10 minutes, 30 minutes, 1 hour or 1 day and then process all the messages in the topic in that window and then write them to distinct parquet files.

Thanks Kolban, Periodic process like 30 mins will work for me.