Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

kafka to parquet

I am trying to run a pipeline tha wirtes kafka messages to parquet file. But i can't and it outputs the error 'ValueError: GroupByKey cannot be applied to an unbounded PCollection with global windowing and a default trigger'.  But non global window is defined in the pipeline. Can you help finding a solution?

Here is the pipeline

kafka_pipeline = (p | 'read kafka' >> ReadFromKafka(consumer_config=  {'bootstrap.servers':'11.11.111.11:9094',
'auto.offset.reset':'earliest'}
, topics=['test_topic'])
| "Fixed window 5s" >> beam.WindowInto(window.FixedWindows(5))
| 'write' >> beam.io.WriteToParquet(file_path_prefix=file_path,
schema=pa.schema([('key', pa.string())])))

0 14 1,952
14 REPLIES 14