Dataflow Provisioning time and optimization - Page 2

abhinay_p · 03-18-2024 12:10 AM

We are using dataflow for batch workloads which are small in future roadmap we want to enable streaming workloads .We are triggering the jobs using python based microservices. Below are the few queries that we need assistance

1)Dataflow is taking minimum of 4min of time for provisioning we are using the below config used:minimal machine,customsdk image confined to a single regional data resources used by or within the same project and single region for the provisioning.

Can you please suggest on how the time can be further reduced.

2)Can we configure dataflow such that the workers are not terminated after processing but can be in idle state like listeners and run the workloads based on an event or can we used the dataflow job to process multiple workloads in a sequential way on the same instances provisioned

3)can we pre-provision the resources periodically(warm-start) and process the data once the data is available

Thanks in advance for the help