Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Batch task scheduling inefficiency

Hi, I noticed an in-efficiency in Batch task scheduling. Suppose I have 14 tasks and each machine runs 4 tasks, Batch would only schedule 12 tasks to run in parallel and would only run the remaining 2 tasks if those 12 tasks have completed.

So in total it would take 2*per_task_time to finish instead of just 1*per_task_time (if it's able to schedule all 14 tasks to run at the same time).

Can the team help me take a look on how to resolve this? Thanks!

1 4 123
4 REPLIES 4

What value do you have set for the `parallelism` parameter in the job configuration file?  This is what controls how many tasks can run concurrently.  Have you tried increasing the number?

My parallelism is set to 600 so I do not think that is relevant.  I think the issue is that Batch will try to  saturate the machine. If I have 4n+2 tasks and each machine can run 4 tasks, it will always leave out 2 tasks until later. I have encountered several similar situations like this. 

Can you share your job configuration? 

If quota/capacity is available, Batch should run all tasks in parallel in this case. If you have recent example, could you send a job UID to gcp-batch-preview@google.com and link to this post for us to investigate?