Hi, I noticed an in-efficiency in Batch task scheduling. Suppose I have 14 tasks and each machine runs 4 tasks, Batch would only schedule 12 tasks to run in parallel and would only run the remaining 2 tasks if those 12 tasks have completed.
So in total it would take 2*per_task_time to finish instead of just 1*per_task_time (if it's able to schedule all 14 tasks to run at the same time).
Can the team help me take a look on how to resolve this? Thanks!
What value do you have set for the `parallelism` parameter in the job configuration file? This is what controls how many tasks can run concurrently. Have you tried increasing the number?
My parallelism is set to 600 so I do not think that is relevant. I think the issue is that Batch will try to saturate the machine. If I have 4n+2 tasks and each machine can run 4 tasks, it will always leave out 2 tasks until later. I have encountered several similar situations like this.
Can you share your job configuration?
If quota/capacity is available, Batch should run all tasks in parallel in this case. If you have recent example, could you send a job UID to gcp-batch-preview@google.com and link to this post for us to investigate?