Any info on if / when more advanced scheduling options might be made available?
I am particularly interested in fair-share scheduling (https://aws.amazon.com/blogs/hpc/introducing-fair-share-scheduling-for-aws-batch/) and gang scheduling.
Google Batch does not support fair-share scheduling. We expect to add it but no specific timeline yet. You should be able to simulate gang scheduling behavior with "barrier" runnable in Tasks where all Tasks need to reach a certain barrier before moving forward.
Ok makes sense. When trying to simulate gang scheduling is there any way to restart all tasks or fail the job, if a single task fails?
There is no restart of jobs or tasks in a job. The normal case for retry is to delete the job and resubmit. There is a trick if you want to fail the job on a single task failure. If you put barrier runnables at the end of tasks, the system will reason that the barrier cannot be reached with one task failure and will fail the job. I haven't tried this personally, but it should work that way in theory. Let us know if that works for you.