Solved: perfectly parallel python jobs - Page 2

blaise · 10-13-2022 12:45 PM

Hello I have a perfectly parallel task that I want to move to the cloud, it involves a function that takes around an hour to complete. I would like to deploy this function around 1000 times in parallel.

Does anyone have any suggestions for how to approach this problem, I have tried using Ray clusters to no avail -- what google cloud product can accomplish this most easily.

Again, each job is agnostic of the other, they do not need to know anything besides that a job has left the queue.

Any tutorials or documentation I should read would be greatly appreciated!

blaise

For anyone else trying to build a lot of models at once with custom frameworks I got this to work using the custom containers and custom jobs on vertex AI.

You just need to dockerize your scripts and throw them into the artifact registry then you can go about running the jobs mostly as you would autoML.

This helped me: https://cloud.google.com/vertex-ai/docs/training/create-custom-container

View solution in original post