Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

perfectly parallel python jobs

Hello I have a perfectly parallel task that I want to move to the cloud, it involves a function that takes around an hour to complete. I would like to deploy this function around 1000 times in parallel.

Does anyone have any suggestions for how to approach this problem, I have tried using Ray clusters to no avail -- what google cloud product can accomplish this most easily.

Again, each job is agnostic of the other, they do not need to know anything besides that a job has left the queue.

Any tutorials or documentation I should read would be greatly appreciated! 

Solved Solved
0 5 1,884
1 ACCEPTED SOLUTION

For anyone else trying to build a lot of models at once with custom frameworks I got this to work using the custom containers and custom jobs on vertex AI.

You just need to dockerize your scripts and throw them into the artifact registry then you can go about running the jobs mostly as you would autoML.

This helped me: https://cloud.google.com/vertex-ai/docs/training/create-custom-container

 

View solution in original post

5 REPLIES 5

RC1
Bronze 4
Bronze 4

@blaise 

What does that function normally does ? Is there involvement of spark ? 

Approach 1 

if there is involvement of spark then we can use dataproc cluster or dataproc serverless to deploy your job , get it done and destroy the cluster. Here spark itself  handles the distributed computing provided if you give a decent cluster.

Approach 2
We can also use a multi core compute engine and run you task on it. But here you have to use a multi threading or multi processing library and handle it. Here compute is handled by the compute engine but distribution is done by you. 



BTW what exactly your function does and where exactly does it take much time ? Without actually knowing your usecase its difficult to provide a good/optimized solution 

No involvement of spark, just python, the function does some ML  training on a given dataset, I would prefer not to use a multiprocessing library and to distribute the tasks.

I think you need to check this out => https://cloud.google.com/ai-platform/training/docs/overview

1) Here you can use a Compute engine with  GPU so that the training is distributed by the ml libraries that you use.

2) You can also check collab pro notebooks 

3) https://cloud.google.com/ai-platform/training/docs/overview#distributed_training_structure

https://cloud.google.com/ai-platform/docs/technical-overview

https://towardsdatascience.com/how-to-train-machine-learning-models-in-the-cloud-using-cloud-ml-engi...


 

 

Cool, thanks for those recourses, digging into VertexAI and the AI platform, do you know if its possible to use my own prop tuning logic, like if I have a custom package built over sklearn can I run that in its own container or do I need to use googles frameworks

For anyone else trying to build a lot of models at once with custom frameworks I got this to work using the custom containers and custom jobs on vertex AI.

You just need to dockerize your scripts and throw them into the artifact registry then you can go about running the jobs mostly as you would autoML.

This helped me: https://cloud.google.com/vertex-ai/docs/training/create-custom-container