Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Does Dynamic Workload Scheduler support GCE MIG setup?

Can I leverage the features of Dynamic Workload Scheduler in  a Managed Instance Group (MIG) deployment for AI/ML workloads?

Solved Solved
0 11 559
1 ACCEPTED SOLUTION

In which case a MIG of size of 1 could also be an approach.

View solution in original post

11 REPLIES 11

Yes - what you're after is a MIG with a resize request, for more details check this page here: 

https://cloud.google.com/compute/docs/instance-groups/about-resize-requests-mig

Thanks @alexmoore 

Hey @alexmoore Sorry to bother you again. Can i use DWS with a single GCE VM or does it NOT make sense?

It can certainly make sense.  Have you looked at DWS on Google Cloud Batch too?

https://cloud.google.com/batch/docs/create-run-job-gpus

Thanks @alexmoore . What I need is a always-on mode for ML processing with occasional shutdown during non business hours.

In which case a MIG of size of 1 could also be an approach.

@alexmoore We have models deployed using Vertex AI. Vertex AI uses GCE VMs behind the scenes, does DWS work with Vertex AI?

@alexmoore Looks like autoscaling configuration does not work with the MIG resize requests?

That's the case as of today.  Of course improvements coming all the time so keep an eye out for changes.  What is the use case?

Hi @alexmoore  The use case here is that we have a client that runs an ML interference pipeline comprising of 4 models orchestrated using python code deployed  in a 3rd party inference hosting provider leveraging A100 GPUs. The objective here is to move to GCP to save costs and reduce operational toil that comes with it and our solution  was to migrate  to a MIG deployment on GCP . Maybe a GKE solution would have been better?

Kueue on GKE integrates with DWS and has support for a range of open source workload types including RayJobs from Ray which maybe worth exploring if its very Python centric - although other options are also available.

See: https://cloud.google.com/kubernetes-engine/docs/how-to/provisioningrequest