Solved: Re: Does Dynamic Workload Scheduler support GCE MI...

dheerajpanyam · 02-17-2025 12:49 AM

Can I leverage the features of Dynamic Workload Scheduler in a Managed Instance Group (MIG) deployment for AI/ML workloads?

alexmoore

In which case a MIG of size of 1 could also be an approach.

View solution in original post

alexmoore

Yes - what you're after is a MIG with a resize request, for more details check this page here:

https://cloud.google.com/compute/docs/instance-groups/about-resize-requests-mig

dheerajpanyam

Thanks @alexmoore

dheerajpanyam

Hey @alexmoore Sorry to bother you again. Can i use DWS with a single GCE VM or does it NOT make sense?

alexmoore

It can certainly make sense. Have you looked at DWS on Google Cloud Batch too?

https://cloud.google.com/batch/docs/create-run-job-gpus

dheerajpanyam

Thanks @alexmoore . What I need is a always-on mode for ML processing with occasional shutdown during non business hours.

alexmoore

In which case a MIG of size of 1 could also be an approach.

dheerajpanyam

@alexmoore We have models deployed using Vertex AI. Vertex AI uses GCE VMs behind the scenes, does DWS work with Vertex AI?

dheerajpanyam

@alexmoore Looks like autoscaling configuration does not work with the MIG resize requests?

alexmoore

That's the case as of today. Of course improvements coming all the time so keep an eye out for changes. What is the use case?

dheerajpanyam

Hi @alexmoore The use case here is that we have a client that runs an ML interference pipeline comprising of 4 models orchestrated using python code deployed in a 3rd party inference hosting provider leveraging A100 GPUs. The objective here is to move to GCP to save costs and reduce operational toil that comes with it and our solution was to migrate to a MIG deployment on GCP . Maybe a GKE solution would have been better?

alexmoore

Kueue on GKE integrates with DWS and has support for a range of open source workload types including RayJobs from Ray which maybe worth exploring if its very Python centric - although other options are also available.

See: https://cloud.google.com/kubernetes-engine/docs/how-to/provisioningrequest

Does Dynamic Workload Scheduler support GCE MIG setup?