Can I leverage the features of Dynamic Workload Scheduler in a Managed Instance Group (MIG) deployment for AI/ML workloads?
Solved! Go to Solution.
In which case a MIG of size of 1 could also be an approach.
Yes - what you're after is a MIG with a resize request, for more details check this page here:
https://cloud.google.com/compute/docs/instance-groups/about-resize-requests-mig
Thanks @alexmoore
Hey @alexmoore Sorry to bother you again. Can i use DWS with a single GCE VM or does it NOT make sense?
It can certainly make sense. Have you looked at DWS on Google Cloud Batch too?
Thanks @alexmoore . What I need is a always-on mode for ML processing with occasional shutdown during non business hours.
In which case a MIG of size of 1 could also be an approach.
@alexmoore We have models deployed using Vertex AI. Vertex AI uses GCE VMs behind the scenes, does DWS work with Vertex AI?
@alexmoore Looks like autoscaling configuration does not work with the MIG resize requests?
That's the case as of today. Of course improvements coming all the time so keep an eye out for changes. What is the use case?
Hi @alexmoore The use case here is that we have a client that runs an ML interference pipeline comprising of 4 models orchestrated using python code deployed in a 3rd party inference hosting provider leveraging A100 GPUs. The objective here is to move to GCP to save costs and reduce operational toil that comes with it and our solution was to migrate to a MIG deployment on GCP . Maybe a GKE solution would have been better?
Kueue on GKE integrates with DWS and has support for a range of open source workload types including RayJobs from Ray which maybe worth exploring if its very Python centric - although other options are also available.
See: https://cloud.google.com/kubernetes-engine/docs/how-to/provisioningrequest