Solved: Re: Running AI/ML workloads with Cloud Run + GPU (...

dheerajpanyam · 10-05-2024 08:04 AM

Agreed Cloud Run with GPU is a big plus which opens doors to run AI/ML workloads on Cloud Run. I am interested in knowing how these GPUs attach to Cloud Run instance. We have a few open models that use Nvidia T4 and currently use a GPU attached GCE VM and contemplating moving to Cloud Run even though it is in preview. However the important factor for us is to be able to scale GPU. Is it a 1:1 mapping between a Cloud Run instance and GPU? Also Cloud Run does not automatically scale the number of instances based on GPU utilization which is a big disadvantage in my opinion.

In summary I would like to know how autoscaling works with Cloud Run with GPU attached and how can i optimize GPU costs (Remember GPU is expensive).

sagarrandive

Today we dont support GPU usage based autoscaling, we plan to add that capability in future. As @knet said scaling is CPU based. However if you want to ensure more instances are scaled out, this can be done based on number of incoming requests. If you increase the concurrency of your Cloud Run service, you should see more requests being queued and in turn more instances being provisioned.

View solution in original post

ronnelg

Hi @dheerajpanyam,

Yes, you are correct. You can configure one GPU per Cloud Run instance and it does not automatically scale the number of instances based on GPU utilization. But, Cloud Run autoscaling still applies, it's an on demand service that automatically scales the number of instances to match workload demands.

Yes, GPU is expensive. But, instances of a Cloud Run service configured to use GPU can scale down to zero when not in use, optimizing cost efficiency. You may check the Cloud Run pricing for your reference.

Also, you can check this informative blog regarding Cloud Run GPU and LLMs for additional reference.

I hope the above information is helpful.

dheerajpanyam

Curious to know if GPU metrics can be monitored when they are used in conjunction with CR service . Are they exposed as custom metrics (CPU, memory)? @ronnelg

ronnelg

Yes, GPU usage and utilization is included in the Cloud Run services metrics.

dheerajpanyam

Thanks @ronnelg probably I can use these metrics to scale GPU

knet

Hello @dheerajpan, from what I've seen, at least for basic chat apps, the number of GPU instances scales well with user traffic. As @ronnelg said, the scaling is based on CPU utilization; most likely, your application is using some CPU as well.

dheerajpanyam

Thanks @ronnelg and @knet . What we are seeing with GPU attached to a VM is that the GPU has become a bottleneck not the VM's CPU or memory and the only way to resolve this issue is to attach a new GPU.

sagarrandive

Today we dont support GPU usage based autoscaling, we plan to add that capability in future. As @knet said scaling is CPU based. However if you want to ensure more instances are scaled out, this can be done based on number of incoming requests. If you increase the concurrency of your Cloud Run service, you should see more requests being queued and in turn more instances being provisioned.

dheerajpanyam

Thanks @sagarrandive and others , closing this post

Running AI/ML workloads with Cloud Run + GPU (In Preview)