Scaling configuration of Vertex AI

chrizandr · 03-17-2025 11:17 PM

Hi

I have a setup where I have multiple Pytorch models that are deployed using Vertex AIs Pytorch pre built container. My use case requires that I have a few set of these models with the ability to autoscale based on the traffic.

From what I understand if I deploy a model to an endpoint with autoscaling enabled, it launches a compute instance for that model. However does the autoscaling apply to the compute instance or the container running the model?

Additionally, I also don't want separate compute instances for each model, since the models are small enough to fit multiple on one instance, so I want to use a deployment resources pool to deploy models on the same instance. Here too, the autoscaling of the resource pool seems to apply to the compute instance and if I use a resource pool I cannot set any autoscaling features for the individual models themselves.

What I am trying to understand is:

If I use a deployment resource pool will the model containers in the resource pool autoscale based on the traffic?