Hi
I have a setup where I have multiple Pytorch models that are deployed using Vertex AIs Pytorch pre built container. My use case requires that I have a few set of these models with the ability to autoscale based on the traffic.
From what I understand if I deploy a model to an endpoint with autoscaling enabled, it launches a compute instance for that model. However does the autoscaling apply to the compute instance or the container running the model?
Additionally, I also don't want separate compute instances for each model, since the models are small enough to fit multiple on one instance, so I want to use a deployment resources pool to deploy models on the same instance. Here too, the autoscaling of the resource pool seems to apply to the compute instance and if I use a resource pool I cannot set any autoscaling features for the individual models themselves.
What I am trying to understand is:
If I use a deployment resource pool will the model containers in the resource pool autoscale based on the traffic?
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |