GCP vertex AI online prediction

I am using GCP vertex AI online prediction for model deployment with a custom container. The deployed model is working fine with less number(<50) of minimum nodes with n1-highmem-2.

But when I tried giving the higher number of minimum nodes(>50), I am getting following error:
Error Messages: model server container out of memory, please use a larger machine type for model deployment:
https://cloud.google.com/vertex-ai/docs/predictions/configure-compute#machine-types

I am not getting why increasing the minimum number of nodes is giving me out of memory. My understanding was, it would deploy the container(in the form of a model artifact registry) on each node and download the model from GCS. So if the same model is working on a smaller number of minimum nodes, why increasing the minimum nodes is giving out of memory?

Thanks in advance
Regards,
Anil

2 7 2,828

7 REPLIES 7

never-displayed