I am using GCP vertex AI online prediction for model deployment with a custom container. The deployed model is working fine with less number(<50) of minimum nodes with n1-highmem-2.
But when I tried giving the higher number of minimum nodes(>50), I am getting following error:
Error Messages: model server container out of memory, please use a larger machine type for model deployment:
https://cloud.google.com/vertex-ai/docs/predictions/configure-compute#machine-types
I am not getting why increasing the minimum number of nodes is giving me out of memory. My understanding was, it would deploy the container(in the form of a model artifact registry) on each node and download the model from GCS. So if the same model is working on a smaller number of minimum nodes, why increasing the minimum nodes is giving out of memory?
Thanks in advance
Regards,
Anil