Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

What quota(s) do I need to increase to create a model serving endpoint in GCP?

I am trying to create an endpoint for an LLM in Vertex AI with a V100 gpu in the US-Central1 region. This will be my only endpoint, in any region. Here are my quotas and their limits:

  • Committed NVIDIA V100 GPUs : 1
  • NVIDIA V100 GPUs : 2
  • Preemptible NVIDIA V100 GPUs : 1
  • Managed Notebooks NVIDIA V100 GPUs per region : 1
  • Custom model serving Nvidia V100 GPUs per region : 6
  • Custom model training Nvidia V100 GPUs per region : 6
  • Custom model training preemptible Nvidia V100 GPUs per region : 10
  • Custom model serving Nvidia V100 GPUs per region : 6
  • GPUs (all regions) : 6

When I try to create the endpoint, with a single V100 gpu, I get the error:

Error Messages: The following quotas are exceeded: CustomModelServingV100GPUsPerProjectPerRegion

I am using a Nvidia V100 GPU in a managed notebook, so I should have space for one more. There are other quotas I can't change, that are not connected to any region:

  • Custom model serving Nvidia V100 GPUs per region (default) : 0
  • Preemptible NVIDIA V100 GPUs (default) : 1
  • Preemptible NVIDIA V100 GPUs (default) : Unlimited
  • Committed NVIDIA V100 GPUs (default) : 0
  • NVIDIA V100 GPUs (default) : 1
  • NVIDIA V100 GPUs (default) : Unlimited

When I mouse over these quotas to change them, this message appears:

Edit is not allowed for this quota.

I have the Owner role for this project. Do I need to reach out to sales to try and get these changed? How can I do that when I only have the basic support plan?  Is there another hidden quota somewhere? Do I just need to increase my quotas more?

3 REPLIES 3

I also encountered the same problem.

Have you solved it?

No.

I wonder if the quota needs to be one more than you actually intend to use to allow for things like rolling restarts?