What quota(s) do I need to increase to create a mo...

achang23 · 09-12-2023 11:40 AM

I am trying to create an endpoint for an LLM in Vertex AI with a V100 gpu in the US-Central1 region. This will be my only endpoint, in any region. Here are my quotas and their limits:

Committed NVIDIA V100 GPUs : 1
NVIDIA V100 GPUs : 2
Preemptible NVIDIA V100 GPUs : 1
Managed Notebooks NVIDIA V100 GPUs per region : 1
Custom model serving Nvidia V100 GPUs per region : 6
Custom model training Nvidia V100 GPUs per region : 6
Custom model training preemptible Nvidia V100 GPUs per region : 10
Custom model serving Nvidia V100 GPUs per region : 6
GPUs (all regions) : 6

When I try to create the endpoint, with a single V100 gpu, I get the error:

Error Messages: The following quotas are exceeded: CustomModelServingV100GPUsPerProjectPerRegion

I am using a Nvidia V100 GPU in a managed notebook, so I should have space for one more. There are other quotas I can't change, that are not connected to any region:

Custom model serving Nvidia V100 GPUs per region (default) : 0
Preemptible NVIDIA V100 GPUs (default) : 1
Preemptible NVIDIA V100 GPUs (default) : Unlimited
Committed NVIDIA V100 GPUs (default) : 0
NVIDIA V100 GPUs (default) : 1
NVIDIA V100 GPUs (default) : Unlimited

When I mouse over these quotas to change them, this message appears:

Edit is not allowed for this quota.

I have the Owner role for this project. Do I need to reach out to sales to try and get these changed? How can I do that when I only have the basic support plan? Is there another hidden quota somewhere? Do I just need to increase my quotas more?

BryanKWNI

I also encountered the same problem.

Have you solved it?

achang23

No.

jasonparker

I wonder if the quota needs to be one more than you actually intend to use to allow for things like rolling restarts?

What quota(s) do I need to increase to create a model serving endpoint in GCP?