Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Quotas exceeded for endpoint API calls on free credits, but still able to create through the UI

Hello all,

I am coming up with an experimental IaC system to deploy an ML inferencing architecture, so I created a GCP account and received the $300 in free credits. All was going well until this week, when I started getting the error below when trying to deploy a model to a Vertex endpoint using the Python API:

 

google.api_core.exceptions.ResourceExhausted: 429 The following quotas are exceeded: CustomModelServingCPUsPerProjectPerRegion,CustomModelServingT4GPUsPerProjectPerRegion 8: The following quotas are exceeded: CustomModelServingCPUsPerProjectPerRegion,CustomModelServingT4GPUsPerProjectPerRegion

 

The funny thing is, if I try to use the UI instead it works.

Apparently, I cannot increase the quotas (the UI does not let me do it). Does anyone know how long do I need to wait until those quotas are restored, or if there is another way to increase the quota?

3 1 237
1 REPLY 1

Just for the record: turns out the problem was I was setting the autoscaling replica count to 10 while the quota for the free credits is 1, even if it won't need all the replicas right away it fails to deploy the model with settings that will exceed the quota eventually. Setting max_replica_count to 1 solved it.