Solved: Re: Encountered 429 error "Quota exceeded for onli...

harrison_jung

I am using Claude 3 Haiku on Vertex AI and occasionally encounter the following error message:

{
  "code": 429,
  "message": "Quota exceeded for aiplatform.googleapis.com/online_prediction_concurrent_requests_per_base_model. Please submit a quota increase request.",
  "status": "RESOURCE_EXHAUSTED"
}

The error message indicates that the quota for online_prediction_concurrent_requests_per_base_model has been exceeded. However, I am only making a few requests (less than 5) per minute.
I have checked the Vertex AI quota page but cannot find any information about this specific quota.
The quota for base_model : anthropic-claude-3-haiku-20240307 is 60 requests per minute.
I believe the error message refers to a different quota, possibly concurrent_requests_per_base_model, which is not listed on the quota page.
I have searched for information about the concurrent_requests_per_base_model quota but have not been able to find anything.
I have checked the Vertex AI documentation but have not found any relevant information.

-----

Can you please provide information about the concurrent_requests_per_base_model quota?
Is there a way to increase this quota?
How can I avoid encountering this error in the future?

nceniza

It is possible that the resource for that region is already exhausted, Can you try calling it from a different region ? Also you can try to request Quota increase in IAM Quotas page the little pencil icon at the upper right of the console "EDIT QUOTAS"

Manage your Quotas: https://cloud.google.com/docs/quotas/view-manage

View solution in original post

harrison_jung

Regarding this issue, I have conducted extensive testing, including adding additional projects and billing accounts. However, as you mentioned, it appears that the resources in the specific region are indeed exhausted.

(Despite minimal usage, I observed instances where resources were unavailable, while at other times, extensive usage did not cause any issues.)

I would also like to add that I was unable to find any information regarding this specific quota ("concurrent_requests_per_base_model") within the quota management section.

Therefore, based on the assumption of regional resource depletion, I have structured my system to utilize a combination of European and US regions, along with Anthropic's native API.

Thank you for your assistance.