Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Encountered 429 error "Quota exceeded for online_prediction_concurrent_requests_per_base_model" when

  • I am using Claude 3 Haiku on Vertex AI and occasionally encounter the following error message:

 

{
  "code": 429,
  "message": "Quota exceeded for aiplatform.googleapis.com/online_prediction_concurrent_requests_per_base_model. Please submit a quota increase request.",
  "status": "RESOURCE_EXHAUSTED"
}

 

  • The error message indicates that the quota for online_prediction_concurrent_requests_per_base_model has been exceeded. However, I am only making a few requests (less than 5) per minute.
  • I have checked the Vertex AI quota page but cannot find any information about this specific quota.
  • The quota for base_model : anthropic-claude-3-haiku-20240307 is 60 requests per minute.
  • I believe the error message refers to a different quota, possibly concurrent_requests_per_base_model, which is not listed on the quota page.
  • I have searched for information about the concurrent_requests_per_base_model quota but have not been able to find anything.
  • I have checked the Vertex AI documentation but have not found any relevant information.

-----

  • Can you please provide information about the concurrent_requests_per_base_model quota?
  • Is there a way to increase this quota?
  • How can I avoid encountering this error in the future?
Solved Solved
2 4 3,375
2 ACCEPTED SOLUTIONS

It is possible that the resource for that region is already exhausted, Can you try calling it from a different region ? Also you can try to request Quota increase in IAM Quotas page the little pencil icon at the upper right of the console "EDIT QUOTAS"

 

Manage your Quotas: https://cloud.google.com/docs/quotas/view-manage

View solution in original post

Regarding this issue, I have conducted extensive testing, including adding additional projects and billing accounts. However, as you mentioned, it appears that the resources in the specific region are indeed exhausted.

(Despite minimal usage, I observed instances where resources were unavailable, while at other times, extensive usage did not cause any issues.)

I would also like to add that I was unable to find any information regarding this specific quota ("concurrent_requests_per_base_model") within the quota management section.

Therefore, based on the assumption of regional resource depletion, I have structured my system to utilize a combination of European and US regions, along with Anthropic's native API.

Thank you for your assistance.

View solution in original post

4 REPLIES 4

It is possible that the resource for that region is already exhausted, Can you try calling it from a different region ? Also you can try to request Quota increase in IAM Quotas page the little pencil icon at the upper right of the console "EDIT QUOTAS"

 

Manage your Quotas: https://cloud.google.com/docs/quotas/view-manage

Regarding this issue, I have conducted extensive testing, including adding additional projects and billing accounts. However, as you mentioned, it appears that the resources in the specific region are indeed exhausted.

(Despite minimal usage, I observed instances where resources were unavailable, while at other times, extensive usage did not cause any issues.)

I would also like to add that I was unable to find any information regarding this specific quota ("concurrent_requests_per_base_model") within the quota management section.

Therefore, based on the assumption of regional resource depletion, I have structured my system to utilize a combination of European and US regions, along with Anthropic's native API.

Thank you for your assistance.

I have the same problem. i have two unconnected accounts. its works perfectly on the one but not the other. i have sent emails and spoke to customer care. sofar no update or fix. they are both paid accounts that are fully activated. any help will be appreciated. here is my error:  raise self._make_status_error_from_response(err.response) from None
anthropic.RateLimitError: Error code: 429 - [{'error': {'code': 429, 'message': 'Quota exceeded for aiplatform.googleapis.com/online_prediction_requests_per_base_model with base model: anthropic-claude-3-5-sonnet. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai.', 'status': 'RESOURCE_EXHAUSTED'}}]

Hello,
I noticed your recent inquiry regarding an issue I had previously posted about. I wanted to provide an update on the matter.


At the time of my original post, I was developing an application using Claude 3.5 Sonnet. This choice was made because Sonnet was state-of-the-art at that time, and there weren't many alternatives offering comparable performance for our needs.


I'm pleased to share that I've since resolved the issue. Currently, I'm successfully implementing the desired functionality using Gemini Pro and Flash.


Regarding the error message you encountered, based on the information I gathered from various sources at the time, it was likely due to insufficient resources in the specific GCP region. However, I'm not certain if this particular issue still persists.