Encountered 429 error "Quota exceeded for online_prediction_concurrent_requests_per_base_model" when

  • I am using Claude 3 Haiku on Vertex AI and occasionally encounter the following error message:

 

{
  "code": 429,
  "message": "Quota exceeded for aiplatform.googleapis.com/online_prediction_concurrent_requests_per_base_model. Please submit a quota increase request.",
  "status": "RESOURCE_EXHAUSTED"
}

 

  • The error message indicates that the quota for online_prediction_concurrent_requests_per_base_model has been exceeded. However, I am only making a few requests (less than 5) per minute.
  • I have checked the Vertex AI quota page but cannot find any information about this specific quota.
  • The quota for base_model : anthropic-claude-3-haiku-20240307 is 60 requests per minute.
  • I believe the error message refers to a different quota, possibly concurrent_requests_per_base_model, which is not listed on the quota page.
  • I have searched for information about the concurrent_requests_per_base_model quota but have not been able to find anything.
  • I have checked the Vertex AI documentation but have not found any relevant information.

-----

  • Can you please provide information about the concurrent_requests_per_base_model quota?
  • Is there a way to increase this quota?
  • How can I avoid encountering this error in the future?
Solved Solved
1 2 208
2 ACCEPTED SOLUTIONS

It is possible that the resource for that region is already exhausted, Can you try calling it from a different region ? Also you can try to request Quota increase in IAM Quotas page the little pencil icon at the upper right of the console "EDIT QUOTAS"

 

Manage your Quotas: https://cloud.google.com/docs/quotas/view-manage

View solution in original post

Regarding this issue, I have conducted extensive testing, including adding additional projects and billing accounts. However, as you mentioned, it appears that the resources in the specific region are indeed exhausted.

(Despite minimal usage, I observed instances where resources were unavailable, while at other times, extensive usage did not cause any issues.)

I would also like to add that I was unable to find any information regarding this specific quota ("concurrent_requests_per_base_model") within the quota management section.

Therefore, based on the assumption of regional resource depletion, I have structured my system to utilize a combination of European and US regions, along with Anthropic's native API.

Thank you for your assistance.

View solution in original post

2 REPLIES 2

It is possible that the resource for that region is already exhausted, Can you try calling it from a different region ? Also you can try to request Quota increase in IAM Quotas page the little pencil icon at the upper right of the console "EDIT QUOTAS"

 

Manage your Quotas: https://cloud.google.com/docs/quotas/view-manage

Regarding this issue, I have conducted extensive testing, including adding additional projects and billing accounts. However, as you mentioned, it appears that the resources in the specific region are indeed exhausted.

(Despite minimal usage, I observed instances where resources were unavailable, while at other times, extensive usage did not cause any issues.)

I would also like to add that I was unable to find any information regarding this specific quota ("concurrent_requests_per_base_model") within the quota management section.

Therefore, based on the assumption of regional resource depletion, I have structured my system to utilize a combination of European and US regions, along with Anthropic's native API.

Thank you for your assistance.