Encountered 429 error "Quota exceeded for online_prediction_concurrent_requests_per_base_model" when

  • I am using Claude 3 Haiku on Vertex AI and occasionally encounter the following error message:


  "code": 429,
  "message": "Quota exceeded for aiplatform.googleapis.com/online_prediction_concurrent_requests_per_base_model. Please submit a quota increase request.",


  • The error message indicates that the quota for online_prediction_concurrent_requests_per_base_model has been exceeded. However, I am only making a few requests (less than 5) per minute.
  • I have checked the Vertex AI quota page but cannot find any information about this specific quota.
  • The quota for base_model : anthropic-claude-3-haiku-20240307 is 60 requests per minute.
  • I believe the error message refers to a different quota, possibly concurrent_requests_per_base_model, which is not listed on the quota page.
  • I have searched for information about the concurrent_requests_per_base_model quota but have not been able to find anything.
  • I have checked the Vertex AI documentation but have not found any relevant information.


  • Can you please provide information about the concurrent_requests_per_base_model quota?
  • Is there a way to increase this quota?
  • How can I avoid encountering this error in the future?
Solved Solved
1 2 265

It is possible that the resource for that region is already exhausted, Can you try calling it from a different region ? Also you can try to request Quota increase in IAM Quotas page the little pencil icon at the upper right of the console "EDIT QUOTAS"


Manage your Quotas: https://cloud.google.com/docs/quotas/view-manage

View solution in original post

Regarding this issue, I have conducted extensive testing, including adding additional projects and billing accounts. However, as you mentioned, it appears that the resources in the specific region are indeed exhausted.

(Despite minimal usage, I observed instances where resources were unavailable, while at other times, extensive usage did not cause any issues.)

I would also like to add that I was unable to find any information regarding this specific quota ("concurrent_requests_per_base_model") within the quota management section.

Therefore, based on the assumption of regional resource depletion, I have structured my system to utilize a combination of European and US regions, along with Anthropic's native API.

Thank you for your assistance.

View solution in original post


It is possible that the resource for that region is already exhausted, Can you try calling it from a different region ? Also you can try to request Quota increase in IAM Quotas page the little pencil icon at the upper right of the console "EDIT QUOTAS"


Manage your Quotas: https://cloud.google.com/docs/quotas/view-manage

Regarding this issue, I have conducted extensive testing, including adding additional projects and billing accounts. However, as you mentioned, it appears that the resources in the specific region are indeed exhausted.

(Despite minimal usage, I observed instances where resources were unavailable, while at other times, extensive usage did not cause any issues.)

I would also like to add that I was unable to find any information regarding this specific quota ("concurrent_requests_per_base_model") within the quota management section.

Therefore, based on the assumption of regional resource depletion, I have structured my system to utilize a combination of European and US regions, along with Anthropic's native API.

Thank you for your assistance.