Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Quota exceeded error for anthropic-claude-3-5-sonnet-v2 for vertexai

I have activated my paid account and enabled the Claude 3.5 sonnet model, but am unable to make any queries since I get the error "Quota exceeded for aiplatform.googleapis.com/online_prediction_requests_per_base_model with base model: anthropic-claude-3-5-sonnet-v2. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai." However, I have checked my online_prediction_requests_per_base_model quota and I'm not exceeding my quota with any of my requests. Could you help fix this issue?

BrandonJ_0-1731954190483.pngBrandonJ_1-1731954201733.png

 

10 REPLIES 10

Hi @BrandonJ,

Welcome to Google Cloud Community!

The error message suggests that you’ve exceeded the prediction request limit you’ve made for the anthropic-claude-3-5-sonnet-v2 model set by Google Cloud’s vertex AI platform. This can happen even if your overall quota appears sufficient. This can happen due to specific limits on certain models or request types.

Here are some potential ways to address your issue:

  • Request a quota increase: As the error message suggests, it is essential to request a quota increase from Google Cloud support, even if you believe you haven’t reached your limit. They can assess your usage and potentially increase your quota.
  • Check rate limits: Review whether the Claude 3.5 V2 Sonnet model has specific rate limits. Different models may have different limits, which could be impacting your request.
  • Review API key: You may consider regenerating your API key. API keys might occasionally become restricted or disabled, especially if they have been shared or misused.
  • Review regional limits: Make sure that your quota is configured at the regional level instead of just the project level. Sometimes, quotas are distributed across regions, and you might be encountering a regional limit.

If you continue to run into issues, consider reaching out to Google Cloud Support to further check underlying issues. When you contact them, be sure to provide as much detail as possible and include screenshots. This will help them understand your problem better and get it sorted out more quickly.

I hope the above information is helpful.

 

AndrewB
Community Manager
Community Manager

'Online prediction requests per base model per minute per region per base_model' is a separate quota and you need both quota to make it function. If either one is 0 then you'll see that error.

Mine is still 0. Any requests to upgrade this quota past 0 is denied, even 1 request per minute. Specifically anthropic models.

same issue and request for quota getting denied. have you resolved this ? 

I don't get it. We set up Anthropic in Vertex AI start using it to notice that we can't use it because of this strange quota of 0 which you can't adjust.

Why providing Anthropic if you can't use it??? This is so pointless and a waste of time.

I faced this exact same issue, i just enabled the Claude 3.7 sonnet on Vertex but i can't even try it inside Vertex AI Studio chat section. with a weird message:
Quota exceeded for aiplatform.googleapis.com/online_prediction_requests_per_base_model with base model: anthropic-claude-3-7-sonnet. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai.

after spending half a day to understand whats what, i see i have 0/0 request quotas. i mean does the whole user journey make any sense?

I faced this exact same issue, i just enabled the Claude 3.7 sonnet on Vertex but i can't even try it inside Vertex AI Studio chat section. with a weird message.
after spending half a day to understand whats what, i see i have 0/0 request quotas. i mean does the whole user journey make any sense?

  ■■■■show they just bait u with their free money. as always. Now they got UR data. Yr the product

Same issue. I tried to contact the salesperson, but it required a lot of information to go through a separate application process, which made me give up and turn to AWS, because it is easier.

In fact, when I enabled the Claude model in Vertex AI, I had already filled in a lot of information and completed the verification. I don’t understand why even the most basic test requires a separate application for quota, and it is so troublesome. If Google does not intend to provide this service, it can be completely offline, and there is no need to do this.

What is worrying is that Google’s bureaucracy does not consider user experience at all. I plan to sell the few stocks I have immediately.

I had the problem that AWS is not supporting concurrent jobs. Was hoping to find a solution with Google....