Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

"Quota Exceeded" for Anthropic Models with Quota + showing usage when I dont get a reply.

I am getting a "quota exceeded" error for a model I have a quota on. It is also showing usage. However the only "usage" I have is fromimage_2025-04-06_135327924.pngimage_2025-04-06_135526603.png

0 1 81
1 REPLY 1

Hi @Astarynight,

Welcome to Google Cloud Community!

The Error Code 429, stating "Quota exceeded for online_prediction_requests_per_base_model," typically occurs when using an Anthropic model on Google Cloud Platform (GCP) due to hitting the quota limits defined within GCP's Vertex AI service for online prediction requests.

Here's how you can address it:

To view your current quotas, go to the Quotas & System Limits page in the Google Cloud Console and filter for the specific Anthropic model you're using (e.g., anthropic-claude-3-7-sonnet) to see the assigned limits for your project.

Here’s how you can check the Specific Quota in Google Cloud Console:

  1. Go to the IAM & Admin > Quotas page: This is the central location for managing your Google Cloud quotas.
  2. Filter by Service:
    • In the "Service" filter, type aiplatform.googleapis.com and select the Vertex AI API. For reference you may check this documentation.
  3. Filter by Metric:
    • In the "Metric" filter, type online_prediction_requests_per_base_model. This will narrow down the list of quotas to the ones that are relevant. 
  4. Filter by Name
    • You can filter by the anthropic model name if you have a lot of models.
  5. Examine the Quota Details:
    • You will see entries like these (which I see in your screenshot above):
      • "Regional online prediction tokens per minute per base model per minute per region per base model"
      • You'll likely see multiple entries for different regions (ex. us-east5, europe-west1, etc.). 

Make sure you're looking at the quota for the region where you are sending your requests (as indicated on the VertexAI screen)

 

If your current quotas are too low, you can request an increase by going to the Quotas page in the Google Cloud Console, select the relevant quota, click the "EDIT QUOTAS," and submit your request.

Note that approval times may vary and are subject to Google's policies and capacity limits.

 

  • Monitor and Optimize Usage:

Implement logging to monitor request rates and token usage, and optimize your application to use resources more efficiently—this can help minimize the need for higher quota allocations.

 

Quota limits can vary by region, so deploying your model in a region with higher available quotas may help overcome certain limitations.

By proactively managing your quotas and monitoring usage, it can help you prevent errors and ensure the smooth operation of Anthropic models within GCP.

If you need further assistance, you can reach out to Google Cloud Support at any time.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

 

Top Solution Authors