Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Quota exceeded error for Generate content requests per minute per project per base model per minute

Hello,

I am getting quota exceeded error message for Generate content requests per minute per project per base model per minute per region per base_model.

I am using vertex ai api in my app, when user makes a query they are getting quota exceeded error message. Any suggestions on how to solve this?

0 8 2,389
8 REPLIES 8

Hi @Nitishjha0207,

Welcome to Google Cloud Community!

When you encounter a "quota exceeded" error message related to "Generate content requests per minute per project per base model per minute per region per base_model" while using Google Cloud's Vertex AI API, it indicates that your application has surpassed the allowed number of requests per minute for the specific resource.

You can confirm whether the quota limit related to "Generate content requests per minute per project per base model per minute per region per base_model" in your Google Cloud project has indeed been exceeded. You can navigate to the Google Cloud Console, and, in the left-hand navigation pane, click on "IAM & Admin" and then select  “Quotas & System Limits."  You can filter by specific service that might be exceeded.

If you want to increase any of your quotas for Generative AI on Vertex AI, you can use the Google Cloud console to request a quota increase. Please note that quota increases are subject to approval and may take some time to process. Note also that the requests will be reviewed and granted for valid business cases.

You can read through this documentation for more information regarding quotas and limits for the Vertex AI API.

I hope the above information is helpful.

Hello, 

My vertex AI quota limit is very less. It`s value is 1 for both gemini pro and gemini flash, but as per documention it should be around 300 for gemini pro and 200 for gemini flash

Nitishjha0207_0-1725730416415.png

 

Nitishjha0207_1-1725730651132.png

 

How to increase this value?

 

Regards

Nitish

Hi, 

I am waiting for an answer, can you please respond?

Hi @Nitishjha0207,

To increase the value, consider requesting a quota increase. You may follow the steps in this documentation. Keep in mind that these requests are subject to review and approval and may take some time to process. Additionally, quota increase requests are typically evaluated based on the validity of the business case provided.

I hope this helps.



This is just sad, its like I can only do one prompt per minute at this point! 

Nowhere in the documentation it says it will start with just a quota of 1.

Any solutions? My paid account have same problem.image.png

Didn't get any solution from Google yet.

same here

iqbalmaulana_0-1741624081735.png

But I wonder, what is the difference between :

  • Generate content requests per minute per project per base model per minute per region per base_model, and
  • Generate content input tokens per minute per base model per minute per region per base_model

because the second point is what documented, and it is consistent, I got 4M token limit