Quota exceeded error for Generate content requests...

Nitishjha0207 · 08-27-2024 11:50 PM

Hello,

I am getting quota exceeded error message for Generate content requests per minute per project per base model per minute per region per base_model.

I am using vertex ai api in my app, when user makes a query they are getting quota exceeded error message. Any suggestions on how to solve this?

ibaui

Hi @Nitishjha0207,

Welcome to Google Cloud Community!

When you encounter a "quota exceeded" error message related to "Generate content requests per minute per project per base model per minute per region per base_model" while using Google Cloud's Vertex AI API, it indicates that your application has surpassed the allowed number of requests per minute for the specific resource.

You can confirm whether the quota limit related to "Generate content requests per minute per project per base model per minute per region per base_model" in your Google Cloud project has indeed been exceeded. You can navigate to the Google Cloud Console, and, in the left-hand navigation pane, click on "IAM & Admin" and then select “Quotas & System Limits." You can filter by specific service that might be exceeded.

If you want to increase any of your quotas for Generative AI on Vertex AI, you can use the Google Cloud console to request a quota increase. Please note that quota increases are subject to approval and may take some time to process. Note also that the requests will be reviewed and granted for valid business cases.

You can read through this documentation for more information regarding quotas and limits for the Vertex AI API.

I hope the above information is helpful.

Nitishjha0207

Hello,

My vertex AI quota limit is very less. It`s value is 1 for both gemini pro and gemini flash, but as per documention it should be around 300 for gemini pro and 200 for gemini flash

How to increase this value?

Regards

Nitish

Nitishjha0207

Hi,

I am waiting for an answer, can you please respond?

ibaui

Hi @Nitishjha0207,

To increase the value, consider requesting a quota increase. You may follow the steps in this documentation. Keep in mind that these requests are subject to review and approval and may take some time to process. Additionally, quota increase requests are typically evaluated based on the validity of the business case provided.

I hope this helps.

tanmaybagwe

This is just sad, its like I can only do one prompt per minute at this point!

Nowhere in the documentation it says it will start with just a quota of 1.

bldev

Any solutions? My paid account have same problem.

Nitishjha0207

Didn't get any solution from Google yet.

iqbalmaulana

same here

But I wonder, what is the difference between :

Generate content requests per minute per project per base model per minute per region per base_model, and
Generate content input tokens per minute per base model per minute per region per base_model

because the second point is what documented, and it is consistent, I got 4M token limit

Quota exceeded error for Generate content requests per minute per project per base model per minute