Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Gemini API - 429 Resource has been exhausted (e.g. check quota).

I have implemented a logic where I have created 15 API keys and allowed access to them. The whole process goes to GeminiPro1.0, which is Google's AI. The reason for all this is that if I use only one API key simultaneously, the entire process takes 4-5 days, as I want to stay outside the 60 calls per minute per API key limit, so that Gemini can be used for free. Accordingly, I now have 15 running in parallel so that the time should be limited to a few hours. However, I received the message "429 Resource has been exhausted (e.g. check quota)." As well as: "limit 'GenerateContent request limit per minute for a region' of service 'generativelanguage.googleapis.com' for consumer 'project_number:11111111111(sample)'". Could this possibly be because almost all the API keys are associated with one project? How do I increase that and what would it cost?

7 27 58.6K
27 REPLIES 27

Possibly that you are sending too many requests in a short period of time. If possible can you place a delay for your request to avoid the error.  As for quota increase you can follow the instruction provided here.  Sample instructions below: 

 


  1. Go to the Quotas page:

     

    Go to Quotas

    The remaining steps will appear automatically in the Google Cloud console.

     

  2. On the Quotas page, find the quota you want to increase in the Quota column.

    You can use the Filter search box to search for your quota.

  3. Select the checkbox to the left of your quota.

  4. Click  EDIT QUOTAS. The Quota changes form displays.

  5. In the Quota changes form, enter the increased quota that you want for your project in the New limit field.

  6. Complete any additional fields in the form, and then click DONE.

  7. Click SUBMIT REQUEST.


 

Yesterday, I used Gemini to translate eBook. Now the problem is that the send request is displayed error 429 Resource has been exhausted (e.g. check quota).  too many request. And i go to quotas, but i don`t find anyone over 90%.then i don`t know which is need to change

 

I am facing the same issue. 429 all over the place. 
free tier limits are: 
1M tokens > I send 1.5k-2k max
1500 requests> I send an average of 2 requests per minute

So I am at 1% of the available quota and still getting the error. Reading through multiple comments here, It seems to be a API issue 

This is exactly what's happening to me too. Did you ever find a solution for it?

This is exactly what's happening to me, I'm only at the 1%. Did you ever get a soultion for this?

I'm running into this same issue. I'm using workflows to do two tasks (via Cloud Functions) that are generative and build on each other. I'm using 2 different API keys, but the second task always fails with  the Cloud Function giving me a final error of 429. 
Even after implementing 30 seconds of wait time in the workflow, it still happens, with 2 different keys, using 2 different models.

Are we limited at a project level of how often we can call the Gemini Generative endpoint?

My theory is that the gemini-pro-1.5-latest endpoint has some sort of other limit, that we as users can't see when using the "generativeai" python SDK. The only thing that shows up in metrics is failed API calls, but NOT limit hits. 

The way around this, I believe, would be to directly use the Vertex SDK directly, not the GenAI API.




I found this GoogleAI API vs VertextAI API a whole mess. GoogleAI API won't allow us to use PDFs in prompt even for gemini-1.5-pro. But we can do it using VertextAI. Not sure why a company like Google has failed to develop a good API that's easy to understand.

Honestly, this doesn't seem like just their AI suite of APIs either. Trying to manage the YouTube Data API for example may be the most unintuitive and strangely documented experience I've had with such a major platform.

I'm looking at my quotas and I'm not even seeing anything for GenAI content generation even though the metrics show 400 requests.

Hi i have same about problem, in my case i use gemini flash 1.5 <= 50 request per minute and i get same alert

Exactly, I believe I am hitting the 10k quota per day with my paid account, but I can't find where this quota is shown in GCP, I've even asked Gemini itself (which I would assume is trained to help with GCP) and it just points me in the wrong direction 😀
I am coming from OpenAI/AWS which is much more simple to manage IMO, and I can't tell if I am not very smart or if the ecosystem in GCP of quotas/projects/API keys/OAuth is as confusing as I feel it is!



I am struggling to understand the relationships and layout of GCP, AI Studio, and the product line of Vertex/Gemini/etc.

Why can I use an API key to make requests to base Gemini models, but a clunky OAuth is needed for managing and USING fine-tuned models? 

Why am I getting "429 Resource has been exhausted (e.g. check quota)"... This is not an acceptable level of logging in my opinion, what quota, where is it managed, and what project/application/credential is the issue? 

yes, i agree with you, its really hard for understanding documentation for gemini AI, Its different of ChatGPT model

You can read about the quotas for Gemini here: https://cloud.google.com/vertex-ai/generative-ai/docs/quotas

If you go to the quota page you can search for Gemini, you can find the quota and increase it. Make sure you do it for the right region.

I figured out that in europe-west1 the default is 10 requests per minute where in us-central1 it's 300. 

If you are looking for you can type: base_model:gemini-pro

I faced the same issue, seems like some of the free quota limit in my case. The main problem is that it wont show in the quotas page that you've reached the limit. So whats the point of having it there? Quite misleading.

any luck with this issue anyone?

Their API's and in particular the GenAI VS Vertex is INSANELY confusing for ME and I'm a well-seasoned beta tester of every AI product they put out. I participated in a one-on-one research call where I spent an hour telling them all about these kinds of issues, so they are well aware of it! 

How to resolve this issue
also we integrate 5 sec delay mechanism but we steel getting below error 
DEBUG: Retrying due to 503 The model is overloaded. Please try again later., sleeping 0.3s ... 2024-11-19 17:44:05,387 DEBUG: Retrying due to 503 The model is overloaded. Please try again later., sleeping 0.3s ...

generate_text_manipulation_response
    raise google.api_core.exceptions.ResourceExhausted("Quota exhausted after retries.")
google.api_core.exceptions.ResourceExhausted: 429 Quota exhausted after retries.

Hi,

We recieved information that the issue is because of resource problems due to a new way to handle quotas and requests. This will happen for version 002 (when many users try to use 002 at the same time, for the specific region), but 001 will still work as it uses the old way to handle quotas and requests. They are investigating it at the moment, as I understood it.

Other ways to solve it is to try another region (if pay-as-you-go) or buy dedicated GSU's via Provisioned Throughput (for Production Environments).


Working in a pay-as-you-go account using the 

gemini-1.5-pro-002 model. The onlt quotas that show up in my Quotas page are the 2 API services: Generate Content and Search Grounding, both far below the quota, yet I'm still getting 429 errors. The first time I got the error, I waited 24 hours and then it worked again until later that day when it popped up again. Running it on +- 10,000 rows and I only have 776 rows left so it can't be a per-minute rate issue or a daily limit issue since I've waited another 24 hours but it's still not working.

I was having this problem too, and found a solution: after your .generate_content(prompt) function put a time.sleep(1) in there. I went from all but one 429 to none. A 1 second pause might be a deal breaker for some, but should solve at least half the use cases.

We do have similar prolbem. Quota-> Request limit per model per minute for a project in the paid tier 1 is 2000. However it is still limit 5-10 requests per minute.

 

Gemini for Google Cloud API->API requests per minute per user is 120 in our case which should be limiting number of requests per user per minute.Request this to larger number increase number of requests for per minute.

@kiranvaranasi I have implemented exponential backoff with the baseline time.sleep() set to 4 seconds. Looking at my quotas, my requests per minute per project for both models I am using (1.5-flash and pro) are far below quota limit. I also know this because it runs fine but it stops at various times (sometimes I get ~300 rows done, sometimes ~100). Google's Quotas and System Limits tab offer no actual helpful insight as to what is going on. There is no consistent pattern about the the quota limits i am reaching either. Making it a nightmare to work with given my deadline

Hello community! 

I came here for the http status 429 -- thanks for all the input, has been helpful. (-; 

I added billing info so that I can test without too much rate limit, all good now. However, looking at the quota page it has very detailed and categorized information about request limitation on the free tier, so I organized the following table. 

ModelRequest limit per model per dayRequest limit per model per minuteTheoretical resource exhaustion timing following request limit (in minutes)
model : gemini-2.0-pro-exp50225
model : gemini-2.0-flash1,50015100
model : gemini-2.0-flash-lite1,50010150
model : gemini-2.0-exp100520
model : gemini-1.5-pro50225
model : gemini-1.5-pro-exp100520
model : gemini-1.5-flash-exp1,5005300
model : gemini-1.5-flash-8b1,50015100
model : gemini-1.5-flash-8b-exp1,50015100
model : gemini-1.5-flash1,50015100
model : gemini-1.0-pro1,50015100
Total10,8001041040

This is by no means a hint to abuse the API but so you know how long you can continue testing your application. As you can see, with some of the APIs because the limits are so low you can only run one or two calls in between coffee breaks. However, as someone pointed out earlier, sometimes Google's quota dashboard will say you're over the limit but you can still call the API, so the http status 429 is what we will use to determine exponential back-off. 

Does anyone know if AI Premium / Gemini Advance subscribers get higher API rate limits? If so what limits do they become? I'm unable to confirm this yet, if a subscriber can shed some light here, that would be great! 

BTW, Google is doing a great job in building their products, the Gemini API helps me to be really productive and make my projects interesting, keep up the good work Google! We need at least 3 hours continue working time before resource exhaustion so please improve it! 

How many API keys we can create on free plan Google Gemini?