Gemini Pro Quota Exceeded

Hi Community,

I am getting a quota exhaustion message: "quota exceeded for aiplatform.googleapis.com/online_prediction_requests_per_base_model with base model: gemini-pro".

However, I haven't even started using the gemini-pro model, had just been using bison so far and have a 60 QPM limit. 

Any guidance on how to resolve this?

0 15 2,855
15 REPLIES 15

Hello,

Gemini is available under :streamGenerateContent and not :predict. Can you please check and confirm that you are using the below API.

https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.publishers.models/strea...

Thanks!

I'm facing a similar issue where quota for gemini-pro is set to 5 requests per minute. That's in contrast to the 300 requests per minute limit mentioned in the documentation. I have also tried to edit the quota but there is a hard cap of 5 imposed. how can I resolve this issue?

HI amira,

Could you show your Quota Page where it say you have 5 Request per minute? By default, for gemini-pro, all the region should have 10 and us-central1 will have 300.

Regards,

It seems that by default, it has a cap of 5 requests per minute despite the documentation. I have upgraded the billing account to paid and made a payment, but nothing has changed.

5 requests per minute is too low for us.


Screenshot 2024-05-02 at 10.43.47.png

Hi saumil,

Could you share the code that you init and call the model? As my colleague mentioned, you may be using the wrong method.

Regards,

Hi there,

I am also getting a similar kind of error: '429 Resource has been exhausted (e.g. check quota).'

 

Here is the code:

genai.configure(api_key=GOOGLE_API_KEY)

generation_config = {
"candidate_count": 1,
"max_output_tokens": 256,
"temperature": 1.0,
"top_p": 0.7,
}

safety_settings=[
{
"category": "HARM_CATEGORY_DANGEROUS",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE",
},
]

model = genai.GenerativeModel(
model_name="gemini-1.0-pro",
# generation_config=generation_config,
safety_settings=safety_settings
)


def ner(text):
try:

response = model.generate_content(f"""

You are a helpful assistant that can analyze the text and extract NER

The extracted entities should include the Organizations, Project names, scheme names and Company names mentioned in the article.

Understand and Extract the ONLY the above specified NER's from the following text: {text}

Output should contain only the extracted NER's.

""")

return response.text

# except:
except Exception as e:
print(e)
return 'No Response'

Hi Larry,

Thing is about the quotas is that they are region bound, where at least were it comes to generativeai python package the is no interface to define the region endpoint.

How do we conclude the region to up the quota on? The API secret key given when registering to the gemini API is one per account/project, but I could not find any place in the configuration to define the current region that the API of that project (or my whole account?) should be configured to work with...

Is there such a configuration?

It appears there might be a misconfiguration or issue with the quota settings. Double-check your AI Platform quotas, ensuring that the gemini-pro model is correctly associated with its designated quota and not impacting the bison model's limit. If the problem persists, consider reaching out to Google Cloud Support for further investigation and resolution of the quota exceeded issue.

Same issue here,

I am using Vertex AI to generate content. Can you please guide me through the solution?

Here is the quota view:

Capture d’écran 2024-03-20 113803.png

We are experiencing the same problem any solutions?

For those that encountering the Quota Error but when you check your Quota page you see it show 0% usage, it is likely that you are calling the wrong API. Note that the `text-bison` and `gemini-pro` are using different method.

If possible, you can use tools such as langchain that has the wrapper for invoking the AI call, so you don't have to worry much about different model using different method.

For those that see 0 < quota usage < 300 or 10 or any number depend on the region, click on the usage chart icon to see in more detail, the number on the first page may show the usage at different timing than the time you hit Quota.

Hello Larry,

We've implemented Langchain with a wrapper as you suggested, which forces us to use Vertex AI. We're currently using Gemini Pro 1.5 but have encountered a quota limit, which is still set at "5".

The specific error we're seeing is:

"Quota exceeded for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model with base model: gemini-pro. Please submit a quota increase request." More details can be found here.

Below is our setup:

 

javascript code
import { ChatVertexAI } from '@langchain/google-vertexai'; const model = new ChatVertexAI({ maxConcurrency: 8, modelName: 'gemini-1.5-pro-latest', temperature: 0.8 });

We have already reached out to Google support for a quota increase and have paid the requested $10 fee. However, it's been over two business days and we have yet to receive a response.

Could you advise on how we might expedite this matter?

Hello Larry,

We've implemented Langchain with a wrapper as you suggested, which forces us to use Vertex AI. We're currently using Gemini Pro 1.5 but have encountered a quota limit, which is still set at "5".

The specific error we're seeing is:

"Quota exceeded for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model with base model: gemini-pro. Please submit a quota increase request." More details can be found here.

Below is our setup:

 

javascript code
import { ChatVertexAI } from '@langchain/google-vertexai'; const model = new ChatVertexAI({ maxConcurrency: 8, modelName: 'gemini-1.5-pro-latest', temperature: 0.8 });

We have already reached out to Google support for a quota increase and have paid the requested $10 fee. However, it's been over two business days and we have yet to receive a response.

Could you advise on how we might expedite this matter?

I feel a bit dumb, but... ahm.. where is the quota editing page for gemini???

And yes, same issue as the one above... gemini-pro-1.5 seems to be ... exhausted?

To fix this, you can either request a quota increase, or what I do is to wait some time and resubmit my request. For example I have wrote a small python script that every time get this exception wait few seconds and resubmit the request. You can adapt it to your Javascript code, here is how it looks:

def invoke(model, text: str) -> str:
completed = False
sleep = 0
sleep_time = 2
while not completed:
try:
response = model.generate_content(text)
except ResourceExhausted as re:
print(f"ResourceExhausted exception occurred while processing property: {re}")
sleep += 1
if sleep > 5:
print(f"ResourceExhausted exception occurred 5 times in a row. Exiting.")
break
time.sleep(sleep_time)
sleep_time *= 2
else:
completed = True

return response.text

I hope it helps.