Current Rate Limits on Gemini Pro Vision

wkrt · 01-19-2024 01:47 AM

Hi!
What are the current rate limits on Gemini Pro Vision?
According to this page, the only limit is 60 QPM.
But are there any other limits, like QPD or TPM?

Roderick

Hey there @wkrt,

I took a look at the support documents and you are correct that Gemini Pro Vision (and Gemini Pro) models typically have a rate limit of 60 requests per minute (RPM).

Limits beyond Requests per Minute (RPM)

While the primary and publicly emphasized limit is the RPM, it is important to understand that large language models and AI APIs can have additional constraints or usage considerations:

Token Limits:

Input token limit: This refers to the maximum number of tokens (roughly equivalent to 4 characters per token) that you can send in a single request. For Gemini Pro Vision, the input token limit is around 12288.
Output token limit: This refers to the maximum number of tokens the model can generate in a single response. For Gemini Pro Vision, this is usually around 4096.

Project or API-Level Quotas (QPD):

Some cloud-based AI services might impose daily quotas (requests per day) at the project level or for their API usage in general. These are often in place to ensure fair resource distribution and prevent abuse.
It's a good idea to check the specific documentation or quota pages for Google Cloud AI services (Vertex AI) to see if any daily or project-wide quotas apply to Gemini Pro Vision usage.

Technical Maintenance or Downtime: Cloud services undergo updates and planned maintenance, which could temporarily affect API availability and usage.

Best Practices and Tips:

Check Official Documentation: Always refer to the latest official documentation from Google Cloud (Vertex AI) related to Gemini models for the most up-to-date rate limits and quotas.

https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview

Contact Google Cloud Support: If you have specific use cases that might push these limits or require further clarification, reach out to Google Cloud Support for guidance related to your project.

Be Mindful of Token Usage: Long inputs and requests for verbose outputs will consume more tokens. Optimize your prompts and requests to stay within the token limits.

Implement Error Handling and Retries: Build in retry mechanisms in your application to handle cases when you might exceed rate limits.

Let me know if you have any other questions!

Shadi_Saifan

How about Gemini Pro 1.5 new model? The RPM is very low compared to Gemini Pro 1, which is 60 RPM. Is there a way to increase this limit?