Hi!
What are the current rate limits on Gemini Pro Vision?
According to this page, the only limit is 60 QPM.
But are there any other limits, like QPD or TPM?
Hey there @wkrt,
I took a look at the support documents and you are correct that Gemini Pro Vision (and Gemini Pro) models typically have a rate limit of 60 requests per minute (RPM).
Limits beyond Requests per Minute (RPM)
While the primary and publicly emphasized limit is the RPM, it is important to understand that large language models and AI APIs can have additional constraints or usage considerations:
Token Limits:
Project or API-Level Quotas (QPD):
Technical Maintenance or Downtime: Cloud services undergo updates and planned maintenance, which could temporarily affect API availability and usage.
Best Practices and Tips:
Check Official Documentation: Always refer to the latest official documentation from Google Cloud (Vertex AI) related to Gemini models for the most up-to-date rate limits and quotas.
Contact Google Cloud Support: If you have specific use cases that might push these limits or require further clarification, reach out to Google Cloud Support for guidance related to your project.
Be Mindful of Token Usage: Long inputs and requests for verbose outputs will consume more tokens. Optimize your prompts and requests to stay within the token limits.
Implement Error Handling and Retries: Build in retry mechanisms in your application to handle cases when you might exceed rate limits.
Let me know if you have any other questions!
How about Gemini Pro 1.5 new model? The RPM is very low compared to Gemini Pro 1, which is 60 RPM. Is there a way to increase this limit?
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |