Hello,
I've been making API requests to gemini 1.5 pro from an app, but starting about 2 days ago I've been getting rate limited (error code 429) errors on some of my requests. The actual usage / requests made to gemini hasn't changed much, and I've confirmed that my usage in terms of requests and tokens is well below my quotas (as seen in images below). Could anyone provide any support here?
Thanks,
Brian
Are you using gemini-1.5-pro-002
?
Beginning on September 24th [release notes], the latest versions of Gemini 1.5 Flash (gemini-1.5-flash-002
) and Gemini 1.5 Pro (gemini-1.5-pro-002
) now use dynamic shared quota, which distributes on-demand capacity among all queries being processed. I am wondering if this may be why you are now seeing 429 and haven't before.
Hi Andrew,
Thanks for the response! We're submitting to Gemini-1.5-pro-001 only - is there a reason the 001 version might be affected or could there be some other issue?
I wonder if you may be hitting limits around the nature of your image, audio, video, or PDF files? https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models?hl=en
To clarify - our requests are text-only, so that shouldn't be the case! Would you happen to know of any other possible causes?