Hi, I'm very frequently getting Error 429 when using Vertex AI API, even though my usage is far below quota for online prediction requests per minute when using gemini-2.0-flash model.
How can I remediate this issue (I'm on a paid tier already)?
Which other LLM (or other location) available at GCloud would you suggest to increase throughput?
As the next step in my project, I wanted to try some genetic algorithms for prompt optimization - unfortunately with so frequent 429 errors it is practically impossible.
Hi @mariuszknowak,
Welcome to Google Cloud Community!
With regard to the error that you received, if the number of your requests exceeds the capacity allocated to process requests, then error code 429 is returned. You may check this page for guidance on how to rectify this issue.
In addition, according to this documentation, Gemini 2.0 Flash support Dynamic Shared Quota (DSQ) which eliminates the need to set quota limits and to submit quota increase requests (QIRs). If you need higher throughput, consider Google's Provisioned Throughput. Note that it is currently in Preview and access must be requested. To reserve your throughput, you must specify the model and available locations in which the model runs.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.