Hi, I'm very frequently getting Error 429 when using Vertex AI API, even though my usage is far below quota for online prediction requests per minute when using gemini-2.0-flash model.
How can I remediate this issue (I'm on a paid tier already)?
Which other LLM (or other location) available at GCloud would you suggest to increase throughput?
As the next step in my project, I wanted to try some genetic algorithms for prompt optimization - unfortunately with so frequent 429 errors it is practically impossible.