Hi everyone,
I'm encountering consistent 429 errors (rate limit exceeded) when using the Vertex AI gemini-2.0-flash-thinking-exp-01-21 API. Here’s a brief summary of my situation:
Yesterday, I used Vertex AI to process data with the gemini-2.0-flash-thinking-exp-01-21 API.
After processing around 5000 data items, I started receiving 429 errors.
When I interrupted the task and restarted it, all subsequent requests returned a 429 error.
My code employs a ThreadPoolExecutor from Python’s concurrent.futures with max_workers set to 32. Even reducing max_workers to 16 doesn't alleviate the issue.
Has anyone experienced similar issues or can offer insights into why this might be happening?
Thanks in advance for your help!
Hi @laolaorkkkkk,
Welcome to the Google Cloud Community!
It looks like you are encountering 429 Too Many Requests errors due to hitting the rate limits of the Vertex AI gemini-2.0-flash-thinking-exp-01-21 API. Despite reducing your concurrency, the issue persists, suggesting the rate limit is likely applied at your project or user level rather than just per connection.
Here are the potential ways that might help with your use case:
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Hi, it seems that a quota of 10 value( default value) isn't enough. I also emailed Google Cloud Platform Support, but they told me that the quota for the experimental API cannot be increased
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |