I'm encountering a "429 Too Many Requests" error when using my distilled model based on (text-bison) model in my Firebase function with the Vertex AI client library. The error message is:
ClientError: [VertexAI.ClientError]: got status: 429 Too Many Requests. {"error":{"code":429,"message":"Quota exceeded for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model with base model: text-bison. Please submit a quota increase request."
Any suggestions on how to resolve this issue or workarounds would be greatly appreciated.
Hi @ess-cr,
Welcome to Google Cloud Community!
The "429 Too Many Requests" HTTP error code specifically indicates you've reached the rate limit for your project using the text-bison model in Vertex AI. This means you're exceeding the allowed number of requests per minute for that model.
Here is a workaround to consider:
Check Vertex AI Generative AI quota limits. You may request a quota increase. Keep in mind that quota increase requests are reviewed and approved on a case-to-case basis. Note that the requests will be reviewed and granted for valid business cases.
If you want to increase your quotas for Generative AI on Vertex AI, you can use the Google Cloud console to request a quota increase. To learn more about quotas, see Work with quotas.
If your usage demands consistently exceed the standard quotas, it's essential to engage with Google Cloud support to discuss a long-term solution. This might involve dedicated capacity or other customized options.
I hope the above information is helpful.
Thank you for your prompt reply!
I appreciate the guidance regarding the "429 Too Many Requests" error. However, I noticed something unusual in the error message. It specifically mentions exceeding the quota for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model using the text-bison model.
When I searched through the Quotas & System limits in the Google Cloud Console, I could only find quotas related to "Online prediction requests per base model per minute per region per base_model." As shown in the attached screenshot, I currently have sufficient quota for these online prediction requests.
Could there be another quota or limit that I'm overlooking? Or is there a possible misconfiguration that might be causing the discrepancy between the error message and the actual quotas available?
Thanks again for your help!