I am testing a platform that I developed using Anthropic's offerings. We decided to go single cloud and I made an impassioned plea to have google be our single cloud provider. Over the past week I've been getting 429 errors roughly 50% of the time. I'm nowhere near meeting or exceeding quota. Its a very lightweight application. when it comes to utilizing the AI service. But that service is critical. Is there any way to determine what the root cause of this is?
And if the root cause is that its generally not as available in the region how can google offer a service that's only available half of the time? If these 429 errors aren't calculated into their SLO/SLA regime then how can I trust that provisioned throughput won't give me the same treatment?
Hi @andylite,
Welcome to Google Cloud Community!
I understand that you've already exhausted troubleshooting efforts regarding the 429 errors on the mentioned models. Error code 429 generally indicates that your resource quota has been exhausted under the Pay-As-You-Go quota framework, or that there are too many requests for Provisioned Throughput.
Since you've already checked your quotas and confirmed that you're not exceeding them, a possible reason could be that the models are experiencing high usage in a specific region, and not because they are generally unavailable in that region. Other contributing factors may include rate limiting and network issues. A workaround for Pay-As-You-Go users is to try accessing the service from a different region. However, since the service is critical, I recommend switching to Provisioned Throughput.
To address the 429 errors in relation to the SLA and Provisioned Throughput, please refer to the documentation quoted below:
“For projects that have purchased Provisioned Throughput, Vertex AI measures a project's throughput and reserves the purchased amount of throughput for the project's actual usage. When you're using less than your purchased throughput amount, errors that might otherwise return as 429 are returned as 5XX and are counted as part of the error rate that is described in the SLA. When you're using more than your purchased throughput amount, the additional requests are processed as pay-as-you-go”.
However, not all 429 errors will be turned into 5XX errors. You might still see 429 errors due to things like bugs in your code or regional outages. If you're getting 429 errors on Provisioned Throughput, check this documentation for more details. Additionally, you can also check this documentation for models that support Provisioned Throughput as reference.
If the issue persists, I recommend reaching out to Google Cloud Support for further assistance, as they can provide insights into whether this behavior is specific to your project.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |