I'm using Vertex API in us-central1. I get this error after 3 or 4 requests per minute.
What's strange is nothing tallies to this in the Vertex API Usage metrics within console: I see another post about regional resource depletion, but it is consistent every minute - if I wait the minute I get another three requests the next minute and I can spread them out over the minute whereas I'd expect if it were resource at the center it would fail for me on my first call late in that minute. So it definitely seems to be a personal limit and it seems to be 3 or 4 as opposed to the listed 30,000.
One other clue:
But if I click through those links it reloads the page I'm on with the applied filter and nothing listed:
Hi @robinsouthgate,
Welcome to the Google Cloud Community!
It looks like you are encountering 429 (Too Many Requests) errors from the Vertex AI API, even though your usage appears to be within the allocated quota for online prediction requests per minute.
Here are the potential ways that might help with your use case:
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Hi Marvin,
I did implement a retry/backoff strategy to work around the rate limiting - but I'm still none the wiser as to why the limiting is so low. I've checked the quota, the project and the service account. All appear correctly configured and the requests I'm seeing made are far lower than the configured levels.