Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

429 error on Vertex after only a few requests per minute with relatively small token sizes

I'm using Vertex API in us-central1. I get this error after 3 or 4 requests per minute.

What's strange is nothing tallies to this in the Vertex API Usage metrics within console:

Screenshot 2025-03-22 at 11.12.17 AM.pngScreenshot 2025-03-22 at 11.13.10 AM.png I see another post about regional resource depletion, but it is consistent every minute - if I wait the minute I get another three requests the next minute and I can spread them out over the minute whereas I'd expect if it were resource at the center it would fail for me on my first call late in that minute. So it definitely seems to be a personal limit and it seems to be 3 or 4 as opposed to the listed 30,000.

One other clue:

Screenshot 2025-03-22 at 11.15.45 AM.png

But if I click through those links it reloads the page I'm on with the applied filter and nothing listed:
Screenshot 2025-03-22 at 11.17.21 AM.png

0 2 342
2 REPLIES 2

Hi @robinsouthgate,

Welcome to the Google Cloud Community!

It looks like you are encountering 429 (Too Many Requests) errors from the Vertex AI API, even though your usage appears to be within the allocated quota for online prediction requests per minute.

Here are the potential ways that might help with your use case:

  • Retries and Backoff: You may implement proper retry logic with exponential backoff in your code. A temporary network glitch affecting your system and causing a few failures could trigger internal rate limits, even if your overall request rate is low.
  • Not the Right Quota: Verify that you are reviewing the appropriate quota. It might be a quota with a less intuitive name or one associated with a specific sub-component of Vertex AI that you are utilizing.
  • Project-Specific Limits: Quotas are frequently tied to individual projects. Ensure that you are reviewing the quota for the specific Google Cloud project you are working with.
  • Service Account Permissions: Double-check that the service account your application is using has the necessary permissions for you to access the Vertex API. Insufficient permissions on your service account can sometimes appear as rate-limiting issues.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Hi Marvin,

I did implement a retry/backoff strategy to work around the rate limiting - but I'm still none the wiser as to why the limiting is so low. I've checked the quota, the project and the service account. All appear correctly configured and the requests I'm seeing made are far lower than the configured levels.