Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Online predictions request per base model per minute per region per base_model goes from 5 to 0

Screenshot 2024-11-10 095320.png

Why does my Online predictions request per base model per minute per region per base_model go from 5 to 0 in like ten minutes, without evening using any predictions?

0 1 121
1 REPLY 1

Hi @RubberDucky,

Welcome to Google Cloud Community!

It appears that you are encountering a rate limit issue with your online predictions. Google Cloud’s Vertex AI sets a limit on the number of requests per minute (RPM) for each base model. Once the limit is reached, any further requests are blocked until the quota resets.

Here are some potential ways to address your issue:

I hope the above information is helpful.