Online predictions request per base model per minu...

RubberDucky · 11-10-2024 06:57 AM

Why does my Online predictions request per base model per minute per region per base_model go from 5 to 0 in like ten minutes, without evening using any predictions?

MarvinLlamas

Hi @RubberDucky,

Welcome to Google Cloud Community!

It appears that you are encountering a rate limit issue with your online predictions. Google Cloud’s Vertex AI sets a limit on the number of requests per minute (RPM) for each base model. Once the limit is reached, any further requests are blocked until the quota resets.

Here are some potential ways to address your issue:

Check your available quota: Viewing your current quota can help you adjust the specific limits for your project.
Request a Quota extension: If needed, you can request a quota increase for your project.

I hope the above information is helpful.

Online predictions request per base model per minute per region per base_model goes from 5 to 0