Why does my Online predictions request per base model per minute per region per base_model go from 5 to 0 in like ten minutes, without evening using any predictions?
Hi @RubberDucky,
Welcome to Google Cloud Community!
It appears that you are encountering a rate limit issue with your online predictions. Google Cloud’s Vertex AI sets a limit on the number of requests per minute (RPM) for each base model. Once the limit is reached, any further requests are blocked until the quota resets.
Here are some potential ways to address your issue:
I hope the above information is helpful.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |