I keep receiving 429 Quota exceeded errors when trying to create embeddings. Looking at my quotas, I can see "online_prediction_requests_per_base_model" is limited to 5/minute.
This seems to contradict this page, which suggests the limit should be 600. https://cloud.google.com/vertex-ai/docs/quotas
Is there a reason why I cannot receive a higher quota?
Many thanks
Solved! Go to Solution.
Hi @stars,
Welcome and thank you for reaching out to our community.
I understand that our documentation can be confusing at times but let me help you get a better picture of our quotas and limits.
The base_model:textembedding-gecko indeed has 600 requests per minute quota but it is limited to 5 input text per request. This means that you can have a maximum of 600 request instances per minute with a maximum of 5 input text for each request, as shown in the screenshot that you have provided.
Please do note that you can also reach out to Vertex AI Support to discuss more of this in detail.
Hi @stars,
Welcome and thank you for reaching out to our community.
I understand that our documentation can be confusing at times but let me help you get a better picture of our quotas and limits.
The base_model:textembedding-gecko indeed has 600 requests per minute quota but it is limited to 5 input text per request. This means that you can have a maximum of 600 request instances per minute with a maximum of 5 input text for each request, as shown in the screenshot that you have provided.
Please do note that you can also reach out to Vertex AI Support to discuss more of this in detail.
Thank you for the clarification! 😀
I've just discovered that the new limit now seems to be 250 input texts per request, compared to 5 before.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |