I'm planning to use Google Cloud Platform's Vertex AI for a few projects. So, I was looking through the documentation in the section on rate limits and I came across this:
https://cloud.google.com/vertex-ai/generative-ai/docs/quotas
But I haven't found any information anywhere about the algorithm that sets these limits. That is, I have two scenarios in my mind:
Or maybe it's different from the scenarios outlined.
I would appreciate if someone could explain to me how Google calculates it, or if there is a section of the documentation where I can find this since I haven't seen it.
Hi @diegol116,
Welcome to Google Cloud Community!
Vertex AI Generative AI quotas are calculated based on the number of requests per minute (RPM) for a base model and all its versions, identifiers, and tuned versions. Unfortunately, Google doesn't publicly disclose the exact algorithm used to calculate these limits. The quotas apply to requests for a given Google Cloud project and supported region. Additionally, there are quotas for specific services like RAG Engine and Gen AI Evaluation Service. Some quotas are shared across all applications and IP addresses within a Google Cloud project.
I hope the above information is helpful.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |