How Vertex AI rate limits are calculated on GCP?

I'm planning to use Google Cloud Platform's Vertex AI for a few projects. So, I was looking through the documentation in the section on rate limits and I came across this:

https://cloud.google.com/vertex-ai/generative-ai/docs/quotas

But I haven't found any information anywhere about the algorithm that sets these limits. That is, I have two scenarios in my mind:

First scenario: The limits are at fixed times. For example, between 08:00:00 AM and 08:00:59 AM there are 4 million tokens available and at 08:01:00 AM the tokens are reset.
Second scenario: The limits move as requests are made.

Or maybe it's different from the scenarios outlined.

I would appreciate if someone could explain to me how Google calculates it, or if there is a section of the documentation where I can find this since I haven't seen it.

0 1 521

1 REPLY 1

never-displayed