Hi, I have some questions regarding the quota limitations for Gemini-1.5-flash and Gemini-1.5-pro in the Tokyo region.
Quota discrepancies
Quota increase request & Dynamic Shared Quotas (DSQ)
"Dynamic shared quota (DSQ) distributes available on-demand capacity among all queries being processed by Google Cloud services for specific models. This capability eliminates the need to set quota limits and to submit quota increase requests (QIRs)."
Understanding DSQ behavior
I would appreciate any clarification on these points. Thank you!
Solved! Go to Solution.
Hi @takenoko,
Welcome to Google Cloud Community!
Your questions are addressed in the following breakdown:
Why is there a difference between these two sources? Which value should I rely on when planning my usage?
The Google Cloud Console displays your specific, current quota allocation. The documentation usually shows the default or potential maximum limits. Always rely on the Cloud Console for your actual usable quota.
If DSQ dynamically adjusts quota allocation, what is the 5 RPM limit shown in the Google Cloud Console actually referring to?
The 5 RPM limit with Dynamic Shared Quota (DSQ) is a level of service that is available, even when the system is very busy. It is also used as a starting point when there isn't much traffic and indicates that the service is being used a lot.
In addition, according to this documentation, Gemini 1.5 Flash and Gemini 1.5 Pro support Dynamic Shared Quota (DSQ) which eliminates the need to set quota limits and to submit quota increase requests (QIRs). If you need higher throughput, consider Google's Provisioned Throughput. Note that it is currently in Preview and access must be requested.
In your understanding of DSQ behavior, I suggest you contact Google Cloud support for a deeper level of understanding
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Hi @takenoko,
Welcome to Google Cloud Community!
Your questions are addressed in the following breakdown:
Why is there a difference between these two sources? Which value should I rely on when planning my usage?
The Google Cloud Console displays your specific, current quota allocation. The documentation usually shows the default or potential maximum limits. Always rely on the Cloud Console for your actual usable quota.
If DSQ dynamically adjusts quota allocation, what is the 5 RPM limit shown in the Google Cloud Console actually referring to?
The 5 RPM limit with Dynamic Shared Quota (DSQ) is a level of service that is available, even when the system is very busy. It is also used as a starting point when there isn't much traffic and indicates that the service is being used a lot.
In addition, according to this documentation, Gemini 1.5 Flash and Gemini 1.5 Pro support Dynamic Shared Quota (DSQ) which eliminates the need to set quota limits and to submit quota increase requests (QIRs). If you need higher throughput, consider Google's Provisioned Throughput. Note that it is currently in Preview and access must be requested.
In your understanding of DSQ behavior, I suggest you contact Google Cloud support for a deeper level of understanding
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |