Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How does Dynamic Shared Quota (DSQ) work for Gemini 1.5 models?

Hey everyone! I’m digging into Google Cloud and got a couple of questions about Dynamic Shared Quota (DSQ) for Gemini 1.5 Flash (gemini-1.5-flash-002) and Gemini 1.5 Pro (gemini-1.5-pro-002).

1. How exactly does DSQ work?

Is it an organization-wide quota that is dynamically allocated to projects as needed?

Or is there no organization-wide quota, and projects can borrow unused quota from other projects when they exceed their own limits?

2. For models that do not support DSQ, each project has its own quota limits. However, for models that support DSQ, do all projects share the same quota instead of having separate limits per project?

3. According to the docs, these models support DSQ and are available by default. Does this mean DSQ is automatically applied, and if so, is there a way to disable it if needed?

4. I checked the “Quotas & Limits” section in both the organization and project settings, but I couldn’t find any DSQ-related entries. Is there a way to verify if DSQ is in effect or to view its status?

 

Thanks a lot for any help or clarification!

Solved Solved
0 1 526
1 ACCEPTED SOLUTION

In Google Cloud, the Dynamic Shared Quota (DSQ) dynamically distributes the available on-demand capacity among all queries handled by particular models, like Gemini 1.5 Flash and Gemini 1.5 Pro. This implies that resources are shared among projects rather than having a set quota for each project, enabling more flexible use. For these models, DSQ is activated by default. Although it provides dynamic allocation, if regional demand surpasses capacity, it may result in 429 errors. Consider investing in Provisioned Throughput, which ensures committed resources, to assure constant performance. Since there isn't a direct method to disable DSQ or check its status in the "Quotas & Limits" area at the moment, it is best to keep an eye on your application's performance and error rates in order to properly manage consumption.

View solution in original post

1 REPLY 1

In Google Cloud, the Dynamic Shared Quota (DSQ) dynamically distributes the available on-demand capacity among all queries handled by particular models, like Gemini 1.5 Flash and Gemini 1.5 Pro. This implies that resources are shared among projects rather than having a set quota for each project, enabling more flexible use. For these models, DSQ is activated by default. Although it provides dynamic allocation, if regional demand surpasses capacity, it may result in 429 errors. Consider investing in Provisioned Throughput, which ensures committed resources, to assure constant performance. Since there isn't a direct method to disable DSQ or check its status in the "Quotas & Limits" area at the moment, it is best to keep an eye on your application's performance and error rates in order to properly manage consumption.