Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Discrepancies in Gemini-1.5 Quotas and Understanding Dynamic Shared Quotas (DSQ)

Hi, I have some questions regarding the quota limitations for Gemini-1.5-flash and Gemini-1.5-pro in the Tokyo region.

  1. Quota discrepancies

    • According to the official documentation on quota limits (reference), the RPM (requests per minute) limits are:
      • Gemini-1.5-flash: 200 RPM
      • Gemini-1.5-pro: 60 RPM
    • However, when I check the "Quota and system limits" page in the Google Cloud Console, both models show a limit of 5 RPM.
    • Why is there a difference between these two sources? Which value should I rely on when planning my usage?
  2. Quota increase request & Dynamic Shared Quotas (DSQ)

    • I plan to request a quota increase since the default RPM values seem too low.
    • I believe that Gemini-1.5 Flash and Gemini-1.5 Pro support DSQ, but according to the DSQ documentation (reference)

      "Dynamic shared quota (DSQ) distributes available on-demand capacity among all queries being processed by Google Cloud services for specific models. This capability eliminates the need to set quota limits and to submit quota increase requests (QIRs)."

    • If DSQ dynamically adjusts quota allocation, what is the 5 RPM limit shown in the Google Cloud Consoleactually referring to?
  3. Understanding DSQ behavior

    • The DSQ documentation provides an example where two customers share a total service capacity of 100 QPM (queries per minute), and their allocations are dynamically adjusted.
    • I have a few questions about this:
      • Is QPM equivalent to RPM, or is there any difference?
      • If one project in my organization exceeds its DSQ allocation and gets throttled, will this also affect other projects within the same organization?
      • If I use a different organization’s project, will it be treated separately, or could DSQ still impact its quota allocation?

I would appreciate any clarification on these points. Thank you!

Solved Solved
0 1 577
1 ACCEPTED SOLUTION

Hi @takenoko,

Welcome to Google Cloud Community!

Your questions are addressed in the following breakdown:

Why is there a difference between these two sources? Which value should I rely on when planning my usage?

The Google Cloud Console displays your specific, current quota allocation. The documentation usually shows the default or potential maximum limits. Always rely on the Cloud Console for your actual usable quota.

If DSQ dynamically adjusts quota allocation, what is the 5 RPM limit shown in the Google Cloud Console actually referring to?

The 5 RPM limit with Dynamic Shared Quota (DSQ) is a level of service that is available, even when the system is very busy. It is also used as a starting point when there isn't much traffic and indicates that the service is being used a lot.

In addition, according to this documentation, Gemini 1.5 Flash and Gemini 1.5 Pro support Dynamic Shared Quota (DSQ) which eliminates the need to set quota limits and to submit quota increase requests (QIRs). If you need higher throughput, consider Google's Provisioned Throughput. Note that it is currently in Preview and access must be requested.

In your understanding of DSQ behavior, I suggest you contact Google Cloud support for a deeper level of understanding

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

View solution in original post

1 REPLY 1

Hi @takenoko,

Welcome to Google Cloud Community!

Your questions are addressed in the following breakdown:

Why is there a difference between these two sources? Which value should I rely on when planning my usage?

The Google Cloud Console displays your specific, current quota allocation. The documentation usually shows the default or potential maximum limits. Always rely on the Cloud Console for your actual usable quota.

If DSQ dynamically adjusts quota allocation, what is the 5 RPM limit shown in the Google Cloud Console actually referring to?

The 5 RPM limit with Dynamic Shared Quota (DSQ) is a level of service that is available, even when the system is very busy. It is also used as a starting point when there isn't much traffic and indicates that the service is being used a lot.

In addition, according to this documentation, Gemini 1.5 Flash and Gemini 1.5 Pro support Dynamic Shared Quota (DSQ) which eliminates the need to set quota limits and to submit quota increase requests (QIRs). If you need higher throughput, consider Google's Provisioned Throughput. Note that it is currently in Preview and access must be requested.

In your understanding of DSQ behavior, I suggest you contact Google Cloud support for a deeper level of understanding

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.