GCP & AI Managing Quotas/Rate limits - What am I m...

pseudopuma · 06-17-2024 01:48 PM

I've been trying to transition from OpenAI/AWS to Gemini/GCP for a project and I am extremely frustrated. I can not get a clear answer, other than "Just look at quotas".

I have a project configured to use OAuth, as for some reason we can't use API keys.
I make tens of thousands of requests using GCP's suite of Python packages to generate content using my trained model.
At some point, it just starts throwing "Resource has been exhausted (e.g. check quota)." Without any details about where, what, and why the resource is exhausted.

If I go into quotas, I see nothing.

After modifying google.api_core.grpc_helpers.py to print the full exception, I finally see that it is referring to `/google.ai.generativelanguage.v1beta.GenerativeService/GenerateContent` for the method, and I am assuming the `resource` is `generativelanguage.googleapis.com`.

So, since there seems to be zero resources from Google on this, I assume the issue is the 10k requests per day limit. Yet for some reason that I don't understand, I can still make requests to the base `Gemini-1.5-pro` model.

1. Why are there so many different terms and no clear description of their relation, what is a quota vs metric vs method vs target vs resource vs system limit? Even here in the forum, the available labels for my cloud AI question include "Gemini, Generative AI Studio, Google AI studio, and Vertex AI Platform". Are these all different products?
2. Within the context of rate-limits and quotas and the other names, how are credentials related to my rate-limits and quotas. Is there anywhere in the console you can see the actual quotas for that method/target for the credential I am using (and I mean outside of just filtering the metrics graphs by credential)?
3. Then beyond the credentials, how are projects effecting the rate-limits and quotas. I can't access my models using an API key, it has to be OAuth. Yet, it seems like AI Studio at the very least wants you to use API-Keys?

jaia

Hello,

I understand your frustration with the lack of clarity around quotas and resource limitations in the transition from OpenAI/AWS to Gemini/GCP. Here's a breakdown of the concepts you mentioned and how they relate to your situation:

Terminology:

Quota: A limit on the amount of a resource you can use within a specific timeframe (e.g., requests per day).
Metric: A measurable value that tracks resource usage (e.g., number of requests made).
Method: A specific API function you're calling (e.g., GenerateContent).
Target: The resource being used (e.g., generativelanguage.googleapis.com).
Resource: A general term for any GCP service or component you're using (e.g., compute engine, storage buckets).
System Limit: A hard limit imposed by the system for security or stability reasons.

Products:
Gemini: The underlying technology powering the Generative Language API.
Generative AI Studio: A web interface for building and managing generative models (uses the Generative Language API under the hood).
Google AI Platform (now Vertex AI Platform): A broader platform encompassing various AI services, including the Generative Language API.

Credentials and Quotas:
Credentials (API keys or OAuth tokens) identify your project and control access to GCP resources. However, they don't directly impact quotas. Quotas are set at the project level.
Unfortunately, there isn't a single place in the GCP Console to see quotas for specific methods/targets per credential. However, you can:
1. Filter Quota Metrics: In the Cloud Monitoring section, filter quota metrics by your project ID and method (e.g., GenerateContent) to see usage trends.
2. API Explorer: Use the API Explorer for the Generative Language API (https://developers.google.com/shopping-content/guides/quickstart) and check the "Quota" tab for general information about potential quota limits.

If the 10k limit is insufficient, consider upgrading to a paid Gemini plan with higher quotas.
If you encounter further issues, contact GCP support[1] for clarification on specific quotas or unexpected behavior.

[1] https://cloud.google.com/support/docs/manage-cases#creating_cases

Regards,
Jai Ade

jaia

Hello,

Thank you for your engagement regarding this issue. We haven’t heard back from you regarding this issue for sometime now. Hence, I'm going to close this issue which will no longer be monitored. However, if you have any new issues, Please don’t hesitate to create a new issue . We will be happy to assist you on the same.

Regards,
Jai Ade

GCP & AI Managing Quotas/Rate limits - What am I missing