Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Sachin Duggal : API Rate Limits and Token Usage for Deploying Google Generative AI Applications

Hii Everyone, As the owner of a bags manufacturing business, my name is Sachin Duggal. I've developed a program utilizing the Google Generative AI API and am planning to deploy it for public use. I'm concerned about potential limitations related to "tokens" and how they might affect the program's performance, especially if a large number of users send numerous requests simultaneously.

Regards

Sachin Duggal

 

0 1 116
1 REPLY 1

Hi @emilianosmith,

I understand that you'd like to know more about the rate limits imposed for Generative AI in terms of tokens and the potential increase of number of simultaneous requests coming from your users.

Have you visited this documentation that covers the rate limits for Gen AI (though I’m not sure what base model you're exactly using for your Generative AI)? That may possibly help you know about the rate limits for each base model in terms of tokens per minute. It also covers the request per minute (RPM) quotas that applies to a base model and all versions, identifiers and tuned versions of that model.

In addition to that, you can view the quotas or choose to request for an increase of quotas through Google Cloud console.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.