Quota on textembedding models using batchpredictio...

I'm following this example from the documentation using the GenAI Python SDK.

I'm able to successfully create embeddings for my text input. However, I'd like to know what the quota are for the "text-embedding-005" model I am using. Where can I find this?

I can find quota on the number of parallel batchpredictionjobs I can run, but I want to know how many requests per minute (RPM) my batchpredictionjob will make so I can estimate how long my batchpredictionjob will run in production.

There is also a quotum for the "base_model: textembedding_gecko", which is the base model of the "text-embedding-005" model. That quotum is for using the API directly, not for the batchpredictionjob. I don't see this quotum being used when running my jobs.

I would like to know the token and/or the RPM quotum for the "text-embedding-005" that is used inside my batchprediction job.

0 0 22

0 REPLIES 0

never-displayed

Quota on textembedding models using batchpredictionjob