Hi everyone,
I am trying to submit a batch prediction job using the gemini-2.0-flash-001 model, but I keep getting the following error:
google.api_core.exceptions.ResourceExhausted: 429
The following quota metrics exceed quota limits:
aiplatform.googleapis.com/gemini_pro_concurrent_batch_prediction_jobs
I am not using gemini-pro, but rather gemini-2.0-flash-001, so I am unsure why this quota error is occurring.
I also checked the "Quotas" section in Google Cloud Console, but I couldn't find any quota related to aiplatform.googleapis.com/gemini_pro_concurrent_batch_prediction_jobs.
Could this be related to my project’s quota limits? If so, is there a way to check and increase the allowed concurrent batch prediction jobs?
Any insights or solutions would be greatly appreciated!
Thanks in advance!
Hi @Ramazan19x,
Welcome to Google Cloud Community!
The error message you're encountering, google.api_core.exceptions.ResourceExhausted: 429, indicates that your project has exceeded the quota for concurrent batch prediction jobs associated with the aiplatform.googleapis.com/gemini_pro_concurrent_batch_prediction_jobs metric. This quota pertains to the Gemini Pro model, which is distinct from the gemini-2.0-flash-001 model you're using.
Regarding this error, here are some possible approaches you can consider to address the issue:
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Re quota mismatch.
Google uses the `gemini_pro_concurrent_batch_prediction_jobs` quota for all Gemini-related batch prediction jobs. This includes, for example, both gemini-1.5-flash-002 and gemini-1.5-pro-002. This appears to be a misconfiguration or a misunderstanding of the quota metrics BY GOOGLE!
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |