Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Why am I getting "ResourceExhausted: 429" error for Gemini-2.0-Flash-001 in batch prediction?

Hi everyone,

I am trying to submit a batch prediction job using the gemini-2.0-flash-001 model, but I keep getting the following error:

google.api_core.exceptions.ResourceExhausted: 429
The following quota metrics exceed quota limits:
aiplatform.googleapis.com/gemini_pro_concurrent_batch_prediction_jobs

I am not using gemini-pro, but rather gemini-2.0-flash-001, so I am unsure why this quota error is occurring.

I also checked the "Quotas" section in Google Cloud Console, but I couldn't find any quota related to aiplatform.googleapis.com/gemini_pro_concurrent_batch_prediction_jobs.

Could this be related to my project’s quota limits? If so, is there a way to check and increase the allowed concurrent batch prediction jobs?

Any insights or solutions would be greatly appreciated!

Thanks in advance!

0 2 169
2 REPLIES 2

Hi @Ramazan19x,

Welcome to Google Cloud Community!

The error message you're encountering, google.api_core.exceptions.ResourceExhausted: 429, indicates that your project has exceeded the quota for concurrent batch prediction jobs associated with the aiplatform.googleapis.com/gemini_pro_concurrent_batch_prediction_jobs metric. This quota pertains to the Gemini Pro model, which is distinct from the gemini-2.0-flash-001 model you're using.

  • Quota Mismatch: The error references a quota metric (gemini_pro_concurrent_batch_prediction_jobs) linked to the Gemini Pro model. Since you're using the gemini-2.0-flash-001 model, this suggests a potential misconfiguration or a misunderstanding of the quota metrics.

Regarding this error, here are some possible approaches you can consider to address the issue:

  1. Verify Model and Quota Association: Ensure that the batch prediction job is correctly configured to use the gemini-2.0-flash-001 model and not the Gemini Pro model. Misconfigurations can lead to quota errors.
  2. Check for Dynamic Quotas: Some models, like Gemini 2.0, have dynamic quotas that may not appear in the Quotas section of the Google Cloud Console. This means you might not see the specific quota for gemini-2.0-flash-001 even if it's being exceeded.
  3. Contact Google Cloud Support: If the issue persists and you're unable to identify the cause, consider reaching out to Google Cloud Support for assistance. They can provide insights into your project's quota usage and help resolve any discrepancies.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Re quota mismatch.

Google uses the `gemini_pro_concurrent_batch_prediction_jobs` quota for all Gemini-related batch prediction jobs. This includes, for example, both gemini-1.5-flash-002 and gemini-1.5-pro-002. This appears to be a misconfiguration or a misunderstanding of the quota metrics BY GOOGLE!