code=RESOURCE_EXHAUSTED, message=The following quo...

kaz07 · 02-13-2025 03:20 PM

Hi all,

I have been trying to create very small and simple pipelines with only a couple of components with simple Python functions but they seem to all fail with the following error:

"com.google.cloud.ai.platform.common.errors.AiPlatformException: code=RESOURCE_EXHAUSTED, message=The following quota metrics exceed quota limits: aiplatform.googleapis.com/custom_model_training_cpus, cause=null; Failed to handle the pipeline task"

even when directly copying simple pipeline code from the GCP website documentation it fails and gives the same error as above. I have also seen my quotas and limits too and it seems no limits or resources are being exhausted and I assumed this as the pipelines are so small and simple.

I am also getting the same issue when trying to train the most simple ML model within Vertex AI.

Any ideas on why this is happening?

Thank you.

ibaui

Hi @kaz07,

Welcome to Google Cloud Community!

The "RESOURCE_EXHAUSTED" error, which specifically mentions aiplatform.googleapis.com/custom_model_training_cpus, indicates that your Vertex AI project is being restricted due to requesting more CPU resources than your allocated quota allows. Even if your pipelines seem small, the default configuration might be requesting more CPUs than your project's quota permits, particularly in a new project or a region with high demand. Quotas are in place to ensure fair usage and to prevent any single user from overloading the system. Regarding this, here are some possible approaches you can consider to address the issue:

Quota Limits: You can confirm whether you have reached the quota limit assigned to your project. You can navigate to the Google Cloud Console, and, in the left-hand navigation panel, click on "IAM & Admin" and then select “Quotas & System Limits." You can use the Filter search box to search for your quota.
Request a Quota Increase: While you're experiencing an error related to resource limits, and upon checking your quotas you find they aren't exhausted, you might consider requesting a quota increase from Google Cloud. You may follow the steps in this documentation. Keep in mind that these requests are subject to review and approval and may take some time to process. Additionally, quota increase requests are typically evaluated based on the validity of the business case provided.
Vertex Pipelines uses Vertex training service for executing its containers. The exhausted resource is the Vertex training quota. You can find more details about this quota here.
Regional Availability: Sometimes, a specific machine type you've chosen for your training job might not be available in your chosen region. This can indirectly lead to resource exhaustion errors. Try using a different, more common machine type.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

code=RESOURCE_EXHAUSTED, message=The following quota-AI pipelines using Kubeflow within Google Colab