Vertex AI Resource Exhaustion Error but resources ...

Churns · 05-04-2023 02:22 AM

I'm attempting to run a basic pipeline using Kubeflow in Vertex AI. However, when I run it I receive a RESOURCE_EXHAUSTED error in the logs relating to aiplatform.googleapis.com/custom_model_training_cpus.

Checking my quotas, I can see that I am not anywhere close to exhausting any of the quotas under aiplatform.googleapis.com/custom_model_training_cpus (including the region I'm using - us-central1).

Has anyone had a similar issue and know what is going on here?

Churns

And here's the exact error:

rubenszmm

Resource exhausted usually refers to the amount of memory used while running the code, not exactly the region quotas. Did you define the machine type in Kubeflow training and deployment, like?

training_op = train_model(epochs,).set_cpu_limit('16').set_memory_limit('32G').set_caching_options(False)

deploy_op = deploy_model(training_op.outputs["xx"] ,"project","uscentral1").set_cpu_limit('8').set_memory_limit('16G').set_caching_options(False)

Vertex AI Resource Exhaustion Error but resources are not even close to exhausted?...