Hello,
I tried to run a function to tune PaLM. The pipeline shows the following error:
RuntimeError: Job failed with:
code: 9
message: "The DAG failed because some tasks failed. The failed tasks are: [large-language-model-tuner].; Job (project_id = ai-assisted-vma-v2, job_id = 4202643503638904832) is failed due to the above error.; Failed to handle the job: {project_number = 117401557332, job_id = 4202643503638904832}"
The node info on Pipeline Run Analysis is:
com.google.cloud.ai.platform.common.errors.AiPlatformException: code=RESOURCE_EXHAUSTED, message=The following quota metrics exceed quota limits: aiplatform.googleapis.com/restricted_image_training_tpu_v3_pod, cause=null; Failed to create custom job.Project number: 117401557332, Job id: 4202643503638904832, Task id: 8599389292487245824, Task name: large-language-model-tuner, Task state: DRIVER_SUCCEEDED, Execution name: projects/117401557332/locations/europe-west4/metadataStores/default/executions/16871158654916253004; Failed to create external task or refresh its state. Task:Project number: 117401557332, Job id: 4202643503638904832, Task id: 8599389292487245824, Task name: large-language-model-tuner, Task state: DRIVER_SUCCEEDED, Execution name: projects/117401557332/locations/europe-west4/metadataStores/default/executions/16871158654916253004; Failed to handle the pipeline task. Task: Project number: 117401557332, Job id: 4202643503638904832, Task id: 8599389292487245824, Task name: large-language-model-tuner, Task state: DRIVER_SUCCEEDED, Execution name: projects/117401557332/locations/europe-west4/metadataStores/default/executions/16871158654916253004
I am unable to identify what the issue is. How can I resolve it?
Thanks in advance!
Solved! Go to Solution.
Hi @midmurali57,
It appears that you have reached your quotas and limits hence the error code 9, which is also mentioned in the first two lines of the error message.
com.google.cloud.ai.platform.common.errors.AiPlatformException: code=RESOURCE_EXHAUSTED, message=The following quota metrics exceed quota
You can check this in your project's Quota page to see which services are already exceeding the quota. There is also a troubleshooting guide available for you.
I believe this can be resolved by requesting a higher quota limit.
Hope this helps.
Hi @midmurali57,
It appears that you have reached your quotas and limits hence the error code 9, which is also mentioned in the first two lines of the error message.
com.google.cloud.ai.platform.common.errors.AiPlatformException: code=RESOURCE_EXHAUSTED, message=The following quota metrics exceed quota
You can check this in your project's Quota page to see which services are already exceeding the quota. There is also a troubleshooting guide available for you.
I believe this can be resolved by requesting a higher quota limit.
Hope this helps.