Re: Failed to Upload Model in Vertex AI Embedding ...

achakr37 · 10-11-2024 08:42 AM

I am followig this documentation (https://cloud.google.com/vertex-ai/generative-ai/docs/models/tune-embeddings) to finetune text-embedding-004 and textembedding-gecko@003.

All the steps pass but it fails at Upload Model with this error:

INFO 2024-10-11T10:59:33.361440027Z [resource.labels.taskName: workerpool0-0] Traceback (most recent call last):
INFO 2024-10-11T10:59:33.361447834Z [resource.labels.taskName: workerpool0-0] File "/tmp/tmp.f4uQryejXU/ephemeral_component.py", line 123, in text_embedding_model_uploader
INFO 2024-10-11T10:59:33.361455365Z [resource.labels.taskName: workerpool0-0] upload_model_lro = remote_runner.poll_lro(lro=upload_model_lro)
INFO 2024-10-11T10:59:33.361462735Z [resource.labels.taskName: workerpool0-0] File "/usr/local/lib/python3.8/dist-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/lro_remote_runner.py", line 129, in poll_lro
INFO 2024-10-11T10:59:33.361469125Z [resource.labels.taskName: workerpool0-0] lro = self.request(
INFO 2024-10-11T10:59:33.361480927Z [resource.labels.taskName: workerpool0-0] File "/usr/local/lib/python3.8/dist-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/lro_remote_runner.py", line 76, in request
INFO 2024-10-11T10:59:33.361487656Z [resource.labels.taskName: workerpool0-0] raise RuntimeError(
INFO 2024-10-11T10:59:33.361495015Z [resource.labels.taskName: workerpool0-0] RuntimeError: Failed to create the resource. Error: {'code': 13, 'message': 'INTERNAL'}
INFO 2024-10-11T10:59:33.361502225Z [resource.labels.taskName: workerpool0-0] [KFP Executor 2024-10-11 10:59:32,996 ERROR]: Failed to create the resource. Error: {'code': 13, 'message': 'INTERNAL'}
INFO 2024-10-11T10:59:33.361508673Z [resource.labels.taskName: workerpool0-0] Traceback (most recent call last):
INFO 2024-10-11T10:59:33.361515174Z [resource.labels.taskName: workerpool0-0] File "/tmp/tmp.f4uQryejXU/ephemeral_component.py", line 138, in text_embedding_model_uploader
INFO 2024-10-11T10:59:33.361521544Z [resource.labels.taskName: workerpool0-0] raise e
INFO 2024-10-11T10:59:33.361528391Z [resource.labels.taskName: workerpool0-0] File "/tmp/tmp.f4uQryejXU/ephemeral_component.py", line 123, in text_embedding_model_uploader
INFO 2024-10-11T10:59:33.361535448Z [resource.labels.taskName: workerpool0-0] upload_model_lro = remote_runner.poll_lro(lro=upload_model_lro)
INFO 2024-10-11T10:59:33.361542097Z [resource.labels.taskName: workerpool0-0] File "/usr/local/lib/python3.8/dist-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/lro_remote_runner.py", line 129, in poll_lro
INFO 2024-10-11T10:59:33.361549228Z [resource.labels.taskName: workerpool0-0] lro = self.request(
INFO 2024-10-11T10:59:33.361578314Z [resource.labels.taskName: workerpool0-0] File "/usr/local/lib/python3.8/dist-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/lro_remote_runner.py", line 76, in request
INFO 2024-10-11T10:59:33.361585728Z [resource.labels.taskName: workerpool0-0] raise RuntimeError(
INFO 2024-10-11T10:59:33.361595837Z [resource.labels.taskName: workerpool0-0] RuntimeError: Failed to create the resource. Error: {'code': 13, 'message': 'INTERNAL'}

These are my IAM Permissions i have configured:

Any help would be appreciated as the error is not giving me much information either

ibaui

Hi @achakr37,

Welcome to Google Cloud Community!

The error message "Failed to create the resource. Error: {'code': 13, 'message': 'INTERNAL'}" is quite generic and doesn't provide much detail, but it seems like there might be an issue with the internal processing of the upload. With regard to this, you can consider the following, which might help you answer your current issue:

Retry Upload: Sometimes, the issue could be temporary. Try running the upload process again after a short period.
Check Vertex AI Quotas: Verify that you haven't hit any resource quotas in your Google Cloud project for the number of models you can create or store. You can navigate to the Google Cloud Console, and, in the left-hand navigation panel, click on "IAM & Admin" and then select “Quotas & System Limits." You can filter by the specific service that might be exceeded. If you're close to a limit, consider requesting an increase.
API Rate Limiting: If you're making too many requests to the Vertex AI API in a short period, you might be rate-limited. You may consider implementing exponential backoff for retries to reduce load in the API.
Review Logs: Find the complete logs for your Vertex AI pipeline run. Look for logs related to your Vertex AI pipeline to get more specifics on why the resource creation failed. It might give more context than the generic INTERNAL error.
Regional Consistency: Ensure your Vertex AI instance, your storage bucket, and your pipeline components are all in the same Google Cloud region. Regional mismatches can cause failures.
Check IAM Permissions: While you've shown your IAM permissions, ensure that the service account used for uploading the model has the necessary permissions or equivalent custom roles that grant sufficient permission to create, upload and manage models within your Vertex AI project.

I hope the above information is helpful.

Abiramibee04

Hi @achakr37 ,

I am facing same issue, are you able to resolve it?

Failed to Upload Model in Vertex AI Embedding FineTuning