Solved: Re: Fail to deploy my registered model to endpoint...

Walter23 · 09-17-2023 10:00 PM

Whenever I want to deploy my registered model to the endpoint on Vertex AI it will fail here. I don't even know what the error is and which process it failed to execute. I can't find useful information on Logs Explorer to help me solve it. This is so weird, and I really need help!

ERROR 2023-09-18T03:31:41.183270692Z 0%| | 0/10 [00:00<?, ?it/s] 10%|█ | 1/10 [00:16<02:24, 16.09s/it]
{
"insertId": -,
"jsonPayload": {
"message": "\r 0%| | 0/10 [00:00<?, ?it/s]\r 10%|█ | 1/10 [00:16<02:24, 16.09s/it]",
"levelname": "ERROR",
"logTag": "F"
},
"resource": {
"type": "aiplatform.googleapis.com/Endpoint",
"labels": {
"location": "us-central1",
"endpoint_id": -,
"resource_container": -
}
},
"timestamp": "2023-09-18T03:31:41.183270692Z",
"severity": "ERROR",
"logName": -
"receiveTimestamp": "2023-09-18T03:31:42.189719321Z"
}

lsolatorio

Hi @Walter23,

Welcome and we appreciate you reaching out to our community.

I understand that you are having issues deploying your model to an endpoint and I agree with you that the error message was lacking substance to start troubleshooting.

I looked up our service health incidents but none was reported that is related to your case. Since you already checked your logs and there are no available documents that specify this error, I can only share with you things to consider when deploying a model.

Make sure that you model is in a deployable format like the SavedModel format in TensorFlow
The endpoint must be configured to correctly follow all the needed requirements
Access to all the needed resources of the endpoint for it to run must be granted as well
Please note that the maximum size of a deployable model is 10 GB

Last resort would be starting again from scratch, recreating your model and redeploying it again. Here are some resources that can be of help.

Hoping that a resolution will come your way.

View solution in original post

lsolatorio

Hi @Walter23,

Welcome and we appreciate you reaching out to our community.

I understand that you are having issues deploying your model to an endpoint and I agree with you that the error message was lacking substance to start troubleshooting.

I looked up our service health incidents but none was reported that is related to your case. Since you already checked your logs and there are no available documents that specify this error, I can only share with you things to consider when deploying a model.

Make sure that you model is in a deployable format like the SavedModel format in TensorFlow
The endpoint must be configured to correctly follow all the needed requirements
Access to all the needed resources of the endpoint for it to run must be granted as well
Please note that the maximum size of a deployable model is 10 GB

Last resort would be starting again from scratch, recreating your model and redeploying it again. Here are some resources that can be of help.

Hoping that a resolution will come your way.

Walter23

Hi @lsolatorio,

Thank you so much for your kind help! However, I still failed to deploy my registered model to the endpoint on Vertex AI. The logging explorer reported an error to me, which I do not know how to solve. Would you like to give me some tips? Do you also know where I can find examples of deployable model format?

{
insertId: "6xz81afdkaiqr"
jsonPayload: {
levelname: "ERROR"
logTag: "F"
message: "10.124.7.1 - - [28/Sep/2023 03:21:56] "GET /v1/endpoints/2815772312920391680/deployedModels/9213636960902447104 HTTP/1.1" 404 -"
}
logName: "projects/myprojectid/logs/aiplatform.googleapis.com%2Fprediction_container"
receiveTimestamp: "2023-09-28T03:21:56.900395566Z"
resource: {
labels: {
endpoint_id: "2815772312920391680"
location: "us-central1"
resource_container: "projects/78123506305"
}
type: "aiplatform.googleapis.com/Endpoint"
}
severity: "ERROR"
timestamp: "2023-09-28T03:21:56.742937326Z"
}

Saniya_BZ

Hello Google Cloud Community,

I am currently working on deploying a custom model to Google Cloud's Vertex AI, and I am encountering a persistent issue that I need help resolving.

What I'm Building:

I am building a system that utilizes a custom LLM (Large Language Model) called Qwen, which is containerized using Docker. The model is designed to handle text-based tasks, and I have trained it for my specific use case. This model is pushed to Artifact Registry and is intended to be deployed to a Vertex AI endpoint for real-time inference.

Steps Taken:

I created a Docker image for my custom LLM, and it was successfully uploaded to Google Cloud Artifact Registry.
I then created a Vertex AI endpoint using the gcloud CLI and tried to deploy the model.
During deployment, I used the following command:
bash

gcloud ai endpoints deploy-model --region=us-central1 --endpoint=ENDPOINT_ID --model=MODEL_ID --display-name="zenv-llm-deployment" --traffic-split=0=100
However, the deployment is continuously failing with the error message:
arduino
Copy
"Endpoint Active but Deployed Model Failed."

Detailed Error:

The model fails to deploy properly, and the status on the Vertex AI dashboard shows "Endpoint Active but Deployed Model Failed." When I check the logs or use the gcloud ai operations describe command, I see no specific error messages that provide clear guidance on the cause of failure.

After several attempts, I get the error: "FAILED_PRECONDITION".
Additionally, during some attempts, I received the message: "Field: version.deployment_uri Error: Deployment directory gs://...artifacts/ is expected to contain exactly one of: [model.pkl, model.joblib]". However, I do not have .pkl or .joblib files for this model. Instead, my model is containerized in Docker and is compatible with the cloud infrastructure.

What I've Tried So Far:

Ensured that the Artifact Registry permissions were correctly set up for the Vertex AI Service Agent.
Verified that the model container image in Artifact Registry exists and is accessible.
Ensured that the endpoint was created successfully before deploying the model.
Tried multiple times, even after waiting for extended periods, but the issue persists.

Request for Help:

I would appreciate any guidance or suggestions on how to resolve the issue. Specifically:

What could be causing the model deployment to fail even though the endpoint is active?
Are there specific configurations or permissions that need to be adjusted to ensure the model container is properly deployed to Vertex AI?
Is there a specific format or configuration required for Docker containers used in Vertex AI deployments?

Your help would be greatly appreciated, and I look forward to hearing from the community.

366plus

I am having the same issues. I’m not so sure it’s possible to deploy using Vertex AI. Anyone actually deployed yet? I’ve made countless corrections. The errors sometimes don’t show in the log. It’s very inconsistent. At one point I got a message one of the endpoints wasn’t created then found out later it actually did. Very frustrating. How can one fix an issue if the log entry isn’t available to use.

ymatsumoto-ai

I have the same problem.

I've been fine-tuning some embedding models on Vertex AI. At first, I could deploy any emb models but now I can't after I frequently switched deployment of model to evaluate each model.

I tried to copy tuned-model to another project or region but it occurred the error.

I'm triyng to train model in other region and I'll check whether I can deploy it or not.

dwight_schrute

im facing same error. Can you please help me if you find the solution.
I got stuck in this problem for couple of weeks but failed.

Saniya_BZ

Hi Walter, how did you solve that problem?, please can you tell me

Saniya_BZ

Hello Google Cloud Community,

I am currently working on deploying a custom model to Google Cloud's Vertex AI, and I am encountering a persistent issue that I need help resolving.

What I'm Building:

I am building a system that utilizes a custom LLM (Large Language Model) called Qwen, which is containerized using Docker. The model is designed to handle text-based tasks, and I have trained it for my specific use case. This model is pushed to Artifact Registry and is intended to be deployed to a Vertex AI endpoint for real-time inference.

Steps Taken:

I created a Docker image for my custom LLM, and it was successfully uploaded to Google Cloud Artifact Registry.
I then created a Vertex AI endpoint using the gcloud CLI and tried to deploy the model.
During deployment, I used the following command:
bash
Copy
gcloud ai endpoints deploy-model --region=us-central1 --endpoint=ENDPOINT_ID --model=MODEL_ID --display-name="zenv-llm-deployment" --traffic-split=0=100
However, the deployment is continuously failing with the error message:
arduino
Copy
"Endpoint Active but Deployed Model Failed."

Detailed Error:

The model fails to deploy properly, and the status on the Vertex AI dashboard shows "Endpoint Active but Deployed Model Failed." When I check the logs or use the gcloud ai operations describe command, I see no specific error messages that provide clear guidance on the cause of failure.

After several attempts, I get the error: "FAILED_PRECONDITION".
Additionally, during some attempts, I received the message: "Field: version.deployment_uri Error: Deployment directory gs://...artifacts/ is expected to contain exactly one of: [model.pkl, model.joblib]". However, I do not have .pkl or .joblib files for this model. Instead, my model is containerized in Docker and is compatible with the cloud infrastructure.

What I've Tried So Far:

Ensured that the Artifact Registry permissions were correctly set up for the Vertex AI Service Agent.
Verified that the model container image in Artifact Registry exists and is accessible.
Ensured that the endpoint was created successfully before deploying the model.
Tried multiple times, even after waiting for extended periods, but the issue persists.

Request for Help:

I would appreciate any guidance or suggestions on how to resolve the issue. Specifically:

What could be causing the model deployment to fail even though the endpoint is active?
Are there specific configurations or permissions that need to be adjusted to ensure the model container is properly deployed to Vertex AI?
Is there a specific format or configuration required for Docker containers used in Vertex AI deployments?

Your help would be greatly appreciated, and I look forward to hearing from the community.

Fail to deploy my registered model to endpoint on Vertex AI

What I'm Building:

Steps Taken:

Detailed Error:

What I've Tried So Far:

Request for Help:

What I'm Building:

Steps Taken:

Detailed Error:

What I've Tried So Far:

Request for Help: