Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Vertex AI Failed to deploy model: Deployment directory is expected [model.pkl, model.joblib]

I'm having problem to replicate the notebook at this link 

No problem to run the Notebook, but when I try to replicate the process with my sklearn model I'm not able to deploy the model from the model registry.

This is the error:

Failed to deploy model "my_model" to endpoint "my_endpoint" due to the error: APPLICATION_ERROR;google.cloud.ml.v1/ModelService.CreateVersion;Field: version.deployment_uri Error: Deployment directory gs://caip-tenant-a781594e-7c47-400f-b646-f01453f02b27/1601535442878988288/artifacts/ is expected to contain exactly one of: [model.pkl, model.joblib].;AppErrorCode=9;StartTimeMs=1700659051985;unknown;ResFormat=uncompressed;ServerTimeSec=0.310175309;LogBytes=256;Non-FailFast;EffSecLevel=none;ReqFormat=uncompressed;ReqID=c7ef928b4eff3260;GlobalID=0;Server=[2002:a17:907:908a:b0:a03:9f9c:c3bb]:4002

Description:

I made the model with prebuild container "sklearn-cpu.1-0"

The training completes and store the model in a GCS bucket (dimension of 4MB)

The model appear in the model registry 

When I try to deploy the model I receive the above alarm, the endpoint is created but the status is: " Endpoint active but deployed model failed"

I tryied to change container for training and deployment, scikit-learn packages (1.3 - 1.2 - 1.0) but without success.

Any suggestion is wellcome.

Thanks

1 3 3,292
3 REPLIES 3

I am having similar issue as this and would appreciate some suggestions towards the solution. I have created a custom container for training and it was successful. I then created another container for serving and can clearly see it in my artifact repo as mlplatform-serving:latest.  However, when I tried to deploy model to an endpoint with this container, I get similar error as above:

google.api_core.exceptions.FailedPrecondition: 400 <eye3 title='/ModelService.CreateVersion, FAILED_PRECONDITION'/> APPLICATION_ERROR;google.cloud.ml.v1/ModelService.CreateVersion;Field: model.model_container_spec.image_uri Error: Failed to read the container uri [region-docker.pkg.dev/x9a5871c592b92a33-tp/ucaip-deployed-model-7275556401924014080/mlplatform-serving:latest]. Please make sure that the image exists; Failed to validate image using service account cloud-ml-platform-ucaip@system.gserviceaccount.com;AppErrorCode=9;StartTimeMs=1701235978362;unknown;ResFormat=uncompressed;ServerTimeSec=0.144279109;LogBytes=256;Non-FailFast;EffSecLevel=none;ReqFormat=uncompressed;ReqID=98afc0822b913811;GlobalID=0;Server=[2002:a1c:4b16:0:b0:40b:360e:29e7]:4002 9: <eye3 title='/ModelService.CreateVersion, FAILED_PRECONDITION'/> APPLICATION_ERROR;google.cloud.ml.v1/ModelService.CreateVersion;Field: model.model_container_spec.image_uri Error: Failed to read the container uri [xxxxx-docker.pkg.dev/x9a5871c592b92a33-tp/ucaip-deployed-model-7275556401924014080/mlplatform-serving:latest]. Please make sure that the image exists; Failed to validate image using service account cloud-ml-platform-ucaip@system.gserviceaccount.com;AppErrorCode=9;StartTimeMs=1701235978362;unknown;...
It seems vertex AI takes the image from the artifact and create another repo to put the image but then complained that it is unable to access the image.
Any help would be appreciated

I have run into the exact same problem so I would be very interested if you find any solution to this problem.

The error message indicates that the deployment directory is expected to contain either model.pkl or model.joblib, but it seems to be empty or doesn’t contain one of these files.

To use scikit-learn to train a model, use joblib library to export `model.joblib` file or use Python's pickle module to export `model.pkl`. Please refer to this documentation.

Ensure that after the model training is completed, you save the model with either the name `model.pkl` or `model.joblib`. Double-check the code used for saving the model and the naming conventions. You mentioned the dimension of the model being 4MB, which should not be an issue. However, confirm that the file is fully written and accessible in the GCS bucket. Sometimes, partial uploads or corrupted files might cause issues during deployment.

Ensure that the deployment configuration is pointing to the correct directory in the GCS bucket where the model artifact (model.pkl or model.joblib) is stored. Cloud Storage path must be specified to this: e.g. `gs://BUCKET_NAME/models/`.

Also check the permissions and ensure that the deployment process has access to the GCS bucket.

If you've already gone through these steps and the issue persists, it might be beneficial to reach out to the support provided by the platform or to the Engineering Team. They might have specific insights or solutions tailored to their deployment environment or configurations.