Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Deploy to local endpoint Vertex AI - Health check never succeeds

Hello,

I am trying the deploy_to_local_endpoint function but without success.

The first step is

from google.cloud.aiplatform.prediction import LocalModel
from src_dir.predictor import MyCustomPredictor
import os

local_model = LocalModel.build_cpr_model(
{LOCAL_SOURCE_DIR},
f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{IMAGE}",
predictor=MyCustomPredictor,
requirements_path=os.path.join(LOCAL_SOURCE_DIR, "requirements.txt"),
)

/opt/conda/lib/python3.10/subprocess.py:955: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
  self.stdin = io.open(p2cwrite, 'wb', bufsize)

This step runs successfully although I get the above mentioned warning

The second step fails unfortunately. The request run for 2 minutes and then I get "The health check never succeeds" before even getting into "predict_response" line.

with local_model.deploy_to_local_endpoint(
    artifact_uri
= 'model_artifacts/', # local path to artifacts
) as local_endpoint:
    predict_response
= local_endpoint.predict(
        request_file
='instances.json',
        headers
={"Content-Type": "application/json"},
   
)

    health_check_response
= local_endpoint.run_health_check()

The model_artifacts/ folder has model.pkl file.

Would appreciate your help!

1 REPLY 1

Hi @aela

Thank you for reaching out to the community for help. Let me try to aid you with some pointers.

When using the open() function, its default buffering parameter value is set to 1, enabling line buffering which allows read/write data from a file one line at a time. I suspect that it is conflicting with the deploy_to_local_endpoint function for it will attempt to open your model artifacts in binary mode which does not support line buffering since binary files does not contain newline characters.

As stated in the warning message you received, io.open(p2cwrite, 'wb', bufsize), I suggest you set the buffering value to zero (0) or none as shown below.

 

io.open(p2cwrite, 'wb', buffering=0)

 

Ideally, this should prevent the RuntimeWarning to show up and allow the deploy_to_local_endpoint function to proceed successfully.

Just an additional note, the deploy_to_local_endpoint function may also fail if the model artifact file is not in the correct format, they must be a pickle file that contains the model's serialized parameters and configuration.

Hope this helps.