Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Batch prediction on custom model

yao
Bronze 1
Bronze 1

Hi,

I used custom containers for training and prediction to create a model on Vertex AI. Now I want to run batch prediction against it but get error message that says "Unable to start batch prediction job due to the following error: A model using a third-party image must specify PredictRoute and HealthRoute in ContainerSpec."

I checked documentation, AIP_HEALTH_ROUTE = /v1/endpoints/ENDPOINT/deployedModels/DEPLOYED_MODEL

Does this mean that the model has to be deployed to an endpoint in order to generate the value of the AIP_ENDPOINT_ID variable?

However, the documentation “Get batch predictions” says “Requesting a batch prediction is an asynchronous request (as opposed to online prediction, which is a synchronous request). You request batch predictions directly from the model resource; you don't need to deploy the model to an endpoint.

I am confused whether in my situation, the model has to be deployed first. Also, is there any resources regarding hosting custom models for batch predictions?

7 REPLIES 7

If you are using a custom container, you can read this information about how to use a custom container for prediction.

About your confusion, if you are using an API to create batch prediction, you need to send the request to a  service endpoint.

 “To create batch predictions, we recommend that you select input and output locations that are in the same region as your model. If you use the API to create batch predictions, send requests to a service endpoint (such as https://us-central1-aiplatform.googleapis.com) that is in the same region or geographically close to your input and output locations.”

Thanks for the reply. The custom container link you shared is about using a custom container for (online) prediction. Now my confusion is that, if I only want the model to be trained to serve batch predictions rather than online predictions, do I still need a custom prediction container. Would a deployed model with only training container suffice?

You can upload your models in two ways:

1. With a pre-build container (supported TensorFlow, XGBoost, scikit-learn)
2. With a custom container

Both options support batch predictions. With batch predictions, you don't need to deploy your model to an endpoint. Uploading it to Vertex AI is enough. 

A custom container is only needed if you use another ML framework that is not supported with the pre-build containers. Or you need additional logic as part of your prediction like pre or post processing for example.  

I don't know if u have solved your probem but hopefully this helps. I can see how this is confusing, it was the same for me. So batch prediction under the hood is similar to vertex ai endpoint prediction. When you start a batch job a a model endpoint to serve model predictions, and a Dataflow job to fetch the data is created, This is then split it into batches, get predictions from the endpoint, and return the results to GCS or BigQuery. All of this is done in a Google-managed project, so you won’t see the model endpoint or the Dataflow job in your own project. So in the custom container you will need to have your model server code that runs your model. You can build your own model server using flask or Fastapi. Or you can also use custom prediction routines which does all that for you and u can focus only on the model logic. So to answer your question for the predict route and health route u need to mention '/predict' and '/health' or whatever name you are giving your routes. I am also working on this currently. So if I am wrong about anything I have told above pls correct me.

Hi, Ive encounter the same problem with batch prediction here too! have you been able to solve it now? the error showed when running the custom model with custom container is below: 

  • Model server terminated: model server container terminated: go/debugproto exit_code: 127 reason: "Error" started_at { seconds: 1729064186 } finished_at { seconds: 1729064186 } .

I am not sure how to structure my dockerfile to solve the problem yet.

Hi,

Has anyone solved this ?
Does anyone have any sample code for this scenario (containerized custom model) for batch predictions?
I'm also struggling to get predictions out of the model.
Thanks

I also have the same problem. No log is generated and its like a black box:
Model server terminated: model server container terminated: go/debugproto exit_code: 127 reason: 127 reason: "Error" started_at { seconds:...