Asynchronous HTTP request using Flask for health e...

YashwanthGC · 01-27-2025 05:41 PM

Hi,

I have successfully deployed a custom-trained model on Vertex AI. It includes a /health endpoint for health checks and waits for a response from the request. However, when the process is running, it takes about 2-3 minutes to load the model files, which results in the following JSON response. Despite this, the backend continues to execute and stream the logs until the process is complete.

AIP_HEALTH_ROUTE=os.environ.get('AIP_HEALTH_ROUTE', '/health')
@app.route(AIP_HEALTH_ROUTE, methods=["GET"])
def health():
return jsonify({'health': 'ok'})

{
"error": {
"code": 503,
"message": "Took too long to respond when processing endpoint_id: 175231367141916672, deployed_model_id: 546162609888428032",
"status": "UNAVAILABLE"
}
}

Below message is from the logs.

{
"insertId": "jzha6yg14hht6n",
"jsonPayload": {
"message": "10.0.1.65 - - [28/Jan/2025 01:13:24] \"GET /health HTTP/1.1\" 200 -"
},
"resource": {
"type": "aiplatform.googleapis.com/Endpoint",
"labels": {
"location": "us-west1",
"endpoint_id": "175231367141916672",
"resource_container": "projects/786116219701"
},
"timestamp": "2025-01-28T01:13:24.744819164Z",
"severity": "ERROR",
"labels": {
"replica_id": "predictor-resource-pool-5120759902087675904-75f9dbbb7-c5p9l",
"deployed_model_id": "<546162609888428032423>"
},
"logName": "projects/<project_id>/logs/aiplatform.googleapis.com%2Fprediction_container",
"receiveTimestamp": "2025-01-28T01:13:25.208217740Z"
}

Thanks in advance

MJane

Hi @YashwanthGC,

Welcome to the Google Cloud Community!

The 503 error code you encountered typically indicates that the service is temporarily unavailable. This can happen for several reasons such as server overload , health check interval might be too frequent causing the server to respond with a 503 error or server might be in maintenance mode.

Here are possible workaround that might help you resolve the issue:

Increase Timeout Settings - To ensure that your model has sufficient time to load, you may need to increase the timeout settings for both your health check and model loading within your deployment configuration.
Optimize Model Loading Time - If possible, try to optimize the loading time of your model files. This might involve reducing the size of the model files, improving the efficiency of your loading code, or using more efficient file formats.
Optimize Server Performance: Ensure that your server has enough resources (CPU, memory) to handle incoming requests.

If the issue persists, I suggest contacting Google Cloud Support as they can provide more insights to see if the behavior you've encountered is a known issue or specific to your project.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Asynchronous HTTP request using Flask for health endpoint