Issues getting batch prediction results from model...

ParkDR · 04-13-2024 01:27 PM

I've deployed BioGPT to an endpoint and I'm using it to try to get some text response batch predictions, to no avail.

I am not seeing any logging errors, even though the job failed with every prompt failing. The error message I get is cryptic and just this:

('Post request fails. Cannot get predictions. Error: Exceeded retries: Non-OK result 500 ({\n  "code": 500,\n  "type": "InternalServerException",\n  "message": "Worker died."\n}\n) from server, retry=3, ellapsed=56.66s.', 64)
('Post request fails. Cannot get predictions. Error: Exceeded retries: Non-OK result 503 (no healthy upstream) from server, retry=3, ellapsed=0.01s.', 48656)
('Post request fails. Cannot get predictions. Error: Exceeded retries: Non-OK result 503 (no healthy upstream) from server, retry=3, ellapsed=0.02s.', 1216)
('Post request fails. Cannot get predictions. Error: Exceeded retries: Non-OK result 500 ({\n  "code": 500,\n  "type": "InternalServerException",\n  "message": "Worker died."\n}\n) from server, retry=3, ellapsed=56.62s.', 64)

What is the problem? I've changed the number of samples, the length of the prompts, etc.

sebaterrazas

Of what i understand, the health endpoint is not available while doing a prediction, so if the prediction takes too long, the automatic health checks will fail. Where you able to solve your problem? how?

SuwarnaKale

When deploying and using BioGPT, encountering errors can be frustrating, especially when the error messages are cryptic. Based on the details shared, it appears that the issues stem from server-side problems with your endpoint. The error messages, including "500 Internal Server Error" and "503 Service Unavailable," indicate two main challenges: worker crashes and unavailability of healthy upstream connections.

Issues getting batch prediction results from model deployed from model garden