Generic 500 server error when querying XGBoost mod...

r0-0l · 01-22-2025 04:36 AM

Hi there, any help would be massively appreciated.

I have a custom XGBoost model which I trained locally and have loaded into Vertex, I've created an endpoint for that model and have been able to successfully query it both within the interface and with Python code like this:

ENDPOINT_URL = "https://europe-west4-aiplatform.googleapis.com/v1/projects/project-aira-gsc-pipeline/locations/europe-west4/endpoints/{endpoint_id}:predict"
HEADERS = {
"Authorization": f"Bearer {access_token}",
"Content-Type": "application/json"
}

# Manually format input data to match your trained model
payload = {
"instances": [list(map(float, [
2, 0, 5, 0, 0, 0, 0, 4, 0, 0,
1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0
]))]
}

response = requests.post(ENDPOINT_URL, headers=HEADERS, data=json.dumps(payload))
print(response.json())

The response I have got back from that request looks like this:

{ "predictions": [ [ 2.4399355424975511e-07, 0.9999997615814209 ] ], "deployedModelId": "id", "model": "projects/id/locations/europe-west4/models/id", "modelDisplayName": "name", "modelVersionId": "1" }

I've created an external connection in BigQuery and have tried creating the model in BigQuery, which seems to have worked fine

CREATE OR REPLACE MODEL `project_id.dataset_id.model_name`
INPUT (instances ARRAY<FLOAT64>)
OUTPUT (predictions ARRAY<FLOAT64>)
REMOTE WITH CONNECTION `project_id.eu.connection_name`
OPTIONS(endpoint = 'https://europe-west4-aiplatform.googleapis.com/v1/projects/project_id/locations/europe-west4/endpoints/endpoint_id:predict')

However, when I try running simple queries to just test the model (like below)

SELECT *
FROM ML.PREDICT(
MODEL `project_id.dataset_id.model_name`,
(SELECT ARRAY<FLOAT64>[
2, 0, 5, 0, 0, 0, 0, 4, 0, 0,
1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0
] AS instances)
)

I get back a table with an empty "predictions" column, and a "remote_model_status" column which just has the message "INTERNAL error occurred."

From what I can see the logs are loads of "200 OK" with the occasional 500 Internal Server Error, the log about that server error doesn't seem to offer any information aside from the fact that it was an error.

My data is in "EU" which is not an available option for Vertex models and endpoints, but my external connection is also "EU" so I'm assuming that takes care of the fact that things have to be in the same region.

I know XGBoost can also be deployed directly in BigQuery but this has to be done via Vertex. For now we're just building out the pipeline - we're going to do more training and test/update models after we've got it working.

I've made sure that my test in BigQuery has exactly the same numbers (and same quantity of numbers) as my successful Python test.

I can't figure out what I'm missing here - have I got the model definition format wrong? Is there an error in my testing SQL? Help docs and LLMs have me going in circles. Any help would be really appreciated!

Thanks in advance!

caryna

Hi @r0-0l,

Welcome to Google Cloud Community!

The 500 error is usually caused by one of the following:

Temporary server failure such as a network connection problem or a server overload
Could be an internal error occurs within BigQuery

To troubleshoot this, you can review and apply the recommended guidelines stated on the documentation.

The error might also suggest that your query in BigQuery only provides the values of the instances field to Vertex AI. The values should be nested under the instances field. Try to check this function and apply it to your query.

Check also if your BigQuery permissions are sufficient to access the Vertex AI endpoint. Lastly, ensure there is no mismatch between your BigQuery endpoint URL and the URL you configured in Python.

If the issue persists, please contact Google Cloud Support. When reaching out, include detailed information and relevant screenshots of the errors you’ve encountered. This will assist them in diagnosing and resolving your issue more efficiently.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

r0-0l

Hi Caryna, thanks very much for coming back about this!

I am relatively confident it wasn't a temporary server error because I was able to get a successful response using the endpoint in Python in between unsuccessful responses in BigQuery, and I've checked the BigQuery response over a few days.

I think my BigQuery request is wrapping the information in "instances", I've included the deliberately stripped-down "let's just test this" SQL above, am I on the right track there?

Thanks for the note about converting it to JSON, I had a go at implementing that;

SELECT *
FROM ML.PREDICT(
    MODEL `model_location`,
    (SELECT TO_JSON_STRING(ARRAY<FLOAT64>[
        2, 0, 5, 0, 0, 0, 0, 4, 0, 0, 
        1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0
    ]) AS instances))

And I got this error, which suggests to me that the endpoint does just want the array

Invalid table-valued function ML.PREDICT Column instances with type STRING cannot be converted to type ARRAY<FLOAT64> from training implicitly according to the coercion rule: https://cloud.google.com/bigquery/docs/reference/standard-sql/conversion_rules. at [2:6]

I think the permissions and endpoint match should be alright because it would have thrown an error when I ran the "CREATE MODEL" step, right? I know that if I forget to set the permissions before I run that step, it throws an error. Likewise when I've written out the template but haven't yet pasted in the correct URL.

Thanks for your help!

Former Community Member

Were you able to solve this one ? @r0-0l . I am getting in to something similar with XGBoost remote model.

r0-0l

Hi there - I ended up resorting to loading the model directly into BigQuery rather than hosting on Vertex. It requires an older xgboost version but it is way faster and more robust, plus you don't get the "always on" fees of Vertex. Better in almost every metric.

Based on my experience and the people I've spoken to - in future I think I will either be uploading models directly to BigQuery when I can, or I'll be using Python or something to manage the Vertex queries.

Generic 500 server error when querying XGBoost model deployed on Vertex via BigQuery