Generic 500 server error when querying XGBoost mod... - Page 2

r0-0l · 01-22-2025 04:36 AM

Hi there, any help would be massively appreciated.

I have a custom XGBoost model which I trained locally and have loaded into Vertex, I've created an endpoint for that model and have been able to successfully query it both within the interface and with Python code like this:

ENDPOINT_URL = "https://europe-west4-aiplatform.googleapis.com/v1/projects/project-aira-gsc-pipeline/locations/europe-west4/endpoints/{endpoint_id}:predict"
HEADERS = {
"Authorization": f"Bearer {access_token}",
"Content-Type": "application/json"
}

# Manually format input data to match your trained model
payload = {
"instances": [list(map(float, [
2, 0, 5, 0, 0, 0, 0, 4, 0, 0,
1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0
]))]
}

response = requests.post(ENDPOINT_URL, headers=HEADERS, data=json.dumps(payload))
print(response.json())

The response I have got back from that request looks like this:

{ "predictions": [ [ 2.4399355424975511e-07, 0.9999997615814209 ] ], "deployedModelId": "id", "model": "projects/id/locations/europe-west4/models/id", "modelDisplayName": "name", "modelVersionId": "1" }

I've created an external connection in BigQuery and have tried creating the model in BigQuery, which seems to have worked fine

CREATE OR REPLACE MODEL `project_id.dataset_id.model_name`
INPUT (instances ARRAY<FLOAT64>)
OUTPUT (predictions ARRAY<FLOAT64>)
REMOTE WITH CONNECTION `project_id.eu.connection_name`
OPTIONS(endpoint = 'https://europe-west4-aiplatform.googleapis.com/v1/projects/project_id/locations/europe-west4/endpoints/endpoint_id:predict')

However, when I try running simple queries to just test the model (like below)

SELECT *
FROM ML.PREDICT(
MODEL `project_id.dataset_id.model_name`,
(SELECT ARRAY<FLOAT64>[
2, 0, 5, 0, 0, 0, 0, 4, 0, 0,
1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0
] AS instances)
)

I get back a table with an empty "predictions" column, and a "remote_model_status" column which just has the message "INTERNAL error occurred."

From what I can see the logs are loads of "200 OK" with the occasional 500 Internal Server Error, the log about that server error doesn't seem to offer any information aside from the fact that it was an error.

My data is in "EU" which is not an available option for Vertex models and endpoints, but my external connection is also "EU" so I'm assuming that takes care of the fact that things have to be in the same region.

I know XGBoost can also be deployed directly in BigQuery but this has to be done via Vertex. For now we're just building out the pipeline - we're going to do more training and test/update models after we've got it working.

I've made sure that my test in BigQuery has exactly the same numbers (and same quantity of numbers) as my successful Python test.

I can't figure out what I'm missing here - have I got the model definition format wrong? Is there an error in my testing SQL? Help docs and LLMs have me going in circles. Any help would be really appreciated!

Thanks in advance!

Generic 500 server error when querying XGBoost model deployed on Vertex via BigQuery