BigQuery ML AUTOML_REGRESSOR weird behaviour

milosmilenkovic · 04-09-2024 06:57 AM

I trained an AutoML model about 2 months ago and it worked as expected. Yesterday I wanted to create a new model using the same training data, but the model is not predicting label values as expected. They are on average 3 times lower than in the original model.

Code is very simple:

 CREATE OR REPLACE MODEL `my-project.my_dataset.my_model_v2`
      OPTIONS(model_type='AUTOML_REGRESSOR',
              input_label_cols=['label_column'],
              budget_hours=4)
AS
SELECT
 input_column_1,
 input_column_2, 
 input_column_3,
 input_column_4,
 input_column_5,
 input_column_6,
 label_column
FROM
 `my-project.my_dataset.my_training_table`

Training table is exactly the same as before, but all of sudden predictions of the label column are much lower than in the original model. Literally nothing has changed, except that it was at the beginning of March when I trained the model for the first time. Does someone have any ideas what could be the reason for such a behaviour?

EDIT: Another thing I noticed is that the Compress stage is not showing in the execution graph for the new model.

Before I had:

Validate
Preprocess
Train
Compress
Evaluate

and now "Compress" is missing.

One more thing I noticed when I exported both versions of the model to a GCS bucket is that new model is 30-40% smaller in size than the old model.