Announcements
This site is in read only until July 22 as we migrate to a new platform; refer to this community post for more details.
Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Discrepancies in Confusion Matrix Item Counts and Metrics Between BigQuery ML and Vertex's XGBoost

**Question 1:**
Why does the total number of items in the confusion matrix increase to approximately 200,000 when training an XGBoost model with about 100,000 records using BigQuery ML, whereas it remains around 100,000 with Vertex's AutoML?

**Question 2:**
Why are the recall and precision values lower when training XGBoost on Vertex's notebook compared to BigQuery ML, despite using roughly the same hyperparameters?

Notebook metrics: {'precision': 0.8919952583956949, 'recall': 0.753797304483842, 'f1': 0.8078981867164722, 'loss': 0.014006406471894417}
BigQuery ML metrics: {'precision': 0.6134, 'recall': 0.3719, 'f1': 0.4630, 'loss': 0.0446}

Below is the query for BigQuery ML:
```
CREATE OR REPLACE MODEL `***.model_xg_boost.model_best_params`
OPTIONS(
MODEL_TYPE='BOOSTED_TREE_CLASSIFIER',
BOOSTER_TYPE = 'GBTREE',
LEARN_RATE = 0.01,
MAX_ITERATIONS = 300,
MAX_TREE_DEPTH = 5,
SUBSAMPLE = 0.9,
EARLY_STOP = FALSE,
L2_REG = 0.1,
DATA_SPLIT_METHOD = 'RANDOM',
DATA_SPLIT_EVAL_FRACTION = 0.2,
INPUT_LABEL_COLS = ['reaction']
) AS
SELECT
reaction,
year,
month,
day,
hour,
...
FROM
`***.***.***`;
```

0 1 348