Hello,
I am trying to run the following requests:
CREATE OR REPLACE MODEL `energy_generation.data_arima_model_ND`
OPTIONS(
MODEL_TYPE="ARIMA_PLUS",
TIME_SERIES_TIMESTAMP_COL="TIMESTAMP",
TIME_SERIES_DATA_COL="ND"
) AS
SELECT TIMESTAMP, ND FROM `energy_generation.historic_demand_2016`;
SELECT * FROM ML.DETECT_ANOMALIES(
MODEL `energy_generation.data_arima_model_ND`,
STRUCT(0.9 AS anomaly_prob_threshold),
(
SELECT * FROM `energy_generation.historic_demand_2016`
)
)
The values I am getting are all 'null', i.e. is_anomaly, lower_bound, upper_bound, anomaly_probability are all null.
For context, I will eventually be changing the table in ML.DETECT_ANOMALIES to a separate table which is why I am testing this while specifying the table rather than just having the model used.
I have tried adding 'TIME_SERIES_ID_COL' but I am getting the same issue.
I am not sure what I am missing and would appreciate any help!
Solved! Go to Solution.
Thank you for your help! Point 5 was the most useful tip, I had to use `ML.FORECAST` rather than `ML.PREDICT` but when I did that I realised that I could only use future data when doing the anomaly detection on a different dataset! The problem with specifying the same dataset as the one the model was trained on is that this isn't 'future data'.
I find this really interesting! When you don't specify a separate dataset, it does the anomaly detection on the same dataset the model was trained on (not future data), but if you indicate a dataset, you have to use future data. Was this a design choice?