Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

ML Detect Anomalies returns null values when using data from a separate table than the model

Hello, 

I am trying to run the following requests:

 

 

 

CREATE OR REPLACE MODEL `energy_generation.data_arima_model_ND` 
OPTIONS(
  MODEL_TYPE="ARIMA_PLUS", 
  TIME_SERIES_TIMESTAMP_COL="TIMESTAMP",
  TIME_SERIES_DATA_COL="ND"
) AS 
SELECT TIMESTAMP, ND FROM `energy_generation.historic_demand_2016`;

SELECT *  FROM ML.DETECT_ANOMALIES(
  MODEL `energy_generation.data_arima_model_ND`,
  STRUCT(0.9 AS anomaly_prob_threshold),
  (
    SELECT * FROM `energy_generation.historic_demand_2016`
  )
) 

 

 

 

The values I am getting are all 'null', i.e. is_anomaly, lower_bound, upper_bound, anomaly_probability are all null.

For context, I will eventually be changing the table in ML.DETECT_ANOMALIES to a separate table which is why I am testing this while specifying the table rather than just having the model used.

I have tried adding 'TIME_SERIES_ID_COL' but I am getting the same issue. 

I am not sure what I am missing and would appreciate any help!

Solved Solved
1 3 3,765
1 ACCEPTED SOLUTION

Thank you for your help! Point 5 was the most useful tip, I had to use `ML.FORECAST` rather than `ML.PREDICT` but when I did that I realised that I could only use future data when doing the anomaly detection on a different dataset! The problem with specifying the same dataset as the one the model was trained on is that this isn't 'future data'.

I find this really interesting! When you don't specify a separate dataset, it does the anomaly detection on the same dataset the model was trained on (not future data), but if you indicate a dataset, you have to use future data. Was this a design choice? 

View solution in original post

3 REPLIES 3