Solved: Re: AutoML forecasting: understanding the rolling ...

lambdalove · 01-31-2023 01:26 PM

I trained a time series forecasting model with AutoML. During training, I checked the option to "Export test dataset to BigQuery." I have a question about how to understand the data that appears in the exported table.

My understanding is that a "predicted_on" timestamp is essentially the first date of the forecast horizon of a rolling forecast window. I see that for each "predicted_on" timestamp, there are 6 timestamps from the test data split of my training data. This suggests that the forecast horizon is 6 weeks long; i.e., for each "predicted_on" date (and starting on that date) it predicts 6 weeks of data.

My question is, where does the number 6 come from? (When I trained the model, I specified that the forecast horizon is 26 weeks, not 6...)

lambdalove

To answer my question:

I'm not sure where the number 6 came from, but I've since discovered the following.

The timestamp format I was using is not among the timestamp formats supported by Google according to this documentation. I changed the format of my timestamps. I also ensured that every number in my target column has a decimal (it was previously a mix of integers and decimal numbers).

After making these changes, I trained a new model and examined the data exported to BigQuery.

Now I see that there are 26 weeks of timestamps from the test data split associated with the first `predicted_on` timestamp. This would suggest that the forecast horizon of the rolling forecast window is 26 weeks long. This is what I would expect, given that I set the forecast horizon to 26 when I trained the model.

View solution in original post

nceniza

Hi, Just wanted to ask if you are currently following any documentation for this task? This for replication purposes.

Thanks!

lambdalove

Hello,
Yes, the documentation I've read which feels most relevant to the question is:

Best practices for tabular forecasting models

Data splits for forecasting

I've also read every article that is linked on the Forecasting overview page.

And I've followed this Build an AutoML Forecasting Model with Vertex AI lab.

nceniza

Thank you for these information, I will attempt and replicate your scenario. Also I assume this is related to : https://www.googlecloudcommunity.com/gc/AI-ML/AutoML-forecast-model-batch-predictions-quot-rows-with... question too. Thanks!

lambdalove

Thanks, I really appreciate the help!

Yes, my other question you linked is related in the sense that both questions are regarding the same forecasting model. (The question you linked is more pressing, since it's about how we're struggling to get any actual predictions from the model - that's our top priority at the moment.)

lambdalove

@nceniza Just checking in to see if you've had a chance to look into this, in particular the issue described in my other question which you linked (i.e., the error when we attempt to get batch predictions: "There are rows with non-empty target values after this row.")

We are really stuck on this error. Although we've invested time and money in training the model, this error is preventing us from getting predictions. We are so close, yet we can't get our project over the finish line. If there is anything you can do to help, we'd really appreciate it. Thanks!

lambdalove

To answer my question:

I'm not sure where the number 6 came from, but I've since discovered the following.

The timestamp format I was using is not among the timestamp formats supported by Google according to this documentation. I changed the format of my timestamps. I also ensured that every number in my target column has a decimal (it was previously a mix of integers and decimal numbers).

After making these changes, I trained a new model and examined the data exported to BigQuery.

Now I see that there are 26 weeks of timestamps from the test data split associated with the first `predicted_on` timestamp. This would suggest that the forecast horizon of the rolling forecast window is 26 weeks long. This is what I would expect, given that I set the forecast horizon to 26 when I trained the model.

AutoML forecasting: understanding the rolling forecast window during model evaluation