Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to set multiple series identifier columns on tabular forecast?

Hello,

I tried BigQuery ML's ARIMA+ to predict sales data, but the results were not particularly good.

So, I wanted to try adding weather as a feature to the dataset. This requires the use of Vertex AI Tabular forecast (AutoML).

The dataset looks like this.

date, store, product, total_amount_sold, temperature, is_rainy

When using ARIMA+, multiple columns can be specified by using the following statement.

TIME_SERIES_ID_COL = ['store', 'product']

 How to set multiple series identifier columns on AutoML? Should I consider merging the store and product columns into one column(eg: tokyo_pixel6)?

0 1 1,039
1 REPLY 1

I found this section of the documentation, which might be helpful: 

One of your columns in your training data for a forecasting model must be specified as the time series identifier. Forecasting training data usually includes multiple time series, and the identifier tells Vertex AI which time series a given observation in the training data is part of. All of the rows in a given time series have the same value in the time series identifier column.

Some common time series identifiers might be the product ID, a store ID, or a region. When you have multiple time series in your training data, there should be a specific column that differentiates them.

You can train a forecasting model on a single time series (in other words, the time series identifier column contains the same value for all rows). However, Vertex AI is a better fit for training data that contains two or more time series. For best results, you should have at least 10 time series for every column used to train the model.