Scenario
In a related post, I illustrated a pattern that allows for some lightweight linear forecasting of measures using LookML. Naturally, the next question I received was is there is a pattern to consider seasonality? In this post, I will dive a little deeper into a pattern that allows for seasonal forecasting, leveraging SQL and your historical data.
The philosophy for this pattern is as follows: In order to predict sales for a day in the future, we will look at that day’s sales in the prior year. We’ll then multiply the current YTD growth rate to that number to get our forecasted amount. This takes into consideration current growth velocity with seasonality of sales in the prior year. Additionally, this means projections will get stronger each day that new data is collected.
Please note, this article contains the use of Looker’s derived tables.
Goal
Let’s create a sustainable pattern, leveraging SQL and LookML to create seasonal forecasting for any measure, with no ongoing maintenance needed.
Step 1: Create a date table
In order to report a measure on future dates, we must first create a table that creates these future dates (and continues to create). Depending on the dialect, this could be a bit tricky to accomplish. For this step, you can learn more about the intricacies of the pattern by referring to this post. Let’s start with Redshift:
WITH DATE_TABLE AS (
SELECT X.*,
EXTRACT(YEAR FROM DAY) AS YEAR,
EXTRACT(MONTH FROM DAY) AS MONTH
FROM (
SELECT PRODUCTS.BRAND AS BRAND,
DATE(DATEADD('DAY',F.NUMBER,'2013-12-31')) AS DAY
FROM PUBLIC.ORDER_ITEMS
LEFT JOIN PUBLIC.INVENTORY_ITEMS ON ORDER_ITEMS.INVENTORY_ITEM_ID = INVENTORY_ITEMS.ID
LEFT JOIN PUBLIC.PRODUCTS ON INVENTORY_ITEMS.PRODUCT_ID = PRODUCTS.ID
CROSS JOIN (SELECT
ROW_NUMBER() OVER () as number
FROM
(SELECT 0 as n UNION SELECT 1) p0,
(SELECT 0 as n UNION SELECT 1) p1,
(SELECT 0 as n UNION SELECT 1) p2,
(SELECT 0 as n UNION SELECT 1) p3,
(SELECT 0 as n UNION SELECT 1) p4,
(SELECT 0 as n UNION SELECT 1) p6,
(SELECT 0 as n UNION SELECT 1) p7,
(SELECT 0 as n UNION SELECT 1) p8,
(SELECT 0 as n UNION SELECT 1) p9,
(SELECT 0 as n UNION SELECT 1) p10,
(SELECT 0 as n UNION SELECT 1) p11,
(SELECT 0 as n UNION SELECT 1) p12) F
GROUP BY 1,2
ORDER BY 1,2) X)
A few notes:
Step 2: Find historical data at the daily level
According to the premise of this pattern stated above, we will need to first aggregate our sales data (or any measure you’re forecasting) at the daily level prior to applying our growth rate to project future sales. I’ll call this our DAILY_HISTORY table:
DAILY_HISTORY AS
(
SELECT D.*,
SUM(REVENUE) OVER(PARTITION BY BRAND,YEAR ORDER BY DAY ROWS UNBOUNDED PRECEDING) AS YTD_REVENUE
FROM (
SELECT PRODUCTS.BRAND AS BRAND,
EXTRACT(YEAR FROM ORDER_ITEMS.CREATED_AT) AS YEAR,
DATE(ORDER_ITEMS.CREATED_AT) AS DAY,
SUM(SALE_PRICE) AS REVENUE
FROM PUBLIC.ORDER_ITEMS
LEFT JOIN PUBLIC.INVENTORY_ITEMS ON ORDER_ITEMS.INVENTORY_ITEM_ID = INVENTORY_ITEMS.ID
LEFT JOIN PUBLIC.PRODUCTS ON INVENTORY_ITEMS.PRODUCT_ID = PRODUCTS.ID
GROUP BY 1,2,3) D)
A few notes:
Step 3: Bringing together our CTE’s and applying forecasting logic
To this point, we have a table that consists of each brand and each day in the future. We also have a table that consists of each brand and the total sales for each day in the past (along with our rolling YTD sales for each of those dates). It is now time to bring these two tables and apply our forecasting logic. To reiterate, the logic is as follows:
Forecasted Sales 1/12/2019 (Future Date) =
Total Sales 1/12/2018 * (1.0+(YTD Sales 1/12/2019 - YTD Sales 1/12/2018) / YTD Sales 1/12/2018)
Here’s the SQL:
SELECT ROW_NUMBER() OVER (ORDER BY DAY) AS PK,
BRAND,
YEAR,
DAY,
CASE WHEN DAY < CURRENT_DATE THEN REVENUE
ELSE LAST_REVENUE*(1+LAST_VALUE(PERCENT_CHANGE IGNORE NULLS) OVER (PARTITION BY BRAND ORDER BY DAY ROWS UNBOUNDED PRECEDING))
END AS REVENUE,
CASE WHEN DAY >= CURRENT_DATE THEN 'Yes' ELSE 'No' END AS PROJECTED
FROM (
SELECT Y.*,
1.0*(YTD_REVENUE - LAST_YTD_REVENUE)/NULLIF(LAST_YTD_REVENUE,0) AS PERCENT_CHANGE
FROM (
SELECT DT.BRAND AS BRAND,
DT.YEAR AS YEAR,
DT.DAY AS DAY,
DH.REVENUE AS REVENUE,
DH.YTD_REVENUE AS YTD_REVENUE,
LAG(DH.REVENUE,CASE WHEN MOD(DT.YEAR,4) = 0 AND DT.MONTH > 2 THEN 366 ELSE 365 END) OVER(PARTITION BY DT.BRAND ORDER BY DT.DAY) AS LAST_REVENUE,
LAG(DH.YTD_REVENUE,CASE WHEN MOD(DT.YEAR,4) = 0 AND DT.MONTH > 2 THEN 366 ELSE 365 END) OVER(PARTITION BY DT.BRAND ORDER BY DT.DAY) AS LAST_YTD_REVENUE
FROM DATE_TABLE DT
LEFT JOIN DAILY_HISTORY DH ON DATE(DT.DAY) = DATE(DH.DAY) AND DT.BRAND = DH.BRAND
ORDER BY 1,2,3) Y) YY
A few notes:
CASE WHEN DAY < CURRENT_DATE THEN REVENUE ELSE LAST_REVENUE*(1+LAST_VALUE(PERCENT_CHANGE IGNORE NULLS) OVER (PARTITION BY BRAND ORDER BY DAY ROWS UNBOUNDED PRECEDING)) END AS REVENUE
This essentially will fill in all future dates with our forecasted sales logic for that day. Therefore, you can always predict one year in the future of your most recent sales date. This will leave us with a table that includes brand, date and revenue, with historical dates containing actual revenue and future dates containing projected revenue.
Here’s the full query in multiple dialects:
Redshift:
WITH DATE_TABLE AS (
SELECT X.*,
EXTRACT(YEAR FROM DAY) AS YEAR,
EXTRACT(MONTH FROM DAY) AS MONTH
FROM (
SELECT PRODUCTS.BRAND AS BRAND,
DATE(DATEADD('DAY',F.NUMBER,'2013-12-31')) AS DAY
FROM PUBLIC.ORDER_ITEMS
LEFT JOIN PUBLIC.INVENTORY_ITEMS ON ORDER_ITEMS.INVENTORY_ITEM_ID = INVENTORY_ITEMS.ID
LEFT JOIN PUBLIC.PRODUCTS ON INVENTORY_ITEMS.PRODUCT_ID = PRODUCTS.ID
CROSS JOIN (SELECT
ROW_NUMBER() OVER () as number
FROM
(SELECT 0 as n UNION SELECT 1) p0,
(SELECT 0 as n UNION SELECT 1) p1,
(SELECT 0 as n UNION SELECT 1) p2,
(SELECT 0 as n UNION SELECT 1) p3,
(SELECT 0 as n UNION SELECT 1) p4,
(SELECT 0 as n UNION SELECT 1) p6,
(SELECT 0 as n UNION SELECT 1) p7,
(SELECT 0 as n UNION SELECT 1) p8,
(SELECT 0 as n UNION SELECT 1) p9,
(SELECT 0 as n UNION SELECT 1) p10,
(SELECT 0 as n UNION SELECT 1) p11,
(SELECT 0 as n UNION SELECT 1) p12) F
GROUP BY 1,2
ORDER BY 1,2) X),
DAILY_HISTORY AS
(
SELECT D.*,
SUM(REVENUE) OVER(PARTITION BY BRAND,YEAR ORDER BY DAY ROWS UNBOUNDED PRECEDING) AS YTD_REVENUE
FROM (
SELECT PRODUCTS.BRAND AS BRAND,
EXTRACT(YEAR FROM ORDER_ITEMS.CREATED_AT) AS YEAR,
DATE(ORDER_ITEMS.CREATED_AT) AS DAY,
SUM(SALE_PRICE) AS REVENUE
FROM PUBLIC.ORDER_ITEMS
LEFT JOIN PUBLIC.INVENTORY_ITEMS ON ORDER_ITEMS.INVENTORY_ITEM_ID = INVENTORY_ITEMS.ID
LEFT JOIN PUBLIC.PRODUCTS ON INVENTORY_ITEMS.PRODUCT_ID = PRODUCTS.ID
GROUP BY 1,2,3) D)
SELECT ROW_NUMBER() OVER (ORDER BY DAY) AS PK,
BRAND,
YEAR,
DAY,
CASE WHEN DAY < CURRENT_DATE THEN REVENUE
ELSE LAST_REVENUE*(1+LAST_VALUE(PERCENT_CHANGE IGNORE NULLS) OVER (PARTITION BY BRAND ORDER BY DAY ROWS UNBOUNDED PRECEDING))
END AS REVENUE,
CASE WHEN DAY >= CURRENT_DATE THEN 'Yes' ELSE 'No' END AS PROJECTED
FROM (
SELECT Y.*,
1.0*(YTD_REVENUE - LAST_YTD_REVENUE)/NULLIF(LAST_YTD_REVENUE,0) AS PERCENT_CHANGE
FROM (
SELECT DT.BRAND AS BRAND,
DT.YEAR AS YEAR,
DT.DAY AS DAY,
DH.REVENUE AS REVENUE,
DH.YTD_REVENUE AS YTD_REVENUE,
LAG(DH.REVENUE,CASE WHEN MOD(DT.YEAR,4) = 0 AND DT.MONTH > 2 THEN 366 ELSE 365 END) OVER(PARTITION BY DT.BRAND ORDER BY DT.DAY) AS LAST_REVENUE,
LAG(DH.YTD_REVENUE,CASE WHEN MOD(DT.YEAR,4) = 0 AND DT.MONTH > 2 THEN 366 ELSE 365 END) OVER(PARTITION BY DT.BRAND ORDER BY DT.DAY) AS LAST_YTD_REVENUE
FROM DATE_TABLE DT
LEFT JOIN DAILY_HISTORY DH ON DATE(DT.DAY) = DATE(DH.DAY) AND DT.BRAND = DH.BRAND
ORDER BY 1,2,3) Y) YY
Snowflake:
WITH DATE_TABLE AS (
SELECT X.*,
EXTRACT(YEAR FROM DAY) AS YEAR,
EXTRACT(MONTH FROM DAY) AS MONTH
FROM (
SELECT PRODUCTS.BRAND AS BRAND,
TO_DATE(DATEADD('DAY',F.NUMBER,'2014-01-01')) AS DAY
FROM ECOMM.ORDER_ITEMS
LEFT JOIN ECOMM.INVENTORY_ITEMS ON ORDER_ITEMS.INVENTORY_ITEM_ID = INVENTORY_ITEMS.ID
LEFT JOIN ECOMM.PRODUCTS ON INVENTORY_ITEMS.PRODUCT_ID = PRODUCTS.ID
CROSS JOIN (SELECT SEQ8() AS NUMBER FROM TABLE(GENERATOR(ROWCOUNT => 3650))) F
GROUP BY 1,2
ORDER BY 1,2)
X),
DAILY_HISTORY AS
(
SELECT D.*,
SUM(REVENUE) OVER(PARTITION BY BRAND,YEAR ORDER BY DAY ROWS UNBOUNDED PRECEDING) AS YTD_REVENUE
FROM (
SELECT PRODUCTS.BRAND AS BRAND,
EXTRACT(YEAR FROM ORDER_ITEMS.CREATED_AT) AS YEAR,
TO_DATE(ORDER_ITEMS.CREATED_AT) AS DAY,
SUM(SALE_PRICE) AS REVENUE
FROM ECOMM.ORDER_ITEMS
LEFT JOIN ECOMM.INVENTORY_ITEMS ON ORDER_ITEMS.INVENTORY_ITEM_ID = INVENTORY_ITEMS.ID
LEFT JOIN ECOMM.PRODUCTS ON INVENTORY_ITEMS.PRODUCT_ID = PRODUCTS.ID
GROUP BY 1,2,3) D)
SELECT ROW_NUMBER() OVER (ORDER BY DAY) AS PK,
BRAND,
YEAR,
DAY,
CASE WHEN DAY < CURRENT_DATE() THEN REVENUE
ELSE LAST_REVENUE*(1+LAST_VALUE(PERCENT_CHANGE IGNORE NULLS) OVER (PARTITION BY BRAND ORDER BY DAY ROWS UNBOUNDED PRECEDING))
END AS REVENUE,
CASE WHEN DAY >= CURRENT_DATE() THEN 'Yes' ELSE 'No' END AS PROJECTED
FROM (
SELECT Y.*,
1.0*(YTD_REVENUE - LAST_YTD_REVENUE)/NULLIF(LAST_YTD_REVENUE,0) AS PERCENT_CHANGE
FROM (
SELECT DT.BRAND AS BRAND,
DT.YEAR AS YEAR,
DT.DAY AS DAY,
DH.REVENUE AS REVENUE,
DH.YTD_REVENUE AS YTD_REVENUE,
CASE WHEN (MOD(DT.YEAR,4) = 0 AND DT.MONTH > 2)
THEN LAG(DH.REVENUE,366) OVER(PARTITION BY DT.BRAND ORDER BY DT.DAY)
ELSE LAG(DH.REVENUE,365) OVER(PARTITION BY DT.BRAND ORDER BY DT.DAY)
END AS LAST_REVENUE,
CASE WHEN (MOD(DT.YEAR,4) = 0 AND DT.MONTH > 2)
THEN LAG(DH.YTD_REVENUE,366) OVER(PARTITION BY DT.BRAND ORDER BY DT.DAY)
ELSE LAG(DH.YTD_REVENUE,365) OVER(PARTITION BY DT.BRAND ORDER BY DT.DAY)
END AS LAST_YTD_REVENUE
FROM DATE_TABLE DT
LEFT JOIN DAILY_HISTORY DH ON TO_DATE(DT.DAY) = TO_DATE(DH.DAY) AND DT.BRAND = DH.BRAND
ORDER BY 1,2,3) Y) YY
LookML example (Snowflake) using a derived table:
view: projected_revenue {
derived_table: {
sql: WITH DATE_TABLE AS (
SELECT X.*,
EXTRACT(YEAR FROM DAY) AS YEAR,
EXTRACT(MONTH FROM DAY) AS MONTH
FROM (
SELECT PRODUCTS.BRAND AS BRAND,
TO_DATE(DATEADD('DAY',F.NUMBER,'2014-01-01')) AS DAY
FROM ECOMM.ORDER_ITEMS
LEFT JOIN ECOMM.INVENTORY_ITEMS ON ORDER_ITEMS.INVENTORY_ITEM_ID = INVENTORY_ITEMS.ID
LEFT JOIN ECOMM.PRODUCTS ON INVENTORY_ITEMS.PRODUCT_ID = PRODUCTS.ID
CROSS JOIN (SELECT SEQ8() AS NUMBER FROM TABLE(GENERATOR(ROWCOUNT => 3650))) F
GROUP BY 1,2
ORDER BY 1,2)
X),
DAILY_HISTORY AS
(
SELECT D.*,
SUM(REVENUE) OVER(PARTITION BY BRAND,YEAR ORDER BY DAY ROWS UNBOUNDED PRECEDING) AS YTD_REVENUE
FROM (
SELECT PRODUCTS.BRAND AS BRAND,
EXTRACT(YEAR FROM ORDER_ITEMS.CREATED_AT) AS YEAR,
TO_DATE(ORDER_ITEMS.CREATED_AT) AS DAY,
SUM(SALE_PRICE) AS REVENUE
FROM ECOMM.ORDER_ITEMS
LEFT JOIN ECOMM.INVENTORY_ITEMS ON ORDER_ITEMS.INVENTORY_ITEM_ID = INVENTORY_ITEMS.ID
LEFT JOIN ECOMM.PRODUCTS ON INVENTORY_ITEMS.PRODUCT_ID = PRODUCTS.ID
GROUP BY 1,2,3) D)
SELECT ROW_NUMBER() OVER (ORDER BY DAY) AS PK,
BRAND,
YEAR,
DAY,
CASE WHEN DAY < CURRENT_DATE() THEN REVENUE
ELSE LAST_REVENUE*(1+LAST_VALUE(PERCENT_CHANGE IGNORE NULLS) OVER (PARTITION BY BRAND ORDER BY DAY ROWS UNBOUNDED PRECEDING))
END AS REVENUE,
CASE WHEN DAY >= CURRENT_DATE() THEN 'Yes' ELSE 'No' END AS PROJECTED
FROM (
SELECT Y.*,
1.0*(YTD_REVENUE - LAST_YTD_REVENUE)/NULLIF(LAST_YTD_REVENUE,0) AS PERCENT_CHANGE
FROM (
SELECT DT.BRAND AS BRAND,
DT.YEAR AS YEAR,
DT.DAY AS DAY,
DH.REVENUE AS REVENUE,
DH.YTD_REVENUE AS YTD_REVENUE,
CASE WHEN (MOD(DT.YEAR,4) = 0 AND DT.MONTH > 2)
THEN LAG(DH.REVENUE,366) OVER(PARTITION BY DT.BRAND ORDER BY DT.DAY)
ELSE LAG(DH.REVENUE,365) OVER(PARTITION BY DT.BRAND ORDER BY DT.DAY)
END AS LAST_REVENUE,
CASE WHEN (MOD(DT.YEAR,4) = 0 AND DT.MONTH > 2)
THEN LAG(DH.YTD_REVENUE,366) OVER(PARTITION BY DT.BRAND ORDER BY DT.DAY)
ELSE LAG(DH.YTD_REVENUE,365) OVER(PARTITION BY DT.BRAND ORDER BY DT.DAY)
END AS LAST_YTD_REVENUE
FROM DATE_TABLE DT
LEFT JOIN DAILY_HISTORY DH ON TO_DATE(DT.DAY) = TO_DATE(DH.DAY) AND DT.BRAND = DH.BRAND
ORDER BY 1,2,3) Y) YY
;;
sql_trigger_value: SELECT current_date() ;;
}
dimension: pk {
primary_key: yes
hidden: yes
type: string
sql: ${TABLE}.pk ;;
}
dimension: year {
hidden: yes
type: number
sql: ${TABLE}.year ;;
}
dimension: brand {
hidden: yes
type: string
sql: ${TABLE}.brand ;;
}
dimension_group: day {
label: "Created"
timeframes: [raw,date,day_of_month,week_of_year,month,quarter,year,day_of_year,month_name,month_num]
type: time
sql: TO_TIMESTAMP(${TABLE}.day) ;;
}
dimension: revenue {
type: number
sql: ${TABLE}.revenue ;;
value_format_name: usd
}
measure: total_revenue {
description: "This measure can only be grouped by brand and date formats"
type: sum
sql: ${revenue} ;;
value_format_name: usd_0
drill_fields: [day_month,actual_revenue,projected_revenue,total_revenue]
}
measure: projected_revenue {
description: "This measure can only be grouped by brand and date formats"
type: sum
sql: ${revenue} ;;
filters: {
field: projected
value: "Yes"
}
value_format_name: usd_0
drill_fields: [day_month,total_revenue]
}
measure: actual_revenue {
description: "This measure can only be grouped by brand and date formats"
type: sum
sql: ${revenue} ;;
filters: {
field: projected
value: "No"
}
value_format_name: usd_0
drill_fields: [day_month,total_revenue]
}
dimension: projected {
type: yesno
sql: ${TABLE}.projected = 'Yes' ;;
}
}