Scenario
Predictive analytics and machine learning have been popular buzzwords as of late and can provide tremendous value for organizations. But what about folks who don’t have a data science team or the resources to build out machine learning models? There is still tremendous value in being able to provide lightweight projections or linear forecasts to see how your business may be pacing through the end of the month, quarter, year, etc. Using simple modeling techniques in Looker, we can provide some baseline predictions depending on how we’re performing thus far.
Please note, this article contains the use of Looker Parameters and Liquid. Here’s some additional resources:
The Dimension, Filter and Parameter Types page indicates which LookML type parameter values can be used with parameter fields.
Goal
Let’s create a pattern that allows us to take a look at how we’re forecasting for any measure, for any period of time. Below is an example dashboard to illustrate.
Step 1: Create Dynamic Forecasting Timeframes
The core philosophy is to find how far along that period we are, so we can linearly forecast sales through the end of the period.
First, we need to create a parameter to give us our forecast period options. Then we’ll need to find how far along we are into the selected forecast period. For example, if today is October 18th, and we’ve selected month, we’re 18 days in. For this example, we’ll create week, month, quarter, and year. See below.
parameter: prediction_window {
allowed_value: {
label: "Week"
value: "week"
}
allowed_value: {
label: "Month"
value: "month"
}
allowed_value: {
label: "Quarter"
value: "quarter"
}
allowed_value: {
label: "Year"
value: "year"
}
}
Now that we have our parameter which allows our users to select which time period their interested in forecasting, we’ll need to create a dimension that dynamically finds the first day of that selected period. Here’s our first use of the Liquid Variable Reference:
dimension: first_day {
type: date
sql: date_trunc({% parameter prediction_window %},getdate()) ;;
convert_tz: no
}
Next, we’ll have to find the last day of the period, so we can calculate what percentage of the period is complete, and how much we have left to go. We’ll do this with a dimension and a Liquid IF statement that leverages our prediction window parameter. If you have a more simplistic application of this in mind (for example, your prediction period is always the same), see below for examples without the use of Liquid.
dimension: last_day {
type: date
sql: {% if prediction_window._parameter_value == "'quarter'" %}
last_day(dateadd(month,2,${first_day}))
{% elsif prediction_window._parameter_value == "'month'" %}
last_day(${first_day})
{% elsif prediction_window._parameter_value == "'year'" %}
last_day(dateadd(month,11,${first_day}))
{% elsif prediction_window._parameter_value == "'week'" %}
dateadd(day,6,${first_day})
{% endif %} ;;
convert_tz: no
}
dimension: hours_since_start {
hidden: yes
type: number
sql: datediff(hour,${first_day},getdate()) ;;
}
This will calculate the number of hours into the selected time period. This is important for finding what percentage of the period has been completed, so we can forecast the remainder. It’s important to do this using hours rather than days to prevent 0% completion on the first day of a period (i.e. October 1st).
dimension: hours_until_last_day {
hidden: yes
type: number
sql: datediff(hour,getdate(),${last_day})+24 ;;
}
We add +24 here to account for the current day. Date Diff is going to give us the difference in hours between two dates. For example, SELECT DATEDIFF(hour,'2018-10-01 00:00:00','2018-10-18 00:00:00')
will give us 408 hours, or 17 days. We need to account for “today” or the 18th in this example.
Finally, we’ll bring it together to find the total hours in the period so we can complete the percentage of period calculation as seen below.
dimension: hours_in_period {
hidden: yes
type: number
sql: ${hours_since_start}+${hours_until_last_day} ;;
}
dimension: percentage_of_period_completed {
type: number
sql: 1.0*${hours_since_start}/NULLIF(${hours_in_period},0) ;;
value_format_name: percent_2
}
Step 2: Scale This Pattern Across Multiple Measures
In order to reuse the pattern we’re creating for multiple measures to prevent repeat code writing, we can introduce a metric selector. Below is the syntax to do so. Here, we’re leveraging these example dimensions, but these could be any fields you care about.
Sample Dimensions:
dimension: sale_price {
type: number
sql: ${TABLE}.sale_price ;;
value_format_name: usd
}
dimension: order_id {
type: number
sql: ${TABLE}.order_id ;;
}
dimension: user_id {
type: number
sql: ${TABLE}.user_id ;;
}
Metric Selector:
parameter: metric_selector {
allowed_value: {
label: "Sales Price"
value: "total_sale_price"
}
allowed_value: {
label: "Customers"
value: "customers"
}
allowed_value: {
label: "Orders"
value: "orders"
}
}
dimension: selected_metric {
hidden: yes
type:number
sql:
{% if metric_selector._parameter_value == "'total_sale_price'" %} ${sale_price}
{% elsif metric_selector._parameter_value == "'customers'" %}${user_id}
{% elsif metric_selector._parameter_value == "'orders'" %} ${order_id}
{% endif %} ;;
value_format_name: decimal_0
}
Step 3: Apply Metric Selector to This Periods vs. Projected Period Measures
Now we have our forecast period defined as well as the metric we want. The next step is to calculate the value for the current and projected period. This will result in the following:
Up until this point, we have modeled both filter parameters and all columns except the last two. First, we’ll calculate the value of the measure for period-to-date. We’ll then divide that by the % of the period we’ve completed (i.e. October 15th is 48% through the period, or 15/31 days, 360/744 hours). This will give us the linear forecast for the entire time period. Here’s where it gets a bit tricky, see below for the syntax!
measure: this_period_measure {
type: number
sql:
{% if metric_selector._parameter_value == "'total_sale_price'" %}
SUM
{% elsif metric_selector._parameter_value == "'customers'" %}
COUNT
{% elsif metric_selector._parameter_value == "'orders'" %}
COUNT
{% endif %} (CASE WHEN ${created_raw} >= ${first_day} THEN ${selected_metric} END)
;;
value_format_name: decimal_0
}
You may ask yourself why we’re doing the aggregation of our dimensions (i.e. sale_price) in the “this_period_measure” instead of pointing to a measure that’s already performing the aggregation. We need to refer to dimensions in the first part of our CASE statement: CASE WHEN ${created_raw} >= ${first_day}
. Therefore, we need to group by those dimensions in order to perform the aggregation. For example, CASE WHEN ${created_raw} >= ${first_day} THEN SUM(${sale_price})
will return an error. The above is a way to dynamically wrap the entire CASE statement in whichever type of aggregation is needed for our analysis.
measure: projected_measure {
type: number
sql: 1.0*${this_period_measure}/NULLIF(${percentage_of_period_completed},0) ;;
value_format_name: decimal_0
}
For those of you working with COUNT DISTINCTS, below is the syntax for that.
measure: this_period_measure {
type: number
sql:
{% if metric_selector._parameter_value == "'total_sale_price'" %}
SUM
{% elsif metric_selector._parameter_value == "'customers'" %}
COUNT(DISTINCT
{% elsif metric_selector._parameter_value == "'orders'" %}
COUNT(DISTINCT
{% endif %} (CASE WHEN ${created_raw} >= ${first_day} THEN ${selected_metric} END)
{% if metric_selector._parameter_value == "'customers'" %}
)
{% elsif metric_selector._parameter_value == "'orders'" %}
)
{% endif %}
;;
value_format_name: decimal_0
}
And for the final product:
Solution without using Parameters or Liquid:
### TIME WINDOW DIMENSIONS ###
dimension: first_day {
type: date
sql: date_trunc('quarter',getdate()) ;;
convert_tz: no
}
dimension: last_day {
type: date
sql: last_day(dateadd(month,2,${first_day})) ;;
convert_tz: no
}
dimension: hours_since_start {
type: number
sql: datediff(hour,${first_day},getdate()) ;;
}
dimension: hours_until_last_day {
type: number
sql: datediff(hour,getdate(),${last_day})+24 ;;
}
dimension: hours_in_period {
type: number
sql: ${hours_since_start}+${hours_until_last_day} ;;
}
dimension: percentage_of_period_completed {
type: number
sql: 1.0*${hours_since_start}/NULLIF(${hours_in_period},0) ;;
value_format_name: percent_2
}
### THIS PERIOD AND PROJECTED PERIOD MEASURES ###
measure: this_period_measure {
type: sum
sql:CASE WHEN ${created_raw} >= ${first_day} THEN ${sales_price} END;;
value_format_name: decimal_0
}
measure: projected_measure {
type: number
sql: 1.0*${this_period_measure}/NULLIF(${percentage_of_period_completed},0);;
value_format_name: decimal_0
}