I recently created a Look that seems that it might not be common knowledge but is a possibly common scenario. The ability to set a relative date range that starts with a date within the database and ends some number of days, weeks, or months after. And use that range to do a year-over-year comparison.
We have data that has unique dates per grouping. When creating a Look, the user should be able to set a timeframe window using this date as a “starting date” and the “end date” relative to a certain number of days/weeks/months. Any events that happen within that range would be counted. To take it one step further, we also wanted to do a year-over-year analysis of that range to compare before and after we launched.
To make it easier, let’s pretend we are a service that makes riding buses easier. Each the cities already have existing state-run bus services and meaningful ridership data that they’d be willing to share. We are working with the state to launching to new cities. After some time, we’ve rolled out across multiple cities at different times and we want to report back to the state. Let’s limit it to just comparing the total number of riders before and after we launched across each city.
We’d likely create a look that filters for a single state and columns that are as follows: State, City, Launch Date, Total Riders. We wouldn’t be able to filter for a hard set date because it’s all relative to a single row’s date.
We can use parameters to allow the Look user to set the relative range and apply that directly to the SQL with liquid syntax. It should be noted that in this example, everything SQL is in RedShift.
Setting up the Parameters
parameter: timeframe_rage_number {
label: "Comparison: Unit Number"
description: "Number of date units (e.g. 90) that will be added to the start date and create an end date"
type: number
default_value: "90"
}
parameter: timeframe_rage_units {
label: "Comparison: Unit Type"
description: "Type of date units (e.g. days) that will be added to the start date and create an end date"
type: string
allowed_value: { label:"Day(s)" value: "day" }
allowed_value: { label:"Week(s)" value: "week" }
allowed_value: { label:"Month(s)" value: "month" }
allowed_value: { label:"Quarter(s)" value: "quarter" }
allowed_value: { label:"Year(s)" value: "year" }
default_value: "day"
}
By default, the timeframe_rage_number
parameter is set to 90
and the timeframe_rage_units
parameter is set to day
. This will be inputted into the SQL to add 90 days to a date range. Within the Look, the Filter section will show these options and allow them to be changed on the fly.
2 - Setup Range & Check if an event is within it
dimension: has_before_events {
hidden: yes
type: yesno
sql: ${event_date} between dateadd(year, -1, ${launch_date}) and dateadd(year, -1, dateadd({% parameter timeframe_rage_units %}, {% parameter timeframe_rage_number %}, ${launch_date}));;
}
dimension: has_after_events {
hidden: yes
type: yesno
sql: ${event_date} between ${launch_date} and dateadd({% parameter timeframe_rage_units %}, {% parameter timeframe_rage_number %}, ${launch_date});;
}
After we’ve set the date ranges, we need to find the rows that have events date that fit within the range for both the before launch range and the after. Also, so it’s noted, I hide these parameters because they’re not used outside of other measures.
Also let’s breakdown the dateadd()
function, which is Redshift (SQL) specific.
dateadd(
{% parameter timeframe_rage_units %}, # default: 90
{% parameter timeframe_rage_number %}, # default: 'day'
${launch_date}
)
Assuming default values, in SQL this will output as dateadd('day', 90, table.launch_date)
. While it says that a parameter requires a string, Looker is smart enough to convert to a number when appropriate (at least in this scenario).
Additionally, the dateadd(year, -1, ...)
will remove a year from the date. So if the date was 2020-06-2019
, this function will change it to 2019-06-2019
3 - Outputting a Measure
measure: before_total_rides {
label: "Before – Total Riders"
description: "Total number of riders who road within the set timeframe for before launch"
type: sum
sql: ${daily_rides} ;;
filters: [has_before_events: "yes"]
}
measure: after_total_rides {
label: "After – Total Riders"
description: "Total number of riders who road within the set timeframe for after launch"
type: sum
sql: ${daily_rides} ;;
filters: [has_after_events: "yes"]
}
Lastly, we want to take the numbers that fit within the range and roll it up. If you’re counting rows, then you’d instead set type:
to count
or count_dist
. In this example, we assume that each row has a number of daily riders already calculated and we’re simply adding them up where data exists from our previous yesno
filtering.
+---------+--------------+-------------+-----------------------+----------------------+
| State | City | Launch Date | Before – Total Riders | After – Total Riders |
+---------+--------------+-------------+-----------------------+----------------------+
| Florida | Miami | 2019-02-11 | 153 | 234 |
| Florida | Orlando | 2019-02-18 | 101 | 150 |
| Florida | Jacksonville | 2019-04-01 | 89 | 100 |
+---------+--------------+-------------+-----------------------+----------------------+
I figured I’d be easier to make the numbers up in an ASCII table (sorry). For completeness, here’s what an output might look like with the above LookML.
I initially thought the best way would be to create a filter, but if a filter is not wrapped in a condition that is true, it seems it doesn’t return a value. I then thought a parameter with parameter_name._parameter_value
setup, but it seems this only works within liquid syntax (like if statements) and doesn’t output in the SQL. And of course, after some testing, {% parameter parameter_name %}
held the key.
I was also successful at setting up a simple user can set number for equations. So if we know that bus rides are typically 80% full and we had an average maximum capacity for the buses. We could to set that up as a parameter that defaults to 80
and set up a measure that has SQL something like this
measure: avg_occupacity {
type: number
sql: ({avg_bus_capacitiy} * {% parameter set_default_occupacity_pct %})/100.00;;
value_format_name: percent_0
}
Where set_default_occupacity_pct
is a parameter set up the same as timeframe_rage_number
. It’s important to note that you have to multiply by 100.00
not just 100
or you won’t see floating-point numbers.