How to create a relative timeframe from column dat... - Page 2

mprogano · 06-17-2020 03:49 PM

I recently created a Look that seems that it might not be common knowledge but is a possibly common scenario. The ability to set a relative date range that starts with a date within the database and ends some number of days, weeks, or months after. And use that range to do a year-over-year comparison.

Background

We have data that has unique dates per grouping. When creating a Look, the user should be able to set a timeframe window using this date as a “starting date” and the “end date” relative to a certain number of days/weeks/months. Any events that happen within that range would be counted. To take it one step further, we also wanted to do a year-over-year analysis of that range to compare before and after we launched.

Scenario

To make it easier, let’s pretend we are a service that makes riding buses easier. Each the cities already have existing state-run bus services and meaningful ridership data that they’d be willing to share. We are working with the state to launching to new cities. After some time, we’ve rolled out across multiple cities at different times and we want to report back to the state. Let’s limit it to just comparing the total number of riders before and after we launched across each city.

We’d likely create a look that filters for a single state and columns that are as follows: State, City, Launch Date, Total Riders. We wouldn’t be able to filter for a hard set date because it’s all relative to a single row’s date.

Solution

We can use parameters to allow the Look user to set the relative range and apply that directly to the SQL with liquid syntax. It should be noted that in this example, everything SQL is in RedShift.

Setting up the Parameters

parameter: timeframe_rage_number {
    label: "Comparison: Unit Number"
    description: "Number of date units (e.g. 90) that will be added to the start date and create an end date"
    type: number
    default_value: "90"
}
parameter: timeframe_rage_units {
    label: "Comparison: Unit Type"
    description: "Type of date units (e.g. days) that will be added to the start date and create an end date"
    type: string
    allowed_value: { label:"Day(s)" value: "day" }
    allowed_value: { label:"Week(s)" value: "week" }
    allowed_value: { label:"Month(s)" value: "month" }
    allowed_value: { label:"Quarter(s)" value: "quarter" }
    allowed_value: { label:"Year(s)" value: "year" }
    default_value: "day"
}

By default, the timeframe_rage_number parameter is set to 90 and the timeframe_rage_units parameter is set to day. This will be inputted into the SQL to add 90 days to a date range. Within the Look, the Filter section will show these options and allow them to be changed on the fly.

2 - Setup Range & Check if an event is within it

dimension: has_before_events {
    hidden: yes
    type: yesno
    sql: ${event_date} between dateadd(year, -1, ${launch_date}) and dateadd(year, -1, dateadd({% parameter timeframe_rage_units %}, {% parameter timeframe_rage_number %}, ${launch_date}));;
}
dimension: has_after_events {
    hidden: yes
    type: yesno
    sql: ${event_date} between ${launch_date} and dateadd({% parameter timeframe_rage_units %}, {% parameter timeframe_rage_number %}, ${launch_date});;
}

After we’ve set the date ranges, we need to find the rows that have events date that fit within the range for both the before launch range and the after. Also, so it’s noted, I hide these parameters because they’re not used outside of other measures.

Also let’s breakdown the dateadd() function, which is Redshift (SQL) specific.

dateadd(
    {% parameter timeframe_rage_units %},     # default: 90
    {% parameter timeframe_rage_number %},    # default: 'day'
    ${launch_date}
)

Assuming default values, in SQL this will output as dateadd('day', 90, table.launch_date). While it says that a parameter requires a string, Looker is smart enough to convert to a number when appropriate (at least in this scenario).

Additionally, the dateadd(year, -1, ...) will remove a year from the date. So if the date was 2020-06-2019, this function will change it to 2019-06-2019

3 - Outputting a Measure

measure: before_total_rides {
    label: "Before – Total Riders"
    description: "Total number of riders who road within the set timeframe for before launch"
    type: sum
    sql: ${daily_rides} ;;
    filters: [has_before_events: "yes"]
}
measure: after_total_rides {
    label: "After – Total Riders"
    description: "Total number of riders who road within the set timeframe for after launch"
    type: sum
    sql: ${daily_rides} ;;
    filters: [has_after_events: "yes"]
}

Lastly, we want to take the numbers that fit within the range and roll it up. If you’re counting rows, then you’d instead set type: to count or count_dist. In this example, we assume that each row has a number of daily riders already calculated and we’re simply adding them up where data exists from our previous yesno filtering.

Output

+---------+--------------+-------------+-----------------------+----------------------+
|  State  |     City     | Launch Date | Before – Total Riders | After – Total Riders |
+---------+--------------+-------------+-----------------------+----------------------+
| Florida | Miami        | 2019-02-11  |                   153 |                  234 |
| Florida | Orlando      | 2019-02-18  |                   101 |                  150 |
| Florida | Jacksonville | 2019-04-01  |                    89 |                  100 |
+---------+--------------+-------------+-----------------------+----------------------+

I figured I’d be easier to make the numbers up in an ASCII table (sorry). For completeness, here’s what an output might look like with the above LookML.

Conclusion

I initially thought the best way would be to create a filter, but if a filter is not wrapped in a condition that is true, it seems it doesn’t return a value. I then thought a parameter with parameter_name._parameter_value setup, but it seems this only works within liquid syntax (like if statements) and doesn’t output in the SQL. And of course, after some testing, {% parameter parameter_name %} held the key.

I was also successful at setting up a simple user can set number for equations. So if we know that bus rides are typically 80% full and we had an average maximum capacity for the buses. We could to set that up as a parameter that defaults to 80 and set up a measure that has SQL something like this

measure: avg_occupacity {
    type: number
    sql: ({avg_bus_capacitiy} * {% parameter set_default_occupacity_pct %})/100.00;;
    value_format_name: percent_0
}

Where set_default_occupacity_pct is a parameter set up the same as timeframe_rage_number. It’s important to note that you have to multiply by 100.00 not just 100 or you won’t see floating-point numbers.

How to create a relative timeframe from column data and do a YoY comparison with it

Background

Scenario

Solution

Output

Conclusion