Rolling Average Daily Active Users - Redshift with a Cross Join

Note: there is a newer version of this analytical pattern available: [Analytic Block] Daily, Weekly, Monthly Active Users. Check it out for a more detailed walkthrough and additional features!

We derive two tables, a simple date table with one column of just dates and a second table with two columns: activity_date and user_id.

We cross join these two tables to create a table with three columns, user_id, activity_date and window_date. In the destination table, there is a row in the table for every day in the 30 days following a transaction.

Some Notes:

  1. Redshift is generate_series is broken. Hack one using an arbitrary table and ROW_NUMBER()
  2. Create a simple daily_use table of date/user_id for each day a user did something
  3. Cross join the daily_use table with the prior 30 days to get a table that looks like [user_id, active_date, window_date]
  4. This table can get large, so we provide filters to limit the range of values we have to query ({% condition %} expands to the a date range filter against the dates).

Uses the common Ecommerce Views

- include: ecommerce.view.lookml

- view: dates
  derived_table:
    sql: |
      -- ## 1) Create a Date table with a row for each date.
      SELECT '2001-01-01'::DATE + d AS date
      FROM 
        -- generate series should work, but it doesn't in redshift
        -- generate_series(0,20000) d
        (SELECT ROW_NUMBER() OVER(ORDER BY id) -1 AS d FROM orders ORDER BY id LIMIT 20000) AS  d
    sortkeys: [date]
    persist_for: 10 minutes

- explore: use_rolling_30_day_window
  joins:
  - join: users
    foreign_key: user_id
  
- view: use_rolling_30_day_window
  derived_table:
    sql: | 
      WITH daily_use AS (
        -- ## 2 ) Create a table of days and activity by user id
        SELECT 
          user_id
          , DATE_TRUNC('day', created_at) as activity_date
        FROM orders
      )
      --  ## 3) Cross join activity and dates to get a row for each
      --   date in the window.
      SELECT
          daily_use.user_id
          , wd.date as window_date
          , daily_use.activity_date
      FROM daily_use, ${dates.SQL_TABLE_NAME} AS wd 
      WHERE
          wd.date >= daily_use.activity_date
          AND wd.date < daily_use.activity_date + interval '30 day'
          -- ## 4) Filter the range of dates we use so don't have to do all of history
          AND {% condition activity_date %} daily_use.activity_date {% endcondition %}
          AND {% condition window_date %} wd.date {% endcondition %}
  fields:
    - dimension_group: activity_date
      type: date
      sql: ${TABLE}.activity_date
 
    - dimension_group: window_date
      type: date
      sql: ${TABLE}.window_date
      
    - dimension: user_id
      
    - dimension: active_day
      type: yesno
      sql: ${TABLE}.activity_date =  ${TABLE}.window_date
      
    - measure: user_count_active_30_days
      type: count_distinct
      sql: ${user_id}
      detail: [users.id, users.name]
      
    - measure: user_count_active_this_day
      type: count_distinct
      sql: ${user_id}
      detail: [users.id, users.name]
      filters:
        active_day: yes

Detailed Look at date 2014-07-01

Count users as active on this date if they have bought anything in the last 30 days. Raw output from the rolling window table.

Rolling 30 Day Average Count of Users Placing Orders

4 4 1,551
4 REPLIES 4
Top Labels in this Space
Top Solution Authors