[Analytic Block] Retention Analysis

 

Looker has created this block to make analyzing data easier and more efficient. This Data Block is made available on an “as is” basis, meaning there will not be updates moving forward.

About This Block

One of the most common requests we receive at Looker is for a better way to perform cohort analysis. Charts like these have become commonplace in modern data applications:

Retention Analysis Image 1

From gaming apps to retail stores, countless businesses want to understand how well they are retaining various cohorts of customers and how to effectively leverage that understanding. This means that countless businesses have also felt the pain of having to write some very complex queries.

Retention analysis works with several different data structures. The pattern below utilizes data that contains a users table and a transactions table.

Try It Yourself

It’s difficult to analyze cohort-based retention when we use traditional SQL, because we want to simultaneously see the number of users in a cohort (time-invariant) and the number of users in that cohort with activity in a given month (time-variant). So, we want to group by two different months: signup_month and activity_month. But, when we group by activity_month, we want to make sure we don’t exclude users without any activity that month. Nasty stuff.

Traditionally, the solution would be to calculate each number separately with multiple queries and then stitch the results together in Excel. But that approach isn’t flexible: It doesn’t allow us to slice and dice by all the other user attributes in the database (among other applications), and we certainly don’t get to drill in to the individual users. Seems like a waste of all that data!

Lookering for a Better Way

The trick is using derived tables to define one table with a large join. Remember, a derived table is not designed to provide an answer to an actual business question; instead, it transforms the underlying data so that, when users start exploring, they see a more meaningful result.

For example, we want our users to be able to explore in Looker to answer a question like this one: “For users acquired from paid search versus organic traffic, what percentage of users are still active after three months, and have those percentages been getting better or worse over time?”

To achieve this, we need the underlying data to contain a row for each user_month combination that demonstrates each user’s behavior in each month — including months with no user activity at alland has the ability to join in all relevant user attributes. To visualize this, imagine a gaming company has users in an app.

Table 1: Users

 
user_id signup_date acqsource
1 1/20/15 paid_search
2 2/15/2015 paid_search
3 3/2/2015 organic

Table 2: App Logins

event_id user_id event_date
1 1 1/21/15
2 1 3/4/15
3 1 4/1/15
4 2 5/1/15
5 2 2/15/15
6 2 2/16/15
7 2 2/17/15
8 3 3/5/15
9 3 4/1/15
10 3 5/1/15

Now, we can use the following LookML pseudo-code to transform the data into a derived table, which gives us one row for each user-month combination and tells us about each user’s activity in that month, including months with no activity.

 

Starting in Looker 7.4, the filters parameter syntax has changed. See the filters parameter documentation page to view the new syntax.

view: monthly_activity {

derived_table: {

sql_trigger_value: select current_date ;;

sortkeys: ["signup_month"]

distribution: "user_id"

sql: SELECT



users.id as user_id

, date_trunc ('month', users.created_at) as signup_month

, month_list.purchase_month as purchase_month

, COALESCE(data.monthly_purchases, 0) as monthly_purchases

, COALESCE(data.total_purchase_amount, 0) as monthly_spend

, row_number() over() AS key

FROM

users



LEFT JOIN



(

SELECT

DISTINCT(date_trunc('month', order_items.created_at)) as purchase_month

FROM order_items

) as month_list

ON month_list.purchase_month >= date_trunc ('month', users.created_at) -- your dialect will vary



LEFT JOIN



(

SELECT

o.user_id

, date_trunc('month', o.created_at) as purchase_month

, COUNT(distinct o.id) AS monthly_purchases

, sum(o.sale_price) AS total_purchase_amount



FROM order_items o

GROUP BY 1,2

) as data

ON data.purchase_month = month_list.purchase_month

AND data.user_id = users.id

;;

}



dimension: user_id {

type: number

sql: ${TABLE}.user_id ;;

}



dimension_group: signup {

type: time

timeframes: [month]

sql: ${TABLE}.signup_month ;;

}



dimension_group: purchase {

type: time

timeframes: [month]

sql: ${TABLE}.purchase_month ;;

}



dimension: months_since_signup {

type: number

sql: datediff('month', ${TABLE}.signup_month, ${TABLE}.purchase_month) ;;

}



dimension: monthly_purchases {

type: number

sql: ${TABLE}.monthly_purchases ;;

}



dimension: monthly_spend {

type: number

sql: ${TABLE}.monthly_spend ;;

}



measure: total_users {

type: count_distinct

sql: ${user_id} ;;

drill_fields: [users.id, users.age, users.name, user_order_facts.lifetime_orders]

}



measure: total_active_users {

type: count_distinct

sql: ${user_id} ;;

drill_fields: [users.id, users.age, users.name, user_order_facts.lifetime_orders]



filters: {

field: monthly_purchases

value: ">0"

}

}



measure: percent_of_cohort_active {

type: number

value_format_name: percent_1

sql: 1.0 * ${total_active_users} / nullif(${total_users},0) ;;

drill_fields: [user_id, monthly_purchases, total_amount_spent]

}



measure: total_amount_spent {

type: sum

value_format_name: usd

sql: ${monthly_spend} ;;

drill_fields: [detail*]

}



measure: spend_per_user {

type: number

value_format_name: usd

sql: ${total_amount_spent} / nullif(${total_users},0) ;;

drill_fields: [user_id, monthly_purchases, total_amount_spent]

}



measure: spend_per_active_user {

type: number

value_format_name: usd

sql: ${total_amount_spent} / nullif(${total_active_users},0) ;;

drill_fields: [user_id, total_amount_spent]

}



dimension: key {

type: number

primary_key: yes

hidden: yes

sql: ${TABLE}.key ;;

}



set: detail {

fields: [user_id, signup_month, monthly_purchases, monthly_spend]

}

}

The resulting table would show this:

user_id first_active_month activity_month monthly_visits
1 1/15 1/15 1
1 1/15 2/15 0
1 1/15 3/15 1
1 1/15 4/15 1
1 1/15 5/15 1
2 2/15 2/15 3
2 2/15 3/15 0
2 2/15 4/15 0
2 2/15 5/15 0
3 3/15 3/15 1
3 3/15 3/15 1
3 3/15 3/15 1

Our last step is to define this derived table view as an Explore and join our users table:

explore: user_monthly_activity {

join: users

sql_on: ${user_monthly_activity.user_id} = ${users.id}

relationship: many_to_one

}

Now, when our users select activity_month, signup_month, and percent_of_cohort_still_active, they will see the percentage of user retention by monthly activity cohort, all from a single query. Then, our users can filter or pivot by user-acquisition source, drill in to specific users, and discover trends that may impact how their marketing team chooses to spend user-acquisition dollars.

Retention Analysis Image 2
11 13 8,527
13 REPLIES 13
Top Labels in this Space
Top Solution Authors