A daily_usage
table can be useful for summarizing usage per day. But what if there is a day without any usage information? Having missing dates when there was no usage could make analysis misleading or confusing. Thus in this case, we will want to have one row for each date, regardless of whether there is usage information on that date.
This can be done in a derived table using a generated date series as the base table. Lets create a simple daily_usage
derived table that counts total events per day:
- view: daily_active_users_new
derived_table:
sql: |
SELECT general_date.date AS general_date
, COUNT(*) AS event_count
FROM
events
view: daily_active_users_new {
derived_table: {
sql: SELECT general_date.date AS general_date
, COUNT(*) AS event_count
FROM
events
;;
}
}
However, this table may be missing rows for dates without events. To ensure that all dates are included in this daily_usage
table, we can put a generated date series in the FROM
clause, and then join events
to that date series:
Note: the generated date series will vary based on the dialect of your database.
- view: daily_active_users_new
derived_table:
sql: |
SELECT general_date.date AS date
, COUNT(*) AS event_count
FROM (SELECT GENERATE_SERIES('2012-10-12', CURRENT_DATE, '1 day')::DATE AS date) AS general_date
LEFT JOIN events
ON events.created_date = general_date.date
fields:
- dimension_group: event
type: time
timeframes: [time, date, month, year]
sql: ${TABLE}.date
- dimension: event_count
type: number
sql: ${TABLE}.event_count
- view: daily_active_users_new
derived_table:
sql: |
SELECT general_date.date AS date
, COUNT(*) AS event_count
FROM (
SELECT
DATE_ADD('2001-01-01', INTERVAL numbers.number DAY) AS date
--DATEADD(day,number::int,'2001-01-01') AS date ##redshift syntax
FROM
(SELECT
p0.n
+ p1.n*2
+ p2.n * POWER(2,2)
+ p3.n * POWER(2,3)
+ p4.n * POWER(2,4)
+ p5.n * POWER(2,5)
+ p6.n * POWER(2,6)
+ p7.n * POWER(2,7)
as number
FROM
(SELECT 0 as n UNION SELECT 1) p0,
(SELECT 0 as n UNION SELECT 1) p1,
(SELECT 0 as n UNION SELECT 1) p2,
(SELECT 0 as n UNION SELECT 1) p3,
(SELECT 0 as n UNION SELECT 1) p4,
(SELECT 0 as n UNION SELECT 1) p5,
(SELECT 0 as n UNION SELECT 1) p6,
(SELECT 0 as n UNION SELECT 1) p7) as numbers) as general_date
LEFT JOIN events
ON events.created_date = general_date.date
fields:
- dimension_group: event
type: time
timeframes: [time, date, month, year]
sql: ${TABLE}.date
- dimension: event_count
type: number
sql: ${TABLE}.event_count
- view: daily_active_users_new
derived_table:
sql: |
SELECT general_date.date AS date
, COUNT(*) AS event_count
FROM (
SELECT date
FROM (
SELECT curdate() - interval (a.a + (10 * b.a) + (100 * c.a) + (1000 * d.a) + (10000 * e.a)) day as date
FROM (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
CROSS JOIN (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
CROSS JOIN (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
CROSS JOIN (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
CROSS JOIN (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as e
) dates
WHERE date >= '2012-01-01'
ORDER BY date) as general_date
LEFT JOIN events
ON events.created_date = general_date.date
fields:
- dimension_group: event
type: time
timeframes: [time, date, month, year]
sql: ${TABLE}.date
- dimension: event_count
type: number
sql: ${TABLE}.event_count
There are many data dimension generators out there. Look for the the Kimball Group Generator as one example. These date and time dimension generators allow for adding columns for more complex time groupings than what looker provides out of the box.
I just remembered about this generator which is very handy. Remember that one of the things about a date dimension is that it give you the power to do exotic things with grouping. Things like adding a column for third of a day or same day of week last quarter without complicated sql. I highly recommend you read up on the Data Warehouse Toolkit’s Date Dimension sections in Kimbal’s Book. They are the standard.
Just wanted to drop a quick reminder that now Looker can fill in gaps in dates without a date dimension. Thanks @lindsey1!
Hi @Rex1,
Correct, with Looker 4.0+, we have introduced Dimension Fill that will do this for you. More on that here. With Looker 4.2+, Dimension Fill now be a parameter allow_fill
for relevant dimensions. Learn more about that here.
Cheers,
Vince
If you’re doing this using BigQuery, this article from SO may be helpful to create a range of dates.
Skipping straight to the dessert, to get a list of date values between two dates (2015-06-1 until the current date in this example):
SELECT day FROM
UNNEST(GENERATE_DATE_ARRAY(DATE('2015-06-01'), CURRENT_DATE(), INTERVAL 1 DAY)) AS day