Excluding weekends from time differences

LookerExperts · ‎03-27-2022

This content, written by Scott Hoover, was initially posted in Looker Blog on Mar 27, 2014. The content is subject to limited support.

Recently, the Looker Analytics team was posed with a unique challenge by a client: come up with a way to exclude weekends from an analysis that calculates the duration between two times.

The data in question captures the rate at which jobs move from various stages in production. Each row of the table of interest (hereinafter velocity) represents a single job — which, along with other attributes, has an approved and a completed time.

velocity table

1	2013-07-31 04:00:00	2013-08-01 11:00:00	2013-08-04 13:00:00	...	1	1
2	2013-03-07 07:00:00	2013-03-07 10:00:00	2013-03-12 14:00:00	...	4	1
...	...	...	...	...	...	...
n	2013-05-07 09:00:00	2013-05-08 02:00:00	2013-05-16 12:00:00	...	3	2

The first pass at addressing approved-to-complete velocity relied on a simple date difference:

    SELECT
     
    job_id
    
    ,
     
    approved_at
    
    ,
     
    published_at
    
    ,
     
    EXTRACT
    
    (
    
    'EPOCH'
     
    FROM
     
    (
    
    completed_at
     
    -
     
    approved_at
    
    ))
    
    /
    
    3600
     
    AS
     
    velocity_in_hours
    
    FROM
     
    velocity

The problem with this approach is that some jobs start in one week, extend over one or more weekends, and are completed in some subsequent week; however, work is not done on weekends, and the velocity for jobs that extend over one or more weekends is penalized in a sense. This is why Looker was asked to come up with a solution to exclude weekends from the analysis.

Solution

A few approaches were posed, but the final solution relied on a look-up table containing three columns: a date_time field, a day_of_week field, and an integer, hour_seq, indicating the number of hours since the first hour in this table (hereinafter sequence). The table would be used to get a corresponding integer (again, the number of hours since the beginning of the sequence table) for the approved_at and completed_at for each job and then subtract the two, yielding the number of hours between the two times.

The key to the solution was that the integers in the sequence table would not increment on weekend hours. The resulting table would look like this:

sequence table

2008-01-01 00:00:00	2	1
2008-01-01 01:00:00	2	2
2008-01-01 02:00:00	2	3
...	...	...
2008-01-04 21:00:00	5	93
2008-01-04 22:00:00	5	94
2008-01-04 23:00:00	5	95
2008-01-05 00:00:00	6	95
2008-01-05 01:00:00	6	95
2008-01-05 02:00:00	6	95
...	...	...
2008-01-06 23:00:00	0	95
2008-01-07 00:00:00	1	96
2008-01-07 01:00:00	1	97

To achieve this, we needed to start with a table of hours from the beginning of time (or 2008, when the business started) going well into the future. We relied on recursive common table expressions (CTEs) in PostgreSQL to generate a sequence of numbers and then add hours accordingly, but this could achieved in a number of ways for dialects without recursive CTEs.

    WITH
     
    RECURSIVE
     
    seq
    
    (
    
    n
    
    )
     
    AS
     
    (
    
    SELECT
     
    1
    
    UNION
     
    ALL
    
    SELECT
     
    n
     
    +
     
    1
     
    FROM
     
    seq
    
    )
           
    --generate a sequence of numbers
    
    ,
     
    hours
     
    AS
     
    (
    
    SELECT
     
    TIMESTAMP
     
    '2008-01-01 00:00:00'
     
    +
     
    n
     
    *
     
    INTERVAL
     
    '1 hour'
     
    AS
     
    date_time
    
    FROM
     
    seq
    
    LIMIT
     
    87600
         
    )
           
    --generate ten years of hours from 2008-01-01 00:00:00 into the future
    
    SELECT
     
    date_time
    
    ,
     
    EXTRACT
    
    (
    
    dow
     
    FROM
     
    date_time
    
    )
     
    AS
     
    day_of_week
    
    ,
     
    SUM
    
    (
    
    CASE
    
    WHEN
     
    EXTRACT
    
    (
    
    dow
     
    FROM
     
    date_time
    
    )
     
    NOT
     
    IN
     
    (
    
    6
    
    ,
    
    0
    
    )
    
    THEN
     
    1
    
    ELSE
     
    NULL
    
    END
    
    )
     
    OVER
     
    (
    
    ORDER
     
    BY
     
    date_time
    
    )
     
    AS
     
    hour_seq
    
    FROM
     
    hours

The penultimate step was to twice join the sequence table into velocity and map each created_at and completed_at entry to its corresponding hour_seq. The final velocity table looked something like this:

velocity table

1	2013-07-31 04:00:00	2013-08-01 11:00:00	34979	2013-08-04 13:00:00	35015	...	1	1
2	2013-03-07 07:00:00	2013-03-07 10:00:00	32458	2013-03-12 14:00:00	32534	...	4	1
...	...	...	...	...	...	...	...	...
n	2013-05-07 09:00:00	2013-05-08 02:00:00	33506	2013-05-16 12:00:00	33660	...	3	2

With a velocity table that had corresponding hour sequence numbers for each approved and completed time, we could simply difference these two values to see the time spent in production, in hours. This could be easily extended to capture any timeframe.

      SELECT
       
      approved_at
      
      ,
       
      completed_at
      
      ,
       
      completed_int
       
      -
       
      approved_int
       
      AS
       
      duration_in_hours
      
      FROM
       
      velocity

— fin —