This content, written by Joshua Moskovitz, was initially posted in Looker Blog on Apr 10, 2014. The content is subject to limited support.
At Looker, we want to make it easier for data analysts to service the needs of the data-hungry users in their organizations. We believe too much of their time is spent responding to ad hoc and not enough time is spent building, experimenting, and embellishing a robust model of the business. Worse yet, business users are often starving for data, but are forced to make important decisions without access to data that could guide them in the right direction.
Looker addresses both of these problems with a YAML-based modeling language called LookML. With LookML, you build your business logic, defining your important metrics once and then reusing them throughout a model. That means you can unleash them on your business users to manipulate, iterate, and transform in any way they see fit.
A key difference of LookML is that, unlike older approaches, it combines modeling, transformations, and derivations at the same layer (late-binding modeling). This allows vast amounts of data to be captured in relatively inexpensive databases (mirrored or copied), and then derivations and transformations occur much closer to, or at, query time. The traditional approach is to transform the data as it’s loaded (ETL), whereas LookML allows for transform and derivation on demand (ELT). The result is a very agile data environment where user questions can change and the data environment can better keep up.
Let’s take a look at a simple e-commerce example. I’ll show how we can create a dimension, the Total Cost of Order, which can be reused and built on throughout a single LookML model.
First, a quick primer on a typical e-commerce data model, which will help us answer questions about the buying and selling of items online. In our example, we’ll work with a subset of tables: Orders, Order Items, and Inventory Items. As a business that tracks Orders, it’s probably important for us to determine the distribution of our customers’ orders based on cost. In our current Orders table, we don’t have a field that tells us the cost of an order, because each order contains multiple items of varying costs. So we need to calculate a cost of an order by summing over the sale prices of the items in the order.
Orders table
1 | 2014-04-01 | 5656 |
2 | 2014-04-01 | 7269 |
Order Items table
1 | 2014-04-01 | 5656 | 3 | $12 |
2 | 2014-04-03 | 7263 | 5 | $45 |
Inventory Items table
3 | 2014-04-01 | $8.50 | 2014-04-05 | 5 |
6 | 2014-04-02 | $24.00 | 2014-04-04 | 7 |
Suppose we want to calculate a new dimension for Orders that will determine the Total Cost of an Order. In this case, the field is not stored in our database, but can be calculated from the sale price of order items in the order. The technique we will use here is a , a simple and easy approach. (Looker supports more complicated and powerful mechanisms via derived tables for databases that do not support this feature or when performance becomes a problem. We'll cover derived tables in a future post.)
For any given order, the SQL to calculate the Total Cost of Order is:
SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id
We sum over the sale price associated for each item in a given order, where the Order Items Order_Id field matches with the primary key in the Orders table. In Looker, we’d want to create this dimension in the Orders view, since it’s an attribute of an Order.
- view: Order
fields:
- dimension: total_amount_of_order_usd
type: number
decimals: 2
sql:
(SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
Now let's see our dimension in action:
Looks good, but we’ve got a wide range of order amounts, so it probably makes sense to bucket these values across set intervals. Normally, if we were writing SQL, we’d have to make a CASE WHEN statement for each discrete bucket. Conveniently, LookML has a tier function, so let’s use that.
- dimension: total_amount_of_order_usd_tier
type: tier
sql: ${total_amount_of_order_usd}
tiers: [0,10,5,150,500,1000]
Notice that we can reference our existing Total Amount of Order dimension in the ‘sql:’ parameter of the measure. Now when we use the tier, we bucket Orders into their respective tiers:
What if we wanted to know more about our Order, maybe the profit? To determine the profit of an order, we will need a Total Cost of Order dimension.
- dimension: total_cost_of_order
type: number
decimals: 2
sql:|
(SELECT SUM(inventory_items.cost)
FROM order_items
LEFT JOIN inventory_items ON order_items.inventory_item_id =
inventory_items.id
WHERE order_items.order_id = orders.id)
When using this Order Profit, Looker will substitute the existing business logic for both the Total Order Amount and Total Cost of Order. Let’s run a new query using the Order Profit dimension.
Another valuable metric for our e-commerce business may be Profit Per User. In Looker we can reference dimensions or measures from other views. In this case, to determine the Profit Per User, we’ll reference our Count measure from the Users view as the denominator of a measure in the Orders view, where the numerator is our Order Profit dimension. We use the Count measure from the Users view to scope the count with ‘users.’
- measure: profit_per_user
type: number
decimals: 2
sql: 100.0 *${order_profit}/NULLIF(${users.count},0)
Now we can see how our Profit Per User varies by every Order dimension. In this case, we see how it varies by Order Date:
What if we wanted a measure that computes the Average Total Order Amount whenever we group by a dimension in Looker? For instance, we might group by average total order amount in a certain Month, by orders from users in a certain State, or by the Lifetime Number of Orders of a user. When we create a measure in Looker, we can reuse it in many different contexts.
Let's first build our Average Total Cost of Order measure.
- measure: average_total_amount_of_order_usd
type: average
sql: ${total_amount_of_order_usd}
decimals: 2
Again, we can reference our already existing Total Order Amount dimension and set the dimension type as an average. Now when we use this dimension, it will aggregate over all total order amounts within that group, calculating the average.
Here we see how our Average Total Order Amount varies by the Lifetime Number Orders of Customers and by the Week the order was created:
We can also create measures that calculate Total Order Amounts based on conditions of the order, such as whether it was a customer’s first purchase or if it’s made by a return customer. This way, we can determine how much revenue was generated from new or returning customers. It’s likely we have discrete teams focused on new user acquisition and on current user retention, so it may be important we break these revenues apart.
- measure: total_first_purchase_revenue
type: sum
sql: ${total_amount_of_order_usd}
decimals: 2
filters:
is_first_purchase: yes
- measure: total_returning_shopper_revenue
type: sum
sql: ${total_amount_of_order_usd}
decimals: 2
filters:
is_first_purchase: no
Again, both of these measures — Total First Purchase Revenue and Total Returning Shopper Revenue — take advantage of our existing Total Amount of Order dimensions. We can now directly compare both types of revenue.
Given the dimensions and measures we’ve just created, let’s build a report that shows us Total Returning Shopping Revenue, Total First Purchase Revenue, Average Total Amount of Order, and Average Order Profit, broken out by the Total Amount of Order tiered and the Week in which the order was created.
There we go.
Now let’s compare this with the SQL we’d have to write to generate such a result set.
SELECT COALESCE (CASE
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
< 0 THEN 'T00 (-inf,0)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 0 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
< 10 THEN 'T01 [0,10)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 10 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
< 50 THEN 'T02 [10,50)'
WHEN SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 50 AND (SELECT SUM(order_items.sale_price)-----
FROM order_items
WHERE order_items.order_id = orders.id)
< 150 THEN 'T03 [50,150)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 150 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
< 500 THEN 'T04 [150,500)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 500 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
< 1000 THEN 'T05 [500,1000)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 1000 THEN 'T06 [1000,inf)'
ELSE 'TXX Undefined'
END,'') AS
"_p1",COALESCE(DATE_ADD(DATE(CONVERT_TZ(orders.created_at,'UTC','America/Los_Angeles')),
INTERVAL
(0-(DAYOFWEEK(CONVERT_TZ(orders.created_at,'UTC','America/Los_Angeles'))+5)%7) DAY),'')
AS "_g1", DATE_ADD(DATE(CONVERT_TZ(orders.created_at,'UTC','America/Los_Angeles')),
INTERVAL
(0-(DAYOFWEEK(CONVERT_TZ(orders.created_at,'UTC','America/Los_Angeles'))+5)%7) DAY)
AS `orders.created_week`,CASE
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
< 0 THEN 'T00 (-inf,0)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 0 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
< 10 THEN 'T01 [0,10)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 10 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
< 50 THEN 'T02 [10,50)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 50 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
< 150 THEN 'T03 [50,150)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 150 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
< 500 THEN 'T04 [150,500)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 500 AND (SELECT SUM(order_items.sale_pr
FROM order_items
WHERE order_items.order_id = orders.id)
< 1000 THEN 'T05 [500,1000)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 1000 THEN 'T06 [1000,inf)'
ELSE 'TXX Undefined'
END AS 'orders.total_amount_of_order_usd_tier',
AVG((SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
) AS 'orders.average_total_amount_of_order_usd',
SUM (CASE WHEN ((SELECT COUNT(*)
FROM orders o
WHERE o.id < orders.id
AND o.user_id=orders.user_id) + 1
) = 1 THEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
ELSE NULL END) AS 'orders.total_first_purchase_revenue',
SUM (CASE WHEN NOT COALESCE((( SELECT COUNT(*)
FROM orders o
WHERE o.id < orders.id
AND o.user_id=orders.user_id) + 1
) = 1, FALSE) THEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
ELSE NULL END) AS 'orders.total_returning_shopper_revenue',
AVG(((SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
- (SELECT SUM(inventory_items.cost)
FROM order_items
LEFT JOIN inventory_items ON order_items.inventory_item_id = inventory_items.id
WHERE order_items.order_id = orders.id)
)) AS `orders.average_order_profit`
FROM orders
WHERE
orders.created_at BETWEEN (CONVERT_TZ(DATE_ADD(CURDATE(),INTERVAL -29 day),
'America/Los_Angeles','UTC')) AND (CONVERT_TZ(DATE_ADD(DATE_ADD(DATE_ADD
(CURDATE(),INTERVAL -29 day),INTERVAL 30 day),INTERVAL -1 second),
'America/Los_Angeles','UTC'))
GROUP BY 1,2
ORDER BY CASE
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
< 0 THEN 'T00 (-inf,0)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 0 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
< 10 THEN 'T01 [0,10)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 10 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
< 50 THEN 'T02 [10,50)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 50 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
< 150 THEN 'T03 [50,150)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 150 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
< 500 THEN 'T04 [150,500)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 500 AND (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
< 1000 THEN 'T05 [500,1000)'
WHEN (SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
>= 1000 THEN 'T06 [1000,inf)'
ELSE 'TXX Undefined'
END
LIMIT 500
Maybe this makes sense to write one time, but what if we want to look at this by a Customer's state instead of by order week?
Or maybe by Lifetime Number of Purchases by a customer, tiered?
As you can see, all these reports can be generated, altered, and updated—without the need to rewrite any SQL. In LookML, we abstract the essential business logic once, then reference it within other dimensions and measures—allowing quick, rapid iteration of data exploration, while also ensuring the accuracy of the SQL that’s generated. If a business user wants a new tier, just add it to the dimension. If they want to determine revenue from customers with more than 10 purchases, just create a new measure that sums total order amount and filters on customers with more than 10 purchases. Small updates are quick and can be made immediately available to end users. That frees you up to define the that will take your business to the next level.