Re: data source calculated metrics VS SQL Aggregat...

DZlAmir · 08-29-2024 07:08 AM

Hello Guys I have a hard problem which I need your help in
We are trying to make one common place for all our metrics and there are two ways to do that

Either by using Looker studio data sources calculated metrics
Or by using directly the aggregation function on big query like (SUM, ...)

The second option have way more benefits for us but I have one problem with the second option

lets say a stakeholder want a daily and monthly chart for the number of users

1. with the first approach, I will do this

SELECT * FROM orders

and after that i will create a calculated filed called nbr_users

COUNT_DISINCT(userID)

this way it will work perfectly with the first approach
e.g.: I have user order 1 time for 3 days in a row so when i do daily count i should see 1 user one each day and when i do monthly I get only 1 user (it works this way with the first approach)

2. but with the second approach the problem appear
i will do this query

SELECT
  COUNT(DISTINCT(userID)),
  DATE(creationDate) AS day
FROM
  orders
GROUP BY
 DATE(creationDate)

and then i do nothing on the data source i just connect it with this custom BigQuery SQL

but this way when we get back to our example we will have an issue
on the daily we will get the user once daily for three days
but when we do monthly we will get 3 users but the truth its only one user that had orders on different days

The only way to fix this i can think about is doing different sql for each groupping (monthly or daily or weekly, ...)

but this way is hard to handle creating 5 sql just to do the same thing with different grouping

lets say stakeholders want daily, weekly, monthly, quarterly, and yearly its like 5 SQLs

marcwo

Hi @DZlAmir

can you please clarify exactly what you are trying to achieve?

On your first point, that makes total sense- you are doing a distinct count and group it by different aggregates. A user that has done 3 orders on different dates within a month will have a count of 3 if grouped by date, and a count of 1 if grouped by month.

I cant figure out your second point. You are using a custom query that returns two columns only - date and distinct count, so it returns a date and an integer...how do you anticipate the count distinct on that integer to work if you group by monthly?

I have mocked up a quick table with what I think you are trying to achieve

I have date and customer ID and I `select * from table` as my datasource. I can use this then to show both use cases you want to show:

all data visibledata grouped by date, count distinct on user iddata grouped by month (same dimension as source), count distinct on user_id

DZlAmir

Hey Thanks for your response, I know this is possible by using `SELECT * FROM table` and after that using the calculated fields to do both monthly and daily

I want to achieve the same thing but only using bigquery and this is like impossible
the aim of doing this is to create a robust view that have all the needed aggregations for the business and with this way I can use dataplex and its cataloging version control with git, but I couldn't find anyway to do that with only one custom SQL, while its possible in looker studio data source with the calculated fields

marcwo

I am not sure I understand.

This is one view in BigQuery, no calculated fields required, to be used in Looker Studio.

DZlAmir

Yeah I know but this is just a simple example metric lets we are trying to do order frequency it will need a calculated field but am trying to do it just by SQL and I don't think its possible

data source calculated metrics VS SQL Aggregations