Solved: Re: Combine AVG function with DISTINCT records

GertVH · 08-20-2024 05:40 AM

I'm using a scorecard to calculate the average of a column. The information used for the average is shown in a table. If I compare the data in the table with the average, the calculation of the average does not appear to be correct. When I dive deaper into the data, the problem seems to be caused because the data-source contains "duplicate records". The records aren't really duplicate, but when you take into account the calculation of the average, the diferences aren't relevant.
Question: Is it possible to calculate the average based on distinct records in the score card
Table in Looker:
In blue is the column I want the average to be calculated on. As you can see, the average of 5+4+4+2 is not 4.5 but 3.75. This caused by multiple occurences in line 1 (8 times) and 3 (10 times) in the datasource.

Mehdi_Oudjida

Hi @GertVH

Inside Looker Studio, you need to think different. Most of the solutions for aggregation and unaggregation are based on data blending.

You probably need to create a single-table data blending only with dimensions to deduplicate your rows, it will keep only the different rows thanks to a merging of rows sharing the same values (year and your stats as 2 dimensions if I understood the use case).
Then, use the blending as the data source of your scorecard and apply an AVG on the stat.

So, to sum-up:
- the blending plays the role of DISTINCT
- the chart applies the AVG aggreg on the blended duplicated data

PS: AVG(DISTINCT ...) could work (never tested it but COUNT(DISTINCT ... do) but it is not officially supported, so I don't recommend to use it

Voila.

Mehdi

View solution in original post

sam8

I think Looker Studio would need an AVERAGE_DISTINCT metric type in order to support this. I can't think of any way to work around that, but I do recommend creating a feature request!

Mehdi_Oudjida

Hi @GertVH

Inside Looker Studio, you need to think different. Most of the solutions for aggregation and unaggregation are based on data blending.

You probably need to create a single-table data blending only with dimensions to deduplicate your rows, it will keep only the different rows thanks to a merging of rows sharing the same values (year and your stats as 2 dimensions if I understood the use case).
Then, use the blending as the data source of your scorecard and apply an AVG on the stat.

So, to sum-up:
- the blending plays the role of DISTINCT
- the chart applies the AVG aggreg on the blended duplicated data

PS: AVG(DISTINCT ...) could work (never tested it but COUNT(DISTINCT ... do) but it is not officially supported, so I don't recommend to use it

Voila.

Mehdi

GertVH

Hi Mehdi,
Thanks for your quick support! The problem seems to be solved!
Thanks
Gert

Mehdi_Oudjida

👍 Good News. Happy to help