Announcements
This site is in read only until July 22 as we migrate to a new platform; refer to this community post for more details.

PERCENTILE() is not working correctly, only considers distinct values

Hello,

As I was working with Looker Studio today, I have realized that I was getting strange values when calculating the percentiles. After extracting the data to Google Sheets I was able to understand that the PERCENTILE() function only considers distinct values and not all values present. Is this intended? It seems like a bug because metrics like the median are not correctly represented this way 

1 7 78
7 REPLIES 7

@HugoTorres 

I don't see any bug on my side.
Sanity check: Have you applied a deduplication of the dimension field values before applying the percentile, like a data blending?

The problem is not duplicate values, is the fact the Looker removes them when calculating the percentile. If I have 100 values, but 5 of them are duplicates, Looker Studio is only considering 95 unique values for the percentile calculation

I previously understood the problem. Have you applied the percentile directly to the field returning the 100 values or have you used some intermediary treatments like a blending? Please, can you describe the details of your implementation.

Mehdi Oudjida

No intermediary tables. Data is ingested from Snowflake and each row has a numerical value. The percentile function is applied directly to that column

Oh Snowflake...it can be a problem of connector and what is used as the underlying data set.

I've made a quick test of Median with a Google Sheets as a data source (1,2,3,3,3 asking the percentile 50) without problem.

Your field is displayed in green as a dimension so without aggregation?
If so, Have you got a "record count" metric in you data source?

Currently the connection to Snowflake in simply select * from table_a, in which table_a is a materialized table on the data warehouse. I have also tested the record count and it gives me the right amount of values. I have also checked and the field is green and numerical

Tested now from a Google Sheet and the median provides the correct value. However, for the 99th percentile it gives me the maximum, which should not happen. Is there any way to understand which data and which formula Looker is using for this calculation?