Announcements
This site is in read only until July 22 as we migrate to a new platform; refer to this community post for more details.

How to Use ThetaSketch for COUNT DISTINCT on Druid?

pandog
New Member

When we are trying to do COUNT DISTINCT, it reports

Remote driver error: QueryInterruptedException: Incompatible type for metric[id], expected a 
HyperUnique, got a class org.apache.druid.query.aggregation.datasketches.theta.SketchHolder - 
> QueryInterruptedException: Incompatible type for metric[id], expected a HyperUnique, got a 
class org.apache.druid.query.aggregation.datasketches.theta.SketchHolder

We do want to use ThetaSketch instead of HyperUnique. How we can change that?

0 3 988
3 REPLIES 3

bump, is there any update here? We’re trying to do the same.

pandog
New Member

Yes. Our team developed a UDF called APPROX_COUNT_DISTINCT_DS_THETA to do this. Please try this out. It’s open sourced.

@pandog Very cool, thanks! On that note, has your team used DruidSQL to compile down to Top N queries?

Top Labels in this Space