Skip to content
Advertisement

How to use multiple count distinct in the same query with other columns in Druid SQL?

I’m trying to use three projections in same query like below in a Druid environment:

select
  __time,
  count(distinct col1),
  count(distinct case when (condition1 and condition2 then (concat(col2,TIME_FORMAT(__time))) else 0 end )
from table
where condition3
GROUP BY __time

But instead I get an error saying – Unknown exception / Cannot build plan for query

It seems to work perfectly fine when I put just one count(distinct) in the query.

How can this be resolved?

Advertisement

Answer

As the Druid documentation points out:

COUNT(DISTINCT expr) Counts distinct values of expr, which can be string, numeric, or hyperUnique. By default this is approximate, using a variant of HyperLogLog. To get exact counts set “useApproximateCountDistinct” to “false”. If you do this, expr must be string or numeric, since exact counts are not possible using hyperUnique columns. See also APPROX_COUNT_DISTINCT(expr). In exact mode, only one distinct count per query is permitted.

So this is a Druid limitation: you either need to disable exact mode, or else limit yourself to one distinct count per query.

On a side note, other databases typically do not have this limitation. Apache Druid is designed for high performance real-time analytics, and as a result, its implementation of SQL has some restrictions. Internally, Druid uses a JSON-based query language. The SQL interface is powered by a parser and planner based on Apache Calcitea, which translates SQL into native Druid queries.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement