I’m trying to use three projections in same query like below in a Druid environment:
select __time, count(distinct col1), count(distinct case when (condition1 and condition2 then (concat(col2,TIME_FORMAT(__time))) else 0 end ) from table where condition3 GROUP BY __time
But instead I get an error saying – Unknown exception / Cannot build plan for query
It seems to work perfectly fine when I put just one count(distinct) in the query.
How can this be resolved?
Advertisement
Answer
As the Druid documentation points out:
COUNT(DISTINCT expr)
Counts distinct values of expr, which can be string, numeric, or hyperUnique. By default this is approximate, using a variant of HyperLogLog. To get exact counts set “useApproximateCountDistinct” to “false”. If you do this, expr must be string or numeric, since exact counts are not possible using hyperUnique columns. See alsoAPPROX_COUNT_DISTINCT(expr)
. In exact mode, only one distinct count per query is permitted.
So this is a Druid limitation: you either need to disable exact mode, or else limit yourself to one distinct count per query.
On a side note, other databases typically do not have this limitation. Apache Druid is designed for high performance real-time analytics, and as a result, its implementation of SQL has some restrictions. Internally, Druid uses a JSON-based query language. The SQL interface is powered by a parser and planner based on Apache Calcitea, which translates SQL into native Druid queries.