I have a table that looks like this:
+----+-------+ | id | col2 | +----+-------+ | 1 | a | | 2 | b | | 3 | ,b | | 4 | c | | 5 | d,a | | 6 | e,a,b | +----+-------+
What is the most efficient way to query it and return the following ?
+------+----------+ | col1 | count_id | +------+----------+ | a | 3 | | b | 3 | | c | 1 | | d | 1 | | e | 1 | +------+----------+
I was thinking to use case when
statements but it seems messy.
Advertisement
Answer
In Presto you can split the delimited list into an array, then unnest the array. This gives you one record per element in each list. The rest is just aggregation:
select s.colx, count(*) cnt from mytable t cross join unnest(split(t.col2, ',')) as s(colx) group by s.colx
If you want the count of distinct id
s (in case there are duplicates within the delimited lists):
select s.colx, count(distinct t.id) cnt from mytable t cross join unnest(split(t.col2, ',')) as s(colx) group by s.colx