Skip to content
Advertisement

SQL: Efficient way to count and group results by like value

I have a table that looks like this:

+----+-------+
| id | col2  |
+----+-------+
|  1 | a     |
|  2 | b     |
|  3 | ,b    |
|  4 | c     |
|  5 | d,a   |
|  6 | e,a,b |
+----+-------+

What is the most efficient way to query it and return the following ?

+------+----------+
| col1 | count_id |
+------+----------+
| a    |        3 |
| b    |        3 |
| c    |        1 |
| d    |        1 |
| e    |        1 |
+------+----------+

I was thinking to use case when statements but it seems messy.

Advertisement

Answer

In Presto you can split the delimited list into an array, then unnest the array. This gives you one record per element in each list. The rest is just aggregation:

select s.colx, count(*) cnt 
from mytable t
cross join unnest(split(t.col2, ',')) as s(colx)
group by s.colx

If you want the count of distinct ids (in case there are duplicates within the delimited lists):

select s.colx, count(distinct t.id) cnt 
from mytable t
cross join unnest(split(t.col2, ',')) as s(colx)
group by s.colx
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement