Skip to content
Advertisement

Complex SQL query aggregation and grouping on athena

I have a table like this:

|     db      |      chat_id      |   Admin    |     user      |
+-------------+-------------------+------------+---------------+
|    db_1     |      chat_id1     |    max     |     greg      |
|    db_1     |      chat_id2     |    max     |     bob       |
|    db_1     |      chat_id3     |    max     |     greg      |
|    db_1     |      chat_id2     |    helen   |     greg      |
|    db_2     |      chat_id1     |    alan    |     greg      |

I would like to retrieve the number of chat performed by users for each database (db) and the last part where I fail, retrieve also a list of all mentors by users.

The final output should be like this for example (notice there is only one time max for greg in the admin column)

|     db      |      user     |  nb_of_chat  |     admins    |
+-------------+---------------+--------------+---------------+
|    db_1     |      greg     |      3       |   max, helen  |
|    db_1     |      bob      |      1       |      max      |
|    db_2     |      greg     |      1       |      alan     |

I wrote the following query but it doesn’t aggregate the admins and i have separated nb_of chats/mentors.

SELECT db, user, COUNT(chat_id), admins
FROM "chat_db"."chats" 
GROUP BY db, user, admins;

As expected I am getting the following result (but I only want it on one line by db/user with grouped admin in the same column):

|     db      |      user     |  nb_of_chat  |     admins    |
+-------------+---------------+--------------+---------------+
|    db_1     |      greg     |      2       |       max     |
|    db_1     |      greg     |      1       |      helen    |
|    ...      |      ...      |     ...      |      ...      |

Have you an idea how to perform it ?

Thank you for your time !

Regards.

Advertisement

Answer

Firsly, remove adminsfrom the group by clause, since you want to aggregate it. Then, in Presto, you can do string aggregation as follows:

select db,user, count(*) no_of_chats, 
    array_join(array_agg(admins), ', ') all_admins
from  "chat_db"."chats" 
group by db, user;

You can add an order by clause to array_agg() if needed:

select db,user, count(*) no_of_chats, 
    array_join(array_agg(admins order by admins), ', ') all_admins
from  "chat_db"."chats" 
group by db, user;

Note that I changed count(chat_id) to count(*): both are equivalent (since chat_id probably is a non-nullable column), and the former is (sligthly) more efficient, and makes the intent clearer in my opinion.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement