Snowflake: ListAgg data based on condition

I have a table P that looks something like this:

ID | Status | Env
1  |   1    | Linux
1  |   1    | Windows
1  |   3    | Mac
2  |   1    | Linux
2  |   1    | Windows
2  |   1    | Mac
3  |   3    | Linux
3  |   0    | Windows
3  |   3    | Mac

​x
 
ID | Status | Env1  |   1    | Linux1  |   1    | Windows1  |   3    | Mac2  |   1    | Linux2  |   1    | Windows2  |   1    | Mac3  |   3    | Linux3  |   0    | Windows3  |   3    | Mac​

Here, 1 signifies test success whereas any other number signify some kind of failure. I want to aggregate this data in such a way that for every failed test I have a comma separated list of failed environments in every row. And if there are no failures, there should be NULL in the new columns. The output would look something like

ID | Status | Env     | Failure_list
1  |   1    | Linux   | Mac
1  |   1    | Windows | Mac
1  |   3    | Mac     | Mac
2  |   1    | Linux   | Null
2  |   1    | Windows | Null
2  |   1    | Mac     | Null
3  |   3    | Linux   | Linux, Windows, Mac
3  |   0    | Windows | Linux, Windows, Mac
3  |   3    | Mac     | Linux, Windows, Mac

 
ID | Status | Env     | Failure_list1  |   1    | Linux   | Mac1  |   1    | Windows | Mac1  |   3    | Mac     | Mac2  |   1    | Linux   | Null2  |   1    | Windows | Null2  |   1    | Mac     | Null3  |   3    | Linux   | Linux, Windows, Mac3  |   0    | Windows | Linux, Windows, Mac3  |   3    | Mac     | Linux, Windows, Mac​

I am using the snowflake LISTAGG() function in a query like

SELECT ID, STATUS, LISTAGG(ENV, ', ') 
FROM P
GROUP BY ID, STATUS

 
SELECT ID, STATUS, LISTAGG(ENV, ', ') FROM PGROUP BY ID, STATUS​

This is the output I get:

ID | Status | Env
1  |   1    | Linux, Windows
1  |   3    | Mac
2  |   1    | Linux, Windows, Mac
3  |   0    | Windows
3  |   3    | Linux, Mac

 
ID | Status | Env1  |   1    | Linux, Windows1  |   3    | Mac2  |   1    | Linux, Windows, Mac3  |   0    | Windows3  |   3    | Linux, Mac​

How can I change this query to get the output that I am looking for?

Answer

You could solve this with a correlated subquery:

select
    t.*,
    (
        select listagg(t1.env)
        from mytable t1
        where t1.id = t.id and t1.status <> 1
    ) failure_list
from mytable t

 
select    t.*,    (        select listagg(t1.env)        from mytable t1        where t1.id = t.id and t1.status <> 1    ) failure_listfrom mytable t​

Or better yet using listagg() as a window function, which Snowflake supports:

select 
    t.*,
    listagg(case when status <> 1 then env end) over(partition by id) failure_list
from mytable t

 
select     t.*,    listagg(case when status <> 1 then env end) over(partition by id) failure_listfrom mytable t​

Advertisement

Answer