I have a table P
that looks something like this:
x
ID | Status | Env
1 | 1 | Linux
1 | 1 | Windows
1 | 3 | Mac
2 | 1 | Linux
2 | 1 | Windows
2 | 1 | Mac
3 | 3 | Linux
3 | 0 | Windows
3 | 3 | Mac
Here, 1
signifies test success whereas any other number signify some kind of failure. I want to aggregate this data in such a way that for every failed test I have a comma separated list of failed environments in every row. And if there are no failures, there should be NULL
in the new columns. The output would look something like
ID | Status | Env | Failure_list
1 | 1 | Linux | Mac
1 | 1 | Windows | Mac
1 | 3 | Mac | Mac
2 | 1 | Linux | Null
2 | 1 | Windows | Null
2 | 1 | Mac | Null
3 | 3 | Linux | Linux, Windows, Mac
3 | 0 | Windows | Linux, Windows, Mac
3 | 3 | Mac | Linux, Windows, Mac
I am using the snowflake LISTAGG() function in a query like
SELECT ID, STATUS, LISTAGG(ENV, ', ')
FROM P
GROUP BY ID, STATUS
This is the output I get:
ID | Status | Env
1 | 1 | Linux, Windows
1 | 3 | Mac
2 | 1 | Linux, Windows, Mac
3 | 0 | Windows
3 | 3 | Linux, Mac
How can I change this query to get the output that I am looking for?
Advertisement
Answer
You could solve this with a correlated subquery:
select
t.*,
(
select listagg(t1.env)
from mytable t1
where t1.id = t.id and t1.status <> 1
) failure_list
from mytable t
Or better yet using listagg()
as a window function, which Snowflake supports:
select
t.*,
listagg(case when status <> 1 then env end) over(partition by id) failure_list
from mytable t