There is a really complicated query, which I cannot wrap my head around alone. In the queried table there are 5 columns: PK_ID
, which is the unique identifier of the value, ID
, which connects the aggregated values in one group, the count
, which says, how many value groups should be aggregated (if there aren’t enough, the group with such ID
should be ommited from the query results), the num
, which says the number of the aggregated value inside the group, and the value
to be aggregated.
(Table T1) PK_ID | ID | num | count | value ------+-------+-------+--------+-------- 1 | 1 | 1 | 3 | A ------+-------+-------+--------+-------- 2 | 1 | 2 | 3 | B ------+-------+-------+--------+-------- 3 | 1 | 3 | 3 | C ------+-------+-------+--------+-------- 4 | 1 | 1 | 3 | D ------+-------+-------+--------+-------- 5 | 1 | 3 | 3 | E ------+-------+-------+--------+-------- 6 | 1 | 2 | 3 | F ------+-------+-------+--------+-------- 7 | 1 | 3 | 3 | G ------+-------+-------+--------+-------- 8 | 2 | 1 | 2 | H ------+-------+-------+--------+-------- 9 | 2 | 2 | 2 | I ------+-------+-------+--------+-------- 10 | 2 | 1 | 2 | J ------+-------+-------+--------+-------- 11 | 3 | 1 | 5 | X
The results of such query would be:
PK_ID | T1_ID | cross_aggr | T1_PK_IDs ------+-------+------------+----------- 1 | 1 | {A, B, C} | {1, 2, 3} ------+-------+------------+----------- 2 | 1 | {A, B, E} | {1, 2, 5} ------+-------+------------+----------- 3 | 1 | {A, B, G} | {1, 2, 7} ------+-------+------------+----------- 4 | 1 | {A, F, C} | {1, 6, 3} ------+-------+------------+----------- 5 | 1 | {A, F, E} | {1, 6, 5} ------+-------+------------+----------- 6 | 1 | {A, F, G} | {1, 6, 7} ------+-------+------------+----------- 7 | 1 | {D, B, C} | {4, 2, 3} ------+-------+------------+----------- 8 | 1 | {D, B, E} | {4, 2, 5} ------+-------+------------+----------- 9 | 1 | {D, B, G} | {4, 2, 7} ------+-------+------------+----------- 10 | 1 | {D, F, C} | {4, 6, 3} ------+-------+------------+----------- 11 | 1 | {D, F, E} | {4, 6, 5} ------+-------+------------+----------- 12 | 1 | {D, F, G} | {4, 6, 7} ------+-------+------------+----------- 13 | 2 | {H, I} | {8, 9} ------+-------+------------+----------- 14 | 2 | {J, I} | {10, 9}
So it returned every possible combination of value
s inside each group, defined by ID
and described by count
, and those values are aggregated in the order of each individual num
.
It looks like I have to use the CROSS JOIN
(in the same manner as the cartesian product would work in linear algebra), but the CROSS JOIN
seems to work only if the number of JOIN
ed tables is known beforehand, but there is the count
complication. The count
itself could potentially (I mean it doesn’t but it could) range from 2
to infinity
.
I have thought of having it processed by a WINDOW
function, PARTITION BY
the ID
and by num
, but after that I’m stuck, since there seems no way to have a cartesian product of those partitions.
Should I call it a day and find another way to process the data, or is there a way to compose such a query?
Advertisement
Answer
This reads like a graph-walking problem – which requires a recursive query.
with recursive cte as ( select id, num, cnt, 1 as lvl, array[value::text] as arr_values, array[pk_id::int] as arr_pk_id from mytable where num = 1 union all select c.id, t.num, c.cnt, c.lvl + 1, c.arr_values || t.value::text, c.arr_pk_id || t.pk_id from cte c inner join mytable t on t.id = c.id and t.num = c.num + 1 and t.num <= c.cnt ) select id, arr_values, arr_pk_id from cte where array_length(arr_pk_id, 1) = cnt order by arr_pk_id
For each id
, the recursive query sarts at num
1
and follows all possible paths. The outer query just filters on the end of paths.
id | arr_values | arr_pk_id -: | :--------- | :-------- 1 | {A,B,C} | {1,2,3} 1 | {A,B,E} | {1,2,5} 1 | {A,B,G} | {1,2,7} 1 | {A,F,C} | {1,6,3} 1 | {A,F,E} | {1,6,5} 1 | {A,F,G} | {1,6,7} 1 | {D,B,C} | {4,2,3} 1 | {D,B,E} | {4,2,5} 1 | {D,B,G} | {4,2,7} 1 | {D,F,C} | {4,6,3} 1 | {D,F,E} | {4,6,5} 1 | {D,F,G} | {4,6,7} 2 | {H,I} | {8,9} 2 | {J,I} | {10,9}