Skip to content
Advertisement

SQL – cartesian product of PARTITIONs

There is a really complicated query, which I cannot wrap my head around alone. In the queried table there are 5 columns: PK_ID, which is the unique identifier of the value, ID, which connects the aggregated values in one group, the count, which says, how many value groups should be aggregated (if there aren’t enough, the group with such ID should be ommited from the query results), the num, which says the number of the aggregated value inside the group, and the value to be aggregated.

(Table T1)

PK_ID |   ID  |  num  |  count |  value
------+-------+-------+--------+--------
 1    |   1   |   1   |   3    |   A
------+-------+-------+--------+--------
 2    |   1   |   2   |   3    |   B
------+-------+-------+--------+--------
 3    |   1   |   3   |   3    |   C
------+-------+-------+--------+--------
 4    |   1   |   1   |   3    |   D
------+-------+-------+--------+--------
 5    |   1   |   3   |   3    |   E
------+-------+-------+--------+--------
 6    |   1   |   2   |   3    |   F
------+-------+-------+--------+--------
 7    |   1   |   3   |   3    |   G
------+-------+-------+--------+--------
 8    |   2   |   1   |   2    |   H
------+-------+-------+--------+--------
 9    |   2   |   2   |   2    |   I
------+-------+-------+--------+--------
 10   |   2   |   1   |   2    |   J
------+-------+-------+--------+--------
 11   |   3   |   1   |   5    |   X

The results of such query would be:

PK_ID | T1_ID | cross_aggr | T1_PK_IDs
------+-------+------------+-----------
  1   |   1   | {A, B, C}  | {1, 2, 3}
------+-------+------------+-----------
  2   |   1   | {A, B, E}  | {1, 2, 5}
------+-------+------------+-----------
  3   |   1   | {A, B, G}  | {1, 2, 7}
------+-------+------------+-----------
  4   |   1   | {A, F, C}  | {1, 6, 3}
------+-------+------------+-----------
  5   |   1   | {A, F, E}  | {1, 6, 5}
------+-------+------------+-----------
  6   |   1   | {A, F, G}  | {1, 6, 7}
------+-------+------------+-----------
  7   |   1   | {D, B, C}  | {4, 2, 3}
------+-------+------------+-----------
  8   |   1   | {D, B, E}  | {4, 2, 5}
------+-------+------------+-----------
  9   |   1   | {D, B, G}  | {4, 2, 7}
------+-------+------------+-----------
  10  |   1   | {D, F, C}  | {4, 6, 3}
------+-------+------------+-----------
  11  |   1   | {D, F, E}  | {4, 6, 5}
------+-------+------------+-----------
  12  |   1   | {D, F, G}  | {4, 6, 7}
------+-------+------------+-----------
  13  |   2   | {H, I}     | {8, 9}
------+-------+------------+-----------
  14  |   2   | {J, I}     | {10, 9}

So it returned every possible combination of values inside each group, defined by ID and described by count, and those values are aggregated in the order of each individual num.

It looks like I have to use the CROSS JOIN (in the same manner as the cartesian product would work in linear algebra), but the CROSS JOIN seems to work only if the number of JOINed tables is known beforehand, but there is the count complication. The count itself could potentially (I mean it doesn’t but it could) range from 2 to infinity.

I have thought of having it processed by a WINDOW function, PARTITION BY the ID and by num, but after that I’m stuck, since there seems no way to have a cartesian product of those partitions.

Should I call it a day and find another way to process the data, or is there a way to compose such a query?

Advertisement

Answer

This reads like a graph-walking problem – which requires a recursive query.

with recursive cte as (
    select id, num, cnt, 1 as lvl,
        array[value::text] as arr_values, 
        array[pk_id::int] as arr_pk_id
    from mytable where num = 1
    union all
    select c.id, t.num, c.cnt, c.lvl + 1, 
        c.arr_values || t.value::text, 
        c.arr_pk_id  || t.pk_id
    from cte c
    inner join mytable t 
        on  t.id = c.id 
        and t.num = c.num + 1 
        and t.num <= c.cnt
)
select id, arr_values, arr_pk_id 
from cte 
where array_length(arr_pk_id, 1) = cnt
order by arr_pk_id

For each id, the recursive query sarts at num 1 and follows all possible paths. The outer query just filters on the end of paths.

Demo on DB Fiddle:

id | arr_values | arr_pk_id
-: | :--------- | :--------
 1 | {A,B,C}    | {1,2,3}  
 1 | {A,B,E}    | {1,2,5}  
 1 | {A,B,G}    | {1,2,7}  
 1 | {A,F,C}    | {1,6,3}  
 1 | {A,F,E}    | {1,6,5}  
 1 | {A,F,G}    | {1,6,7}  
 1 | {D,B,C}    | {4,2,3}  
 1 | {D,B,E}    | {4,2,5}  
 1 | {D,B,G}    | {4,2,7}  
 1 | {D,F,C}    | {4,6,3}  
 1 | {D,F,E}    | {4,6,5}  
 1 | {D,F,G}    | {4,6,7}  
 2 | {H,I}      | {8,9}    
 2 | {J,I}      | {10,9}   
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement