i have a table like this:
id | arr_val | grp ----------------- 1 | {10,20} | - 2 | {20,30} | - 3 | {50,5} | - 4 | {30,60} | - 5 | {1,5} | - 6 | {7,6} | -
I want to find out which rows are in a group together. In this example 1,2,4 are one group because 1 and 2 have a common element and 2 and 4. 3 and 5 form a group because they have a common element. 6 Has no common elments with anybody else. So it forms a group for itself. The result should look like this:
id | arr_val | grp ----------------- 1 | {10,20} | 1 2 | {20,30} | 1 3 | {50,5} | 2 4 | {30,60} | 1 5 | {1,5} | 2 6 | {7,6} | 3
I think i need recursive cte because my problem is graphlike but i am not sure how to that.
Additional info and background:
The Table has ~2500000 rows.
In reality the problem i try to solve has more fields and conditions for finding a group:
id | arr_val | date | val | grp --------------------------------- 1 | {10,20} | - 2 | {20,30} | -
Not only do the element of a group need to be linked by common elements in arr_val. They all need to have the same value in val and need to be linked by a timespan in date (gaps and islands). I solved the other two but now the condition of my question was added. If there is an easy way to do all three together in one query that would be awesome but it is not necessary.
—-Edit—–
While both answer work for the example of five rows they do not work for a table with a lot more rows. Both answers have the problem that the number of rows in the recursive part explodes and only reduce them at the end. A solutiuon should work for data like this too:
id | arr_val | grp ----------------- 1 | {1} | - 2 | {1} | - 3 | {1} | - 4 | {1} | - 5 | {1} | - 6 | {1} | - 7 | {1} | - 8 | {1} | - 9 | {1} | - 10 | {1} | - 11 | {1} | - more rows........
Is there a solution to that problem?
Advertisement
Answer
You can handle this as a recursive CTE. Define the edges between the ids based on common values. Then traverse the edges and aggregate:
with recursive nodes as ( select id, val from t cross join unnest(arr_val) as val ), edges as ( select distinct n1.id as id1, n2.id as id2 from nodes n1 join nodes n2 on n1.val = n2.val ), cte as ( select id1, id2, array[id1] as visited, 1 as lev from edges where id1 = id2 union all select cte.id1, e.id2, visited || e.id2, lev + 1 from cte join edges e on cte.id2 = e.id1 where e.id2 <> all(cte.visited) ), vals as ( select id1, array_agg(distinct id2 order by id2) as id2s from cte group by id1 ) select *, dense_rank() over (order by id2s) as grp from vals;
Here is a db<>fiddle.