Skip to content
Advertisement

How can i find all linked rows by array values in postgres?

i have a table like this:

id | arr_val  | grp
-----------------
1  | {10,20}  | -
2  | {20,30}  | -
3  | {50,5}   | -
4  | {30,60}  | -
5  | {1,5}    | -
6  | {7,6}    | -

I want to find out which rows are in a group together. In this example 1,2,4 are one group because 1 and 2 have a common element and 2 and 4. 3 and 5 form a group because they have a common element. 6 Has no common elments with anybody else. So it forms a group for itself. The result should look like this:

id | arr_val  | grp
-----------------
1  | {10,20}  | 1
2  | {20,30}  | 1
3  | {50,5}   | 2
4  | {30,60}  | 1
5  | {1,5}    | 2
6  | {7,6}    | 3

I think i need recursive cte because my problem is graphlike but i am not sure how to that.

Additional info and background:

The Table has ~2500000 rows.

In reality the problem i try to solve has more fields and conditions for finding a group:

id | arr_val  | date | val | grp
---------------------------------
1  | {10,20}  | -
2  | {20,30}  | -

Not only do the element of a group need to be linked by common elements in arr_val. They all need to have the same value in val and need to be linked by a timespan in date (gaps and islands). I solved the other two but now the condition of my question was added. If there is an easy way to do all three together in one query that would be awesome but it is not necessary.

—-Edit—–

While both answer work for the example of five rows they do not work for a table with a lot more rows. Both answers have the problem that the number of rows in the recursive part explodes and only reduce them at the end. A solutiuon should work for data like this too:

id | arr_val  | grp
-----------------
1  | {1}      | -
2  | {1}      | -
3  | {1}      | -
4  | {1}      | -
5  | {1}      | -
6  | {1}      | -
7  | {1}      | -
8  | {1}      | -
9  | {1}      | -
10 | {1}      | -
11 | {1}      | -
more rows........

Is there a solution to that problem?

Advertisement

Answer

You can handle this as a recursive CTE. Define the edges between the ids based on common values. Then traverse the edges and aggregate:

with recursive nodes as (
      select id, val
      from t cross join
           unnest(arr_val) as val
     ),
     edges as (
      select distinct n1.id as id1, n2.id as id2
      from nodes n1 join
           nodes n2
           on n1.val = n2.val
     ),
     cte as (
      select id1, id2, array[id1] as visited, 1 as lev
      from edges
      where id1 = id2
      union all
      select cte.id1, e.id2, visited || e.id2,
             lev + 1
      from cte join
           edges e
           on cte.id2 = e.id1
      where e.id2 <> all(cte.visited) 
     ),
     vals as (
      select id1, array_agg(distinct id2 order by id2) as id2s
      from cte
      group by id1
    )
select *, dense_rank() over (order by id2s) as grp
from vals;

Here is a db<>fiddle.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement