Let’s say we have a Sqlite table containing:
name;city;age;id;foo Alice;New-York;25;13782749;12 Eve;Chicago;23;1938679;34 Bob;New-York;25;824697624;56 Jack;Denver;30;239679163;12 Simone;Denver;30;1687631;99
How to group by h=CONCAT(city,age)
but also add in the group the users that have a column foo
identical to someone in the group?
As seen in Group items by 2 columns, here is how to group by h
:
select dense_rank() over (order by city, age) as grpnum, name, id from t;
How to add this second condition on foo
?
Example: here Alice and Bob are in the same group1
because they have identical h=CONCAT(city,age)
, but Jack should also be in group1
because he has same foo value as Alice: 12
. Simone has same CONCAT(city,age)
than Jack, therefore she is in the same group as Jack, in group1
as well.
Advertisement
Answer
This is much trickier. This is a graph-walking problem, which requires a recursive CTE (Common Table Expression).
The idea is to generate all the edges between names. Then use a recursive CTE to visit all connected nodes (name
s), avoiding visiting existing ones.
Here is one method:
with recursive edges as ( select distinct t1.name as name1, t2.name as name2 from t t1 join t t2 on t1.city = t2.city and t1.age = t2.age or t1.foo = t2.foo ), cte as ( select name1, name2, min(name1, name2) as first_name, ',' || name1 || ',' || name2 || ',' as visited from edges union all select cte.name1, e.name2, min(cte.first_name, e.name2), visited || e.name2 || ',' from cte join edges e on e.name1 = cte.name2 where visited not like '%,' || e.name2 || ',%' ) select t.*, cte.grpnum from t join (select name1, min(first_name), dense_rank() over (order by min(first_name)) as grpnum from cte group by name1 ) cte on t.name = cte.name1;
And here is a db<>fiddle.