Skip to content
Advertisement

Using global list in recursive SQL query to avoid visted nodes

I have a self-referential table User:

id          | follower
------------|------------
1 (adam)    | 2 (bob)
1 (adam)    | 3 (charlie)
2 (bob)     | 1 (adam)
2 (bob)     | 3 (charlie)

Note that there are circular references.

I want to get all the followers, and followers of the followers, and so on of a user such that all the followers are presented in a flattened list, with their respective depths

For Adam:

id | follower    | depth
---|-------------|-------
1  | 1 (bob)     | 0
2  | 3 (charlie) | 0
3  | 1 (adam)    | 1 (bob -> adam)
4  | 3 (charlie) | 1 (bob -> charlie)

Problem

I want to avoid the rows 3 and 4, which represents two problems:

  1. adam -> bob -> adam because it’s circular.

  2. adam -> bob -> charlie because charlie has already appeared before.

I’m able to solve problem #1 by using the following query by keeping a path column of visited ids in a branch

WITH RECURSIVE cte AS (
  SELECT id, follower, 0 as depth, ARRAY[id] AS path
  FROM user
  UNION ALL
  SELECT id, follower, depth + 1, id || path
  FROM user
  JOIN cte ON user.id = cte.follower
  WHERE NOT path @> Array[user.id]
)
SELECT * from cte

But it doesn’t resolve problem #2.

It gives the following result:

follower    | depth | path
------------|-------|-----
2 (bob)     | 0     | {2}
3 (charlie) | 0     | {3}
3 (charlie) | 1     | {2, 3}

It still has problem #2 (duplicate charlie entry) because path column only keeps a list of ids in a specific branch.

How do I fix problem #2?

Possible solution

I can solve it in my code (Node.JS) by keeping a global cache (path equivalent).

const list = {}; /* <-- GLOBAL cache */
function recurse(user, depth = 0) {
  for(const { id, followers } of user.followers) {
    if (!(id in list)) {
      list[id] = {id, depth}
      recurse({ followers }, depth + 1);
    }
  }
}

Whereas, as far as I can tell, the above SQL query is the equivalent of something like:

function recursive() {
  const list = {}; /* <-- LOCAL cache */
  for(const {id} of followers)
    if (!(id in list)) ...

How to I replicate my solution in code using a global cache in SQL?

Or any other way I can achieve the desired result?

I’m using Node.JS and PostgreSQL

Advertisement

Answer

If I understand correctly, you want to select only one row per follower after the recursive search:

WITH RECURSIVE cte AS (
      SELECT id, follower, 0 as depth, ARRAY[id] AS path
      FROM user
      UNION ALL
      SELECT id, follower, depth + 1, id || path
      FROM user
      JOIN cte ON user.id = cte.follower
      WHERE NOT path @> Array[user.id]
     )
SELECT DISTINCT ON (follower) *
FROM cte
ORDER BY follower, depth;
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement