Skip to content
Advertisement

Is branch pruning possible for recursive cte query

This is inspired by question Retrieve a list of lists in one SQL statement – I have come up with a solution, but I have doubts on its efficiency.

To restate the problem:

  • we have 2 Tables: Person and Parent
  • Person contains basic data about each person
  • Parent is a join table relating person with its parents
  • each Person can have multiple parents
  • we want to receive each person data with list of all their ancestors – each ancestor in its own row
  • if there are no ancestors, we have only one row for that person, with null parentId

Here is the data format:

Person table

Id <PK>
Name

Parent table

Id<PK>
ParentPersonId <FK into Person >

Person has rows with values PK

1, 'Jim'
2, 'John'
3, 'Anna'
4, 'Peter'

Parent has rows with values

1, 2
1, 3
2, 3
3, 4

So person 1 has ancestors 2, 3, 4

I want the output in the following form

id  name    parentPersonId
--------------------------
1   Jim     2
1   Jim     3
1   Jim     4
2   John    3
2   John    4
3   Anna    4
4   Peter   (null)

My solution used recursive CTE query, but my fear is that it produces too many rows, as each subtree can be entered multiple times. I needed to filter out duplicates with distinct, and execution plan shows that, even with this simple data, sorting for distinct takes 50% of the time. Here is my query:

WITH cte_org AS 
(
    SELECT per.id, per.name, par.parentPersonId
    FROM Person per 
    LEFT JOIN Parent par ON per.id = par.id
    UNION ALL
    SELECT o.id, o.name, rec.parentPersonId
    FROM Parent rec
    INNER JOIN cte_org o ON o.parentPersonId = rec.id
    WHERE rec.parentPersonId IS NOT NULL
)
SELECT DISTINCT *
FROM cte_org
ORDER BY id, parentPersonId;

http://sqlfiddle.com/#!18/d7d62/4

My questions:

  • can I somehow prune the already visited branches, so that recursive-CTE does not produce duplicate rows, and final distinct is not necessary
  • is recursive CTE a right approach to this problem?

Advertisement

Answer

On PostgreSQL you can acheive that by replacing UNION ALL with UNION. So the query looks like that:

WITH RECURSIVE cte_org AS (
    select per.id, per.name, par.parentPersonId
    from Person per left join Parent par 
    on per.id = par.id
    UNION 
    SELECT 
        o.id, 
        o.name,
        rec.parentPersonId
    FROM 
        Parent rec
        INNER JOIN cte_org o 
            ON o.parentPersonId = rec.id
        where rec.parentPersonId is not null
)
SELECT *
FROM cte_org
ORDER BY id, parentPersonId;

http://sqlfiddle.com/#!17/225cf4/4

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement