This is inspired by question Retrieve a list of lists in one SQL statement – I have come up with a solution, but I have doubts on its efficiency.
To restate the problem:
- we have 2 Tables:
PersonandParent Personcontains basic data about each personParentis a join table relating person with its parents- each Person can have multiple parents
- we want to receive each person data with list of all their ancestors – each ancestor in its own row
- if there are no ancestors, we have only one row for that person, with null parentId
Here is the data format:
Person table
Id <PK> Name
Parent table
Id<PK> ParentPersonId <FK into Person >
Person has rows with values PK
1, 'Jim' 2, 'John' 3, 'Anna' 4, 'Peter'
Parent has rows with values
1, 2 1, 3 2, 3 3, 4
So person 1 has ancestors 2, 3, 4
I want the output in the following form
id name parentPersonId -------------------------- 1 Jim 2 1 Jim 3 1 Jim 4 2 John 3 2 John 4 3 Anna 4 4 Peter (null)
My solution used recursive CTE query, but my fear is that it produces too many rows, as each subtree can be entered multiple times. I needed to filter out duplicates with distinct, and execution plan shows that, even with this simple data, sorting for distinct takes 50% of the time. Here is my query:
WITH cte_org AS
(
SELECT per.id, per.name, par.parentPersonId
FROM Person per
LEFT JOIN Parent par ON per.id = par.id
UNION ALL
SELECT o.id, o.name, rec.parentPersonId
FROM Parent rec
INNER JOIN cte_org o ON o.parentPersonId = rec.id
WHERE rec.parentPersonId IS NOT NULL
)
SELECT DISTINCT *
FROM cte_org
ORDER BY id, parentPersonId;
http://sqlfiddle.com/#!18/d7d62/4
My questions:
- can I somehow prune the already visited branches, so that recursive-CTE does not produce duplicate rows, and final distinct is not necessary
- is recursive CTE a right approach to this problem?
Advertisement
Answer
On PostgreSQL you can acheive that by replacing UNION ALL with UNION.
So the query looks like that:
WITH RECURSIVE cte_org AS (
select per.id, per.name, par.parentPersonId
from Person per left join Parent par
on per.id = par.id
UNION
SELECT
o.id,
o.name,
rec.parentPersonId
FROM
Parent rec
INNER JOIN cte_org o
ON o.parentPersonId = rec.id
where rec.parentPersonId is not null
)
SELECT *
FROM cte_org
ORDER BY id, parentPersonId;