Here is the code:
CREATE TABLE audit_trail ( old_email TEXT NOT NULL, new_email TEXT NOT NULL ); INSERT INTO audit_trail(old_email, new_email) VALUES ('harold_gim@yahoo.com', 'hgimenez@hotmail.com'), ('hgimenez@hotmail.com', 'harold.gimenez@gmail.com'), ('harold.gimenez@gmail.com', 'harold@heroku.com'), ('foo@bar.com', 'bar@baz.com'), ('bar@baz.com', 'barbaz@gmail.com'); WITH RECURSIVE all_emails AS ( SELECT old_email, new_email FROM audit_trail WHERE old_email = 'harold_gim@yahoo.com' UNION SELECT at.old_email, at.new_email FROM audit_trail at JOIN all_emails a ON (at.old_email = a.new_email) ) SELECT * FROM all_emails; old_email | new_email --------------------------+-------------------------- harold_gim@yahoo.com | hgimenez@hotmail.com hgimenez@hotmail.com | harold.gimenez@gmail.com harold.gimenez@gmail.com | harold@heroku.com (3 rows) select old_email, new_email into iter1 from audit_trail where old_email = 'harold_gim@yahoo.com'; select * from iter1; -- old_email | new_email -- ----------------------+---------------------- -- harold_gim@yahoo.com | hgimenez@hotmail.com -- (1 row) select a.old_email, a.new_email into iter2 from audit_trail a join iter1 b on (a.old_email = b.new_email); select * from iter2; -- old_email | new_email -- ----------------------+-------------------------- -- hgimenez@hotmail.com | harold.gimenez@gmail.com -- (1 row) select * from iter1 union select * from iter2; -- old_email | new_email -- ----------------------+-------------------------- -- hgimenez@hotmail.com | harold.gimenez@gmail.com -- harold_gim@yahoo.com | hgimenez@hotmail.com -- (2 rows)
As you can see the recursive code gives the result in right order, but the non-recursive code does not.
They both use union
, why the difference?
Advertisement
Answer
Basically, your query is incorrect to begin with. Use UNION ALL
, not or you would incorrectly remove duplicate entries. (There is nothing to say the trail cannot switch back and forth between the same emails.)UNION
The Postgres implementation for UNION ALL
returns values in the sequence as appended – as long as you do not add ORDER BY
at the end or do anything else with the result.
Be aware though, that each SELECT
returns rows in arbitrary order unless ORDER BY
is appended. There is no natural order in tables.
The same is not true for UNION
, which has to process all rows to remove possible duplicates. There are various ways to determine duplicates, the resulting order of rows depends on the chosen algorithm and is implementation-dependent and completely unreliable – unless, again, ORDER BY
is appended.
So use instead:
SELECT * FROM iter1 UNION ALL -- union all! SELECT * FROM iter2;
To get a reliable sort order, and “simulate the record of growth”, you can track levels like this:
WITH RECURSIVE all_emails AS ( SELECT *, 1 AS lvl FROM audit_trail WHERE old_email = 'harold_gim@yahoo.com' UNION ALL -- union all! SELECT t.*, a.lvl + 1 FROM all_emails a JOIN audit_trail t ON t.old_email = a.new_email ) TABLE all_emails ORDER BY lvl;
Aside: if old_email
is not defined UNIQUE
in some way, you can get multiple trails. You would need a unique column (or combination of columns) to keep it unambiguous. If all else fails you can (ab-)use the internal tuple ID ctid
for the purpose of telling trails apart. But you should rather use your own columns. (Added example in the fiddle.)
Consider: