I am trying to achieve a 1:1 mapping between two tables based on their content. Unfortunately, the 1:1 mapping can not be perfect, as not both tables contain the exact same set of data. Anyways, I want to have the best possible match.
Let the code talk in an example:
/* (Re-)create first table */ DROP TABLE IF EXISTS Nodes; CREATE TABLE Nodes ( ID INTEGER PRIMARY KEY AUTOINCREMENT UNIQUE NOT NULL, Data INTEGER NOT NULL ); /* Fill some data into the first table */ INSERT INTO Nodes (Data) VALUES (2), (3), (9), (20), (19), (13), (29), (25), (9), (25), (20), (24); /* (Re-)create second table */ DROP TABLE IF EXISTS Links; CREATE TABLE Links ( ID INTEGER PRIMARY KEY AUTOINCREMENT UNIQUE NOT NULL, Data INTEGER NOT NULL, NodeID INTEGER ); /* Fill some data into the second table */ INSERT INTO Links (Data) VALUES (9), (9), (13), (19), (20), (20), (21), (24), (25), (25), (29), (30), (32); /* Now try to macht the two tables */ UPDATE Links SET NodeID = ( SELECT Nodes.ID FROM Nodes WHERE Nodes.Data = Links.Data /* The following line seems to be executed once per Update, but not for each row of the update, which seems to be my problem */ AND Nodes.ID NOT IN (SELECT NodeID FROM Links WHERE NodeID IS NOT NULL) );
My expected output would be something like this:
Links: ID | Data | NodeID ----+------+-------- 1 | 9 | 3 2 | 9 | 9 ... 5 | 20 | 4 6 | 20 | 11
However, what I get is:
Links: ID | Data | NodeID ----+------+-------- 1 | 9 | 3 2 | 9 | 3 <- Fail ... 5 | 20 | 4 6 | 20 | 4 <- Fail
Well, as mentioned in the comment above the last code line: My problem is, that the update ignores previously updated rows and thus inserts a single Nodes.ID multiple times. However, I want the Links.NodeID column to be unique after this step. (Not unique in general but for this step)
Any Ideas on how to achieve that? I am thankful for any tip or idea, as I am failing on this for days now.
Thanks a lot, DonLuigi
Advertisement
Answer
Use ROW_NUMBER()
window function in each of the tables so you can link correctly the rows that you want to update:
UPDATE Links SET NodeID = ( SELECT n.ID FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY Data ORDER BY ID) rn FROM Nodes ) n INNER JOIN ( SELECT *, ROW_NUMBER() OVER (PARTITION BY Data ORDER BY ID) rn FROM Links ) l ON l.Data = n.Data AND l.rn = n.rn WHERE n.Data = Links.Data AND l.ID = Links.ID );
WITH cte AS ( SELECT n.ID nID, n.Data nData, l.ID lID, l.Data lData, l.NodeID lNodeID FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY Data ORDER BY ID) rn FROM Nodes ) n INNER JOIN ( SELECT *, ROW_NUMBER() OVER (PARTITION BY Data ORDER BY ID) rn FROM Links ) l ON l.Data = n.Data AND l.rn = n.rn ) UPDATE Links SET NodeID = ( SELECT nID FROM cte WHERE nData = Links.Data AND lID = Links.ID );
See the demo.
Results:
| ID | Data | NodeID | | --- | ---- | ------ | | 1 | 9 | 3 | | 2 | 9 | 9 | | 3 | 13 | 6 | | 4 | 19 | 5 | | 5 | 20 | 4 | | 6 | 20 | 11 | | 7 | 21 | | | 8 | 24 | 12 | | 9 | 25 | 8 | | 10 | 25 | 10 | | 11 | 29 | 7 | | 12 | 30 | | | 13 | 32 | |