I am trying to achieve a 1:1 mapping between two tables based on their content. Unfortunately, the 1:1 mapping can not be perfect, as not both tables contain the exact same set of data. Anyways, I want to have the best possible match.
Let the code talk in an example:
/* (Re-)create first table */
DROP TABLE IF EXISTS Nodes;
CREATE TABLE Nodes (
ID INTEGER PRIMARY KEY AUTOINCREMENT
UNIQUE
NOT NULL,
Data INTEGER NOT NULL
);
/* Fill some data into the first table */
INSERT INTO Nodes (Data) VALUES (2), (3), (9), (20), (19), (13), (29), (25), (9), (25), (20), (24);
/* (Re-)create second table */
DROP TABLE IF EXISTS Links;
CREATE TABLE Links (
ID INTEGER PRIMARY KEY AUTOINCREMENT
UNIQUE
NOT NULL,
Data INTEGER NOT NULL,
NodeID INTEGER
);
/* Fill some data into the second table */
INSERT INTO Links (Data) VALUES (9), (9), (13), (19), (20), (20), (21), (24), (25), (25), (29), (30), (32);
/* Now try to macht the two tables */
UPDATE Links
SET NodeID = (
SELECT Nodes.ID
FROM Nodes
WHERE Nodes.Data = Links.Data
/* The following line seems to be executed once per Update, but not for each row of the update, which seems to be my problem */
AND Nodes.ID NOT IN (SELECT NodeID FROM Links WHERE NodeID IS NOT NULL) );
My expected output would be something like this:
Links:
ID | Data | NodeID
----+------+--------
1 | 9 | 3
2 | 9 | 9
5 | 20 | 4
6 | 20 | 11
However, what I get is:
Links:
ID | Data | NodeID
----+------+--------
1 | 9 | 3
2 | 9 | 3 <- Fail
5 | 20 | 4
6 | 20 | 4 <- Fail
Well, as mentioned in the comment above the last code line: My problem is, that the update ignores previously updated rows and thus inserts a single Nodes.ID multiple times. However, I want the Links.NodeID column to be unique after this step. (Not unique in general but for this step)
Any Ideas on how to achieve that? I am thankful for any tip or idea, as I am failing on this for days now.
Thanks a lot, DonLuigi
Advertisement
Answer
Use ROW_NUMBER()
window function in each of the tables so you can link correctly the rows that you want to update:
UPDATE Links
SET NodeID = (
SELECT n.ID
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Data ORDER BY ID) rn
FROM Nodes
) n INNER JOIN (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Data ORDER BY ID) rn
FROM Links
) l ON l.Data = n.Data AND l.rn = n.rn
WHERE n.Data = Links.Data AND l.ID = Links.ID
);
WITH cte AS (
SELECT n.ID nID, n.Data nData, l.ID lID, l.Data lData, l.NodeID lNodeID
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Data ORDER BY ID) rn
FROM Nodes
) n INNER JOIN (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Data ORDER BY ID) rn
FROM Links
) l ON l.Data = n.Data AND l.rn = n.rn
)
UPDATE Links
SET NodeID = (
SELECT nID FROM cte
WHERE nData = Links.Data AND lID = Links.ID
);
See the demo.
Results:
| ID | Data | NodeID |
| --- | ---- | ------ |
| 1 | 9 | 3 |
| 2 | 9 | 9 |
| 3 | 13 | 6 |
| 4 | 19 | 5 |
| 5 | 20 | 4 |
| 6 | 20 | 11 |
| 7 | 21 | |
| 8 | 24 | 12 |
| 9 | 25 | 8 |
| 10 | 25 | 10 |
| 11 | 29 | 7 |
| 12 | 30 | |
| 13 | 32 | |