I have a table with the following structure:
IdM|IdS ------- 1 | 2 1 | 3 1 | 4 2 | 1 2 | 3 2 | 4 3 | 1 3 | 2 3 | 3 3 | 4
How could I make a select statement on this table, which will return some rows of this table, where in each row, a specific id appears only one, indifferent on which column it is specified?
For the above result set, I would like a query that would return:
------- 1 | 2 3 | 4 -------
To give another example, if you would omit the first row in the original dataset:
IdM|IdS ------- 1 | 3 1 | 4 2 | 1 2 | 3 2 | 4 3 | 1 3 | 2 3 | 3 3 | 4
the result set should be:
------- 1 | 3 2 | 4 -------
Advertisement
Answer
That’s actually an interesting problem. If I follow you correctly, you want to iterate through the dataset and only retain rows where both values were never seen before. You could use a recursive query:
with recursive data as ( select idm, ids, row_number() over(order by idm, ids) rn from mytable where idm <> ids ), cte as ( select idm, ids, rn, 1 as to_keep , concat(idm, ',', ids) visited from data where rn = 1 union all select d.idm, d.ids, d.rn, (not find_in_set(d.idm, c.visited) and not find_in_set(d.ids, c.visited)), case when (not find_in_set(d.idm, c.visited) and not find_in_set(d.ids, c.visited)) then concat_ws(',', c.visited, d.idm, d.ids) else c.visited end from cte c inner join data d on d.rn = c.rn + 1 ) select idm, ids from cte where to_keep
The first CTE enumerates the rows ordered by both columns. Then the recursive query walks the resultset, checks if both values are new, and sets a flag accordingly of the columns. Flagged numbers are retained to be used for filtering in the following iteration.
Note that, given your requirement, not all values may appear in the resultset. Consider the following dataset:
idm ids +-----+--- 1 2 1 3 1 4
Your logic will only return the first row.