In my problem, I have these rows e.g.:
id1 |name1| id2 | name2| c 1 10994,Apple,22265,Banana,103 2 22265,Banana,10994,Apple,103 3 20945,Coconut,20391,Date,101 4 20391,Date,20945,Coconut,101
They show pair-wise combinations of ids and names, together with another column c.
I consider row 1+2, and 3+4 as duplicates, considering the pairings of ids and names. Rows 1+2, or 3+4 show basically the same information.
I had no luck removing the duplicates with grouping, because id1 + id2, or name1 + name2 respectively are distinct columns.
Can I add something to my SQL query that removes these ‘duplicates’, so that only rows 1+3 are output?
Advertisement
Answer
One approach here would to be aggregate by the c
column and then use a least/greatest trick to tease apart the duplicates. Here is a general solution which should work on any version of SQL:
SELECT DISTINCT c, CASE WHEN id1 < id2 THEN id1 ELSE id2 END AS id1, CASE WHEN id1 < id2 THEN id2 ELSE id1 END AS id2, CASE WHEN name1 < name2 THEN name1 ELSE name2 END AS name1, CASE WHEN name1 < name2 THEN name2 ELSE name1 END AS name2 FROM yourTable;
Here is a working demo.
Note that on databases which have a LEAST
and GREATEST
function, the above can be simplified to:
SELECT DISTINCT c, LEAST(id1, id2) AS id1, GREATEST(id1, id2) AS id2, LEAST(name1, name2) AS name1, GREATEST(name1, name2) AS name2 FROM yourTable;