Remove subset rows of other rows in SQL

Question

I&#8217;ve this table below that includes ID, and five indicator columns: x1, &#8230;, x5: I need to remove duplicates based on this logic: For each ID, we look at the values of x1, &#8230;, x5, and we remove the ones that are subset of other row. For example, for ID=1, row #3 is a subset of row #2, so we

Accepted Answer

First concatenate all values of the 5 columns so that you get a binary string like ‘100101’ which can be converted to a base 10 number, say aliased value, with the function CONV().Assuming there are no duplicate rows in the table as you mention in your comment, a row #X should be considered a subset of another row #Y if the result of the bitwise OR between the values of the 2 rows returns the value of #Y:SELECT t1.*FROM tablename t1WHERE NOT EXISTS ( SELECT * FROM tablename t2 WHERE t2.ID = t1.ID AND (t1.x1, t1.x2, t1.x3, t1.x4, t1.x5) <> (t2.x1, t2.x2, t2.x3, t2.x4, t2.x5) AND CONV(CONCAT(t1.x1, t1.x2, t1.x3, t1.x4, t1.x5), 2, 10) | CONV(CONCAT(t2.x1, t2.x2, t2.x3, t2.x4, t2.x5), 2, 10) = CONV(CONCAT(t2.x1, t2.x2, t2.x3, t2.x4, t2.x5), 2, 10));Or, for MySql 8.0+:WITH cte AS ( SELECT *, CONV(CONCAT(x1, x2, x3, x4, x5), 2, 10) value FROM tablename)SELECT t1.ID, t1.x1, t1.x2, t1.x3, t1.x4, t1.x5FROM cte t1WHERE NOT EXISTS ( SELECT * FROM cte t2 WHERE t2.ID = t1.ID AND t2.value <> t1.value AND t1.value | t2.value = t2.value);If you want to delete the subset rows, use a self join of the table like this:DELETE t1FROM tablename t1 INNER JOIN tablename t2ON t2.ID = t1.IDAND (t1.x1, t1.x2, t1.x3, t1.x4, t1.x5) <> (t2.x1, t2.x2, t2.x3, t2.x4, t2.x5)AND CONV(CONCAT(t1.x1, t1.x2, t1.x3, t1.x4, t1.x5), 2, 10) | CONV(CONCAT(t2.x1, t2.x2, t2.x3, t2.x4, t2.x5), 2, 10) = CONV(CONCAT(t2.x1, t2.x2, t2.x3, t2.x4, t2.x5), 2, 10);See the demo.

Advertisement

Answer