Skip to content
Advertisement

Remove subset rows of other rows in SQL

I’ve this table below that includes ID, and five indicator columns: x1, …, x5:

enter image description here

I need to remove duplicates based on this logic:

For each ID, we look at the values of x1, …, x5, and we remove the ones that are subset of other row. For example, for ID=1, row #3 is a subset of row #2, so we remove row #3. Also, row #4 is NOT a subset of row #2, so we keep it.

Here is the expected final view of the table:

enter image description here

Advertisement

Answer

First concatenate all values of the 5 columns so that you get a binary string like ‘100101’ which can be converted to a base 10 number, say aliased value, with the function CONV().

Assuming there are no duplicate rows in the table as you mention in your comment, a row #X should be considered a subset of another row #Y if the result of the bitwise OR between the values of the 2 rows returns the value of #Y:

Or, for MySql 8.0+:

If you want to delete the subset rows, use a self join of the table like this:

See the demo.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement