Skip to content
Advertisement

Identify columns causing duplicates

I have a table that gets loaded with data. It SHOULD NOT contain duplicate ID’s, but it is not a perfect world, people make errors and duplicates occure. My table looks like this, but with many, MANY more columns:

ID    col_a    col_b    col_c    col_d
1     1        d        f        e    
2     r        g        v        s  
2     r        s        v        s
3     dd       ee       tt       ww
3     rf       DD       tt       ww
3     dd       DD       tt       ww
3     rf       ee       tt       ww
4     a        a        a        a

as you can see cloumn col_b causes a duplicate for ID = 2 and col_a and col_b causes duplicates for ID = 3. I can identify the ID’s with duplicate rows with a simple query like this:

select * from (select ID,  count row from TABLE  group by ID) where ROW > 1;

this will return something like:

ID    ROW
2     2
3     4

but what would REALLY help me is a query that would return something like this:

ID    col_a    col_b    col_c    col_d    ROW
2              TRUE                       2
3     TRUE     TRUE                       4

…or any OTHER solution that would highlight not only the duplicated ID’s but also the columns which are causing them.

Oh, I am using DB2.

Advertisement

Answer

select id
       , case when c_a = 1 then null else 'True' end COL_A
       , case when c_b = 1 then null else 'True' end COL_B
       , case when c_c = 1 then null else 'True' end COL_C
       , case when c_d = 1 then null else 'True' end COL_D
       , c ROW
from (select id
            , count(distinct col_a) c_a
            , count(distinct col_b) c_b
            , count(distinct col_c) c_c
            , count(distinct col_d) c_d
            , count(*) c
      from test
      group by id)

Here is a demo

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement