I have a table that gets loaded with data. It SHOULD NOT contain duplicate ID’s, but it is not a perfect world, people make errors and duplicates occure. My table looks like this, but with many, MANY more columns:
ID col_a col_b col_c col_d 1 1 d f e 2 r g v s 2 r s v s 3 dd ee tt ww 3 rf DD tt ww 3 dd DD tt ww 3 rf ee tt ww 4 a a a a
as you can see cloumn col_b causes a duplicate for ID = 2 and col_a and col_b causes duplicates for ID = 3. I can identify the ID’s with duplicate rows with a simple query like this:
select * from (select ID, count row from TABLE group by ID) where ROW > 1;
this will return something like:
ID ROW 2 2 3 4
but what would REALLY help me is a query that would return something like this:
ID col_a col_b col_c col_d ROW 2 TRUE 2 3 TRUE TRUE 4
…or any OTHER solution that would highlight not only the duplicated ID’s but also the columns which are causing them.
Oh, I am using DB2.
Advertisement
Answer
select id , case when c_a = 1 then null else 'True' end COL_A , case when c_b = 1 then null else 'True' end COL_B , case when c_c = 1 then null else 'True' end COL_C , case when c_d = 1 then null else 'True' end COL_D , c ROW from (select id , count(distinct col_a) c_a , count(distinct col_b) c_b , count(distinct col_c) c_c , count(distinct col_d) c_d , count(*) c from test group by id)