Skip to content
Advertisement

Sql query to flter by column only if duplicate exists

I’m writing a query on data with a ‘provider_id’ column. The data is mostly unique except for some entries with all columns but ‘provider_id’ being unique. Most of the values for provider_id are 1 but when there are duplicates as described above I want to remove the row with provider_id = 1. There are only ever 2 duplicates at a time.

The query below lets me find all the duplicates but I’m not sure how to then select the correct row based off this.

select position_key, pricing_provider
from customer_data
group by position_key, pricing_provider
having count(*) > 0

How can I select out where pricing_provider != 1?

Ex:

 position_key | pricing_provider |account | user 
152894  1   2   5
152704  1   2   1084
152678  6   2   43
152513  6   2   38
152305  1   2   1121
152305  6   2   1121
152300  6   2   1121
152300  1   2   1121

I want:

 position_key | pricing_provider |account | user 
152894  1   2   5
152704  1   2   1084
152678  6   2   43
152513  6   2   38
152305  6   2   1121
152300  6   2   1121

Advertisement

Answer

If you care about duplicates, I would expect > 1, not > 0. Then, window functions are probably the best solution:

select position_key, pricing_provicer
from (select position_key, pricing_provider,
             count(*) filter (where pricing_provider <> 1) over (partition by position_key) as cnt_not_1
      from customer_data
      group by position_key, pricing_provider
      having count(*) > 1
     ) pp
where pricing_provider <> 1 or cnt_not_1 = 0;
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement