I’m writing a query on data with a ‘provider_id’ column. The data is mostly unique except for some entries with all columns but ‘provider_id’ being unique. Most of the values for provider_id are 1 but when there are duplicates as described above I want to remove the row with provider_id = 1. There are only ever 2 duplicates at a time.
The query below lets me find all the duplicates but I’m not sure how to then select the correct row based off this.
select position_key, pricing_provider from customer_data group by position_key, pricing_provider having count(*) > 0
How can I select out where pricing_provider != 1?
Ex:
position_key | pricing_provider |account | user 152894 1 2 5 152704 1 2 1084 152678 6 2 43 152513 6 2 38 152305 1 2 1121 152305 6 2 1121 152300 6 2 1121 152300 1 2 1121
I want:
position_key | pricing_provider |account | user 152894 1 2 5 152704 1 2 1084 152678 6 2 43 152513 6 2 38 152305 6 2 1121 152300 6 2 1121
Advertisement
Answer
If you care about duplicates, I would expect > 1
, not > 0
. Then, window functions are probably the best solution:
select position_key, pricing_provicer from (select position_key, pricing_provider, count(*) filter (where pricing_provider <> 1) over (partition by position_key) as cnt_not_1 from customer_data group by position_key, pricing_provider having count(*) > 1 ) pp where pricing_provider <> 1 or cnt_not_1 = 0;