Select all columns but ignore records having duplicate values on some of them

Let’s say I have a cars table. It has following columns: id, color, engine, price, upholstery, model_name. So far I’ve been querying for all cars based on criteria passed from the front end app. However, now I’m told that if multiple cars have the same value of engine, model_name and price they shouldn’t be returned as they are duplicates. In this case, one car should be returned with a number of duplicates instead. So I probably need to do some sort of a group_by on engine, model_name, price but I still need to get values of the other (not-grouped) columns (id, color, upholstery) for each record, as well as duplicates count.

My API needs to stay compliant with the old one. Currently it returns:

[
  {id: 1, color: 'blue', engine: '2.0', price: 20000, upholstery: 'leather', model_name: 'outback'},
  {id: 2, color: 'red', engine: '2.5', price: 20000, upholstery: 'material', model_name: 'wrx'},
  {id: 3, color: 'yellow', engine: '2.5', price: 20000, upholstery: 'leather', model_name: 'wrx'},
]

Now it should be:

[
  {id: 1, color: 'blue', engine: '2.0', price: 20000, upholstery: 'leather', model_name: 'outback', count: 1},
  {id: 2, color: 'red', engine: '2.5', price: 20000, upholstery: 'material', model_name: 'wrx', count: 2},
]

Performance does matter in this case. In reality there’s way more columns which are queried and joined from multiple tables.

Answer

You can do this with distinct on and an analytic function:

select distinct on (engine, model_name, price) c.*,
       count(*) over (partition by engine, model_name, price) as cnt
from cars c
order by engine, model_name, price;

Advertisement

Answer