Let’s say I have a cars
table. It has following columns: id
, color
, engine
, price
, upholstery
, model_name
. So far I’ve been querying for all cars based on criteria passed from the front end app. However, now I’m told that if multiple cars have the same value of engine
, model_name
and price
they shouldn’t be returned as they are duplicates. In this case, one car should be returned with a number of duplicates instead. So I probably need to do some sort of a group_by
on engine, model_name, price
but I still need to get values of the other (not-grouped) columns (id
, color
, upholstery
) for each record, as well as duplicates count.
My API needs to stay compliant with the old one. Currently it returns:
[ {id: 1, color: 'blue', engine: '2.0', price: 20000, upholstery: 'leather', model_name: 'outback'}, {id: 2, color: 'red', engine: '2.5', price: 20000, upholstery: 'material', model_name: 'wrx'}, {id: 3, color: 'yellow', engine: '2.5', price: 20000, upholstery: 'leather', model_name: 'wrx'}, ]
Now it should be:
[ {id: 1, color: 'blue', engine: '2.0', price: 20000, upholstery: 'leather', model_name: 'outback', count: 1}, {id: 2, color: 'red', engine: '2.5', price: 20000, upholstery: 'material', model_name: 'wrx', count: 2}, ]
Performance does matter in this case. In reality there’s way more columns which are queried and joined from multiple tables.
Advertisement
Answer
You can do this with distinct on
and an analytic function:
select distinct on (engine, model_name, price) c.*, count(*) over (partition by engine, model_name, price) as cnt from cars c order by engine, model_name, price;