Skip to content
Advertisement

How to find the three greatest values in each category in PostgreSQL?

I am a SQL beginner. I have trouble on how to find the top 3 max values in each category. The question was

“For order_ids in January 2006, what were the top (by revenue) 3 product_ids for each category_id? “

Table A:                    
(Column name)         
customer_id            
order_id              
order_date   
revenue  
product_id

Table B:  
product_id  
category_id

I tried to combine table B and A using an Inner Join and filtered by the order_date. But then I am stuck on how to find the top 3 max values in each category_id. Thanks.

This is so far what I can think of

SELECT B.product_id, category_id FROM A

JOIN B ON B.product_id = A.product_id

WHERE order_date BETWEEN ‘2006-01-01’ AND ‘2006-01-31’

ORDER BY revenue DESC

LIMIT 3;

Advertisement

Answer

This kind of query is typically solved using window functions

select *
from (
  SELECT b.product_id, 
         b.category_id,
         a.revenue,
         dense_rank() over (partition by b.category_id, b.product_id order by a.revenue desc) as rnk
  from A
    join b ON B.product_id = A.product_id
  where a.order_date between date '2006-01-01' AND date '2006-01-31'
) as t
where rnk <= 3
order by product_id, category_id, revenue desc;

dense_rank() will also deal with ties (products with the same revenue in the same category) so you might actually get more than 3 rows per product/category.

If the same product can show up more than once in table b (for the same category) you need to combine this with a GROUP BY to get the sum of all revenues:

select *
from (
  SELECT b.product_id, 
         b.category_id,
         sum(a.revenue) as total_revenue,
         dense_rank() over (partition by b.category_id, a.product_id order by sum(a.revenue) desc) as rnk
  from a
    join b on B.product_id = A.product_id
  where a.order_date between date '2006-01-01' AND date '2006-01-31'
  group by b.product_id, b.category_id
) as t
where rnk <= 3
order by product_id, category_id, total_revenue desc;

When combining window functions and GROUP BY, the window function will be applied after the GROUP BY.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement