I am a SQL beginner. I have trouble on how to find the top 3 max values in each category. The question was
“For order_ids in January 2006, what were the top (by revenue) 3 product_ids for each category_id? “
Table A: (Column name) customer_id order_id order_date revenue product_id Table B: product_id category_id
I tried to combine table B and A using an Inner Join and filtered by the order_date. But then I am stuck on how to find the top 3 max values in each category_id. Thanks.
This is so far what I can think of
SELECT B.product_id, category_id FROM A JOIN B ON B.product_id = A.product_id WHERE order_date BETWEEN ‘2006-01-01’ AND ‘2006-01-31’ ORDER BY revenue DESC LIMIT 3;
Advertisement
Answer
This kind of query is typically solved using window functions
select * from ( SELECT b.product_id, b.category_id, a.revenue, dense_rank() over (partition by b.category_id, b.product_id order by a.revenue desc) as rnk from A join b ON B.product_id = A.product_id where a.order_date between date '2006-01-01' AND date '2006-01-31' ) as t where rnk <= 3 order by product_id, category_id, revenue desc;
dense_rank()
will also deal with ties (products with the same revenue in the same category) so you might actually get more than 3 rows per product/category.
If the same product can show up more than once in table b
(for the same category) you need to combine this with a GROUP BY to get the sum of all revenues:
select * from ( SELECT b.product_id, b.category_id, sum(a.revenue) as total_revenue, dense_rank() over (partition by b.category_id, a.product_id order by sum(a.revenue) desc) as rnk from a join b on B.product_id = A.product_id where a.order_date between date '2006-01-01' AND date '2006-01-31' group by b.product_id, b.category_id ) as t where rnk <= 3 order by product_id, category_id, total_revenue desc;
When combining window functions and GROUP BY, the window function will be applied after the GROUP BY.