I have two tables: Orders and Customers. Orders lists all bought items (policy_id) and Customers information such as their age. The two tables are linked via the customer_id. Now I want to find the most popular items within 3 age groups (18-30, 31-60, >60). How do I approach this problem?
The following gives me a list of all items and their popularity split into the age groups but I only want the most popular ones per age group.
SELECT o.policy_id, count(o.policy_id) as ntotal, CASE WHEN c.Age BETWEEN 18 and 30 then '18 - 30' WHEN c.Age BETWEEN 31 and 60 then '31 - 60' else 'above 60' end as RANGErrr FROM Orders o INNER JOIN Customer c ON (o.customer_id=c.customer_id) GROUP BY RANGErrr, o.policy_id
Do I need to add something or am I on the completely wrong track?
Advertisement
Answer
You can use window functions for this. You can do so right along with the aggregation. For the top policy_id
per age range:
SELECT co.* FROM (SELECT co.*, ROW_NUMBER() OVER (PARTITION BY RANGErrr ORDER BY ntotal DESC) as seqnum FROM (SELECT o.policy_id, count(*) as ntotal, (CASE WHEN c.Age BETWEEN 18 and 30 then '18 - 30' WHEN c.Age BETWEEN 31 and 60 then '31 - 60' else 'above 60' END) as RANGErrr FROM Orders o INNER JOIN Customer c ON o.customer_id = c.customer_id GROUP BY RANGErrr, o.policy_id ) co ) co WHERE seqnum = 1;
If you want a different number, just change the WHERE
clause.