I have two tables: Orders and Customers. Orders lists all bought items (policy_id) and Customers information such as their age. The two tables are linked via the customer_id. Now I want to find the most popular items within 3 age groups (18-30, 31-60, >60). How do I approach this problem?
The following gives me a list of all items and their popularity split into the age groups but I only want the most popular ones per age group.
x
SELECT o.policy_id, count(o.policy_id) as ntotal, CASE
WHEN c.Age BETWEEN 18 and 30 then '18 - 30'
WHEN c.Age BETWEEN 31 and 60 then '31 - 60'
else 'above 60' end as RANGErrr
FROM Orders o INNER JOIN Customer c ON (o.customer_id=c.customer_id)
GROUP BY RANGErrr, o.policy_id
Do I need to add something or am I on the completely wrong track?
Advertisement
Answer
You can use window functions for this. You can do so right along with the aggregation. For the top policy_id
per age range:
SELECT co.*
FROM (SELECT co.*,
ROW_NUMBER() OVER (PARTITION BY RANGErrr ORDER BY ntotal DESC) as seqnum
FROM (SELECT o.policy_id, count(*) as ntotal,
(CASE WHEN c.Age BETWEEN 18 and 30 then '18 - 30'
WHEN c.Age BETWEEN 31 and 60 then '31 - 60'
else 'above 60'
END) as RANGErrr
FROM Orders o INNER JOIN
Customer c
ON o.customer_id = c.customer_id
GROUP BY RANGErrr, o.policy_id
) co
) co
WHERE seqnum = 1;
If you want a different number, just change the WHERE
clause.