I have a table Customer_Order_Data
with these columns:
- customer_id
- order_id
- product_id
- quantity
- order_date
For all the dataset available I want to find out the most frequently purchased quantity and most recently purchased quantity for a customer-product pair.
Let’s say I have the following data –
Order_id | Customer_id | product_id | Quantity | order_date |
---|---|---|---|---|
00001 | ABCD | B019 | 7 | 2020-01-01 |
00002 | ABCD | B019 | 6 | 2020-05-23 |
00003 | EFGH | B018 | 8 | 2020-01-12 |
00004 | ABCD | B019 | 7 | 2020-02-14 |
00005 | EFGH | B018 | 6 | 2020-04-18 |
00006 | ABCD | B019 | 7 | 2020-04-19 |
00007 | EFGH | B018 | 8 | 2020-03-12 |
I want to group the rows based on customer_id.
Result expected –
Customer_id | product_id | Most frequently Purachsed | Most Recently Purchased |
---|---|---|---|
ABCD | B019 | 7 | 6 |
EFGH | B018 | 8 | 6 |
I am unable to add dummy columns to vend out this type of view.
PostgreSQL 8.0.26
Advertisement
Answer
You are describing the statistical mode – and a top-1-per-group problem. A few databases have a built-in aggregate function to compute it, but a generic approach is:
select * from ( select customer_id, product_id, quantity, count(*) as cnt, rank() over(partition by customer_id, product_id order by count(*) desc) rn1, max(case when rn = 1 then quantity end) as most_recently_purchased from ( select t.*, row_number() over(partition by customer_id, product_id order by order_date desc) rn from mytable t ) t group by customer_id, product_id, quantity ) t where rn1 = 1