CREATE TABLE sales ( id SERIAL PRIMARY KEY, event_date DATE, customer VARCHAR, orderID VARCHAR, sales_volume DECIMAL ); INSERT INTO sales (event_date, customer, orderID, sales_volume) VALUES ('2020-01-08', 'Customer_A', 'Order_001', '130'), ('2020-01-12', 'Customer_A', 'Order_002', '120'), ('2020-01-22', 'Customer_A', 'Order_003', '115'), ('2020-01-22', 'Customer_C', 'Order_001', '300'), ('2020-01-23', 'Customer_C', 'Order_002', '500'), ('2020-04-08', 'Customer_B', 'Order_001', '325'), ('2020-04-12', 'Customer_B', 'Order_002', '875'), ('2020-04-15', 'Customer_B', 'Order_003', '910'), ('2020-04-20', 'Customer_B', 'Order_004', '723'), ('2020-04-30', 'Customer_C', 'Order_003', '665'), ('2020-06-01', 'Customer_B', 'Order_005', '982'), ('2020-06-15', 'Customer_B', 'Order_006', '100'), ('2020-06-19', 'Customer_C', 'Order_004', '250'), ('2020-06-20', 'Customer_C', 'Order_005', '322'), ('2020-06-25', 'Customer_A', 'Order_004', '445');
Exptected Result:
customer | orderid | event_date | sales_volume -------------|---------------|--------------------|---------------------- Customer_A | Order_001 | 2020-01-08 | 130 Customer_A | Order_003 | 2020-01-22 | 115 Customer_C | Order_002 | 2020-01-23 | 500 Customer_C | Order_001 | 2020-01-22 | 300 -------------|---------------|--------------------|------------------------ Customer_B | Order_002 | 2020-04-12 | 875 Customer_B | Order_003 | 2020-04-15 | 910 Customer_C | Order_003 | 2020-04-30 | 665 -------------|---------------|--------------------|------------------------ Customer_A | Order_004 | 2020-06-25 | 445 Customer_B | Order_005 | 2020-06-01 | 982 Customer_B | Order_006 | 2020-06-15 | 100 Customer_C | Order_005 | 2020-06-20 | 322 Customer_C | Order_004 | 2020-06-19 | 250
I have a huge database and need to extract some data from it for a case study.
The problem is that I need to extract the full year of the data because I want to be able to conduct a monthly analysis in the case study. Therefore, I can not limit the extract with dates
or LIMIT
.
Thus, my idea to solve this issue is a query which extracts randomly maximal two orders per customer per month.
Do you have any idea if this is possible?
If yes, how do I need to modify the below query?
SELECT s.customer, s.orderID, s.event_date, SUM(s.sales_volume) AS sales_volume FROM sales s GROUP BY 1,2,3 ORDER BY 1,2,3;
Advertisement
Answer
Thus, my idea to solve this issue is a query which extracts randomly maximal two orders per customer per month.
You can use:
select s.* from (select s.*, row_number() over (partition by customer, date_trunc('month', event_date) order by random()) as seqnum from sales s ) s where seqnum <= 2;
To be honest, though, for analytic purposes, I would prefer to take a random sample of customers — say 1% or 5% — and all the transactions for them.